Understanding of Logistic Regression

1 minute read

Published: August 13, 2025

Summary: This note explains why logistic regression is suited for classification. Unlike linear regression, probabilities must remain between 0 and 1. The logistic function enforces this constraint, mapping linear predictors into valid probabilities. Its log-odds formulation ensures interpretability, making logistic regression a well-defined, probabilistic approach for binary classification tasks.

Question

Why we use logistic regression to solve classification problem? What ensures its validity and makes it well-defined?

Explanation

For $i=1,\cdots,N$, we have tuple $(x_1,y_1),\cdots,(x_N,y_N)$, where target variable is $y$, predictor is $x$.

For continuous $y_i$, we want to explore the relationship between $x_i$ and $y_i$, $i=1,\cdots,N$, then we apply linear regression, $y_i = \beta_i + \beta_1x_i+\epsilon_i,i=1,\cdots,N$ However, when $y_i$ only have value 0 and 1, it seems that the corresponding changing trend between $x_i$ and $y_i$ doesn’t hold anymore. What we used to predict in $y_i$ is not a mutation, in discrete case. Certainly, we have some solutions in this case, such as Support Vector Machine (SVM), however, if we still set out base on linear regression, then we should change our respect of seeing the relationship between $x_i$ and $y_i$. Mostly, in classification problem, we tend to assign probability to different patterns of predictors, closer patterns in term of Euclidean Measure are more likely to be given similar probability. Given that the change in probability should be continuous, then we can focus on (doesn’t have to follow Normal Distribution):

\[\mathbb{P}(Y_i=1|X_i)=\beta_o+\beta_1x_i,i=1,\cdots,N\]

Furthermore, in some cases, the prediction might go beyond 1 or below 0, so we have to set some constraints on the function, hence, we have

\[\mathbb{P}(Y_i=1|X_i)=f(\beta_0+\beta_1x_i)=\frac{\exp(\beta_0+\beta_1x_i)}{1+\exp(\beta_0+\beta_1x_i)},i=1,\cdots,N\]

Note that,

\[\frac{\mathbb{P}(Y_i=1|X_i)}{1-\mathbb{P}(Y_i=1|X_i)}=\beta_0+\beta_1x_i\]

Bluesky Facebook LinkedIn X (formerly Twitter)

He Yichen

Understanding of Logistic Regression

Question

Explanation

Ripley’s K and Besag’s L Function

Shell File Descriptor and Redirection Reference

Difference between Standard Deviation and Standard Error

Another way of understanding Stochastic Gradient Descending (SGD) Algorithm