General context without going into business details
Focus on the technical parts
Follow the data science project cycle
Response: if a corn multi-year customer will purchase again next year
Predictors: other customer experience and behavior data
\(\mathbf{\beta}=(\beta_{0},\dots,\beta_{p})^{T}\) parameter vector
The logliklihood function is as follows:
\[ln\mathcal{L}(\boldsymbol{\beta}|\mathbf{y})=\sum_{i=1}^{n}\left\{ y_{i}ln\frac{1}{1+exp(-\mathbf{x_{i}}^{T}\mathbf{\boldsymbol{\beta}})}+(1-y_{i})ln\left[1-\frac{1}{1+exp(-\mathbf{x_{i}}^{T}\boldsymbol{\beta})}\right]\right\} \]
\[D(\boldsymbol{\beta})\equiv\frac{\partial ln\mathcal{L}(\boldsymbol{\beta}|\mathbf{y})}{\partial\boldsymbol{\beta}}=\sum_{i=1}^{n}\left\{ y_{i}-\frac{1}{exp(-\mathbf{x_{i}}^{T}\boldsymbol{\beta})}\right\} \mathbf{x_{i}}\]
\(\mathbf{x_{i,g}}\) vector of dummy variables ( \(i^{th}\) observation in group \(g\) ) \(i = 1,...,n , g = 1,...,G\)
\(y_{i}\) binary response for the \(i^{th}\) observation
\(df_{g}\) degrees of freedom of group \(g\)
\[\mathcal{S}_{\lambda}(\beta)=-l(\beta)+\lambda\sum_{g=1}^{G}s(df_{g})\parallel\beta_{g}\parallel_{2}\]
where \(l(\mathbf{\beta})\) is log-likelihood:
\[\Sigma_{i=1}^{n}\{y_{i}\eta_{\beta}(\mathbf{x_{i}})-log[1+exp(\eta_{\beta}(\mathbf{x_{i}}))]\}\]
\(\lambda\) tuning parameter for penalty and \(s(\centerdot)\) is \(s(df_{g})=df_{g}^{0.5}\)
where
\[\lambda_{max}=max_{g\in {1,\dots,G}}{\frac{1}{s(df_{g})}\parallel \mathbf{x_{g}^{T}(y-\bar{y})}\parallel_{2}}\]
Essentially, all models are wrong, but some are useful.