General context without going into business details
Focus on the technical parts
Follow the data science project cycle
Response: if a corn multi-year customer will purchase again next year
Predictors: other customer experience and behavior data
β=(β0,…,βp)T parameter vector
The logliklihood function is as follows:
lnL(β|y)=n∑i=1{yiln11+exp(−xiTβ)+(1−yi)ln[1−11+exp(−xiTβ)]}
D(β)≡∂lnL(β|y)∂β=n∑i=1{yi−1exp(−xiTβ)}xi
xi,g vector of dummy variables ( ith observation in group g ) i=1,...,n,g=1,...,G
yi binary response for the ith observation
dfg degrees of freedom of group g
Sλ(β)=−l(β)+λG∑g=1s(dfg)∥βg∥2
where l(β) is log-likelihood:
Σni=1{yiηβ(xi)−log[1+exp(ηβ(xi))]}
λ tuning parameter for penalty and s(⋅) is s(dfg)=df0.5g
where
λmax=maxg∈1,…,G1s(dfg)∥xTg(y−ˉy)∥2
Essentially, all models are wrong, but some are useful.