10.3 Elastic Net
Elastic Net is a generalization of lasso and ridge regression (Zou and Hastie 2005). It combines the two penalties. The estimates of coefficients optimize the following function:
\[\begin{equation} \Sigma_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}+\lambda_{1}\Sigma_{j=1}^{p}\beta_{j}^{2}+\lambda_{2}\Sigma_{j=1}^{p}|\beta_{j}| \tag{10.3} \end{equation}\]
Ridge penalty shrinks the coefficients of correlated predictors towards each other while the lasso tends to pick one and discard the others. So lasso estimates have a higher variance. However, ridge regression doesn’t have a variable selection property. The advantage of the elastic net is that it keeps the feature selection quality from the lasso penalty as well as the effectiveness of the ridge penalty. And it deals with highly correlated variables more effectively.
We can still use train()
function to tune the parameters in the elastic net. As before, set the cross-validation and parameter range, and standardize the predictors:
<- expand.grid(.lambda = seq(0,0.2,length=20),
enetGrid .fraction = seq(.8, 1, length = 20))
set.seed(100)
<- train(trainx, trainy,
enetTune method = "enet",
tuneGrid = enetGrid,
trControl = ctrl,
preProc = c("center", "scale"))
enetTune
Elasticnet
999 samples
10 predictor
Pre-processing: centered (10), scaled (10)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 899, 899, 899, 899, 899, 900, ...
Resampling results across tuning parameters:
lambda fraction RMSE Rsquared MAE
0.00000 0.8000 1763 0.7921 787.5
0.00000 0.8105 1760 0.7924 784.1
.
.
.
0.09474 0.9158 1760 0.7945 782.5
0.09474 0.9263 1761 0.7947 782.5
0.09474 0.9368 1761 0.7949 782.7
0.09474 0.9474 1763 0.7950 783.3
0.09474 0.9579 1764 0.7951 784.3
0.09474 0.9684 1766 0.7953 785.7
0.09474 0.9789 1768 0.7954 787.1
0.09474 0.9895 1770 0.7954 788.8
0.09474 1.0000 1772 0.7955 790.4
[ reached getOption("max.print") -- omitted 200 rows ]
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.9579 and lambda = 0.
The results show that the best values of the tuning parameters are fraction = 0.9579 and lambda = 0. It also indicates that the final model is lasso only (the ridge penalty parameter lambda is 0). The RMSE and \(R^{2}\) are 1742.2843 and 0.7954 correspondingly.