10.3 Elastic Net

Elastic Net is a generalization of lasso and ridge regression (Zou and Hastie 2005). It combines the two penalties. The estimates of coefficients optimize the following function:

$\begin{matrix} (10.3) & Σ_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2} + λ_{1} Σ_{j = 1}^{p} β_{j}^{2} + λ_{2} Σ_{j = 1}^{p} | β_{j} | \end{matrix}$

Ridge penalty shrinks the coefficients of correlated predictors towards each other while the lasso tends to pick one and discard the others. So lasso estimates have a higher variance. However, ridge regression doesn’t have a variable selection property. The advantage of the elastic net is that it keeps the feature selection quality from the lasso penalty as well as the effectiveness of the ridge penalty. And it deals with highly correlated variables more effectively.

We can still use train() function to tune the parameters in the elastic net. As before, set the cross-validation and parameter range, and standardize the predictors:

enetGrid <- expand.grid(.lambda = seq(0,0.2,length=20), 
                        .fraction = seq(.8, 1, length = 20))
set.seed(100)
enetTune <- train(trainx, trainy,
                  method = "enet",
                  tuneGrid = enetGrid,
                  trControl = ctrl,
                  preProc = c("center", "scale"))
enetTune

Elasticnet 

999 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 899, 899, 899, 899, 899, 900, ... 
Resampling results across tuning parameters:

  lambda   fraction  RMSE  Rsquared  MAE  
  0.00000  0.8000    1763  0.7921    787.5
  0.00000  0.8105    1760  0.7924    784.1
  .
  .
  .
  0.09474  0.9158    1760  0.7945    782.5
  0.09474  0.9263    1761  0.7947    782.5
  0.09474  0.9368    1761  0.7949    782.7
  0.09474  0.9474    1763  0.7950    783.3
  0.09474  0.9579    1764  0.7951    784.3
  0.09474  0.9684    1766  0.7953    785.7
  0.09474  0.9789    1768  0.7954    787.1
  0.09474  0.9895    1770  0.7954    788.8
  0.09474  1.0000    1772  0.7955    790.4
 [ reached getOption("max.print") -- omitted 200 rows ]

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.9579 and lambda = 0.

The results show that the best values of the tuning parameters are fraction = 0.9579 and lambda = 0. It also indicates that the final model is lasso only (the ridge penalty parameter lambda is 0). The RMSE and $R^{2}$ are 1742.2843 and 0.7954 correspondingly.

References

Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, 67 (2): 301–20.