5.3 Centering and Scaling

It is the most straightforward data transformation. It centers and scales a variable to mean 0 and standard deviation 1. It ensures that the criterion for finding linear combinations of the predictors is based on how much variation they explain and therefore improves the numerical stability. Models involving finding linear combinations of the predictors to explain response/predictors variation need data centering and scaling, such as PCA (Jolliffe 2002), PLS (Geladi P 1986) and EFA (Mulaik 2009). You can quickly write code yourself to conduct this transformation.

Let’s standardize the variable income from sim.dat:

Or the function preProcess() can apply this transformation to a set of predictors.

Now the two variables are in the same scale. You can check the result using summary(transformed). Note that there are missing values.

References

Geladi P, Kowalski B. 1986. “Partial Least Squares Regression: A Tutorial.” Analytica Chimica Acta, no. 185: 1–17.

Jolliffe, I.T. 2002. Principla Component Analysis. 2nd ed. Springer.

Mulaik, S.A. 2009. Foundations of Factor Analysis. 2ND ed. Chapman Hall/CRC.