A, Albert, and Anderson A. J. 1984. “On the Existence of the Maximum Likelihood Estimates in Logistic Regression Models.” Biometrika 71 (1): 1–10.
al, BenDor A et. 2000. “Tissue Classification with Gene Expression Profiles.” Journal of Computational Biology 7 (3): 559–83.
B, Efron, and Tibshirani R. 1986. “Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy.” Statistical Science, 54–75.
Bauer, Eric, and Ron Kohavi. 1999. “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning 36: 105–42.
Bergstra, James, Norman Casagrande, Dumitru Erhan, Douglas Eck, and Balazs Kegl. 2006. “Aggregate Features and AdaBoost for Music Classification.” Machine Learning 65: 473–84.
Box G, Cox D. 1964. “An Analysis of Transformations.” Journal of the Royal Statistical Society, 211–52.
Breiman, Leo. 1998. “Arcing Classifiers.” The Annals of Statistics 26: 123–40.
———. 2001a. “Random Forests.” Machine Learning 45: 5–32.
———. 2001b. “Statistical Modeling: The Two Cultures.” Statistical Science 16 (3): 199231.
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. ISBN 978-0412048418. CRC.
Cestnik, B., and I. Bratko. 1991. “Estimating Probabilities in Tree Pruning.” EWSL, 138–50.
Chollet, François. 2017. Deep Learning with Python. Manning.
Chollet, François, and J. J. Allaire. 2018. Deep Learning with r. Manning.
Chun, Hyonho, and Sündüz Keleş. 2010. “Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (1): 3–25.
Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.”
D, McClish. 1989. “Analyzing a Portion of the ROC Curve.” Medical Decision Making 9: 190–95.
Dudoit S, Fridlyand J, and Speed T. 2002. “Comparison of Discrimination Meth- Ods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association 97 (457): 77–87.
E. R. DeLong, D. L. Clarke-Pearson, D. M. DeLong. 1988. “Comparing the Areas Under Two or More Correlated Receiver Operating Characteristics Curves: A Nonparametric Approach.” Biometrics 44: 837–45.
Efron, B. 1983. “Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.” Journal of the American Statistical Association, 316–31.
Efron, B, and R Tibshirani. 1986. “Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy.” Statistical Science, 54–75.
———. 1997. “Improvements on Cross-Validation: The 632+ Bootstrap Method.” Journal of the American Statistical Association 92 (438): 548–60.
F. Espoito, D. Malerba, and G. Semeraro. 1997. “A Comparative Analysis of Methods for Pruning Decision Trees.” IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (5): 476–91.
Freund, Y., and R. Schapire. 1997. “A Decision-Theoretic Generalization of Online Learning and an Application to Boosting.” Journal of Computer and System Sciences 55: 119–39.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2000. “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics 38: 337–74.
Gareth James, Trevor Hastie, Daniela Witten, and Robert Tibshirani. 2015. An Introduction to Statistical Learning. 6th ed. Springer.
Geladi P, Kowalski B. 1986. “Partial Least Squares Regression: A Tutorial.” Analytica Chimica Acta, no. 185: 1–17.
Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
Hall P, Fan Y, Hyndman R. 2004. “Nonparametric Confidence Intervals for Receiver Operating Characteristic Curves.” Biometrika 91: 743–50.
Hand D, Till R. 2001. “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.” Machine Learning 45 (2): 171–86.
Hastie T, Friedman J, Tibshirani R. 2008. The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd ed. Springer.
Hochreiter, Sepp, and JÃŒrgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80.
Hoerl, Arthur, and Robert Kennard. 1970. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics 12 (1): 55–67.
HSSINA, Badr, Abdelkarim MERBOUHA, Hanane EZZIKOURI, and Mohammed ERRITALI. 2014. “A Comparative Study of Decision Tree ID3 and C4.5.” International Journal of Advanced Computer Science and Applications(IJACSA), Special Issue on Advances in Vehicular Ad Hoc Networking and Applications 2014 4 (2).
Hyndman, R. J., and G. Athanasopoulos. 2013. Forecasting: Principles and Practice. Vol. Section 2/5. OTect: Melbourne, Australia.
Iglewicz, Boris, and David Hoaglin. 1993. “How to Detect and Handle Outliers.” The ASQC Basic References in Quality Control: Statistical Techniques 16.
J, Cohen. 1960. “A Coefficient of Agreement for Nominal Data.” Educational and Psychological Measurement 20: 37–46.
Jolliffe, I. T. 2002. Principla Component Analysis. 2nd ed. Springer.
Kuhn, Max, and Kjell Johnston. 2013. Applied Predictive Modeling. Springer.
Kwak, Gloria Hyun Jung, and Pan Hui. 2019. “DeepHealth: Deep Learning for Health Informatics Reviews, Challenges, and Opportunities on Medical Imaging, Electronic Health Records, Genomics, Sensing, and Online Communication Health.” 2019.
L, Breiman. 1966a. “Bagging Predictors.” Machine Learning 24 (2): 123–40.
L Meier, S van de Geer, and P Buhlmann. 2008. “The Group Lasso for Logistic Regression.” J. R. Stat. Soc. Ser. B Stat. Methodol 70: 53–71.
L, Valiant. 1984. “A Theory of the Learnable.” Communications of the ACM 27: 1134–42.
Lachiche N, Flach P. 2003. “Improving Accuracy and Cost of Two–Class and Multi–Class Probabilistic Classifiers Using ROC Curves.” In “Proceed- Ings of the Twentieth International Conference on Machine Learning 20 (416–424).
Landis JR, Koch GG. 1977. “The Measurement of Observer Agreement for Categorical Data.” Biometrics 33: 159–74.
Li J, Fine JP. 2008. “ROC Analysis with Multiple Classes and Multiple Tests: Methodology and Its Application in Microarray Studies.” Biostatistics 9 (3): 566–76.
Line Clemmensen, Daniela Witten, Trevor Hastie, and Bjarne Ersbøll. 2011. “Sparse Discriminant Analysis.” Technometrics 53 (4): 406–13.
M, Kearns, and Valiant L. 1989. “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.”
M, Saar Tsechansky, and Provost F. 2007. “Handling Missing Values When Applying Classification Models.” Journal of Machine Learning Research b (8): 1625–57.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in r and STAN. Edited by 2nd. Chapman; Hall/CRC.
Mulaik, S. A. 2009. Foundations of Factor Analysis. 2ND ed. Chapman Hall/CRC.
Patel, Nikita, and Saurabh Upadhyay. 2012. “Study of Various Decision Tree Pruning Methods with Their Empirical Comparison in WEKA.” International Journal of Computer Applications 60 (12).
Pearl, Judea, and Dana Mackenzie. 2019. The Book of Why. Penguin Books.
Provost F, Kohavi R, Fawcett T. 1998. “The Case Against Accuracy Esti- Mation for Comparing Induction Algorithms.” Proceedings of the Fifteenth International Conference on Machine Learning, 445–53.
Quinlan, J. 1999. “Simplifying Decision Trees.” International Journal of Human-Computer Studies 61 (2).
R, Tibshirani. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society Series B (Methodological) 58 (1): 267–88.
Ronald L. Wassersteina, Nicole A. Lazara. 2016. “Position on p-Values: Context, Process, and Purpose.”
Serneels S, Espen PV, Nolf ED. 2006. “Spatial Sign Preprocessing: A Simple Way to Impart Moderate Robustness to Multivariate Estimators.” Journal of Chemical Information and Modeling 46 (3): 1402–9.
T, Dietterich. 2000. “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning 40: 139–58.
T, Fawcett. 2006. “An Introduction to ROC Analysis.” Pattern Recognition Letters 27 (8): 861–74.
T, Ho. 1998. “The Random Subspace Method for Constructing Decision Forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence 13: 340–54.
Varmuza K, He P, and Fang K. 2003. “Boosting Applied to Classification of Mass Spectral Data.” Journal of Data Science 1 (391–404).
W, Massy. 1965. “Principal Components Regression in Exploratory Statistical Research.” Journal of the American Statistical Association 60: 234–46.
Waal, Ton de, Jeroen Pannekoek, and Sander Scholtus. 2011. Handbook of Statistical Data Editing and Imputation. John Wiley; Sons.
Wedderburn, R. W. M. 1976. “On the Existence and Uniqueness of the Maximum Likelihood Estimates for Certain Generalized Linear Models.” Biometrika 63: 27–32.
Willett, Peter. 2004. “Dissimilarity-Based Algorithms for Selecting Structurally Diverse Sets of Compounds.” Journal of Computational Biology 6(3-4) (doi:10.1089/106652799318382): 447–57.
Wold, Herman. 1973. “Nonlinear Iterative Partial Least Squares (NIPALS) Modelling: Some Current Developments.” Academic Press, 383–407.
Wold, Herman, and K. G. Jöreskog. 1982. Systems Under Indirect Observation: Causality, Structure, Prediction. North Holland, Amsterdam.
Y, Amit, and Geman D. 1997. “Shape Quantization and Recognition with Randomized Trees.” Neural Computation 9: 1545–88.
Y. Kim, J. Kim, and Y. Kim. 2006. “Blockwise Sparse Regression.” Statist. Sin 16: 375–90.
Yeo, G. W., and C. B. Burge. 2004. “Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals.” Journal of Computational Biology, November, 475–94.
YFR, Schapire. 1999. “Adaptive Game Playing Using Multiplicative Weights.” Games and Economic Behavior 29: 79–103.
Yuan, M., and Y. Lin. 2007. “Model Selection and Estimation in Regression with Grouped Variables.” J. R. Stat. Soc. Ser. B Stat. Methodol 68: 49–67.
Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. “Understanding Deep Learning Requires Rethinking Generalization.” arXiv :1611.03530.
Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, 67 (2): 301–20.

  1. This is based on “Industry recommendations for academic data science programs:”. It is a collection of thoughts of different data scientist across industries about what a data scientist does, and what differentiates an exceptional data scientist.↩︎

  2. The image is from↩︎