Home Blog

One statistics graduate student asked me some questions about how to prepare to be a “Big Data” era statistician. Since it is not the first time I was asked questions of this kind, I decide to put all of them together and hope it is helpful for some others who are interested in analytical work in the future. I started to work in industry right after my PhD, so the following are from industry point of view not academe. All of the questions are great and don’t have right/wrong answer. I just say my opinions based on personal experience. Also many professional in statistics reacted to such questions:

Q1 How could I prepare myself to be a “Big Data” era statistician?

I think “Big Data” has been overly used which created plenty bubbles. Everyone is talking about big data, but no one can explain what exactly it is. It is fun to look at trends for some words by google occasionally, but not helpful to solve problems in real life. A lot of data, on its own is worthless. It isn’t the size of the data that’s important. It’s what you do with it. The big data skills that so many are touting today are not skills for better solving the real problem of inference from data. As David Donoho pointed out in his article “50 years of Data Science”:

……they are coping skills for dealing with organizational artifacts of large-scale cluster computing……the range of easily constructible algorithms shrinks dramatically compared to the single-processor model, so one inevitably tends to adopt inferential approaches which would have been considered rudimentary or even inappropriate in old times. Such coping……deforms our judgements about what is appropriate, and holds us back from data analysis strategies that we would otherwise eagerly pursue. Nevertheless, the scaling cheerleaders are yelling at the top of their lungs that using more data deserves a big shout.

A science doesn’t just spring into existence simply because a deluge of data will soon be filling telecom servers.

I can’t agree with David more. So I will instead talk about how to prepare to be a statistician and ignore “Big Data”.

Statistician is a very general word and we need to further define which sub-field of statistic a “Statistician” works on. You can refer to the information on American Statistical Association’s website about “Which Industries Employ Statisticians?” If you click the industry you are interested in, you will see a page with information about how statistics fit into the industry. By reading those, you should have an idea of what statistical skills are required for your aimed area.

Q2 What’s the direction of the development of statistics?

This question is too big and I am not knowledgeable enough to answer. Since I have been working in marketing, I will talk about what I think is promising in marketing statistics.

Q3 How could statisticians keep competitive in the job market (both industry and academia) compared with computer scientists and show our uniqueness from mathematicians? The center of these questions is about the current statistician identity crisis in the data science industry. Here may prompt out another question though, what the “data science” really refers to.

You are right that there is lots of confusion around Data Scientist, Statistician, Business/Financial/Risk(etc) Analyst, BI professional…… It is because the obvious intersections among those. It took me two years to get through the identity crises myself. Now I see data science as a discipline to make sense of data. In order to make sense of data, statistics is an indispensable part. Meanwhile a data scientist needs many other skills. The article “What is a data scientist?” summarizes the difference among these roles. It provides very nice skill lists for different roles and comparisons among them. Some of my comments:

Also no matter what you do in the future, working for 10 years doesn’t equal to 10 years’ working experience. Many people just work for one year and repeat the first year for many years after. That is certainly one thing I try my best to avoid. The most important and far reaching thing you can get from a PhD program is to learn how to learn. It is great and necessary to prepare ahead of time. But the uncertainty is the most certain thing in life. We can never be fully prepared but always be preparing and learning. Being a life-time learner is the best way to prepare you to be a future statistician (and many others). Gook luck!

[I will add on questions later]

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.