2.3 Three Pillars of Knowledge

It is well known there are three pillars of essential knowledge for a successful data scientist.

  1. Analytics knowledge and toolsets

A successful data scientist needs to have a strong technical background in data mining, statistics, and machine learning. The in-depth understanding of modeling with insight about data enables a data scientist to convert a business problem to a data science problem. Many chapters of this book are focusing on analytics knowledge and toolsets.

  1. Domain knowledge and collaboration

A successful data scientist needs in-depth domain knowledge to understand the business problem well. For any data science project, the data scientist needs to collaborate with other team members. Communication and leadership skills are critical for data scientists during the entire project cycle, especially when there is only one scientist in the project. The scientist needs to decide the timeline and impact with uncertainty.

  1. (Big) data management and (new) IT skills

The last pillar is about computation environment and model implementation in a big data platform. It used to be the most difficult one for a data scientist with a statistics background (i.e., lack computer science knowledge or programming skills). The good news is that with the rise of the big data platform in the cloud, it is easier for a statistician to overcome this barrier. The “Big Data Cloud Platform” chapter of this book will describe this pillar in detail.

Three pillars of knowledge

FIGURE 2.2: Three pillars of knowledge