2.2 Beyond Data and Analytics

Data scientists usually have a good sense of data and analytics, but data science projects are much more than that. A data science project may involve people with different roles, especially in a large company:

  • the business owner or leader who identifies business problem and value;
  • the data owner and computation resource/infrastructure owner from the IT department;
  • a dedicated policy owner to make sure the data and model are under model governance, security and privacy guidelines and laws;
  • a dedicated engineering team to implement, maintain and refresh the model;
  • a program manager to ensure the data science project fits into the overall technical program development and to coordinate all involved parties to set periodical tasks so that the project meets the preset milestones and results;

The entire team usually will have multiple rounds of discussion of resource allocation among groups (i.e., who pay for the data science project) at the beginning of the project and during the project.

Effective communication and in-depth domain knowledge about the business problem are essential requirements for a successful data scientist. A data scientist may interact with people at various levels, from senior leaders who set the corporate strategies to front-line employees who do the daily work. A data scientist needs to have the capability to view the problem from 10,000 feet above the ground and down to the detail to the very bottom. To convert a business question into a data science problem, a data scientist needs to communicate using the language other people can understand and obtain the required information through formal and informal conversations.

In the entire data science project cycle, including defining, planning, developing, and implementing, every step needs to get a data scientist involved to ensure the whole team can correctly determine the business problem and reasonably evaluate the business value and success. Corporates are investing heavily in data science and machine learning, and there is a very high expectation of return for the investment.

However, it is easy to set an unrealistic goal and inflated estimation for a data science project’s business impact. The team’s data scientist should lead and navigate the discussions to ensure data and analytics, not wishful thinking, back the goal. Many data science projects often over-promise in business value and are too optimistic on the timeline to delivery. These projects eventually fail by not delivering the pre-set business impact within the promised timeline. As data scientists, we need to identify these issues early and communicate with the entire team to ensure the project has a realistic deliverable and timeline. The data scientist team also needs to work closely with data owners on different things. For example, identify a relevant internal and external data source, evaluate the data’s quality and relevancy to the project, and work closely with the infrastructure team to understand the computation resources (i.e., hardware and software) availability. It is easy to create scalable computation resources through the cloud infrastructure for a data science project. However, you need to evaluate the dedicated computation resources’ cost and make sure it fits the budget.

In summary, data science projects are much more than data and analytics. A successful project requires a data scientist to lead many aspects of the project.