Table of Contents

1. Marketing Data Science
2. About Me
3. Outline
4. What is Data Science?
5. Sexist or Worst Defined Job
6. Types of Questions
7. Types of Learning
8. Types of Algorithm (1)
9. Types of Algorithm (2)
10. Types of Algorithm (3)
11. Case Study: Group Lasso Logistic Regression for Customer Retention
12. Project Cycle
13. Business Questions
14. Clarification Questions
15. Refined Question
16. Project Cycle
17. Data Preprocessing
18. Project Cycle
19. Multivariate Logistic Regression
20. Lasso: Weighted L1-norm Penalty [Tibshirani 1996]
21. Group Lasso Logistic Regression
22. Performance Measure
23. Model Training and Testing
24. Cut-off Tuning
25. Project Cycle
26. Model Comparison
27. Project Cycle
28. Data Science in Marketing Overview
29. Conjoint analysis
30. Data Science Pipeline
31. Data Scientist Skill Set
32. Some Links

Marketing Data Science

Hui Lin, DowDuPont

2018/03/23 @ University of Iowa

About Me

Hui Lin, Data Scientist (http://scientistcafe.com)
Contact
- Email: longqiman@gmail.com
- Github: https://github.com/happyrabbit
Slides: http://scientistcafe.com/IDS/slides/MarketingDataScience.html
Github repo: https://github.com/happyrabbit/Talks/tree/master/2018_03_23_UnivIowa

Outline

What is Data Science?
Case Study: Group Lasso Logistic Regression for Customer Retention
Data Science in Marketing Overview
Be Data Scientist

What is Data Science?

Sexist or Worst Defined Job

Types of Questions

Types of Learning

Types of Algorithm (1)

http://scientistcafe.com/2017/07/08/MachineLearningAl.html

Types of Algorithm (2)

http://scientistcafe.com/2017/07/08/MachineLearningAl.html

Types of Algorithm (3)

http://scientistcafe.com/2017/07/08/MachineLearningAl.html

Case Study: Group Lasso Logistic Regression for Customer Retention

General context without going into business details
Focus on the technical parts
Follow the data science project cycle

Project Cycle

Business Questions

How likely will a customer purchase?
What are the key drivers?

Clarification Questions

Who are our customers?
Are there different segments of customers?
What is a purchase?
How far ahead do we need to predict?
What are the predictors?
What is the quality of the data?
Where are different data sets located?
…

Refined Question

Response: if a corn multi-year customer will purchase again next year
Predictors: other customer experience and behavior data

Project Cycle

Data Preprocessing

Cleaning
Missing values
Transformation
- Categorical
- 0/1
- percentage
- large positive number
- counts

Project Cycle

Multivariate Logistic Regression

binary response vector
design matrix in which each is dimention column
parameter vector
The logliklihood function is as follows:

Problems: quasi-complete-separation and significance based variable selection
Solution: add penalty

Lasso: Weighted L1-norm Penalty [Tibshirani 1996]

Advantage: stabilize the estimation, also a variable selection tool
Limitation: only selects individual dummy variables, the estimates are affected by the way dummy variables are encoded (M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol. 68 (2007), pp.49-67)

Group Lasso Logistic Regression

vector of dummy variables ( observation in group )
binary response for the observation
degrees of freedom of group

where is log-likelihood:

tuning parameter for penalty and is

Performance Measure

Maximize AUC
Grid of 148 values

where

Model Training and Testing

70/30 (Train/Test)
10-fold cross validation (Train)
Double check: one year holdout

Cut-off Tuning

Ordered the score from high to low
Calculate the sensitivity and specificity as the cutoff changes
Get the cut-off values with corresponding likelihoods

Project Cycle

Model Comparison

Traditional Stepwise Regression
Random Forest
SVM
Neural Network

Essentially, all models are wrong, but some are useful.

Project Cycle

Data Science in Marketing Overview

Program and Service Analysis
- Causal inference in observational environment
Unstructured data analytics: digital marketing
- API: Twitter/Google/Wikipedia…
- Webpage: Forum, Reviews
- Survey
- Interviews
Market research analytics
- Customer Segmentation
- Choice-based conjoint analysis
- Customer perception analysis
Market Basket Analysis
…

Conjoint analysis

Data Science Pipeline

Data Scientist Skill Set

Some Links

Types of Machine Learning Algorithm
Online books:
- The Elements of Statistical Learning
- An Introduction to Statistical Learning
- Introduction to Data Science(still writing)
Hard copy books:
- Applied Predictive Modeling
- R for Marketing Research and Analytics
Online course:
- Deep Learning Specialization
Awesome-Data-Science-Materials

slide 1/32

* help? Contents

Space, Right Arrow or swipe left to move to next slide, click help below for more details