Marketing Data Science

Hui Lin, DowDuPont

2018/03/23 @ University of Iowa

About Me

Outline

What is Data Science?

Sexist or Worst Defined Job

HTML5 Icon

Types of Questions

Types of Learning

Types of Algorithm (1)

Types of Algorithm (2)

Types of Algorithm (3)

Case Study: Group Lasso Logistic Regression for Customer Retention

Project Cycle

Business Questions

  1. How likely will a customer purchase?
  2. What are the key drivers?

Clarification Questions

Refined Question

Project Cycle

Data Preprocessing

Project Cycle

Multivariate Logistic Regression

\[ln\mathcal{L}(\boldsymbol{\beta}|\mathbf{y})=\sum_{i=1}^{n}\left\{ y_{i}ln\frac{1}{1+exp(-\mathbf{x_{i}}^{T}\mathbf{\boldsymbol{\beta}})}+(1-y_{i})ln\left[1-\frac{1}{1+exp(-\mathbf{x_{i}}^{T}\boldsymbol{\beta})}\right]\right\} \]

\[D(\boldsymbol{\beta})\equiv\frac{\partial ln\mathcal{L}(\boldsymbol{\beta}|\mathbf{y})}{\partial\boldsymbol{\beta}}=\sum_{i=1}^{n}\left\{ y_{i}-\frac{1}{exp(-\mathbf{x_{i}}^{T}\boldsymbol{\beta})}\right\} \mathbf{x_{i}}\]

Lasso: Weighted L1-norm Penalty [Tibshirani 1996]

Group Lasso Logistic Regression

\[\mathcal{S}_{\lambda}(\beta)=-l(\beta)+\lambda\sum_{g=1}^{G}s(df_{g})\parallel\beta_{g}\parallel_{2}\]

where \(l(\mathbf{\beta})\) is log-likelihood:

\[\Sigma_{i=1}^{n}\{y_{i}\eta_{\beta}(\mathbf{x_{i}})-log[1+exp(\eta_{\beta}(\mathbf{x_{i}}))]\}\]

\(\lambda\) tuning parameter for penalty and \(s(\centerdot)\) is \(s(df_{g})=df_{g}^{0.5}\)

Performance Measure

where

\[\lambda_{max}=max_{g\in {1,\dots,G}}{\frac{1}{s(df_{g})}\parallel \mathbf{x_{g}^{T}(y-\bar{y})}\parallel_{2}}\]

Model Training and Testing

Cut-off Tuning

  1. Ordered the score from high to low
  2. Calculate the sensitivity and specificity as the cutoff changes
  3. Get the cut-off values with corresponding likelihoods

Project Cycle

Model Comparison

Essentially, all models are wrong, but some are useful.

Project Cycle

Data Science in Marketing Overview

Conjoint analysis

Data Science Pipeline

Data Scientist Skill Set