B.1 Customer Data for Clothing Company

The simulation is not very straightforward and we will break it into three parts:

  1. Define data structure: variable names, variable distribution, customer segment names, segment size
  2. Variable distribution parameters: mean and variance
  3. Iterate across segments and variables. Simulate data according to specific parameters assigned

By organizing code this way, it makes easy for us to change specific parts of the simulation. For example, if we want to change the distribution of one variable, we can just change the corresponding part of the code.

Here is code to define data structure:

The next step is to define variable distribution parameters. There are 4 segments of customers and 8 parameters. Different segments correspond to different parameters. Let’s store the parameters in a 4×8 matrix:

Now we are ready to simulate data using the parameters defined above:

Now let’s edit the data we just simulated a little by adding tags to 0/1 binomial variables:

In the real world, the data always includes some noise such as missing, wrong imputation. So we will add some noise to the data:

So far we have created part of the data. You can check it using summary(sim.dat). Next, we will move on to simulate survey data.