2.4 Create Table

As a data scientist, you are always building models and making predictions. You may want to take those predictions that you create and write them back to a database. This ensures that someone else could then pick up those predictions and use them in a dashboard. Or you may want to create a dashboard or visualize it with another tool that can be hooked up and used with that database. It’s also helpful if you’re extracting data off the web or scraping it from somewhere and you want to store this data in a database. As we have previously discussed, the data scientist isn’t usually the one in charge of managing the entire database, that usually left to the DBA or some type of administrator. Even so, it’s helpful to have a basic understanding of how this works. You can use the statement CREATE TABLE to do this.

CREATE TABLE Shoes
(
Id     char(10)     PRIMARY KEY,
Brand  char(10)     NOT NULL,
ShoeTypes  char(250)    NOT NULL,
Color  char(250)    NOT NULL,
Price  decimal(8,2) NOT NULL,
Descp  Varchar(750) NULL
);

In the above example, you create a table named “Shoes” by putting the name after CREATE TABLE. Then define the list of columns in the brackets. The first column is Id. It is a character with lenth 10. And the Id is going to be primary key. So you define data type (how many characters or decimals you allow to be inserted into this column), whether or not you allow null values. By default, it assumes that null values are accepted.

The syntax for creating these tables varies greatly by relational database management system that you’re using. The above example exhibits the basic structure to create a table. However, it’s important to check the specifications and syntax of the relational database management system you’re using. An important thing to note when creating a table is to define whether a column can contain a null value or is a primary key. There are several things to pay special attention here:

  1. Don’t confuse null value with empty string. Null value is the absence of everything, whereas empty string is a value there, such as space.
  2. A primary key cannot accept null values. The Id in the example can never have an empty value or not any value.
  3. If you indicate that a column cannot be null, then you will get an error if there is any missing value in that column.

There are two ways to incert data to the table after defining the variables:

  1. Use INSERT INTO statement
INSERT INTO Shoes
VALUES ( '14535974',
'Gucci',
'Slippers',
'Pink',
'695.00',
NULL
);

The first method will take the first value indicated and put it in the first column; the second value will go to the second column, and so on and so forth. It works. However, a potential problem of this method is that you have no guarantee that data is going into the correct column. It is better to be more specific.

  1. Use INSERT INTO and VALUES
INSERT INTO Shoes
(Id, 
Brand,
ShoeTypes,
Color,
Price,
Descp
)
VALUES 
( 
'14535974',
'Gucci',
'Slippers',
'Pink',
'695.00',
NULL
);

The second method also lists the columns. This can be really beneficial if you want to insert values into some columns but not all. I will recommend using this method. It’s a little safer because you have more control and know exactly which value is going and into which column.

Now you have learned how to create tables using SQL. Another option is to create a copy or get a subset of an existing table. A table created this way is a temporary table. The most important thing to know about temporary tables is that they will be deleted when the current client session is terminated. That’s why they’re called temporary tables. Why do we need temporary table? Because it is mush faster than creating a real table. If you have complex queries and you want to simplify it a bit by creating a subset and then joining to that subset and driving a new calculation from that, then temporary table is a great option. You can use the statement CREATE TEMPORARY TABLE to do this:

CREATE TEMPORARY TABLE Sandals AS
(
SELECT *
FROM Shoes
WHERE ShoeTypes = 'sandals'
)

In this case, we create a subset of data from Shoes table and name the new table Sandals. The new table is all records with shoe type sandals.