SQL for Data Science

2.3 Retrive Data with `SELECT`

The majority of what data scientists are doing with SQL is retrieving data. To get started, the first statement to learn is SELECT. There are two pieces of information to specify in SELECT statement: what to select and where it is from. It will be straightforward with an example.

SELECT prod_name
FROM Products;

prod_name
Shampoo
Toothpaste
Deodorant
Toothbrush

Ihis example selects product name which a column from the table (prod_name). Then you need to specify where to get it from. Here it is from a table named Products. The output of this is then going to look like the column listed below. Which it has a column product name and then all of the list of products. We have shampoo, toothpaste, deodorant, and toothbrush. If you want to retrieve more than a single column from a table, then what you need to add the names of the individual columns separated with a comma:

SELECT prod_name, prod_id, prod_price
FROM Products;

Request all columns by using the asterisk (*) wildcard character instead of column names:

SELECT *
FROM Products;

This is going to go ahead and grab everything from the Products table, each individual column, and put it into your output.

A lot of times we may want get a view of the table to understand what data is in there without pulling all the records. In this case, we may do a “SELECT *”. But if there are 5 million records in it, we just need a tiny sample of that. In this case, we can use LIMIT to limit the results.

SELECT prod_name
FROM Products
LIMIT 5;

Here is an example of how different RDMS differ in syntax:

If you are using SQLite and understand it’s LIMIT 5. When switch over to a DB2 system, you can easily Google in terms of the syntax in DB2.

2.3 Retrive Data with SELECT

2.3 Retrive Data with `SELECT`