SQL (Structured Query Language) is a powerful tool for managing data in Relational Database Management Systems (RDBMS). This cheat sheet guides you through the basics of SQL and provides essential SQL commands and explanations.
- A database stores objects called tables.
- Every table is broken up into smaller entities called fields, which is the name of a column in the table.
- Each entry (row/tuple in the table, containing the actual data) is called a record.
- The tuples of a table may not be distinct (tables are bags not sets).
- A schema is specifying the structure of the database including the table names (outside the parentheses), all the fields in each table (inside the parentheses), and which field(s) in each table are keys (underlined fields). For example:
**Users**(user_id, first_name, last_name, address, email)
**Products**(product_id, product_name, description, price)
**Orders**(order_id, user, product_ordered, total_paid)
To get started with SQL, you'll need to install an SQL server like MySQL or MariaDB. For installation instructions on Arch-based Linux distributions like Manjaro, follow this link.
Then, download this sample dataset or from this repo. To import the database and start playing around with it, execute the following commands:
- Open terminal and login into MariaDB
mariadb -u <your_username> -p
- Create a new database called “northwind” and exit
CREATE DATABASE northwind;
EXIT;
- Import two files: “northwind.sql “and “northwind-data.sql” into the “northwind” database
mariadb -u <your_username> -p northwind < northwind.sql
mariadb -u <your_username> -p northwind < northwind-data.sql
To start using MariaDB, enter the following in a terminal:
mariadb -u <your_username> -p
Then, a prompt like this will be displayed:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 11.1.2-MariaDB Arch Linux
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
The “none” in “MariaDB [(none)]>” indicates that no database is selected.
USE northwind;
SHOW TABLES;
DESCRIBE customers;
mariadb -u <your_username> -p < "filename.sql"
All queries follow a roughly SELECT
, FROM
, WHERE
, GROUP BY
, HAVING
, ORDER BY
structure.
Displays the all the tuples for the specified fields. Aliases (AS
keyword for field names, no keyword for table names) can be used to shorten the name of both the field names and the table names (especially useful in more complex queries to reduce amount of typing).
SELECT first_name AS name
FROM customers C;
SELECT id, company, first_name AS name
FROM customers C;
SELECT *
FROM customers C;
SELECT DISTINCT
considers tuples distinct as long as there is at least 1 field where the values differ. SELECT DISTINCT
will consider tuples distinct even if some fields have same values. Tuples are only considered equivalent (and thus its duplicates are excluded) if the values are the same in all of the fields. Compare the followingSELECT
and differentSELECT DISTINCT
queries:
SELECT ship_name, ship_city
FROM orders; --produces 48 rows
SELECT DISTINCT ship_name, ship_city
FROM orders; --produces 15 rows
SELECT DISTINCT ship_name
FROM orders; --produces 15 rows
SELECT DISTINCT ship_city
FROM orders; --produces 12 rows
Retrieves only the tuples that meet certain conditions.
SELECT id, product_name, category, list_price
FROM products
WHERE list_price = 30;
SELECT id, product_name, category, list_price
FROM products
WHERE list_price IN (18, 30, 40);
SELECT id, product_name, category, list_price
FROM products
WHERE list_price > 15;
SELECT id, product_name, category, list_price
FROM products
WHERE list_price BETWEEN 20 AND 30;
SELECT id, product_name, category, list_price
FROM products
WHERE list_price NOT BETWEEN 20 AND 30;
- % - matches zero, one, or multiple characters of any kind
- _ - matches exactly one character of any kind
There are endless possibilities for pattern expressions, but some common patterns are:
SELECT id, first_name, last_name
FROM customers
WHERE first_name LIKE "A%"; -- starts with A
SELECT id, first_name, last_name
FROM customers
WHERE first_name LIKE "%e"; -- ends with e
SELECT id, first_name, last_name
FROM customers
WHERE first_name LIKE "%re%"; -- include re
SELECT id, first_name, last_name
FROM customers
WHERE first_name LIKE "K_ren"; -- don't know what 2nd letter is
SELECT id, first_name, last_name
FROM customers
ORDER BY first_name ASC; -- ASC keyword can be omitted
SELECT id, first_name, last_name
FROM customers
ORDER BY first_name DESC;
Sorting by multiple columns will first sort by the first column. If there are still tuples that have the same value for the first column, they will be sorted by the second column.
SELECT id, first_name, last_name
FROM customers
ORDER BY first_name DESC, last_name DESC;
-- Notice that the output contains in this order:
-- (25, "John", "Rodman")
-- (12, "John", "Edwards")
-- Since "John" was equal, it then sorted on the 2nd column
SELECT
CASE
WHEN G.GRADE < 8 THEN "NULL"
ELSE S.NAME
END AS STAT,
G.GRADE,
S.MARKS
FROM STUDENTS S
CROSS JOIN GRADES G
WHERE
MIN_MARK <= S.MARKS AND
S.MARKS <= MAX_MARK
ORDER BY GRADE DESC, S.NAME
It's often used to handle situations where you want to retrieve a value from a list of columns but you're not sure which column will have a non-null value.
The general structure of an OVER statements:
<window function> OVER(
[PARTITION BY <column names>]
[ORDER BY <column names>]
[ <ROW or RANGE clause> ]
) AS Alias
Cumulative price example:
SELECT order_date,
revenue,
SUM(revenue) OVER (ORDER BY order_date) AS cumulative_revenue
FROM sales
ORDER BY order_date;
Create Database
Drop Database
Create Table
Drop Table
The other columns will have NULL values in them.
INSERT INTO privileges (id)
VALUES (5), (6), (7);
INSERT INTO privileges
VALUES (3, "Free Lunches"), (4, "Health Insurance");
INSERT INTO employee_privileges
VALUES (5, 2);
INSERT INTO tableau;
SELECT *
FROM source_table;
DELETE FROM privileges
WHERE id BETWEEN 3 AND 7;
If a condition is not specified, all values are deleted without deleting the table itself.
DELETE FROM employee_privileges;
Temporary tables are tables that only exist for the duration of the query and are automatically dropped at the end of the session. They can be useful when performing more complex queries. They are akin to variables in a programming language.
CREATE TABLE TEMPORARY TempTable (
ID INT(16),
Name VARCHAR(50)
);
If you want to build an ordered list of tuples, where the next tuple depends on the previous tuple in some way, a recursive query is effective:
WITH RECURSIVE FactorialCTE(n, factorial) AS (
-- Base case
SELECT 1, 1
UNION ALL
-- Recursive part
SELECT n + 1, (n + 1) * factorial -- How to build next record
FROM FactorialCTE
WHERE n < 10 -- Complement of termination condition (n >= 10)
)
SELECT * FROM FactorialCTE;
Useful for creating queries where the result can vary based on some input condition.
CREATE FUNCTION AddNumbers (@num1 INT, @num2 INT)
RETURNS INT
AS
BEGIN
RETURN @num1 + @num2;
END;