• Stars
    star
    178
  • Rank 214,989 (Top 5 %)
  • Language
  • Created over 3 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Some SQL tutorials I've been creating and posting to Twitter

SQL Tutorials

Welcome!

Here are some SQL tutorials I've been creating and posting to Twitter.

Support Me

If you find any of this useful, consider supporting me at:

Buy Me A Coffee

Let's Connect

You can also follow me at:

linkedin twitter

Let's Begin!

Let's get started! We'll begin with a quick overview of databases.

What is a database?

This is essentially a container storing organised data electronically, usually controlled by something called a database management system (DBMS for short). These can store more data (and are more secure) than a text file or spreadsheet.

What is a DBMS?

These are software packages which allow us to create, manipulate and manage our databases. You've probably heard of some before.

They include:

  • MySQL
  • SQLite
  • PostgreSQL

What is a table?

These are structured files containing data within the database. Typically, a database contains multiple tables, each composed of rows & columns. This makes data retrieval easier.

What are table columns & rows?

Picture tables like spreadsheets. We have columns storing a particular type of data, such as name, price, or date, with each row representing a specific record.

What is a type of data?

Each column in a table has a datatype. A column called age would be a numeric datatype, as an example. Setting datatypes correctly can:

  • Restrict incorrect data being inserted
  • Help if data needs to be sorted
  • Help optimise memory usage

What are constraints?

These are rules enforced on your data, ensuring the correct type of data is entered into a database, maintaining data integrity. For example, a UNIQUE type constraint would ensure all values in a table column remain unique.

What is a primary key?

This is a column or combo of columns designed to uniquely identify each table row. Without a primary key, we won't always be able to uniquely identify a table record. An example might be a 'user_ID' column, whereby each record has its own unique ID.

What is a foreign key?

This is a column or combo of columns in a table that refers to the primary key of another table. This establishes a link between tables, allowing us to join them together if required.

What is SQL?

SQL, or Structured Query Language, is what we use to communicate with databases. We use this to remove data, add data, read data, join tables, create tables, and so on. It's easy to learn, and almost all DBMS support SQL. So it's worthwhile learning this!

Getting Started

A good place to get started is with SQLite. Follow the below steps to get set-up:

  1. Download DB Browser for SQLite from https://sqlitebrowser.org
  2. Download the Chinook sample database
  3. Open up DB Browser for SQLite and open the database
  4. Under the Database Structure tab, you can see a list of tables
  5. Under the Execute SQL tab, you can run SQL queries

Alternatively, you can use the SQLite command line program, rather than DB Browser: https://www.sqlite.org/cli.html

You can view the database diagram illustrating table relationships for the Chinook database here: https://www.sqlitetutorial.net/sqlite-sample-database/

Play around with this database as you read through some of these tutorials. You can start by simply typing a basic command and running it to see the output:

Select Statment

The SELECT statement is used to retrieve table data. We only need to specify what we want to select and where we want to select it from. Simple!

Note that we use FROM to specify the table we wish to retrieve data from.

In the below example, you'll notice we've used the * character. This signifies that we want to return all columns in the table. This is useful when you are practicing; however, it's bad practice within a real production enviornment, and can slow things down. Rarely would we ever need to retun all columns.

Additionally, pay attention to column placement when we are returning more than one column. There should be a comma between each column name, but not after the last one. If we were to include a comma after the last column in a SELECT statment, we would get an error.

Sorting

When retrieving data from a table, it will generally be displayed in the order it appears within that table; therefore it's not a good idea to rely on data being in any specific order when retrieving it. Instead, you can explicitly sort using ORDER BY.

If we want to order descendingly, we can use DESC. We can also explicity write ASC for ascending. However, as this is the default, we don't really need to include it.

Where Clause

When extracting data from tables, you'll normally only want a subset of the data. Achieve this using the WHERE clause, which will filter the data.

Specify WHERE after the FROM clause.

Note that we've split our queries across multiple lines here, rather than just writing it all on a single line. In SQL, we don't have to worry about whitespace. It's usually good practice to split long queries across multiple lines for better readability.

Where Clause ~ Part 2

We can combine WHERE clauses with AND or OR.

When combining these operators, AND is processed before OR.

But... we can control evaluation order using parentheses to explicitly group operators, a bit like what we do in Maths! For example:

  • 5 + 2 x 3 = 11
  • (5 + 2) x 3 = 21

Note how we use parenthese to change the order of evaluation, and thus change the final result. We can do the same with AND and OR in SQL.

Wildcards ~ Percent ( % )

Wildcards are symbols used to substitute characters within a text string. Adding these to a search condition allows us to filter data matching a pattern.

To do this, we use the LIKE operator within a WHERE clause.

Let's look at the % wildcard.

Wildcards ~ Underscore ( _ )

Another wildcard is the underscore.

This matches 1 character, differing from the % wildcard, which represents 0, 1, or more characters.

Note: wildcard searches typically take longer to run, especially if wildcard is at the start of a search pattern. Consider this before using.

Calculated Fields ~ Concatenation

In a SELECT statement, we can calculate new columns on-the-fly without altering the database data itself.

One example is concatenating columns together.

To give a new field a proper name, we use the AS keyword. Note that we don't actually have to include the AS keyword. The query would work exactly the same if we omitted this. However it's good practice to include it, for readability.

Calculated Fields ~ Maths

We learned we can calculate fields on-the-fly without altering database data itself

One use for this is concatenation. Another is running maths operations on retrieved data.

Here are some operators:

  • Division ( / )
  • Addition ( + )
  • Subtraction ( - )
  • Multiplication ( * )

If we want to change the order or precedence, we can use parentheses (like we would in Maths).

Functions ~ Part 1

Functions are a set of instructions grouped together, used to perform a specific task.

Value(s) can be passed to a function, which the function then operates on.

This can make manipulating or converting data in SQL simpler and more efficient.

Additional Info

Here's just a bit more information on SQL statements!

Aggregate Functions

Aggregate functions run calculations on a set of rows to return a single value. These are:

  1. MAX
  2. MIN
  3. AVG
  4. SUM
  5. COUNT

These all ignore NULL (blank) values except for COUNT(*) which counts the number of table rows.

We can use DISTINCT to include only unique values. Notice how it changes the average returned when we use this.

Note that where you see DISTINCT below, you could replace this with ALL to perform the calculation on all rows. However, we would never do this, as ALL is actually the default.

So for example, we could rewrite one of the other lines, like SUM(price) AS sum_price, as SUM(ALL price) AS sum_price instead. This would do exactly the same thing.

Grouping Data

The GROUP BY clause creates a separate group for each unique value in a column or set of columns.

We can then perform aggregate functions like SUM or AVG on each individual group.

Note that, if a grouping column contains NULL, this will be classed as a group. Also, GROUP BY can list multiple columns to group by (each of these must be a retrieved column under the SELECT statement except for the aggregate function). Similarly, every column listed in the SELECT statement must be included in the GROUP BY clause; again, with the exception of the aggregate function(s).

Finally, GROUP BY comes after the FROM clause and WHERE clauses, but before the ORDER BY clause.

It's worth playing around with this on your own so you can better understand what's going on.

Filtering Grouped Data

We use HAVING to filter grouped data. This is very similar to the WHERE clause.

Only difference is WHERE filters specific rows (filters before data is grouped) and HAVING filters groups (filters after data is grouped)

Note: We can use both clauses in one statement. So for example, we might want to filter our data using WHERE, then group the data, and filter those groups usin HAVING.

Subqueries

A subquery is an SQL query embedded within another query, often used within WHERE clause IN operators

Subqueries are processed starting with the innermost SELECT statement

Note: although powerful, subqueries can negatively impact performance. There are other ways we can perform the type of data retrieval we are conducting here (more on that in a later tutorial).

Subqueries ~ Part 2

Not always the best option, but we can use subqueries to create calculated fields (columns created on-the-fly) within a SELECT statement.

Note: Because both tables below contain an "em_id" column, we've used syntax "table.column" to remove ambiguity & ensure correct results (i.e. Employees.em_id and Expenses.em_id).

Joins

In progress...

More Repositories

1

CheatSheets

Just a place to store cheatsheets
1,711
star
2

Reddit-API-Pipeline

Python
302
star
3

DataEngineerZoomCamp

I'm partaking in a Data Engineering Bootcamp / Zoomcamp. I'll store files and progress here.
Jupyter Notebook
95
star
4

Abz-Aaron

9
star
5

CoinCap-API-Pipeline

A simple pipeline utilising cron, Postgres, AWS EC2, and Metabase
Python
8
star
6

Portfolio

This contains a list of projects that make up part of my portfolio. Enjoy!
7
star
7

Reddit-API-Pipeline-DBT

Working with DBT to transform Reddit API data
4
star
8

SQL-Leetcode

A repository where I'll store my solutions for SQL Leetcode Problems.
4
star
9

AirflowOverview

Notes taken for an Airflow Udemy Course
3
star
10

DataQuestDE

This contains a collection of projects completed as part of the Data Engineering DataQuest course.
Jupyter Notebook
3
star
11

ASCII

Python
2
star
12

WebScraper-Keywords

Just a (very) simple web scraper using Beautiful Soup to scrap SQL keywords
Python
2
star
13

Databricks-Notes

Some notes taken during my Databricks learning.
1
star
14

DataMining-Lab1

This is just a lab completed as part of a Data Mining module
HTML
1
star
15

AnalysisWorkflow

Just a basic script on how to run an analysis with R
HTML
1
star
16

DirectoryTreeUtility

A small utility for generating a directory tree and saving to Excel with formatting
Python
1
star
17

MediaApp

Python
1
star
18

FileSearch

Python
1
star
19

Microblog

This is a Flask App I am creating as part of a mega-tutorial.
Roff
1
star
20

GameOfLife

Python
1
star
21

DQ-Forest-Fires

Just a short Data Quest project exploring Forest Fires with Ggplot
HTML
1
star
22

Experiment

This is Stroop Experiment, completed as part of my master's degree
Python
1
star
23

DQ-BookSales

Just a short project covering some basic R concepts
1
star
24

PythonDSA

This is work I'm doing as part of a Data Structures & Algorithms course.
Python
1
star
25

DQ-Covid-R

Basic Covid-19 Analysis with R
HTML
1
star
26

DQ-NYC-Schools

Just a short Data Quest project working with R
HTML
1
star
27

DockerOverview

Notes on Docker from a YouTube tutorial
1
star
28

TwitterTips

A place to store some tips that I share on Twitter
1
star
29

StarlingFinance

Project to extract banking data
Python
1
star