Beyond LeetCode SQL
This repository covers supplementary analysis of SQL for LeetCode and classic interview questions, tradeoff between performance optimization and developmental efficiency, and how it relates to general database design consideration (e.g. indexing and join). Specific sample databases are provided to illustrate tricky interview questions.
LeetCode Selected Probems
Only high-quality problems are selected. Pathological problems such as Find Median Given Frequency of Numbers are not discussed. Entry-level syntax problems such as Combine Two Tables are not discussed.
# | Problems | Solutions | Level | Concept |
---|---|---|---|---|
262 | Trips and Users | MySQL | Hard | Three-way join; filtering |
185 | Department Top Three Salaries | MySQL, MS SQL | Hard | Non-equijoin; aggregation; window functionsample |
579 | Cumulative Salary of Employee | MySQL, MS SQL | Hard | Self-join; left join; aggregation |
601 | Human Traffic of Stadium | MySQL, MS SQL | Hard | Self-join; de-duplication; window |
615 | Average Salary | MySQL | Hard | Case; aggregation, join |
618 | Students Report By Geography | MySQL, MS SQL | Hard | Full join, pivoting |
Other undiscussed problems have solutions lumped here.
Classic Interview Questions
This section covers commonly tested concepts during interviews. Many notebooks are inspired by problems people who violated their confidentiality agreement and posted interview materials on Quora, Glassdoor, and 1point3acres. Data are either synthetic or from my personal challenge project.
# | Problems | Solutions | Concept |
---|---|---|---|
1 | Facebook Advertiser Status | MySQL | Transition diagram; conditional update |
2 | Spotify Listening History | MySQL | Update cumulative sum |
3 | Monthly Active User | MySQL | Functional dependency; aggregation; filtering |
4 | Page Recommendation | MySQL | Undirected edge; aggregation; existance |
5 | Pivoting Numeric Data | MySQL | Pivoting numeric data with case statement |
6 | Pivoting Text Data | MySQL | Pivoting text data with self-join |
7 | Un-pivoting Tables | MySQL | Un-pivoting tables using cross-join |
8 | Group by Bins | MySQL | Create custom column to group by |
9 | Consecutive Active Users | MySQL | Self-join, LAG() |
10 | Recommend Friends | MySQL | Self-join, de-duplication, aggregation |
11 | Spotify Similar Friends | MySQL | Three-way join, de-duplication, aggregation |
12 | Invalid Search | MySQL | NULL handling, rate calculation |
13 | Text Confirmation | MySQL | Rate calculation |
14 | Facebook Common Friends | MySQL | self join, three-way join |
15 | Facebook Recommend Friend | MySQL | Self join, four-way join |
16 | Instagram Common Follower | MySQL | Self join, Directed edge |
Hacks
This section covers esoteric details of SQL language and use cases that may be completely useless in interview. Nevertheless, they come handy when judgement calls and some are simply fun to explore.
# | Concept | Notebook |
---|---|---|
1 | Random Sampling from Groups | MySQL8 |
2 | NULL Pathological Study | MySQL8 |
3 | Full Join | MySQL8 |
4 | Dynamic Query (Python) | MySQL8 |
5 | Stored Procedure | MySQL8 |
6 | Hacking Aggregation | MySQL8 |
6 | Multi Column Partition | MySQL8 |
Table Optimization
Avoid full-table scan. Used for primary key (automatic), foreign key, commonly used columns.
- cardinality: the uniqueness of the data.
Index types:
- single column index
- unique index (e.g. SSN)
- cannot create unique index on column with duplicate or NULL values
- composite index
- column that get's most recently queried gets placed first
- implicit index: primary key (automatically unique)
Create index when column:
- is frequently referneced in
order by
orgroup by
- contains lots of unique values
Avoid index when column:
- in small table
- return high percentage of matching data
- require frequent batch update (drop before updating)
- contains lots of NULL
- gets frequently manipulated
- extremely long string
best practice: rebuild index frequently to reduce fragmentation
Query Optimization
- Place smaller table first when joining multiple tables
- Largest table is the base table
- base table is placed on right hand side of equal sign (where clause)
- Place most restrictive condition last:
- The condition in the WHERE clause of a statement that returns the fewest rows of data
- the most restrictive condition was listed last in the WHERE clause,
- try to use indexed column
FROM TABLE1, -- Smallest table
TABLE2, -- to
TABLE3 -- Largest table, also base table
WHERE TABLE1.COLUMN = TABLE3.COLUMN -- Join condition
AND TABLE2.COLUMN = TABLE3.COLUMN -- Join condition
[ AND CONDITION1 ] -- Filter condition
[ AND CONDITION2 ] -- Filter condition
- Using the
like
operator and wildcards (flexible search) - Avoiding the
or
operator, usein
operator- data retrieval is measurably faster by replac- ing OR conditions with the IN predicate
- Avoiding the
HAVING
clause- try to frame the restriction earlier (
where
clause) - try to keep
HAVING
clause simple (use constant, not function)
- try to frame the restriction earlier (
- avoiding large sort operations
- it is best to schedule queries with large sorts as periodic batch processes during off-peak database usage so that the performance of most user processes is not affected.
- Prefer stored procedure
- compiled and permanently stored in the database in an executable format.
- Disabling indexes during batch loads
- When the batch load is complete, you should rebuild the indexes.
- reduction of fragmentation that is found in the index
- cost-based optimization: check database server manual
- Using view: keep the levels of code in your query as flat as possible and to test and tune the statements that make up your views
Formatting for Readability
-
Always begin a new line with each clause in the statement. For example, place the FROM clause on a separate line from the SELECT clause. Then place the WHERE clause on a separate line from the FROM clause, and so on.
-
Use tabs or spaces for indentation when arguments of a clause in the statement exceed one line.
-
Use tabs and spaces consistently.
-
Use table aliases when multiple tables are used in the statement. The use of the full table name to qualify each column in the statement quickly clutters the statement and makes reading it difficult.
-
Use remarks sparingly in SQL statements if they are available with- in your specific implementation. Remarks are great for documentation, but too many of them clutter a statement.
-
Begin a new line with each column name in the SELECT clause if many columns are being selected.
-
Begin a new line with each table name in the FROM clause if many tables are being used.
-
Begin a new line with each condition of the WHERE clause. You can easily see all conditions of the statement and the order in which they are used.
Anti-patterns
This section WILL discusses common pitfalls such as nested selects, redundant temporary tables, unnecessary cross join, unnecessary hashset using distinct key word.
# | Anti-patterns | Notebook |
---|---|---|
1 | Ambiguous Group | MySQL8 |
2 | Bad Subquery | MySQL8 |
3 | Fail to Use Index | MySQL8 |