• Stars
    star
    1,029
  • Rank 44,769 (Top 0.9 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Analysis of SQL Leetcode and classic interview questions, common pitfalls, anti-patterns and handy tricks. Sample databases.

Beyond LeetCode SQL

This repository covers supplementary analysis of SQL for LeetCode and classic interview questions, tradeoff between performance optimization and developmental efficiency, and how it relates to general database design consideration (e.g. indexing and join). Specific sample databases are provided to illustrate tricky interview questions.


LeetCode Selected Probems

Only high-quality problems are selected. Pathological problems such as Find Median Given Frequency of Numbers are not discussed. Entry-level syntax problems such as Combine Two Tables are not discussed.

# Problems Solutions Level Concept
262 Trips and Users MySQL Hard Three-way join; filtering
185 Department Top Three Salaries MySQL, MS SQL Hard Non-equijoin; aggregation; window functionsample
579 Cumulative Salary of Employee MySQL, MS SQL Hard Self-join; left join; aggregation
601 Human Traffic of Stadium MySQL, MS SQL Hard Self-join; de-duplication; window
615 Average Salary MySQL Hard Case; aggregation, join
618 Students Report By Geography MySQL, MS SQL Hard Full join, pivoting

Other undiscussed problems have solutions lumped here.


Classic Interview Questions

This section covers commonly tested concepts during interviews. Many notebooks are inspired by problems people who violated their confidentiality agreement and posted interview materials on Quora, Glassdoor, and 1point3acres. Data are either synthetic or from my personal challenge project.

# Problems Solutions Concept
1 Facebook Advertiser Status MySQL Transition diagram; conditional update
2 Spotify Listening History MySQL Update cumulative sum
3 Monthly Active User MySQL Functional dependency; aggregation; filtering
4 Page Recommendation MySQL Undirected edge; aggregation; existance
5 Pivoting Numeric Data MySQL Pivoting numeric data with case statement
6 Pivoting Text Data MySQL Pivoting text data with self-join
7 Un-pivoting Tables MySQL Un-pivoting tables using cross-join
8 Group by Bins MySQL Create custom column to group by
9 Consecutive Active Users MySQL Self-join, LAG()
10 Recommend Friends MySQL Self-join, de-duplication, aggregation
11 Spotify Similar Friends MySQL Three-way join, de-duplication, aggregation
12 Invalid Search MySQL NULL handling, rate calculation
13 Text Confirmation MySQL Rate calculation
14 Facebook Common Friends MySQL self join, three-way join
15 Facebook Recommend Friend MySQL Self join, four-way join
16 Instagram Common Follower MySQL Self join, Directed edge

Hacks

This section covers esoteric details of SQL language and use cases that may be completely useless in interview. Nevertheless, they come handy when judgement calls and some are simply fun to explore.

# Concept Notebook
1 Random Sampling from Groups MySQL8
2 NULL Pathological Study MySQL8
3 Full Join MySQL8
4 Dynamic Query (Python) MySQL8
5 Stored Procedure MySQL8
6 Hacking Aggregation MySQL8
6 Multi Column Partition MySQL8

Table Optimization

Avoid full-table scan. Used for primary key (automatic), foreign key, commonly used columns.

  • cardinality: the uniqueness of the data.

Index types:

  • single column index
  • unique index (e.g. SSN)
    • cannot create unique index on column with duplicate or NULL values
  • composite index
    • column that get's most recently queried gets placed first
  • implicit index: primary key (automatically unique)

Create index when column:

  • is frequently referneced in order by or group by
  • contains lots of unique values

Avoid index when column:

  • in small table
  • return high percentage of matching data
  • require frequent batch update (drop before updating)
  • contains lots of NULL
  • gets frequently manipulated
  • extremely long string

best practice: rebuild index frequently to reduce fragmentation


Query Optimization

alt-text

  • Place smaller table first when joining multiple tables
  • Largest table is the base table
    • base table is placed on right hand side of equal sign (where clause)
  • Place most restrictive condition last:
    • The condition in the WHERE clause of a statement that returns the fewest rows of data
    • the most restrictive condition was listed last in the WHERE clause,
  • try to use indexed column
FROM TABLE1,   -- Smallest table
     TABLE2,   -- to
     TABLE3    -- Largest table, also base table
WHERE TABLE1.COLUMN = TABLE3.COLUMN    -- Join condition
  AND TABLE2.COLUMN = TABLE3.COLUMN    -- Join condition
[ AND CONDITION1 ]                     -- Filter condition
[ AND CONDITION2 ]                     -- Filter condition
  • Using the like operator and wildcards (flexible search)
  • Avoiding the or operator, use in operator
    • data retrieval is measurably faster by replac- ing OR conditions with the IN predicate
  • Avoiding the HAVING clause
    • try to frame the restriction earlier (where clause)
    • try to keep HAVING clause simple (use constant, not function)
  • avoiding large sort operations
    • it is best to schedule queries with large sorts as periodic batch processes during off-peak database usage so that the performance of most user processes is not affected.
  • Prefer stored procedure
    • compiled and permanently stored in the database in an executable format.
  • Disabling indexes during batch loads
    • When the batch load is complete, you should rebuild the indexes.
    • reduction of fragmentation that is found in the index
  • cost-based optimization: check database server manual
  • Using view: keep the levels of code in your query as flat as possible and to test and tune the statements that make up your views

Formatting for Readability

  • Always begin a new line with each clause in the statement. For example, place the FROM clause on a separate line from the SELECT clause. Then place the WHERE clause on a separate line from the FROM clause, and so on.

  • Use tabs or spaces for indentation when arguments of a clause in the statement exceed one line.

  • Use tabs and spaces consistently.

  • Use table aliases when multiple tables are used in the statement. The use of the full table name to qualify each column in the statement quickly clutters the statement and makes reading it difficult.

  • Use remarks sparingly in SQL statements if they are available with- in your specific implementation. Remarks are great for documentation, but too many of them clutter a statement.

  • Begin a new line with each column name in the SELECT clause if many columns are being selected.

  • Begin a new line with each table name in the FROM clause if many tables are being used.

  • Begin a new line with each condition of the WHERE clause. You can easily see all conditions of the statement and the order in which they are used.


Anti-patterns

This section WILL discusses common pitfalls such as nested selects, redundant temporary tables, unnecessary cross join, unnecessary hashset using distinct key word.

# Anti-patterns Notebook
1 Ambiguous Group MySQL8
2 Bad Subquery MySQL8
3 Fail to Use Index MySQL8

More Repositories

1

Data-Science-Toolbox

Examples and illustration of basic statistic concepts, probability distribution, Monte Carlo simulation, preprocessing and visualization techniques, and statistical testing.
Jupyter Notebook
45
star
2

Lookalike-Model

Finding similar, high-valued users based on seed users. The model includes 1805 features using Hive HQL and AWS Redshift.
Jupyter Notebook
28
star
3

Data-Engineering-Toolbox

Lecture notes, lab notes, and links to helpful resources to pass Google Certification Exam for Professional Data Engineer.
Jupyter Notebook
15
star
4

AI-for-Trading

Python notebooks and scripts for Udacity's AI for Trading nanoprogram.
HTML
7
star
5

Syll2Vec-A-Phonetic-Approach-in-NLP

Implemented (in PyTorch) syllable embedding in a bidirectional attention flow (BiDAF) recurrent neural net. Improved upon baseline F1 score by 6.2% on Stanford Question and Answering Dataset (SQuAD 2.0).
Python
7
star
6

centaurwealth.io

Double-entry bookkeeping made easy for personal wealth management. Microservices. Kubernetes. CI&CD. Test driven development with NodeJs & Typescript. Ingress-nginx.
TypeScript
6
star
7

blogs

All blogs are published in Toward Data Science for which I am an official author. This page contains friend links to directly access all my Medium blog posts without limit or membership requirement.
5
star
8

GSN-with-HardHat

JavaScript
5
star
9

docker-kubernetes-guide

Complete guide on Docker & Kubernetes container orchestration on local env and AWS ECS, EKS; persistent volume built on AWS EFS; multi-stage build; utility container etc
PHP
4
star
10

Lifetime-Learning

3
star
11

IdiomGraph

GraphDB applied to 成语接龙
Jupyter Notebook
3
star
12

system-design

2
star
13

Purchase-Intent-Model

A predictive model for customer purchase behavior in e-commerce context.
Jupyter Notebook
2
star
14

Accounting

2
star
15

Grocery-Matter

A iOS accounting app for personal finance. Server code (PHP) connects the mobile app to a MySQL database. A Python notebook is also provided for SQL analytics and visualization.
Objective-C
2
star
16

SF-Simulation

How much does SF30 cost?
Jupyter Notebook
1
star
17

cassandra-masterclass

Java
1
star
18

On-Leveraged-ETF

A brief study of the pros and cons of leveraged ETF, and recommendation on how to invest.
Jupyter Notebook
1
star
19

Object-Oriented-Python

Jupyter Notebook
1
star
20

Proxy-Ownable

JavaScript
1
star
21

memori-nft

A time-lapse NFT web3 app that transform writing process into collectible art pieces; Solidity; HardHat; ETH Main-net deployed; live website dapp
JavaScript
1
star
22

GPA-Monitor

A light-weight iOS app helping New York University students calculating their GPAs, including official GPA, cumulative GPA, departmental GPA, yearly GPA.
Objective-C
1
star