Overview
Overview of SQL Intermediate
Advanced retrieval patterns, filtering logic, and data cleaning techniques β the bridge between writing basic queries and designing real-world analytical SQL.
Learning Objectives
- Write and interpret subqueries, correlated subqueries, derived tables, and CTEs β and choose the right structure for each analytical situation.
- Apply advanced filtering logic using EXISTS, ANY, ALL, CASE, COALESCE, and NULL-safe patterns to handle complex conditions cleanly and correctly.
- Use string, type, and date functions β TRIM, UPPER, LOWER, CAST, DATEADD, DATEDIFF, EXTRACT β to standardize and transform raw data into analysis-ready form.
- Detect and resolve data quality issues β NULL values, inconsistent categories, duplicate records, and messy strings β using SQL directly, without external tools.
SQL Intermediate bridges the gap between foundational SQL skills and the techniques required in real-world analytics environments. Each part addresses a distinct category of intermediate work β how to structure complex queries, how to filter data precisely, and how to clean and prepare it before it reaches reporting or analysis.
By the end of all three parts, learners will write SQL with the precision and structure of an analyst β choosing the right retrieval pattern, applying clean conditional logic, and producing data that is ready for reporting without manual cleanup.
Where SQL Intermediate Sits in the Curriculum
SQL Intermediate builds on the six-keyword workflow introduced in SQL Basics β SELECT, FROM, WHERE, JOIN, GROUP BY, HAVING β and introduces the patterns that make those keywords genuinely powerful at scale and in real data environments.
| Course | Focus | Key Skills |
|---|---|---|
| SQL Basics | Core workflow | SELECT, WHERE, JOIN, GROUP BY, HAVING, ORDER BY |
| SQL Intermediate | Query patterns & real-world logic | Subqueries, CTEs, EXISTS, CASE, COALESCE, data cleaning |
| SQL Advanced | Performance & engineering | Window functions, indexes, execution plans, partitioning |
SQL Intermediate uses the same four tables introduced in SQL Basics. Keeping the dataset consistent means all mental energy goes toward mastering the new technique β not toward understanding unfamiliar data. The same rows now serve as the basis for correlated subqueries, EXISTS filters, CASE logic, date calculations, and duplicate detection.
Correlated subquery practice (per-student calculations), EXISTS-based filtering (students who are enrolled), CASE-based cohort labelling, TRIM/UPPER/LOWER on name fields, NULL detection, and duplicate identification.
Subqueries in SELECT (course-level counts), derived table practice, ANY/ALL comparisons against credits, CASE-based credit classification, and CAST on numeric fields.
EXISTS-based filtering, subqueries against department, messy string handling on name fields, COALESCE for missing department values, and IS NULL / IS NOT NULL validation.
The most-used table across all three parts. Subqueries in WHERE (filtered enrollment sets), correlated subqueries (per-student counts), CASE-based grade flags, COALESCE on NULL grades, date functions on term fields, and duplicate detection using ROW_NUMBER().
Part 1 introduces the retrieval structures that power real-world SQL β going beyond simple SELECT statements to build queries that reference other queries, compute values dynamically, and break complex logic into named, reusable steps. Every technique is demonstrated using the four course tables.
Embedding a query inside another query at three different locations. A subquery in SELECT adds a computed column. A subquery in WHERE filters rows against a dynamic result set. A subquery in FROM acts as a temporary table.
Subqueries that reference the outer query β recalculating once per outer row. Used for row-by-row comparisons and per-entity calculations that cannot be expressed with a simple JOIN or GROUP BY.
Subqueries placed in the FROM clause and treated as a named temporary table. Derived tables allow filtered, aggregated, or transformed result sets to be joined as if they were regular tables.
Named temporary result sets defined using WITH at the start of a query. CTEs separate complex logic into readable steps β each with a clear name β making multi-step queries easier to write, read, and debug.
Subqueries placed inside other subqueries β multiple levels of nesting. Understanding when nesting is necessary versus when a CTE or JOIN would produce cleaner, more maintainable logic.
A direct comparison of three ways to express the same logic β when each approach is preferred, how they differ in readability and reuse, and the practical rules for choosing between them.
Choosing the Right Structure
Understanding which retrieval pattern to use β and why β is one of the most practically valuable skills in Part 1. Each structure has a distinct strength and the right choice depends on reuse, readability, and complexity.
| Structure | Defined Where | Best Used When | Key Strength |
|---|---|---|---|
| Subquery | Inline inside SELECT, WHERE, or FROM | The logic is simple and used only once | Concise for single-use calculations and filters |
| Correlated Subquery | Inside WHERE, referencing the outer query | The calculation depends on each outer row | Row-by-row comparisons that JOINs cannot express |
| Derived Table | Inside FROM, given an alias | A filtered or aggregated result needs joining | Acts like a table β supports JOINs on the result |
| CTE | Before the main query using WITH | Logic is complex, multi-step, or referenced more than once | Readable, named steps β easiest to debug and extend |
Subquery in SELECT Β· Subquery in WHERE Β· Subquery in FROM Β· Correlated subquery with outer reference Β· Derived table with alias Β· WITH β¦ AS CTE Β· Chained CTEs Β· Nested subquery depth.
Part 2 introduces the filtering and conditional tools that allow SQL to handle the ambiguity, incompleteness, and complexity of real-world data. Rather than filtering on simple equality conditions, learners apply logic that checks for existence, compares against sets, responds to NULL, and classifies records into meaningful categories.
Filters rows based on whether a correlated subquery returns any results β without needing to know what those results are. More efficient than IN for large subquery result sets and better at expressing presence-or-absence logic.
Comparing a value against every result returned by a subquery. ANY returns true if at least one comparison passes. ALL returns true only if every comparison passes. Used for threshold and range checks against dynamic sets.
Moving beyond simple categorization into multi-condition CASE logic β nested CASE statements, CASE inside aggregations, and CASE used to pivot data from rows into columns within a single query.
Replacing NULL values with meaningful defaults in query results. COALESCE evaluates a list of expressions and returns the first non-NULL. Applied to enrollment grades, instructor departments, and any field that may legitimately be missing.
Embedding CASE inside SUM and COUNT to count or total only the rows meeting a specific condition β producing multi-metric summary reports in a single query without multiple separate GROUP BY passes.
Using IS NULL and IS NOT NULL correctly β understanding why NULL cannot be compared with = or !=, and applying the right filtering pattern to include, exclude, or flag records with missing values.
EXISTS vs. IN β The Key Distinction
Two of the most commonly confused filtering patterns in intermediate SQL are EXISTS and IN. Each checks whether a value appears in a set β but they operate differently and have different performance characteristics on large datasets.
| Pattern | How It Works | Best Used When |
|---|---|---|
| WHERE id IN (subquery) | Evaluates the subquery once, produces a list, checks membership | The subquery returns a small, simple list of values |
| WHERE EXISTS (correlated subquery) | Evaluates the subquery once per outer row, stops at first match | Checking presence or absence β especially in large tables |
| WHERE NOT EXISTS (correlated subquery) | Returns outer rows where the subquery finds no match | Finding records with no related rows β students with no enrollments |
NULL cannot be compared using = or !=. A condition like WHERE grade = NULL will never match any row β not because no grades are NULL, but because NULL is not equal to anything, including itself. Always use IS NULL or IS NOT NULL when filtering for missing values.
EXISTS Β· NOT EXISTS Β· ANY Β· ALL Β· CASE WHEN THEN ELSE END (multi-condition) Β· CASE inside SUM/COUNT Β· COALESCE(expr, default) Β· IFNULL(expr, default) Β· IS NULL Β· IS NOT NULL.
Part 3 addresses one of the most practical and frequently underestimated skills in data work β preparing raw data for analysis directly in SQL, without external tools. Messy strings, inconsistent types, date format issues, NULL values, and duplicate records are all handled in-query before results reach a report or dashboard.
Cleans hidden whitespace from the beginning or end of a string β a common issue in data imported from spreadsheets, forms, and legacy systems. Prevents join failures caused by invisible spaces.
Converts string values to a consistent case β ensuring that “Math”, “MATH”, and “math” are treated as the same value in comparisons, GROUP BY operations, and joins.
Combining TRIM, UPPER, LOWER, REPLACE, and SUBSTRING to standardize free-text fields β removing unwanted characters, correcting formatting, and extracting the useful portion of a string.
Converting values from one data type to another β text to number, number to string, string to date. Essential when source data stores numbers as text or dates in non-standard formats.
Calculating time differences between dates, adding or subtracting intervals, and extracting specific date parts (year, month, day). Used for tenure calculations, age derivation, and period-based filtering.
Using COALESCE and IFNULL to substitute meaningful defaults for NULL values β replacing NULL grades with “Not Graded”, NULL departments with “Unknown”, or NULL dates with a reference date.
Using DISTINCT, GROUP BY + HAVING COUNT(*) > 1, and ROW_NUMBER() to identify records that appear more than once in a table β a critical step before any aggregation or reporting.
Using ROW_NUMBER() inside a CTE to assign a sequence number per group, then filtering for only the first record β the standard SQL pattern for de-duplicating a table while keeping the most recent or most relevant row.
The Data Cleaning Workflow
A consistent cleaning workflow applied in SQL before any analysis begins prevents errors from propagating into reports, dashboards, and aggregations. Part 3 teaches learners to work through cleaning problems in a structured, repeatable order.
| Step | Problem Addressed | SQL Tools |
|---|---|---|
| 1. Check for NULLs | Missing values that will affect counts, joins, and aggregations | IS NULL Β· COALESCE Β· IFNULL |
| 2. Standardize strings | Inconsistent case and whitespace preventing correct grouping and joins | TRIM Β· UPPER Β· LOWER |
| 3. Fix data types | Numbers stored as text, dates in wrong format | CAST Β· CONVERT |
| 4. Handle dates | Inconsistent date formats, missing date parts, derived date calculations | DATEADD Β· DATEDIFF Β· EXTRACT |
| 5. Detect duplicates | Repeated records inflating counts and totals | DISTINCT Β· GROUP BY HAVING COUNT(*) > 1 |
| 6. Remove duplicates | Keeping only one version of each duplicated record | ROW_NUMBER() OVER (PARTITION BY β¦ ORDER BY β¦) |
TRIM Β· UPPER Β· LOWER Β· REPLACE Β· SUBSTRING Β· CAST Β· CONVERT Β· DATEADD Β· DATEDIFF Β· EXTRACT Β· COALESCE Β· IFNULL Β· IS NULL Β· DISTINCT Β· GROUP BY + HAVING COUNT(*) > 1 Β· ROW_NUMBER() for deduplication.
Overview β Key Takeaways
Five foundational principles from SQL Intermediate.
Students, courses, instructors, and enrollments β the same dataset from SQL Basics, now supporting correlated subqueries, EXISTS filters, COALESCE logic, date functions, and duplicate detection.
Retrieval patterns, filtering logic, and data cleaning β three distinct intermediate skill sets that together produce the analytical foundation needed for real-world SQL work.
Subquery, correlated subquery, derived table, or CTE β each retrieval structure has a specific strength. Knowing which to choose for a given problem is the most important decision in Part 1.
NULL cannot be compared with =. EXISTS and NOT EXISTS express presence and absence more clearly than IN and NOT IN for large datasets. COALESCE and IFNULL replace missing values with meaningful defaults.
Raw data rarely arrives in analysis-ready form. Applying string standardization, type conversion, date normalization, and duplicate removal directly in SQL β before any aggregation β produces results that can be trusted.

