Overview

Overview of SQL Intermediate | DataSoSi
Intermediate Β· Query Patterns & Real-World Logic

Overview of SQL Intermediate

Advanced retrieval patterns, filtering logic, and data cleaning techniques β€” the bridge between writing basic queries and designing real-world analytical SQL.

SQL Intermediate moves beyond SELECT, WHERE, and GROUP BY into the query patterns that analysts, data engineers, and BI professionals use every day. The course covers three essential layers: how to retrieve data using subqueries, derived tables, and CTEs; how to apply advanced filtering with EXISTS, ANY/ALL, and conditional logic; and how to clean, standardize, and prepare raw data directly in SQL before it reaches a report or dashboard.
SQL Basics Required
Three Structured Parts
Notes & Reference Guide

Learning Objectives

  • Write and interpret subqueries, correlated subqueries, derived tables, and CTEs β€” and choose the right structure for each analytical situation.
  • Apply advanced filtering logic using EXISTS, ANY, ALL, CASE, COALESCE, and NULL-safe patterns to handle complex conditions cleanly and correctly.
  • Use string, type, and date functions β€” TRIM, UPPER, LOWER, CAST, DATEADD, DATEDIFF, EXTRACT β€” to standardize and transform raw data into analysis-ready form.
  • Detect and resolve data quality issues β€” NULL values, inconsistent categories, duplicate records, and messy strings β€” using SQL directly, without external tools.

SQL Intermediate bridges the gap between foundational SQL skills and the techniques required in real-world analytics environments. Each part addresses a distinct category of intermediate work β€” how to structure complex queries, how to filter data precisely, and how to clean and prepare it before it reaches reporting or analysis.

Part 1
Advanced Retrieval Patterns
Subqueries, CTEs, derived tables, and correlated logic β€” choosing the right structure for any query problem.
Part 2
Advanced Filtering & Logic
EXISTS, ANY/ALL, CASE patterns, COALESCE, and NULL-safe filtering for precise conditional query design.
Part 3
Data Cleaning with SQL
String functions, type conversion, date logic, NULL imputation, and duplicate detection β€” all in SQL.
Course Goal

By the end of all three parts, learners will write SQL with the precision and structure of an analyst β€” choosing the right retrieval pattern, applying clean conditional logic, and producing data that is ready for reporting without manual cleanup.

Where SQL Intermediate Sits in the Curriculum

SQL Intermediate builds on the six-keyword workflow introduced in SQL Basics β€” SELECT, FROM, WHERE, JOIN, GROUP BY, HAVING β€” and introduces the patterns that make those keywords genuinely powerful at scale and in real data environments.

Course Focus Key Skills
SQL Basics Core workflow SELECT, WHERE, JOIN, GROUP BY, HAVING, ORDER BY
SQL Intermediate Query patterns & real-world logic Subqueries, CTEs, EXISTS, CASE, COALESCE, data cleaning
SQL Advanced Performance & engineering Window functions, indexes, execution plans, partitioning

SQL Intermediate uses the same four tables introduced in SQL Basics. Keeping the dataset consistent means all mental energy goes toward mastering the new technique β€” not toward understanding unfamiliar data. The same rows now serve as the basis for correlated subqueries, EXISTS filters, CASE logic, date calculations, and duplicate detection.

students Person-level One row per student
Key Fields
student_idfirst_namelast_namestart_year
Used For

Correlated subquery practice (per-student calculations), EXISTS-based filtering (students who are enrolled), CASE-based cohort labelling, TRIM/UPPER/LOWER on name fields, NULL detection, and duplicate identification.

courses Course-level One row per course
Key Fields
course_idcourse_namedepartmentcredits
Used For

Subqueries in SELECT (course-level counts), derived table practice, ANY/ALL comparisons against credits, CASE-based credit classification, and CAST on numeric fields.

instructors Instructor-level One row per instructor
Key Fields
instructor_idfirst_namelast_namedepartment
Used For

EXISTS-based filtering, subqueries against department, messy string handling on name fields, COALESCE for missing department values, and IS NULL / IS NOT NULL validation.

enrollments Enrollment-level One row per student per course registration
Key Fields
enrollment_idstudent_idcourse_idgradeterm
Used For

The most-used table across all three parts. Subqueries in WHERE (filtered enrollment sets), correlated subqueries (per-student counts), CASE-based grade flags, COALESCE on NULL grades, date functions on term fields, and duplicate detection using ROW_NUMBER().

Part 1 introduces the retrieval structures that power real-world SQL β€” going beyond simple SELECT statements to build queries that reference other queries, compute values dynamically, and break complex logic into named, reusable steps. Every technique is demonstrated using the four course tables.

Topic 1
Subqueries in SELECT, WHERE & FROM

Embedding a query inside another query at three different locations. A subquery in SELECT adds a computed column. A subquery in WHERE filters rows against a dynamic result set. A subquery in FROM acts as a temporary table.

Topic 2
Correlated Subqueries

Subqueries that reference the outer query β€” recalculating once per outer row. Used for row-by-row comparisons and per-entity calculations that cannot be expressed with a simple JOIN or GROUP BY.

Topic 3
Derived Tables

Subqueries placed in the FROM clause and treated as a named temporary table. Derived tables allow filtered, aggregated, or transformed result sets to be joined as if they were regular tables.

Topic 4
CTEs (Common Table Expressions)

Named temporary result sets defined using WITH at the start of a query. CTEs separate complex logic into readable steps β€” each with a clear name β€” making multi-step queries easier to write, read, and debug.

Topic 5
Nested Subqueries

Subqueries placed inside other subqueries β€” multiple levels of nesting. Understanding when nesting is necessary versus when a CTE or JOIN would produce cleaner, more maintainable logic.

Topic 6
Subquery vs. CTE vs. Derived Table

A direct comparison of three ways to express the same logic β€” when each approach is preferred, how they differ in readability and reuse, and the practical rules for choosing between them.

Choosing the Right Structure

Understanding which retrieval pattern to use β€” and why β€” is one of the most practically valuable skills in Part 1. Each structure has a distinct strength and the right choice depends on reuse, readability, and complexity.

Structure Defined Where Best Used When Key Strength
Subquery Inline inside SELECT, WHERE, or FROM The logic is simple and used only once Concise for single-use calculations and filters
Correlated Subquery Inside WHERE, referencing the outer query The calculation depends on each outer row Row-by-row comparisons that JOINs cannot express
Derived Table Inside FROM, given an alias A filtered or aggregated result needs joining Acts like a table β€” supports JOINs on the result
CTE Before the main query using WITH Logic is complex, multi-step, or referenced more than once Readable, named steps β€” easiest to debug and extend
Key Concepts β€” Part 1

Subquery in SELECT Β· Subquery in WHERE Β· Subquery in FROM Β· Correlated subquery with outer reference Β· Derived table with alias Β· WITH … AS CTE Β· Chained CTEs Β· Nested subquery depth.

Part 2 introduces the filtering and conditional tools that allow SQL to handle the ambiguity, incompleteness, and complexity of real-world data. Rather than filtering on simple equality conditions, learners apply logic that checks for existence, compares against sets, responds to NULL, and classifies records into meaningful categories.

Topic 1
EXISTS

Filters rows based on whether a correlated subquery returns any results β€” without needing to know what those results are. More efficient than IN for large subquery result sets and better at expressing presence-or-absence logic.

Topic 2
ANY & ALL

Comparing a value against every result returned by a subquery. ANY returns true if at least one comparison passes. ALL returns true only if every comparison passes. Used for threshold and range checks against dynamic sets.

Topic 3
CASE β€” Advanced Patterns

Moving beyond simple categorization into multi-condition CASE logic β€” nested CASE statements, CASE inside aggregations, and CASE used to pivot data from rows into columns within a single query.

Topic 4
COALESCE & IFNULL

Replacing NULL values with meaningful defaults in query results. COALESCE evaluates a list of expressions and returns the first non-NULL. Applied to enrollment grades, instructor departments, and any field that may legitimately be missing.

Topic 5
CASE with Aggregations

Embedding CASE inside SUM and COUNT to count or total only the rows meeting a specific condition β€” producing multi-metric summary reports in a single query without multiple separate GROUP BY passes.

Topic 6
NULL-Safe Filtering

Using IS NULL and IS NOT NULL correctly β€” understanding why NULL cannot be compared with = or !=, and applying the right filtering pattern to include, exclude, or flag records with missing values.

EXISTS vs. IN β€” The Key Distinction

Two of the most commonly confused filtering patterns in intermediate SQL are EXISTS and IN. Each checks whether a value appears in a set β€” but they operate differently and have different performance characteristics on large datasets.

Pattern How It Works Best Used When
WHERE id IN (subquery) Evaluates the subquery once, produces a list, checks membership The subquery returns a small, simple list of values
WHERE EXISTS (correlated subquery) Evaluates the subquery once per outer row, stops at first match Checking presence or absence β€” especially in large tables
WHERE NOT EXISTS (correlated subquery) Returns outer rows where the subquery finds no match Finding records with no related rows β€” students with no enrollments
NULL and Equality

NULL cannot be compared using = or !=. A condition like WHERE grade = NULL will never match any row β€” not because no grades are NULL, but because NULL is not equal to anything, including itself. Always use IS NULL or IS NOT NULL when filtering for missing values.

Key Concepts β€” Part 2

EXISTS Β· NOT EXISTS Β· ANY Β· ALL Β· CASE WHEN THEN ELSE END (multi-condition) Β· CASE inside SUM/COUNT Β· COALESCE(expr, default) Β· IFNULL(expr, default) Β· IS NULL Β· IS NOT NULL.

Part 3 addresses one of the most practical and frequently underestimated skills in data work β€” preparing raw data for analysis directly in SQL, without external tools. Messy strings, inconsistent types, date format issues, NULL values, and duplicate records are all handled in-query before results reach a report or dashboard.

String Functions
TRIM
Remove Leading & Trailing Spaces

Cleans hidden whitespace from the beginning or end of a string β€” a common issue in data imported from spreadsheets, forms, and legacy systems. Prevents join failures caused by invisible spaces.

UPPER & LOWER
Standardize Case

Converts string values to a consistent case β€” ensuring that “Math”, “MATH”, and “math” are treated as the same value in comparisons, GROUP BY operations, and joins.

Messy String Handling
Combined Cleaning Patterns

Combining TRIM, UPPER, LOWER, REPLACE, and SUBSTRING to standardize free-text fields β€” removing unwanted characters, correcting formatting, and extracting the useful portion of a string.

Type Conversion & Date Functions
CAST & CONVERT
Change Data Types

Converting values from one data type to another β€” text to number, number to string, string to date. Essential when source data stores numbers as text or dates in non-standard formats.

Date Functions
DATEADD Β· DATEDIFF Β· EXTRACT

Calculating time differences between dates, adding or subtracting intervals, and extracting specific date parts (year, month, day). Used for tenure calculations, age derivation, and period-based filtering.

NULL Handling & Duplicate Detection
NULL Imputation
Replace Missing Values

Using COALESCE and IFNULL to substitute meaningful defaults for NULL values β€” replacing NULL grades with “Not Graded”, NULL departments with “Unknown”, or NULL dates with a reference date.

Duplicate Detection
Find Repeated Records

Using DISTINCT, GROUP BY + HAVING COUNT(*) > 1, and ROW_NUMBER() to identify records that appear more than once in a table β€” a critical step before any aggregation or reporting.

Duplicate Removal
Keep Only One Version

Using ROW_NUMBER() inside a CTE to assign a sequence number per group, then filtering for only the first record β€” the standard SQL pattern for de-duplicating a table while keeping the most recent or most relevant row.

The Data Cleaning Workflow

A consistent cleaning workflow applied in SQL before any analysis begins prevents errors from propagating into reports, dashboards, and aggregations. Part 3 teaches learners to work through cleaning problems in a structured, repeatable order.

Step Problem Addressed SQL Tools
1. Check for NULLs Missing values that will affect counts, joins, and aggregations IS NULL Β· COALESCE Β· IFNULL
2. Standardize strings Inconsistent case and whitespace preventing correct grouping and joins TRIM Β· UPPER Β· LOWER
3. Fix data types Numbers stored as text, dates in wrong format CAST Β· CONVERT
4. Handle dates Inconsistent date formats, missing date parts, derived date calculations DATEADD Β· DATEDIFF Β· EXTRACT
5. Detect duplicates Repeated records inflating counts and totals DISTINCT Β· GROUP BY HAVING COUNT(*) > 1
6. Remove duplicates Keeping only one version of each duplicated record ROW_NUMBER() OVER (PARTITION BY … ORDER BY …)
Key Concepts β€” Part 3

TRIM Β· UPPER Β· LOWER Β· REPLACE Β· SUBSTRING Β· CAST Β· CONVERT Β· DATEADD Β· DATEDIFF Β· EXTRACT Β· COALESCE Β· IFNULL Β· IS NULL Β· DISTINCT Β· GROUP BY + HAVING COUNT(*) > 1 Β· ROW_NUMBER() for deduplication.

Overview β€” Key Takeaways

Five foundational principles from SQL Intermediate.

Four Tables

Students, courses, instructors, and enrollments β€” the same dataset from SQL Basics, now supporting correlated subqueries, EXISTS filters, COALESCE logic, date functions, and duplicate detection.

Three Parts

Retrieval patterns, filtering logic, and data cleaning β€” three distinct intermediate skill sets that together produce the analytical foundation needed for real-world SQL work.

Right Structure

Subquery, correlated subquery, derived table, or CTE β€” each retrieval structure has a specific strength. Knowing which to choose for a given problem is the most important decision in Part 1.

NULL Awareness

NULL cannot be compared with =. EXISTS and NOT EXISTS express presence and absence more clearly than IN and NOT IN for large datasets. COALESCE and IFNULL replace missing values with meaningful defaults.

Clean Before Analyzing

Raw data rarely arrives in analysis-ready form. Applying string standardization, type conversion, date normalization, and duplicate removal directly in SQL β€” before any aggregation β€” produces results that can be trusted.