Lesson 4 of 4

Progress: 0/3 exercises solved (0%)

← PreviousFinal lesson

Solve all exercises below to unlock the next lesson.

COUNT DISTINCT and FILTER

Lesson Description

Use filtering patterns to extract exactly the rows you need and avoid noisy, incorrect result sets.

Easy Project

Mini project: Create a hiring shortlist query that filters by role, salary range, and hire date with deterministic ordering.

Two powerful extensions to standard aggregates for precise counting and conditional aggregation.

COUNT(DISTINCT col) — count unique non-NULL values

-- How many different departments are represented?
SELECT COUNT(DISTINCT department_id) AS unique_departments
FROM employees;

Compare to plain COUNT(department_id) which counts all non-NULL occurrences.

FILTER (WHERE …) — conditional aggregation

FILTER limits an aggregate to rows matching a condition. It is cleaner than CASE WHEN inside the aggregate:

SELECT
  COUNT(*) FILTER (WHERE status = 'shipped')   AS shipped,
  COUNT(*) FILTER (WHERE status = 'delivered') AS delivered,
  COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled
FROM orders;

Works with any aggregate function:

SELECT
  AVG(salary)                               AS overall_avg,
  AVG(salary) FILTER (WHERE department_id = 60) AS it_avg
FROM employees;

Combining GROUP BY with FILTER

SELECT
  department_id,
  COUNT(*)                               AS total,
  COUNT(*) FILTER (WHERE salary > 8000)  AS high_earners
FROM employees
GROUP BY department_id
ORDER BY department_id;

FILTER vs CASE WHEN inside aggregate

-- These are equivalent, but FILTER is more readable:
COUNT(*) FILTER (WHERE salary > 8000)
COUNT(CASE WHEN salary > 8000 THEN 1 END)

Tips

  • FILTER is a PostgreSQL (and SQL:2003 standard) feature — not available in older MySQL or SQLite.
  • COUNT(DISTINCT col) cannot use the FILTER clause directly in all databases; wrap in a subquery if needed.
  • For pivot-style reports, FILTER + GROUP BY replaces multiple sub-selects.

Learning context

This lesson is part of Aggregations & Grouping. Focus on the core idea in COUNT DISTINCT and FILTER, then validate with deliberate practice.

What to master

  • Separate dimensions from measures
  • Group only by columns needed in final grain
  • Use HAVING only for aggregated filters

Common mistakes

  • Mixing row-level and aggregated logic
  • Grouping by too many columns
  • Using WHERE for aggregated conditions

High-level strategy

  • Draft base rows first
  • Add GROUP BY and verify expected number of groups
  • Add HAVING and ordering last

Task ladder

  1. Baseline query: get a correct first output.
  2. Add one complication: edge case, join, filter, or ranking rule.
  3. Re-run and refine until output is stable and explainable.

Transparent data checks

  • Check row count before and after each major clause.
  • Validate one manual sample row for correctness.
  • Confirm final ordering and column naming for trust.

Retention loop

  • Run the solution once.
  • Rewrite from memory with one variation.
  • Explain the logic in plain English in one sentence.

Logical reasoning for commands

GROUP BY

Why: Defines one output row per grouping key.

Memory cue: Group key = output row identity.

HAVING

Why: Filters after aggregates are calculated.

Memory cue: WHERE before groups, HAVING after groups.

COUNT / SUM / AVG

Why: Converts row-level data into business metrics.

Memory cue: Measure only after grain is correct.

Concept check

Quick check: COUNT DISTINCT and FILTER

3 quick questions. One at a time. Instant score at the end.

Exercise 1

Count distinct departments in use

Enterprise Workforce Analyticseasy

Return a single value unique_departments — the number of distinct department_id values across all employees.

Run a query to see results here.
Exercise 2

Shipped vs cancelled order counts

Commerce Operations Intelligenceeasy

From the store sandbox, return two columns in a single query: shipped (orders with status 'shipped') and cancelled (status 'cancelled'). Use FILTER.

Run a query to see results here.
Exercise 3

Per-department headcount and high earners

Enterprise Workforce Analyticseasy

Return department_id, total employees, and high_earners (those with salary > 8000) per department. Use FILTER. Order by department_id.

Run a query to see results here.

Final checkpoint

End-of-course high-level quiz

Pass mark: 80%. Review each explanation after you submit.

1. Which clause should you use to filter grouped results?

2. What is the safest way to avoid accidental row multiplication in joins?

3. Which feature is best for ranking rows within partitions?

4. Why should final analytical queries usually include ORDER BY?

5. When query logic gets complex, what improves readability most?

Lesson 4 of 4

0/3 solved (0%)

← PreviousFinal lesson

Finish this lesson to unlock next.