COUNT DISTINCT and FILTER

Lesson Description

Build more reliable KPIs with COUNT(DISTINCT ...) and FILTER-based conditional aggregation. This lesson helps you avoid inflated counts and produce metrics stakeholders can trust.

Easy Project

Mini project: Create a hiring shortlist query that filters by role, salary range, and hire date with deterministic ordering.

Two powerful extensions to standard aggregates for precise counting and conditional aggregation.

COUNT(DISTINCT col) — count unique non-NULL values

-- How many different departments are represented?
SELECT COUNT(DISTINCT department_id) AS unique_departments
FROM employees;

Compare to plain COUNT(department_id) which counts all non-NULL occurrences.

FILTER (WHERE …) — conditional aggregation

FILTER limits an aggregate to rows matching a condition. It is cleaner than CASE WHEN inside the aggregate:

SELECT
  COUNT(*) FILTER (WHERE status `=` 'shipped')   AS shipped,
  COUNT(*) FILTER (WHERE status `=` 'delivered') AS delivered,
  COUNT(*) FILTER (WHERE status `=` 'cancelled') AS cancelled
FROM orders;

Works with any aggregate function:

SELECT
  AVG(salary)                               AS overall_avg,
  AVG(salary) FILTER (WHERE department_id `=` 60) AS it_avg
FROM employees;

Combining GROUP BY with FILTER

SELECT
  department_id,
  COUNT(*)                               AS total,
  COUNT(*) FILTER (WHERE salary `>` 8000)  AS high_earners
FROM employees
GROUP BY department_id
ORDER BY department_id;

FILTER vs CASE WHEN inside aggregate

-- These are equivalent, but FILTER is more readable:
COUNT(*) FILTER (WHERE salary `>` 8000)
COUNT(CASE WHEN salary `>` 8000 THEN 1 END)

Tips

FILTER is a PostgreSQL (and SQL:2003 standard) feature — not available in older MySQL or SQLite.
COUNT(DISTINCT col) cannot use the FILTER clause directly in all databases; wrap in a subquery if needed.
For pivot-style reports, FILTER + GROUP BY replaces multiple sub-selects.

Learning context

This lesson is part of Aggregations & Grouping. Focus on the core idea in COUNT DISTINCT and FILTER, then validate with deliberate practice.

What to master

Separate dimensions from measures
Group only by columns needed in final grain
Use HAVING only for aggregated filters

Common mistakes

Mixing row-level and aggregated logic
Grouping by too many columns
Using WHERE for aggregated conditions

High-level strategy

Draft base rows first
Add GROUP BY and verify expected number of groups
Add HAVING and ordering last

Task ladder

Baseline query: get a correct first output.
Add one complication: edge case, join, filter, or ranking rule.
Re-run and refine until output is stable and explainable.

Transparent data checks

Check row count before and after each major clause.
Validate one manual sample row for correctness.
Confirm final ordering and column naming for trust.

Retention loop

Run the solution once.
Rewrite from memory with one variation.
Explain the logic in plain English in one sentence.

Logical reasoning for commands

GROUP BY

Why: Defines one output row per grouping key.

Memory cue: Group key = output row identity.

HAVING

Why: Filters after aggregates are calculated.

Memory cue: WHERE before groups, HAVING after groups.

COUNT / SUM / AVG

Why: Converts row-level data into business metrics.

Memory cue: Measure only after grain is correct.

Concept check

Quick check: COUNT DISTINCT and FILTER

3 quick questions. One at a time. Instant score at the end.

Exercise 1

Count distinct departments in use

Enterprise Workforce AnalyticseasyPostgreSQL

Return a single value unique_departments — the number of distinct department_id values across all employees.

Autocomplete: start typing table/column namesPaste supported

No results yet

Run a query to see results here

Exercise 2

Shipped vs cancelled order counts

Commerce Operations IntelligenceeasyPostgreSQL

From the store sandbox, return two columns in a single query: shipped (orders with status 'shipped') and cancelled (status 'cancelled'). Use FILTER.

Autocomplete: start typing table/column namesPaste supported

No results yet

Run a query to see results here

Exercise 3

Per-department headcount and high earners

Enterprise Workforce AnalyticseasyPostgreSQL

Return department_id, total employees, and high_earners (those with salary > 8000) per department. Use FILTER. Order by department_id.

Autocomplete: start typing table/column namesPaste supported

No results yet

Run a query to see results here

Final checkpoint

End-of-course high-level quiz

Pass mark: 80%. Review each explanation after you submit.

1. Which clause should you use to filter grouped results?

WHEREHAVINGORDER BYLIMIT

2. What is the safest way to avoid accidental row multiplication in joins?

Use DISTINCT everywhereCheck join keys and compare row counts step-by-stepAlways use LEFT JOINOnly select one column

3. Which feature is best for ranking rows within partitions?

GROUP BYCASEWindow functionsUNION ALL

4. Why should final analytical queries usually include ORDER BY?

It always improves speedIt ensures deterministic outputIt removes duplicatesIt groups data

5. When query logic gets complex, what improves readability most?

Nested subqueries onlyCTEs with named stepsRemoving aliasesSelecting *

Aggregations & Grouping

Lesson 4 / 4

COUNT DISTINCT and FILTER

0/3 solved3 to unlock next

Previous★ Final lesson

Prev★ Final lesson