Subqueries and CTEs

This page covers nested queries with IN (SELECT ...), Common Table Expressions (WITH), CTE join types, deep join paths, and mixing CTEs with custom joins.


Subqueries in Filters

Pass a query object to @in to create a server-side IN (SELECT ...) predicate:

# Build the subquery
subquery = M.Status.objects
subquery.filter("status" => "Engine")
subquery.values("statusId")

# Use it in the main query
query = M.Result.objects
query.filter("statusId__@in" => subquery)
query.values("resultId", "statusId", "statusId__status", "grid", "driverId")
df = query |> DataFrame

This generates:

SELECT ... FROM "result" as "Tb" ...
WHERE "Tb"."statusid" IN (SELECT "Tb"."statusid" FROM "status" as "Tb" WHERE "Tb"."status" = $1)

[!TIP] Subqueries run entirely on the server — PormG does not materialize the subquery in Julia. This is much more efficient for large datasets.


Subqueries with Additional Filters

Combine subquery @in with other filter conditions:

subquery = M.Status.objects
subquery.filter("status" => "Engine")
subquery.values("statusId")

query = M.Result.objects
query.filter("statusId__@in" => subquery, "driverId__@lte" => 7)
query.values(
    "resultId",
    "statusId",
    "statusId__status",
    "grid",
    "driverId",
    "raceId__date__@quarter"
)
query.order_by("raceId__date__quarter")
df = query |> DataFrame

Common Table Expressions (CTEs)

CTEs (SQL WITH clauses) are useful when a query becomes easier to reason about in stages. PormG supports CTEs through the .with() method.

Why Use CTEs?

  • Readability — Break complex queries into named stages.
  • Aggregation — Pre-compute aggregates and join them back to the main query.
  • Reuse — Reference the same subquery multiple times without duplication.

Basic CTE with JOIN

Define a subquery, give it a name, and join it to the main query via join_field:

using PormG: Count

# Define the CTE: count results per driver (where status = 1)
driver_stats = M.Result.objects
driver_stats.filter("statusId" => 1)
driver_stats.values("driverId", "total_results" => Count("resultId"))

# Main query: join the CTE to Result via driverId
main_query = M.Result.objects
main_query.with("stats" => driver_stats, join_field="driverId" => "driverId")

main_query.filter("resultId__@lte" => 100)
main_query.values("resultId", "driverId", "stats__total_results")
df = main_query |> DataFrame

The .with() method:

  1. Emits the subquery as a WITH stats AS (SELECT ...) clause.
  2. Creates a LEFT JOIN stats ON result.driverid = stats.driverid.
  3. Makes stats__total_results available for selection via values().

CTE with Multiple Aggregated Fields

using PormG: Count, Sum

# CTE with multiple aggregates
stats = M.Result.objects
stats.filter("raceId__@lte" => 100)
stats.values(
    "driverId",
    "total_results"         => Count("resultId"),
    "total_grid_positions"  => Sum("grid")
)

# Join CTE to Driver model
query = M.Driver.objects
query.with("driver_stats" => stats, join_field="driverId" => "driverId")

query.filter("driverId__@lte" => 50)
query.values(
    "driverId",
    "forename",
    "surname",
    "driver_stats__total_results",
    "driver_stats__total_grid_positions"
)
df = query |> DataFrame

Multiple CTEs

Attach multiple CTEs to the same query:

# CTE 1: Recent races
recent_races = M.Race.objects
recent_races.filter("year__@gte" => 2020)
recent_races.values("raceId", "name", "year")

# CTE 2: Top drivers
top_drivers = M.Driver.objects
top_drivers.filter("driverId__@lte" => 100)
top_drivers.values("driverId", "forename", "surname")

# Main query with both CTEs
query = M.Result.objects
query.with("recent" => recent_races, join_field="raceId" => "raceId")
query.with("top_d" => top_drivers, join_field="driverId" => "driverId")

query.values("resultId", "recent__name", "top_d__forename", "points")
query.filter("recent__name__@isnull" => false, "top_d__forename__@isnull" => false)
df = query |> DataFrame

Each .with() call adds another WITH clause and JOIN to the final SQL.


Choosing Join Types for CTEs

By default, CTEs use LEFT JOIN. Use join_type="INNER" to filter out non-matching rows:

using PormG: Sum

high_scorers = M.Result.objects
high_scorers.filter("points__@gte" => 10)
high_scorers.values("driverId", "max_points" => Sum("points"))

query = M.Driver.objects
query.with(
    "high_scorers" => high_scorers,
    join_field="driverId" => "driverId",
    join_type="INNER"   # Only include drivers who have high scores
)

query.values("driverId", "forename", "max_points" => "high_scorers__max_points")
query.filter("driverId__@lte" => 100)
df = query |> DataFrame

Available CTE Join Types

Join TypeBehavior
"LEFT"Default. All main-table rows are kept; unmatched CTE columns are missing.
"INNER"Only rows that match the CTE are returned.

The SQL builder can also render "RIGHT" and "FULL" join keywords, but these are not the primary documented workflow.


The join_field can contain a multi-level path with __. PormG builds the intermediate joins needed to connect the main query to the CTE:

using PormG: Count

# CTE: count drivers per nationality
nat_stats = M.Driver.objects
nat_stats.values("nationality", "driver_count" => Count("driverId"))

# Main query: join CTE via a deep path (Result → Driver → nationality)
query = M.Result.objects
query.with("stats" => nat_stats, join_field="driverId__nationality" => "nationality")

query.filter("raceId__year" => 2023)
query.values("raceId__name", "driverId__surname", "stats__driver_count")
df = query |> DataFrame

PormG automatically:

  1. Joins Result → Driver (via the driverId ForeignKey).
  2. Links Driver.nationality to stats.nationality (the CTE join column).

CTE Without a JOIN

If you call .with("name" => subq) without providing join_field, PormG still emits the CTE in the WITH clause but does not join it to the main table:

subq = M.Status.objects.filter("status" => "Engine").values("statusId")

query = M.Result.objects
query.with("engine_statuses" => subq)   # Declared but not joined

This is valid SQL, but the CTE data won't be accessible through values(). The main use case is combining a CTE declaration with a subquery filter:

query = M.Result.objects
query.with("sub" => subq)
query.filter("statusId__@in" => subq)   # Reuse the subquery in a filter

[!TIP] Always provide join_field when you want CTE data accessible via .values(). Without it, the CTE is emitted but produces no additional projectable columns.


Mixing CTEs and Custom Joins

PormG allows CTEs and custom joins (cjoin) in the same query. Parameter ordering is deterministic across these combinations:

using PormG: cjoin

# 1. Define the CTE
top_const = M.Constructor.objects
    .filter("constructorId__@lte" => 5)
    .values("constructorId", "name")

query = M.Result.objects

# 2. Attach the CTE
query.with("tc" => top_const, join_field="constructorId" => "constructorId")

# 3. Add a custom join
query.cjoin("driverId" => "Driver", filters=["nationality" => "German"])

# 4. Filter and select across physical tables, custom joins, and CTEs
query.filter("positionOrder" => 1)
query.values("resultId", "tc__name", "driverId__surname")
df = query |> DataFrame

Chaining Multiple Custom Joins

query = M.Result.objects
query.cjoin("driverId" => "Driver", filters=["nationality" => "Brazilian", "forename" => "Ayrton"])
query.cjoin("raceId" => "Race", filters=["year" => 1991])

query.filter("positionOrder" => 1)
query.values("resultId", "driverId__surname", "raceId__name")
df = query |> DataFrame

See Custom Joins for the full cjoin() and on() documentation.


Summary

FeatureSyntaxUse Case
Subquery IN"field__@in" => subqueryFilter by a set computed on the server.
CTE with JOIN.with("name" => subq, join_field=...)Pre-aggregate data and join it.
CTE without JOIN.with("name" => subq)Declare for use in filters only.
CTE INNER JOINjoin_type="INNER"Only keep matching rows.
Deep join pathjoin_field="a__b" => "field"Link CTE via multi-level relationships.
CTE + cjoin.with(...) + .cjoin(...)Combine all join strategies.

Next Steps