Writing Data with PormG

This guide covers all data manipulation operations in PormG, including creating, updating, and deleting records. PormG provides both single-record and bulk operations for efficient data management.

Creating Records
Updating Records
Bulk insert
Deleting Records
F Expressions

Creating Records

Single Record Creation

Use the .create() method to insert individual records:

The .create() method returns a Dict containing the data of the newly created record, including the database-generated primary key.

# Load your models
include("db/models.jl")
import .models as M

# Create a new record
query = M.Just_a_test_deletion |> object
new_record = query.create("name" => "test", "test_result" => 1)

Upon success, new_record will contain:

Dict{Symbol, Any} with 4 entries:
  :test_result  => 1
  :id           => 172
  :name         => "test"
  :test_result2 => missing

PormG validates that all non-nullable fields are provided. If a required field is missing, it will raise an ArgumentError.

# This will fail because the 'driverref' field is required and not provided.
query = M.Driver |> object
driver = query.create(
    "forename" => "Lewis",
    "surname" => "Hamilton",
    "nationality" => "British",
    "dob" => Date(1985, 1, 7)
)

This will produce the following error:

ERROR: ArgumentError: Error in insert, the field driverref not allow null

Creating with Relationships

To create a record with a ForeignKey relationship, you first need the ID of the related record. You can either create the related record first or use an existing one.

# Create related records
circuit_query = M.Circuit |> object
circuit = circuit_query.create(
    "name" => "Monaco",
    "country" => "Monaco",
    "circuitref" => "monaco",
    "location" => "Monaco",
    "lat" => 43.7347,
    "lng" => 7.4206,
    "alt" => 5,
    "url" => "https://example.com/circuits/monaco"
)

# Create a record with a foreign key reference and other fields
race_query = M.Race |> object;
race = race_query.create(
    "year" => 2024,
    "round" => 8,
    "circuitid" => circuit[:circuitid],
    "name" => "Monaco Grand Prix",
    "date" => Date(2024, 5, 26),
    "time" => Time(13, 0, 0), # Optional: 13:00 UTC
    "url" => "https://example.com/races/monaco2024",
    "fp1_date" => Date(2024, 5, 24),
    "fp1_time" => Time(11, 30, 0),
    "fp2_date" => Date(2024, 5, 25),
    "fp2_time" => Time(11, 30, 0),
    "fp3_date" => Date(2024, 5, 26),
    "fp3_time" => Time(11, 30, 0),
    "quali_date" => Date(2024, 5, 25),
    "quali_time" => Time(15, 0, 0),
    "sprint_date" => Date(2024, 5, 26),
    "sprint_time" => Time(12, 0, 0)
)

Upon success, race will contain:

Dict{Symbol, Any} with 18 entries:
  :fp3_time    => 11:30:00
  :fp2_date    => Date("2024-05-25")
  :sprint_time => 12:00:00
  :time        => 13:00:00
  :year        => 2024
  :fp1_date    => Date("2024-05-24")
  :raceid      => 1146
  ⋮            => ⋮

Creating Multiple Records Individually

# Create multiple records in a loop
test_data = [
    ("test10", 10),
    ("test11", 11),
    ("test12", 12)
]

query = M.Just_a_test_deletion |> object
for (name, test_result) in test_data
    query.create(
        "name" => name,
        "test_result" => test_result
    )
end

Bulk Insert

Basic insertion

Efficiently insert large datasets using bulk_insert():

using CSV, DataFrames

# Prepare data
df = CSV.File("drivers.csv") |> DataFrame

# Bulk insert from DataFrame
query = M.Driver |> object
bulk_insert(query, df)

By default, bulk_insert() chunks data into batches of 1000 rows for efficient insertion. However, for larger datasets with many columns, you might need to adjust the batch size to avoid hitting LibPQ limits. Unfortunately, when an insertion exceeds these limits, LibPQ can raise an unknown error. If you encounter such an error, try reducing the chunk_size.

query = M.Driver |> object
bulk_insert(query, df, chunk_size=500)

Bulk Insert with error handling

When performing bulk inserts, especially from sources like CSV files, you might encounter data that doesn't align with your database schema. A common issue is trying to insert a string value (like \N, often used to represent NULL in CSVs) into a numeric column. This mismatch will cause bulk_insert() to raise an error and halt the operation.

Example: A Failing Insert

Imagine you're loading data from a results.csv file.

# Load data from a CSV file
df = CSV.File("results.csv") |> DataFrame

# Attempt a bulk insert
query = M.Result |> object
bulk_insert(query, df)

If the milliseconds column in your CSV contains \N for some rows, while the corresponding database column is numeric, you'll get an error:

julia> bulk_insert(query, df)
ERROR: ArgumentError: Error in bulk_insert, the field milliseconds in row 6 has a value that can't be formatted: \N

The Solution: Pre-process Your Data

To fix this, you need to "clean" the data in your DataFrame before passing it to bulk_insert(). The goal is to convert problematic strings like \N into Julia's missing value. PormG will then correctly interpret missing as a NULL value for the database.

Here’s how you can clean the DataFrame. This code iterates through potentially problematic columns and replaces any \N strings with missing.

# Define the list of columns that need cleaning
cols_to_clean = [
    :position, :time, :milliseconds, :fastestlap, :rank, 
    :fastestlaptime, :fastestlapspeed, :number
]

# In each specified column, replace "\N" with `missing`
for col in cols_to_clean
    df[!, col] = map(x -> ismissing(x) || x == "\\N" ? missing : x, df[!, col])
end

# Now, the bulk insert will succeed
bulk_insert(query, df)

# You can verify the result
query |> do_count
26759

By pre-processing the data, you ensure it's compatible with your database schema, leading to a smooth and successful bulk insert.

Updating Records

Single Record Updates

Update specific records using filters:

# Update a single record
query = M.Driver |> object;
query.filter("forename" => "Lewis");
query |> do_count
1
df = query |> DataFrame
1×9 DataFrame
 Row │ number  surname   driverid  driverref  nationality  dob         code     url                                forename 
     │ Int32?  String?   Int64?    String?    String?      Date?       String?  String?                            String?  
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     44  Hamilton         1  hamilton   British      1985-01-07  HAM      http://en.wikipedia.org/wiki/Lew…  Lewis

query.update("nationality" => "Xylos")

# Verify the update
df = query |> DataFrame
1×9 DataFrame
 Row │ number  surname   driverid  driverref  nationality  dob         code     url                                forename 
     │ Int32?  String?   Int64?    String?    String?      Date?       String?  String?                            String?  
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     44  Hamilton         1  hamilton   Xylos       1985-01-07  HAM      http://en.wikipedia.org/wiki/Lew…  Lewis

query.update("nationality" => "British")

# Update multiple fields
query = M.Race |> object;
query.filter("raceid" => 1);
query.update(
    "name" => "Australian Grand Prix",
    "date" => Date(2024, 3, 24),
    "round" => 1
)

# If you want to see the query (without executing it)
query.update(
    "name" => "Australian Grand Prix",
    "date" => Date(2024, 3, 24),
    "round" => 1,
    show_query=true
)

The return value of the update() when show_query is true is:

┌ Info: UPDATE "race" AS "Tb"
│ SET "name" = $2, "date" = $3, "round" = $4
└ WHERE "Tb"."raceid" = $1

Updates with Relationships

# Update with relationships
query = M.Result |> object;
query.filter("driverid__nationality" => "British", "resultid" => 1);
query.update("points" => F("points") + 10)

# if you want to see the query (without executing it)
query.update("points" => F("points") + 10, show_query=true)

# Update with complex JOINs
query = M.Result |> object;
query.filter("raceid__circuitid__name__@icontains" => "Monaco", "resultid" => 7654);
query.update("points" => 11)


query.update("points" => 10, show_query=true)

Deleting Records

Single Record Deletion

If you want to use this example, first create a few records in the Just_a_test_deletion table as show in the Creating Multiple Records Individually section for details.

# Delete specific records
query = M.Just_a_test_deletion |> object;
query.filter("test_result" => 10)
delete(query)

If you had created records with testresult values of 10, 11, and 12, the above code would delete the record with testresult equal to 10, and the return value would be:

(1, Dict{String, Integer}("just_a_test_deletion" => 1))

# Delete with multiple conditions (AND want just see the query without executing it)
query = M.Just_a_test_deletion |> object;
query.filter("test_result__@in" => [11, 12], "test_result2__@isnull" => true)
delete(query, show_query=true)

The return value of the above code is:

┌ Info: DELETE FROM just_a_test_deletion WHERE "id" IN (SELECT
│    "Tb"."id" as id
│ FROM "just_a_test_deletion" as "Tb"
│ 
│ WHERE "Tb"."test_result" = ANY($1) AND
│    "Tb"."test_result2" IS NULL
└ )
(2, Dict{String, Integer}("just_a_test_deletion" => 2))

However, if you execute the above code, remove the show_query parameter, it will delete the records with test_result values of 11 and 12, and the return value will be:

(2, Dict{String, Integer}("just_a_test_deletion" => 2))

Bulk Deletion

# Delete all records (requires explicit permission)
query = M.Just_a_test_deletion |> object
delete(query, allow_delete_all=true)

# Delete with conditions
query = M.Result |> object
query.filter("raceid__year__@lt" => 1960)
delete(query)

Cascade Deletion

Foreign key relationships with on_delete="CASCADE" will automatically delete related records:

# This will also delete related Result records if configured with CASCADE
query = M.Race |> object
query.filter("name" => "Cancelled Grand Prix")
delete(query)

Bulk Update

Update multiple records from a DataFrame:

# Get existing data
query = M.Result |> object
df = query |> DataFrame

# Modify data
for (index, row) in enumerate(eachrow(df))
    row.points = row.points + 1  # Bonus point adjustment
    row.milliseconds = row.milliseconds * 1000  # Convert to microseconds
end

# Bulk update using ID as filter
bulk_update(query, df, columns=["points", "milliseconds"], filters=["resultid"])

Bulk Update with Custom Filters

# Update with additional static filters
query = M.Result |> object
df = query |> DataFrame

# Modify lap times
for row in eachrow(df)
    if !ismissing(row.milliseconds)
        row.milliseconds = row.milliseconds - 1000  # Improve lap time by 1 second
    end
end

# Bulk update with custom filters
bulk_update(
    query, df, 
    columns=["milliseconds"], 
    filters=["resultid", "statusid" => 1],
)

Bulk Update with unsupported join in bulk_update

# Unsupported join in bulk_update, you will get a error
bulk_update(
    query, df, 
    columns=["milliseconds"], 
    filters=["resultid", "statusid__status" => "Finished"],
)
ERROR: "Error in bulk_update, the join is not allowed in bulk_update"

F Expressions

F expressions allow database-level operations without loading data into Julia, similar to Django's F objects.

Basic F Expression Usage

# Increment a counter field
query = M.Driver |> object;
query.filter("driverid" => 1);
df = query |> DataFrame

1×9 DataFrame
 Row │ number  surname   driverid  driverref  nationality  dob         code     url                                forename 
     │ Int32?  String?   Int64?    String?    String?      Date?       String?  String?                            String?  
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     44  Hamilton         1  hamilton   British      1985-01-07  HAM      http://en.wikipedia.org/wiki/Lew…  Lewis

query.update("number" => F("number") + 1)

df = query |> DataFrame

1×9 DataFrame
 Row │ number  surname   driverid  driverref  nationality  dob         code     url                                forename 
     │ Int32?  String?   Int64?    String?    String?      Date?       String?  String?                            String?  
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     45  Hamilton         1  hamilton   British      1985-01-07  HAM      http://en.wikipedia.org/wiki/Lew…  Lewis

query.update("number" => F("number") - 1)

# Set one field equal to another
query = M.Driver |> object;
query.filter("driverid" => 1);
query.update("number" => F("driverid"))

df = query |> DataFrame

1×9 DataFrame
 Row │ number  surname   driverid  driverref  nationality  dob         code     url                                forename 
     │ Int32?  String?   Int64?    String?    String?      Date?       String?  String?                            String?  
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │      1  Hamilton         1  hamilton   British      1985-01-07  HAM      http://en.wikipedia.org/wiki/Lew…  Lewis

query.update("number" => 44)

With F, you can do all basic mathematical operations directly in the database.

query = M.Just_a_test_deletion |> object;
query |> do_exists && delete(query; allow_delete_all = true)
query.create("name" => "fexpr", "test_result" => 1)
query.create("name" => "fexpr", "test_result" => 2)
query.create("name" => "fexpr", "test_result" => 3)

query = M.Just_a_test_deletion |> object;
query.filter("test_result" => 1)

# Addition
query.update("test_result2" => F("test_result") + 1)

# Multiplication  
query.update("test_result2" => F("test_result2") * 2)

# Division
query.update("test_result2" => F("test_result2") / 2)

# Opereations with only F expressions
query.update("test_result2" => F("test_result") + F("test_result"))

# Subtraction
query.update("test_result2" => F("test_result2") - 1)

F Expressions with Relationships

# Use values from related models
query = M.Result |> object;
query.filter("resultid" => 3);
df = query |> DataFrame
1×18 DataFrame
 Row │ fastestlapspeed  points    raceid  number  time     fastestlaptime  driverid  position  laps    rank    statusid  resultid  fastestlap  grid    milliseconds  positionorder  positiontext  constructorid 
     │ Float64?         Float64?  Int64?  Int32?  String?  Time?           Int64?    Int32?    Int32?  Int32?  Int64?    Int64?    Int32?      Int32?  Int32?        Int32?         String?       Int64?        
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │         216.719       6.0      18       7  +8.163   00:01:28.09            3         3      58       5         1         3          41       7       5697779              3  3                         3


# Update using related field values 
query.update("grid" => F("driverid__number"))

df = query |> DataFrame
1×18 DataFrame
 Row │ fastestlapspeed  points    raceid  number  time     fastestlaptime  driverid  position  laps    rank    statusid  resultid  fastestlap  grid    milliseconds  positionorder  positiontext  constructorid 
     │ Float64?         Float64?  Int64?  Int32?  String?  Time?           Int64?    Int32?    Int32?  Int32?  Int64?    Int64?    Int32?      Int32?  Int32?        Int32?         String?       Int64?        
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │         216.719       6.0      18       7  +8.163   00:01:28.09            3         3      58       5         1         3          41       6       5697779              3  3                         3

# Complex relationship traversal
query.update("positiontext" => F("raceid__circuitid__country"))

# Yeah, I know, nothing this make any sense, but it's just an example
query.update("grid" => F("driverid__number"), show_query=true)
┌ Info: UPDATE "result" AS "Tb"
│ SET "grid" = "Tb_1"."number"
│ FROM "driver" AS "Tb_1"
└ WHERE "Tb"."driverid" = "Tb_1"."driverid" AND "Tb"."resultid" = $1

query.update("positiontext" => F("raceid__circuitid__country"), show_query=true)
┌ Info: UPDATE "result" AS "Tb"
│ SET "positiontext" = "Tb_2"."country"
│ FROM "race" AS "Tb_1", "circuit" AS "Tb_2"
└ WHERE "Tb"."raceid" = "Tb_1"."raceid" AND "Tb_1"."circuitid" = "Tb_2"."circuitid" AND "Tb"."resultid" = $1

This comprehensive guide covers all data writing operations in PormG. For more details on querying data, see the read.md documentation. For field definitions and validation, refer to fields.md.