rngo

Tutorial

Outputs and Systems

By default, a stream's data is ouput to a JSON file. But this can be changed by explicitly specifying an output for the stream. Update the config file to output data to CSV:

streams:
  users:
    outputs:
      - format: csv
    schema:
      type: object
      # etc

Rerun the simulation. The new file will now be 001.csv and the contents will be in CSV format:

id,full_name,created_at
"1","Micaela Batz","2024-05-28T05:16:38.635516+00:00"
"2","Everette Krajcik","2024-05-29T00:50:52.439516+00:00"
...

Next we'll specify a Postgres system, but first let's set things up. Install Docker if you haven't already and add this compose.yml:

version: '3.8'
services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: pw
      POSTGRES_USER: rngo_tutorial
      POSTGRES_DB: rngo_tutorial
    ports:
      - "5431:5432"
    volumes:
      - ./db:/docker-entrypoint-initdb.d/

Add db/1-users.sql with a table that corresponds to the shape of the users stream:

CREATE TABLE users (
    id bigserial PRIMARY KEY,
    full_name text NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now()
);

Run docker compose up -d and update the config:

systems:
  db:
    output:
      format: csv
    scripts:
      import: |
        PGPASSWORD="$RNGO_DB_PASSWORD" \
        psql \
          -h $RNGO_DB_HOST \
          -p $RNGO_DB_PORT \
          -U $RNGO_DB_USER \
          -d $RNGO_DB_DATABASE \
          -c "TRUNCATE {{table}} CASCADE;" \
          -c "\\COPY {{table}} FROM {{dataFile}} CSV HEADER;"
streams:
  users:
    systems:
      db:
        parameters:
          table: users
    schema:
      type: object
      # etc

We've added a simple system called "db" that defines how to import data into a Postgres database. It asks for the data to be outputed in CSV format, and defines a bash import script that copies the CSV into a table.

The script is a template - stream-specific parameters may be interpolated via mustache syntax. In this case, we're refencing {{table}}, which is defined in the stream, and {{dataFile}}, which is the path to the CSV file and is provided by rngo.

The script also contains system-specific environment variables that must be available when the sim command is run - add a .env file for this:

RNGO_DB_HOST=localhost
RNGO_DB_PORT=5431
RNGO_DB_USER=rngo_tutorial
RNGO_DB_PASSWORD=pw
RNGO_DB_DATABASE=rngo_tutorial

Now run rngo sim again and then run:

PGPASSWORD=pw psql \
    -c "SELECT * FROM users" \
    -h localhost \
    -p 5431 \
    -U rngo_tutorial \
    rngo_tutorial

The result should be something like:

id | full_name        | created_at
---+------------------+-------------------------
1 | Micaela Batz      | 2024-05-28 05:16:38.635516+00
2 | Everette Krajcik  | 2024-05-29 00:50:52.439516+00
...
29 | Jacey Dicki       | 2024-06-24 21:33:51.897516+00
30 | Agnes Brakus      | 2024-06-26 02:20:23.631516+00

You'll see that the (fully-resolved) import script is part of the downloaded data at .rngo/simulations/last/import.sh - rngo sim runs this script as its last step.

You're able to customize systems to meet your needs, but rngo provides a default Postgres system definition that is equivalent the above definition. Update the config to reference the default:

systems:
  db:
    type: postgres
streams:
  users:
    systems:
      db: {}
    schema:
      type: object
      # etc
Previous
Realistic Data