Reference
Streams
A stream defines the schema, content and volume of one data set within a simulation. At a minimum, a stream must have:
- a unique name (within its namespace)
- a JSON Schema
Currently, you can only add and update streams via your project config.
Schema
All streams define a JSON schema - in the config, it is specified under the schema
keyword. When a stream is included in a simulation, rngo guarantees that the data it generates for that stream will be valid against its schema.
See JSON Schema for all details.
Rate
Use the rate
keyword to specify the rate at which the stream should produce new events, expressed in hertz. For example, this stream will produce events at a rate of roughly 1 event per 10 seconds:
streams:
users:
rate: 0.1
schema:
#...
rngo builds in variance, so the observed rate over any sub-interval of the simulation may be higher or lower than the configured one.
The value is an expression, so to make the rate increase over time, you could do something like this:
streams:
users:
rate: 0.1 + (0.0001 * sim.offset)
schema:
#...
The expression is sampled periodically over the course of the simulation, so the rate will change in steps.
Rates will always be adjusted to be greater than or equal to zero and less than 1000 events / second.
Outputs
The outputs
key configures how a stream outputs its data.
The values of the associated object either directly define an ouptput format, or reference a system.
Direct Outputs
You can directly specify a stream's output format like this:
streams:
users:
outputs:
JSON:
format: json
schema:
#...
When run to a file sink, the users
stream will output one or more JSON files under /streams/users/JSON/
. See Outputs for all configuration options.
If outputs
is not specified for a stream, the following output configuration will be used by default:
outputs:
default:
format: json
System Outputs
You can also direct a stream's output to a system like this:
systems:
db:
type: postgres
streams:
orders:
outputs:
database:
system: db
parameters:
table: ORDER
schema:
#...
The orders
stream will output CSV, because that is the format defined by the postgres
system type. The system's import script will know to look for the CSV file(s) at /streams/users/database/
.
You may override system parameters in the associated object. For the above example, the import script will attempt to load the data into the ORDER
table.
You can use shorthand to specify a system output by using the system name as a key. So this is a more concise way to write the above:
systems:
db:
type: postgres
streams:
orders:
outputs:
db:
parameters:
table: ORDER
schema:
#...