Reference
Streams
A stream defines the schema, content and volume of one data set within a simulation. At a minimum, a stream must have:
- a unique name (within its namespace)
- a JSON Schema
Currently, you can only add and update streams via your project config.
Schema
All streams define a JSON schema - in the config, it is specified under the schema
keyword. When a stream is included in a simulation, rngo guarantees that the data it generates for that stream will be valid against its schema.
rngo will eventually fully support the 2020-12 draft of JSON Schema. For now, only a subset of the vocabularies and keywords are supported.
Custom Vocabulary
rngo extends JSON schema with a custom vocabulary to support the generation of realistic data. All extension keywords are nested under the rngo
keyword and do not change the validation semantics of the schema.
rngo.value
The rngo.value
keyword specifies an expression that returns either a Set
or a single Value
. For example:
type: object
properties:
name:
type: string
rngo:
value: enums.fullNames
In this case, a value will be randomly selected from enum.fullName
set when generating a value for the name
property.
Upon stream creation or update, rngo will validate that all rngo.value
expressions returns a value or set of the correct type.
See Expressions for more information.
rngo.probability.type
When a schema has multiple type references, you can specify the probability that each type will be generated with the rngo.probability.type
keyword.
The most common scenario for this is to make a value nullable:
type:
- integer
- null
rngo:
probability:
type:
integer: 4
null: 1
The keyword expects a map from the type name to a weight. So, the above defines a schema that produces an integer 80% of the time and null
20% of the time.
Weights must be positive integers. By default, each type has a weight of 1
.
rngo.probability.properties
For object schemas, you can specify that a likelihood that a non-required property will be included in the generated value via the rngo.probability.properties
keyword:
type: object
properties:
id:
type: integer
name:
type: string
homepage:
type: string
required:
- id
rngo:
probability:
properties:
name: 0.9
homepage: 0.5
The keyword expects a map from a property name to a probability between 0 and 1. So, the above schema will produce an object with a name
property 90% of the time and a homepage
property 50% of the time.
By default, a non-required property has a 60% chance of being included. A required property may not be referenced by rngo.probability.properties
.
Rate
Use the rate
keyword to specify the rate at which the stream should produce new events, expressed in hertz. For example, this stream will produce events at a rate of roughly 1 event per 10 seconds:
streams:
users:
rate: 0.1
schema:
#...
rngo builds in variance, so the observed rate over any sub-interval of the simulation may be higher or lower than the configured one.
The value is an expression, so to make the rate increase over time, you could do something like this:
streams:
users:
rate: 0.1 + (0.0001 * sim.offset)
schema:
#...
The expression is sampled periodically over the course of the simulation, so the rate will change in steps.
Rates will always be adjusted to be greater than or equal to zero and less than 1000 events / second.
Systems
Streams may be associated with one or more systems. See the Systems reference for details.
Outputs
You can also customize the output of a stream's data. In the config, this looks like this:
streams:
users:
outputs:
- format: csv
schema:
#...