rngo

Reference

Streams

A stream defines the schema, content and volume of one data set within a simulation. At a minimum, a stream must have:

  1. a unique name (within its namespace)
  2. a JSON Schema

Currently, you can only add and update streams via your project config.

Schema

All streams define a JSON schema - in the config, it is specified under the schema keyword. When a stream is included in a simulation, rngo guarantees that the data it generates for that stream will be valid against its schema.

rngo will eventually fully support the 2020-12 draft of JSON Schema. For now, only a subset of the vocabularies and keywords are supported.

Custom Vocabulary

rngo extends JSON schema with a custom vocabulary to support the generation of realistic data. All extension keywords are nested under the rngo keyword and do not change the validation semantics of the schema.

rngo.value

The rngo.value keyword specifies an expression that returns either a Set or a single Value. For example:

type: object
properties:
  name:
    type: string
    rngo:
      value: enums.fullNames

In this case, a value will be randomly selected from enum.fullName set when generating a value for the name property.

Upon stream creation or update, rngo will validate that all rngo.value expressions returns a value or set of the correct type.

See Expressions for more information.

rngo.probability.type

When a schema has multiple type references, you can specify the probability that each type will be generated with the rngo.probability.type keyword.

The most common scenario for this is to make a value nullable:

type:
  - integer
  - null
rngo:
  probability:
    type:
      integer: 4
      null: 1

The keyword expects a map from the type name to a weight. So, the above defines a schema that produces an integer 80% of the time and null 20% of the time.

Weights must be positive integers. By default, each type has a weight of 1.

rngo.probability.properties

For object schemas, you can specify that a likelihood that a non-required property will be included in the generated value via the rngo.probability.properties keyword:

type: object
properties:
  id:
    type: integer
  name:
    type: string
  homepage:
    type: string
required:
  - id
rngo:
  probability:
    properties:
      name: 0.9
      homepage: 0.5

The keyword expects a map from a property name to a probability between 0 and 1. So, the above schema will produce an object with a name property 90% of the time and a homepage property 50% of the time.

By default, a non-required property has a 60% chance of being included. A required property may not be referenced by rngo.probability.properties.

Rate

Use the rate keyword to specify the rate at which the stream should produce new events, expressed in hertz. For example, this stream will produce events at a rate of roughly 1 event per 10 seconds:

streams:
  users:
    rate: 0.1
    schema:
      #...

rngo builds in variance, so the observed rate over any sub-interval of the simulation may be higher or lower than the configured one.

The value is an expression, so to make the rate increase over time, you could do something like this:

streams:
  users:
    rate: 0.1 + (0.0001 * sim.offset)
    schema:
      #...

The expression is sampled periodically over the course of the simulation, so the rate will change in steps.

Rates will always be adjusted to be greater than or equal to zero and less than 1000 events / second.

Systems

Streams may be associated with one or more systems. See the Systems reference for details.

Outputs

You can also customize the output of a stream's data. In the config, this looks like this:

streams:
  users:
    outputs:
      - format: csv
    schema:
    #...
Previous
Pricing