Statically typing DataFlows using advanced TypeScript concepts
Michael
Electric UI utilises a streaming computation model for performing transformations on inbound data. We call this model DataFlow.
By expressing the transformation as an incremental computation on a stream of inputs, the transformation can be run across historical, static, and future incoming data. DataFlows can consume DataSources or other DataFlows as inputs, allowing for composition of complex transformations from simple operators such as map
, filter
and interleave
. The computation cost is incremental and time sliceable. If the browser has an input pending, the computation may be paused and resumed once the main thread is idle. If data is added out of order, computation can restart, guaranteeing correctness.
To make this API as ergonomic as possible, DataFlows are statically typed with TypeScript to with as much type inference as feasible. Some of this typing involved non-trivial concepts that may be useful to others.
In this article we'll take a look at lookup types, the unknown and never type, overriding the type of class constructors, the infer declaration, conditional types, grabbing the inner type out of a generic, and forming unions from the inner members of an array of generic types.
A DataFlow
The following DataFlow combines two DataSources, one that provides XYZ position information, one that provides global state change information in the form of a color. The colorMixer
DataFlow combines these XYZ positions with the latest state color.
type XYZEvent = { x: number y: number z: number}type ColorEvent = { color: string}type MixedEvent = XYZEvent & ColorEventfunction colorMixer(colorQueryable: Queryable<ColorEvent>, xyzQueryable: Queryable<XYZEvent>) { let currentColorState = 'blue' const colorSetter = forEach(colorQueryable, (data, time) => { currentColorState = data.color }) const colorXYZ = map(xyzQueryable, (data, time) => ({ x: data.x, y: data.y, z: data.z, color: currentColorState, })) return interleave([colorXYZ, colorSetter])}
Events are passed through the DataFlow in temporal order, modifying the currentColorState
, emitting new mixed events when new positions are received.
┌─────────┐ │ │ ┌─────┐ ┌───┐ │ Color ├───►Green├────────────────────────►Red│ │ │ └──┬──┘ └─┬─┘ └─────────┘ │ │ │ │ │ │ ┌──▼──┐ ┌─────┐ ┌─────┐ │ ┌─────┐ │Green│ │Green│ │Green│ │ │ Red │ │ 1 ├─────► 2 ├─────► 3 ├──┴──► 4 │ └──▲──┘ └──▲──┘ └──▲──┘ └──▲──┘ │ │ │ │ │ │ │ │ ┌─────────┐ │ │ │ │ │ │ ┌┴┐ ┌┴┐ ┌┴┐ ┌┴┐ │ XYZ ├─────►1├─────────►2├─────────►3├─────────►4│ │ │ └─┘ └─┘ └─┘ └─┘ └─────────┘
Typing the MessageDataSource
Electric UI maintains a key value store of the current state of hardware. The key is the MessageID, the value can be any arbritrary data. This interface is declared globally per project.
// An easy way to declare zero-runtime-cost opaque types:type TemperatureCelcius = number & { __temp_celcius: true }declare global { interface ElectricUIDeveloperState { pub_time: number quat: [x: number, y: number, z: number, w: number] orient: [p: number, y: number, r: number] lin_acc: [x: number, y: number, z: number] ang_vel: [p: number, y: number, r: number] baro: number // kPa temp: TemperatureCelcius // In general [MessageID: string]: any }}
MessageDataSources are tied to specific keys via the MessageID:
import { MessageDataSource } from '@electricui/core-timeseries'const temperature = new MessageDataSource('temp')const runtimeSent = new MessageDataSource<number>('runtime')
Ideally, the temperature
MessageDataSource correctly infers its type as MessageDataSource<TemperatureCelcius>
. It is also desirable for type overrides to be available as an escape hatch, such as with runtimeSent
.
The type of a key can be looked up using a lookup type.
interface Store { test: 42}type TestType = Store['test']type LookupKeyType<K extends keyof Store> = Store[K]
With a function this is relatively simple. However, the below doesn't work as well as one would hope, the union of possibilities is returned instead of the exact type.
interface Store { test: 42 foo: string}const store: Store = { test: 42, foo: 'bar',}type TestType = Store['test']function lookup(key: keyof Store): Store[typeof key] { return store[key]}const res = lookup('test')// ^?: string | 42
A generic type argument is required to have the type narrowed exactly.
// ...function lookup<K extends keyof Store>(key: K): Store[K] { return store[key]}const res = lookup('test')// ^?: 42
Class constructor return types cannot be overriden, and it doesn't seem like this feature will be added any time soon.
As a result, the following does not work:
interface Store { key: 42}class Container<M extends keyof Store> { constructor(key: M): Container<Store[M]> { // ^? // Error: Type annotation cannot appear on a constructor declaration. (1093) }}
Instead, we alias the class and override the constructor function in a type assertion.
// Note the underscoreclass _MessageDataSource< T = unknown // the type of the events of this MessageID> implements Queryable<T> { constructor(public messageID: MessageID) {} // ... trimmed for brevity}type MessageDataSource<T> = _MessageDataSource<T>export const MessageDataSource = _MessageDataSource as { new < M extends keyof ElectricUIDeveloperState = keyof ElectricUIDeveloperState >( messageID: M, ): MessageDataSource<ElectricUIDeveloperState[M]>}
A generic type argument, M is used to contain the messageID
argument, which is used to extract the type from ElectricUIDeveloperState
.
interface ElectricUIDeveloperState { key: string foo: 42}const test = new MessageDataSource('foo')// ^?: MessageDataSource<42>
To support type overrides, an additional type argument T
is allowed, defaulting to unknown
. If it's unknown
, the conditional type defaults to the above extraction. If it's any other type (including any), it provides the override.
// Note the underscoreclass _MessageDataSource< T = unknown, // the type of the events of this MessageID M = keyof ElectricUIDeveloperState // the MessageID> implements Queryable<T> { constructor(public messageID: M & MessageID) {} // ... trimmed for brevity}type MessageDataSource<T, M> = _MessageDataSource<T, M>export const MessageDataSource = _MessageDataSource as { new < O = unknown, M extends keyof ElectricUIDeveloperState = keyof ElectricUIDeveloperState >( messageID: M, ): MessageDataSource<unknown extends O ? ElectricUIDeveloperState[M] : O, M>}
This results in the following behaviour:
interface ElectricUIDeveloperState { temp: number runtime: number}const temperature = new MessageDataSource('temp')// ^?: MessageDataSource<number, 'temp'>const runtimeSent = new MessageDataSource<number>('runtime')// ^?: MessageDataSource<number, string>
Unfortunately Typescript doesn't support partial type inference. It isn't a huge deal in this case, it just results in the second type parameter defaulting to string
when doing a type override.
Typing the DataFlow
DataFlows take an input and produce an output. They are composed from primitive operators, some of which are listed below with their type signature:
// Maps an event from one form to anothermap<I, O>(queryable: Queryable<I>, mapper: (data: I, time: Time) => O): Queryable<O>// Executes a closure for each event, potentially consuming itforEach<I>(queryable: Queryable<I>, func: (data: I, time: Time) => void, consuming: true): Queryable<never>forEach<I>(queryable: Queryable<I>, func: (data: I, time: Time) => void, consuming: false): Queryable<I>// Filters incoming events based on a predicate functionfilter<I>(queryable: Queryable<I>, predicate: (data: I, time: Time) => boolean): Queryable<I>
Internally the DataFlow keeps track of its inputs and outputs to provide type inference for callback functions, externally it only exposes its output type. All DataFlows alias themselves to Queryable<Output>
.
We can use the infer operator and a conditional type to extract the inner type from a Queryable or DataFlow.
// For plain Queryablestype GetQueryableInner<T> = T extends Queryable<infer I> ? I : never// For DataFlows and Queryablestype GetDataFlowInput<T> = T extends DataFlow<infer A, infer B> ? A // input if DataFlow : T extends Queryable<infer C> ? C // inner type if Queryable : nevertype GetDataFlowOutput<T> = T extends DataFlow<infer A, infer B> ? B // output if DataFlow : T extends Queryable<infer C> ? C // inner type if Queryable : never
The forEach
operator has an overload that determines if it consumes events without re-emitting them. This consumption can be represented with an output of the never type. In a union, the never type evaporates away, consumed by the other members.
forEach<I>(queryable: Queryable<I>, func: (data: I, time: Time) => void, consuming: true): Queryable<never>
This behaviour is useful with the interleave operator.
Interleave
The interleave operator combines multiple Queryables, ordering their events temporally.
Imagine the original DataFlow again:
function colorMixer(colorQueryable: Queryable<ColorEvent>, xyzQueryable: Queryable<XYZEvent>) { let currentColorState = 'blue' const colorSetter = forEach(colorQueryable, (data, time) => { currentColorState = data.color }) const colorXYZ = map(xyzQueryable, (data, time) => ({ x: data.x, y: data.y, z: data.z, color: currentColorState, })) return interleave([colorXYZ, colorSetter])}
The expected return type of this DataFlow would be:
colorMixer(colorQueryable: Queryable<ColorEvent>, xyzQueryable: Queryable<XYZEvent>): DataFlow<ColorEvent | XYZEvent, MixedEvent>
To achieve type inference of the members of the array of inputs to the interleave function, a generic type parameter Q is used, extending any array of Queryables.
function interleave<Q extends Queryable<any>[]>( queryables: Q,): DataFlow<GetDataFlowInput<Q[number]>, GetDataFlowOutput<Q[number]>> {}
An array can be indexed by numbers, so Q[number]
gives us the union of Queryables (including their inner types), in this case:
Q[number] = Queryable<ColorEvent> | Queryable<XYZEvent>
The GetDataFlowInput
and GetDataFlowOutput
helpers can be used to extract the relevant inner types.
Finally, the naive typing for forEach
would result in DataFlow<ColorEvent | XYZEvent, ColorEvent | MixedEvent>
. However since the forEach
in this case consumes, the output type is never
, which is consumed by MixedEvent
, resulting in the correct final type:
colorMixer(/* ... */): DataFlow<ColorEvent | XYZEvent, MixedEvent>
While DataFlows internally know both their inputs and outputs, they are presented as Queryables for downstream use, and as such the final signature is simply:
colorMixer(/* ... */): Queryable<MixedEvent>
Coalesce
The coalesce operator combines multiple Queryables into a keyed object. It emits new Events when any of its constituent members update. In the above color mixer implementation, Events are only emitted when the position changes. In the following coalesce based implementation, the DataFlow also emits events at the previous position if the color alone changes.
function colorMixer(colorQueryable: Queryable<ColorEvent>, xyzQueryable: Queryable<XYZEvent>) { return coalesce({ x: map(xyzDataSource, data => data.x), y: map(xyzDataSource, data => data.y), z: map(xyzDataSource, data => data.z), color: map(colorDataSource, data => data.colour), })})
Again the expected return type is a Queryable<MixedEvent>
.
To achieve this, the coalesce
function is generic over the object structure it receives.
function coalesce<S extends KeyedQueryables>(structure: S)
The KeyedQueryables
type is a non-nested object with string keys and Queryable values:
type KeyedQueryables = { [key: string]: DataFlow<any> | Queryable<any>}
An additional helper type is created to extract the output values of an object of Queryables of the type KeyedQueryables
.
type UnwrapKeyedQueryables<T extends KeyedQueryables> = { [K in keyof T]: GetDataFlowOutput<T[K]>}
Each key matches a key in the original structure, and each value is inferred using the conditional infer type, GetDataFlowOutput
, created above.
This results in the output of:
Queryable<UnwrapKeyedQueryables<{ x: Queryable<number>; y: Queryable<number>; z: Queryable<number>; color: Queryable<string>;}>>
Which results in:
Queryable<{ x: number; y: number; z: number; color: string;}>
Which matches MixedEvent
.
Map
Events cannot have a data field which is purely the undefined
value, the value is used to delineate "don't emit an event" in callbacks that return the object data alone. This pattern is used due to a limitation in Typescript described in the next section.
function map<T, O>( queryable: Queryable<T>, mapper: (data: T, time: Time) => O extends undefined ? never : O,): Queryable<O>
Using the never type in a conditional type, undefined can be disallowed as a return type for the mapper.
However, the implicit return of undefined by a bare return statement or the omission of a return statement is considered void
instead of undefined
, as a result, the following doesn't error:
function foo() { return map(new DataSource<number>(), (data, time) => { return })}
To capture our intent, we use void instead, which captures both undefined and the bare return, or lack of a return statement.
function map<T, O>( queryable: Queryable<T>, mapper: (data: T, time: Time) => O extends void ? never : O,): Queryable<O>
This now errors correctly:
function foo() { return map(new DataSource<number>(), (data, time) => { return // Type 'void' is not assignable to type 'never'. ts(2345) })}
Advance
The advance
operator is one such operator that uses an undefined
return value to signal that no Event should be emitted that round.
function advance<R, O>( queryable: Queryable<R>, callback: (time: number) => O | undefined,): Queryable<O extends void ? never : O>
If the return of an advance operator is statically analysable as always being void, the Queryable can be typed as Queryable<never>
to remove it from the union created by a later interleave
operator.
function foo() { return advance(new DataSource<number>(), time => { return undefined })}// function foo(): Queryable<never>
Limitations
The iterateEmit
operator gives raw access to the underlying API that powers the majority of other operators. It simply receives each event, and is allowed to emit other events.
function iterateEmit<T, O>( queryable: Queryable<T>, iterate: (event: Event<T>, emit: (event: Event<O>) => void) => void,): Queryable<O>
Unfortunately, its return type cannot be inferred automatically from the usage of the emit callback. The promise constructor suffers a similar limitation.
function foo(queryable: Queryable<any>) { return iterateEmit(queryable, (event, emit) => { emit(new Event(event.time, 42)) // The output of iterateEmit should be Queryable<number> }) // Instead it is inferred as Queryable<unknown>}
Maybe one day this will be possible, but for now when using the operator, it must be manually type annotated. As a result of this limitation, operators like map
require the event data to be returned by the callback, instead of having a separate emit
callback.
Results
All charts, loggers and other consumers of Queryables are generic over the return type, resulting in autocomplete and compile-time checking of inputs and accessors.
interface LineChartProps<T> { /** * A reference to a `Queryable` for event injestion. */ dataSource: Queryable<T> /** * An accessor on the `Event`s data to produce a column of data. If the event is produced by a MessageQueryable, * the eventData argument will be the payload of the message. */ accessor?: (data: T, time: number) => number /** * An accessor on the `Event`s data to produce the color for this point. */ colorAccessor?: (data: T, time: number) => Color}
If complex DataFlows are used to process incoming data, their types are inferred and maintained throughout the pipeline.
const pos = new DataSource<XYZEvent>()const col = new DataSource<ColorEvent>()const mixed = colorMixer(col, pos)const Page = () => { return ( <ChartContainer> <LineChart dataSource={mixed} accessor={data => data.q} // Errors! colorAccessor={data => data.color} // Autocompletes! /> </ChartContainer> )}
Finally, here's the color mixer in action, combining the liftoff, thrusting, coasting and chutes deployed state with the XYZ position to color the flight path of a model rocket.