Collections, Tables and Tags

Released on 2024-08-06

This update includes a number of improvements to DataFlow including a new Event tagging system, a new chart type, new annotations, DataFlow controlled tables, more control over domains, perceptual and wall time performance improvements, plus many bug fixes and general improvements.

New features

Collections and Tables

Previously Charts and DataFlow could only produce and consume singular items. If an annotation component needed to draw multiple items at the same time, multiple components would be needed. If an operator required multiple items, conceptually an array could be passed as the 'single item', but maintaining that array was expensive. Each change resulted in a new array with redundant processing. A lack of data was unrepresentable without a union of null, as there was no ability to distinguish between a lack of an event and an event that might come in the future.

Collections removes these restrictions, allowing for the incremental maintenance of an optionally ordered list of items. Annotations can now render multiple items per component. Windows of data are incrementally maintained with constant time memory and compute cost per event. Collections can be interleaved natively for combining multiple sets of data.

In the example above, power generation data is ingested as a stream and split into daily time spans. Each time span is rendered as a row in the table below the charts. Per technology statistics are calculated, across each day, and combined into daily totals.

Mouse Interactions

The Mouse signal now emits data on a per chart basis, even when the mouse is not hovered over the chart. It now emits drag information for user selected regions, and chart boundaries, for placement of labels on the chart out of the way of data.

The closestPointOnLineSegment and closestLineSegmentIntersection operators provide geometric line intersection and accelerated raycasting operations. These allow for precise interactions on sparse data where linear interpolation is valid.

For displaying data next to the mouse explicitly, the new MouseAttached component reduces the jitter compared to rendering an ElementAnnotation on the closest point.

New Components

An AreaChart component has been added. They can be used for typical 'integrated' quantities such as energy in the power generation example above, or for confidence interval style charts such as below:

Confidence band

New annotations in the form of BoxAnnotation and LineSegmentAnnotation have been added and used throughout the examples on this page. In the example below, as data is bulk-loaded from a device, areas without data are marked with a faint red box and an icon. The time frontier is marked similarly.

Continuity transitions

When data is flying by too fast, you can now pause and resume the flow of data with the new ChartPauseButton component.

Authoritative Domains

Domains can now be named via a domainID prop. DataSourcePrinters can reference them specifically, or the DomainWrapper component can be used to set a default for a group of components. This makes it simple to point particular DataSourcePrinters at different domains of data.

Authoritative domains

Tags

Metadata can now be attached to events, next to the data itself. Data from multiple devices can be interleaved then filtered later in the pipeline.

<ScatterPlot
dataSource={allDeviceSig4}
accessor={(data, time, tags) => ({ x: time, y: data })}
size={4}
colorAccessor={(data, time, tags) => getDeviceIDColor(tags.deviceID)}
/>

The new tag operator lets you apply tags to data inline.

const tagged = tag(dataSource, (data, time, tags) => ({
...tags,
session: 'A',
}))

The new search and collect operators contain methods to tag output data given the search query.

closestSpatially(dataTaggedWithPeaks, mouseSignal, {
positionAccessor: (data, time, tags) => ({ x: time, y: data }),
mapResult: (data, time, tags, position, searchData) => ({
x: position.x,
y: position.y,
}),
tagResult: (data, time, tags, position, searchData) => ({
bboxXMin: searchData.dragXMin,
bboxXMax: searchData.dragXMax,
bboxYMin: searchData.dragYMin,
bboxYMax: searchData.dragYMax,
}),
})

New operators

Various new operators have been added.

  • collectTimeSpan and collectTimeSpans select a region or regions based on a time interval.
  • collectValueSpan and collectValueSpans select a region or regions based on a value interval.
  • collectBoundingBox and collectBoundingBoxes select a region or regions based on a bounding box.
  • lineSegments turns pairs of consecutive point events into a line segment event.
  • closestPointOnLineSegment selects the closest point on a line segment.
  • closestLineSegmentIntersection selects the closest line intersection to a point.
  • integral calculates the trapizoidal area under a curve.
  • window emits a Collection over time.
  • batch and buffer emit collections, as rolling or fixed batches.
  • head and tail emit collections of a fixed number of items, ordered by a value.

Better handling of null data

Many operators now return null to indicate a lack of a result. Aggregation operators can return null for mathematically impossible operations. Singular search operators return null for no events found.

These affect the following statistics operators:

  • geometricMean
  • harmonicMean
  • interquartileRange
  • min
  • max
  • quantile
  • quantiles
  • product
  • rootMeanSquare
  • standardDeviation
  • sampleStandardDeviation
  • sum
  • variance
  • sampleVariance

Note that mode now returns a Collection, instead of an array of modes.

Non-collection search operators now signal a lack of data as a null value. This fixes previous behaviour where the last search that found data would 'stick' past when it made sense to.

This affects all search operators:

  • closestByValue
  • closestLineSegmentIntersection
  • closestPointOnLineSegment
  • closestSpatially
  • closestTemporally

Components now automatically skip these events, and that is represented in the type system to help keep implementations clean.

Previously:

<DataSourcePrinter
dataSource={closestPeaks}
accessor={closestEvent =>
closestEvent ? Math.abs(closestEvent.leftX - closestEvent.rightX) : 'nothing selected'
}
/>
<ElementAnnotation
dataSource={closestPeaks}
accessor={(data, time, tags) =>
data !== null
? {
x: (data.leftX + data.rightX) / 2,
y: data.mouseData.chartYMin,
}
: null
}
visibilitySource={mouseSignal}
visibilityAccessor={data => data.hovered || data.hasDragged}
>
{/* */}
</ElementAnnotation>

And now:

<DataSourcePrinter
dataSource={closestPeaks}
accessor={closestEvent => Math.abs(closestEvent.leftX - closestEvent.rightX)}
defaultValue="nothing selected"
/>
<ElementAnnotation
dataSource={closestPeaks}
accessor={(data, time, tags) => ({
x: (data.leftX + data.rightX) / 2,
y: data.mouseData.chartYMin,
})}
visibilitySource={mouseSignal}
visibilityAccessor={data => data.hovered || data.hasDragged}
>
{/* */}
</ElementAnnotation>

Advance scheduling

Previously the emission of events could only occur as a result of events being added. If a sliding window collected events, then discarded them as they left the screen, there was no way to update quantities that changed as a result of those events leaving. As a result, operators like min and max would be incorrect until a new event came into the window to trigger an update.

The addition of events can now schedule additional advance calls later, to efficiently allow for the removal of events in windows, such as above.

Performance improvements

Incremental Rendering

DataFlow operators now have a compute budget per frame. If it is exceeded, then partial batches of results will be rendered, instead of waiting for the entire batch to be complete.

The DataFlow execution engine is also more performant in absolute terms.

Up to date value propagation

Previously DataFlows would defer the advance stage until a timestamp had passed, in case more events arrived at the exact timestamp about to be processed. This would result in DataFlows appearing to lag one event behind in certain situations.

Event writing and frontier advancement are now separate operations, and frontier advancement is now a declaration that events will not appear before or at the timestamp. As a result, the advance stage can be called with the most up to date timestamp, and DataFlows no longer lag an event behind.

Intermediate value skipping optimisation

BarCharts and DataSourcePrinters cannot display data faster than the update rate of the screen. DataFlow queries from these components are now optimised to only emit events as fast as the screen update rate. Collection operators such as histogram and quantile can defer expensive work until these updates, while internally operating on the higher rate data.

If possible, this optimisation is pushed down to the PersistenceEngine. If only the latest piece of data is required, nothing will be queried except that last piece of data.

Interaction performance

DataFlow performance has been improved. Fast DataFlows can now be executed synchronously, just-in-time for frame paints. Previously they would be executed asynchronously, deferred to after the frame paint, and would therefore always be one frame late at a minimum regardless of execution time.

The previous behaviour is represented in the flamechart below. The pointermove event is fired offscreen to the left. A WebGL render occurs, then the frame loop fires, where the previous frame's mouse position is used to find the nearest line segment to a point. The result is a mouse based DataFlow drawing events an entire frame late!

Flamechart before

The updated behaviour is represented in this flame chart. High priority events such as the mouse move are executed just before the paint, resulting in more responsive mouse based DataFlows.

Flamechart after

Collection operator memory optimisations

Collection operators (standardDeviation, histogram, etc) with infinite sized windows now have bounded memory cost where they were unbounded before. This primarily improves the performance of fixed zoom queries.

The interquartileRange, meanAbsoluteDeviation, medianAbsoluteDeviation, quantile and quantiles operators now utilise reservoir sampling to maintain a bounded memory footprint. By default their reservoir size is 8192 items and is configurable. When they are at capacity, they deterministically replace items.

Breaking changes

Breaking changes. Improved device to persistence engine API

  • Renamed DomainManger -> ChartManager, ZoomManager -> DomainManager
  • The batchIngestion API for lines has changed.
  • MouseCapture IDs have been moved to the ChartContainer
  • coalesce now takes an options bag as a second parameter instead of a boolean. It can now have a customisable defaultValue and can optionally not wait for all keys before emitting events via the synchronizeInitial flag.
  • The batch and buffer operator dataOnly arguments have been removed as they now emits Collections.
  • The closestSpacially and closestTemporally arguments have changed, taking an options bag with additional parameters.
  • Custom message processors can be specified with QueryableMessageIDProvider.setCustomMessageProcessor, more details are in the bulk events example.

General fixes and changes

  • The DataSourcePrinter now accepts a styleAccessor to dynamically change its style based on incoming data. A formatter prop has been added to help deliniate from the role of the accessor prop.
  • The Dropdown component now automatically disables its trigger button when the menu itself is disabled.
  • TriggerDomains now support zooming, searching only within the zoom range for a trigger.
  • The TimeAxis can now be aligned to match the behaviour of the TriggerDomain with the align prop.
  • If a TriggerDomain finds no trigger, it is now cleared, as opposed to keeping the old data aronud.
  • WebGL errors will now be caught and displayed in place of the chart.
  • Overall improved type inference for DataFlow operators, including tuple inference for coalesce and interleave operators.
  • QueryableMessageIDProvider no longer provides a batchTime argument, reducing the latency of updates.
  • The interleave operator no longer accepts an options bag as a second parameter.
  • All collection operators now have an options bag for options such as the accessor. The duration option has been removed. Pass a window collection operator for customised window sizing.
  • All collection operators now have a window defined by the authoritative domain by default, instead of a 1 second window.
  • The branch operator has been deprecated. If you're using this functionality, please contact us. We'll be writing a tutorial on making advanced operators from scratch and would like to hear your use cases.
  • closestSpatially now supports indexing multiple events per timestamp.
  • Fixed the continuityAccessor not working for LineCharts.
  • useEventLogger has been updated to use a (data, time, tags) => data format accessor.
  • Serial transport can avoid setting things like rts by passing a blank object to the onAttachmentSettings argument in the serial transport.
  • Legends now hover exclusively, making externally driven hover behaviour easier to setup.
  • The IPC infrastructure has been improved.
  • The Button component now has onMouseDown / onMouseUp writers and callbacks.
  • File save dialogs no longer get hidden behind the window in MacOS.
  • Button sizes in the template header have been made consistent.
  • Updated to Yarn v4.
  • Support for NodeJS 20.
  • Support for TypeScript v5.

Update instructions

An automatic upgrade is available by running the following command in your template directory:

arc upgrade --template [email protected]

Feel free to contact support if you need any help upgrading.