Collections, Tables and Tags
Released on 2024-08-06
This update includes a number of improvements to DataFlow including a new Event tagging system, a new chart type, new annotations, DataFlow controlled tables, more control over domains, perceptual and wall time performance improvements, plus many bug fixes and general improvements.
New features
Collections and Tables
Previously Charts and DataFlow could only produce and consume singular items. If an annotation component needed to draw multiple items at the same time, multiple components would be needed. If an operator required multiple items, conceptually an array could be passed as the 'single item', but maintaining that array was expensive. Each change resulted in a new array with redundant processing. A lack of data was unrepresentable without a union of null
, as there was no ability to distinguish between a lack of an event and an event that might come in the future.
Collections removes these restrictions, allowing for the incremental maintenance of an optionally ordered list of items. Annotations can now render multiple items per component. Windows of data are incrementally maintained with constant time memory and compute cost per event. Collections can be interleaved natively for combining multiple sets of data.
In the example above, power generation data is ingested as a stream and split into daily time spans. Each time span is rendered as a row in the table below the charts. Per technology statistics are calculated, across each day, and combined into daily totals.
Mouse Interactions
The Mouse signal now emits data on a per chart basis, even when the mouse is not hovered over the chart. It now emits drag information for user selected regions, and chart boundaries, for placement of labels on the chart out of the way of data.
The closestPointOnLineSegment
and closestLineSegmentIntersection
operators provide geometric line intersection and accelerated raycasting operations. These allow for precise interactions on sparse data where linear interpolation is valid.
For displaying data next to the mouse explicitly, the new MouseAttached
component reduces the jitter compared to rendering an ElementAnnotation
on the closest point.
New Components
An AreaChart
component has been added. They can be used for typical 'integrated' quantities such as energy in the power generation example above, or for confidence interval style charts such as below:
New annotations in the form of BoxAnnotation
and LineSegmentAnnotation
have been added and used throughout the examples on this page. In the example below, as data is bulk-loaded from a device, areas without data are marked with a faint red box and an icon. The time frontier is marked similarly.
When data is flying by too fast, you can now pause and resume the flow of data with the new ChartPauseButton
component.
Authoritative Domains
Domains can now be named via a domainID
prop. DataSourcePrinter
s can reference them specifically, or the DomainWrapper
component can be used to set a default for a group of components. This makes it simple to point particular DataSourcePrinter
s at different domains of data.
Tags
Metadata can now be attached to events, next to the data itself. Data from multiple devices can be interleaved then filtered later in the pipeline.
<ScatterPlot dataSource={allDeviceSig4} accessor={(data, time, tags) => ({ x: time, y: data })} size={4} colorAccessor={(data, time, tags) => getDeviceIDColor(tags.deviceID)}/>
The new tag
operator lets you apply tags to data inline.
const tagged = tag(dataSource, (data, time, tags) => ({ ...tags, session: 'A',}))
The new search and collect operators contain methods to tag output data given the search query.
closestSpatially(dataTaggedWithPeaks, mouseSignal, { positionAccessor: (data, time, tags) => ({ x: time, y: data }), mapResult: (data, time, tags, position, searchData) => ({ x: position.x, y: position.y, }), tagResult: (data, time, tags, position, searchData) => ({ bboxXMin: searchData.dragXMin, bboxXMax: searchData.dragXMax, bboxYMin: searchData.dragYMin, bboxYMax: searchData.dragYMax, }),})
New operators
Various new operators have been added.
collectTimeSpan
andcollectTimeSpans
select a region or regions based on a time interval.collectValueSpan
andcollectValueSpans
select a region or regions based on a value interval.collectBoundingBox
andcollectBoundingBoxes
select a region or regions based on a bounding box.lineSegments
turns pairs of consecutive point events into a line segment event.closestPointOnLineSegment
selects the closest point on a line segment.closestLineSegmentIntersection
selects the closest line intersection to a point.integral
calculates the trapizoidal area under a curve.window
emits a Collection over time.batch
andbuffer
emit collections, as rolling or fixed batches.head
andtail
emit collections of a fixed number of items, ordered by a value.
Better handling of null data
Many operators now return null
to indicate a lack of a result. Aggregation operators can return null
for mathematically impossible operations. Singular search operators return null
for no events found.
These affect the following statistics operators:
geometricMean
harmonicMean
interquartileRange
min
max
quantile
quantiles
product
rootMeanSquare
standardDeviation
sampleStandardDeviation
sum
variance
sampleVariance
Note that mode
now returns a Collection, instead of an array of modes.
Non-collection search operators now signal a lack of data as a null
value. This fixes previous behaviour where the last search that found data would 'stick' past when it made sense to.
This affects all search operators:
closestByValue
closestLineSegmentIntersection
closestPointOnLineSegment
closestSpatially
closestTemporally
Components now automatically skip these events, and that is represented in the type system to help keep implementations clean.
Previously:
<DataSourcePrinter dataSource={closestPeaks} accessor={closestEvent => closestEvent ? Math.abs(closestEvent.leftX - closestEvent.rightX) : 'nothing selected' }/><ElementAnnotation dataSource={closestPeaks} accessor={(data, time, tags) => data !== null ? { x: (data.leftX + data.rightX) / 2, y: data.mouseData.chartYMin, } : null } visibilitySource={mouseSignal} visibilityAccessor={data => data.hovered || data.hasDragged}>{/* */}</ElementAnnotation>
And now:
<DataSourcePrinter dataSource={closestPeaks} accessor={closestEvent => Math.abs(closestEvent.leftX - closestEvent.rightX)} defaultValue="nothing selected"/><ElementAnnotation dataSource={closestPeaks} accessor={(data, time, tags) => ({ x: (data.leftX + data.rightX) / 2, y: data.mouseData.chartYMin, })} visibilitySource={mouseSignal} visibilityAccessor={data => data.hovered || data.hasDragged}>{/* */}</ElementAnnotation>
Advance scheduling
Previously the emission of events could only occur as a result of events being added. If a sliding window collected events, then discarded them as they left the screen, there was no way to update quantities that changed as a result of those events leaving. As a result, operators like min
and max
would be incorrect until a new event came into the window to trigger an update.
The addition of events can now schedule additional advance
calls later, to efficiently allow for the removal of events in windows, such as above.
Performance improvements
Incremental Rendering
DataFlow operators now have a compute budget per frame. If it is exceeded, then partial batches of results will be rendered, instead of waiting for the entire batch to be complete.
The DataFlow execution engine is also more performant in absolute terms.
Up to date value propagation
Previously DataFlows would defer the advance
stage until a timestamp had passed, in case more events arrived at the exact timestamp about to be processed. This would result in DataFlows appearing to lag one event behind in certain situations.
Event writing and frontier advancement are now separate operations, and frontier advancement is now a declaration that events will not appear before or at the timestamp. As a result, the advance
stage can be called with the most up to date timestamp, and DataFlows no longer lag an event behind.
Intermediate value skipping optimisation
BarChart
s and DataSourcePrinter
s cannot display data faster than the update rate of the screen. DataFlow queries from these components are now optimised to only emit events as fast as the screen update rate. Collection operators such as histogram
and quantile
can defer expensive work until these updates, while internally operating on the higher rate data.
If possible, this optimisation is pushed down to the PersistenceEngine. If only the latest piece of data is required, nothing will be queried except that last piece of data.
Interaction performance
DataFlow performance has been improved. Fast DataFlows can now be executed synchronously, just-in-time for frame paints. Previously they would be executed asynchronously, deferred to after the frame paint, and would therefore always be one frame late at a minimum regardless of execution time.
The previous behaviour is represented in the flamechart below. The pointermove
event is fired offscreen to the left. A WebGL render occurs, then the frame loop fires, where the previous frame's mouse position is used to find the nearest line segment to a point. The result is a mouse based DataFlow drawing events an entire frame late!
The updated behaviour is represented in this flame chart. High priority events such as the mouse move are executed just before the paint, resulting in more responsive mouse based DataFlows.
Collection operator memory optimisations
Collection operators (standardDeviation
, histogram
, etc) with infinite sized windows now have bounded memory cost where they were unbounded before. This primarily improves the performance of fixed zoom queries.
The interquartileRange
, meanAbsoluteDeviation
, medianAbsoluteDeviation
, quantile
and quantiles
operators now utilise reservoir sampling to maintain a bounded memory footprint. By default their reservoir size is 8192 items and is configurable. When they are at capacity, they deterministically replace items.
Breaking changes
Breaking changes. Improved device to persistence engine API
- Renamed DomainManger -> ChartManager, ZoomManager -> DomainManager
- The batchIngestion API for lines has changed.
MouseCapture
IDs have been moved to theChartContainer
coalesce
now takes an options bag as a second parameter instead of aboolean
. It can now have a customisabledefaultValue
and can optionally not wait for all keys before emitting events via thesynchronizeInitial
flag.- The
batch
andbuffer
operatordataOnly
arguments have been removed as they now emits Collections. - The
closestSpacially
andclosestTemporally
arguments have changed, taking an options bag with additional parameters. - Custom message processors can be specified with
QueryableMessageIDProvider.setCustomMessageProcessor
, more details are in the bulk events example.
General fixes and changes
- The
DataSourcePrinter
now accepts astyleAccessor
to dynamically change its style based on incoming data. Aformatter
prop has been added to help deliniate from the role of theaccessor
prop. - The Dropdown component now automatically disables its trigger button when the menu itself is disabled.
- TriggerDomains now support zooming, searching only within the zoom range for a trigger.
- The TimeAxis can now be aligned to match the behaviour of the TriggerDomain with the
align
prop. - If a TriggerDomain finds no trigger, it is now cleared, as opposed to keeping the old data aronud.
- WebGL errors will now be caught and displayed in place of the chart.
- Overall improved type inference for DataFlow operators, including tuple inference for coalesce and interleave operators.
QueryableMessageIDProvider
no longer provides abatchTime
argument, reducing the latency of updates.- The
interleave
operator no longer accepts an options bag as a second parameter. - All collection operators now have an options bag for options such as the accessor. The
duration
option has been removed. Pass awindow
collection operator for customised window sizing. - All collection operators now have a window defined by the authoritative domain by default, instead of a 1 second window.
- The
branch
operator has been deprecated. If you're using this functionality, please contact us. We'll be writing a tutorial on making advanced operators from scratch and would like to hear your use cases. closestSpatially
now supports indexing multiple events per timestamp.- Fixed the
continuityAccessor
not working for LineCharts. - useEventLogger has been updated to use a
(data, time, tags) => data
format accessor. - Serial transport can avoid setting things like
rts
by passing a blank object to theonAttachmentSettings
argument in the serial transport. - Legends now hover exclusively, making externally driven hover behaviour easier to setup.
- The IPC infrastructure has been improved.
- The Button component now has
onMouseDown
/onMouseUp
writers and callbacks. - File save dialogs no longer get hidden behind the window in MacOS.
- Button sizes in the template header have been made consistent.
- Updated to Yarn v4.
- Support for NodeJS 20.
- Support for TypeScript v5.
Update instructions
An automatic upgrade is available by running the following command in your template directory:
arc upgrade --template [email protected]
Feel free to contact support if you need any help upgrading.