Wishlist for a React Compiler

Michael

Recently I've been thinking about the prospects of a React Compiler, and the potential optimisations it could bring. The following is a collection of thoughts surrounding various techniques I think should be explored in addition to the auto-memoisation process employed by React Forget. Some have already been discussed in public, some appear to be novel - I think these techniques will work best when used in combination with one another.

The majority of these are in the pursuit of increased runtime performance. There will likely be a tradeoff of code size vs runtime performance, ideally these choices and optimisations can happen at a per-component level, allowing the developer to tune their compromises.

As an overview we're going to do three main things:

  1. Break up the component tree into distinct data dependency paths with granular render functions.
  2. Monomorphise and batch leaf node mutations.
  3. Optimise call propagation, bypassing static parts of the tree.

The end result should be the same React API, with fewer unnecessary re-renders, and much faster updates.

Granular render functions

React's component render functions must be pure functions of state, props (and context). React reserves the right to run these render functions at arbitrary times, potentially multiple times without actually committing the results to the DOM.

I think we can take that idea further, and split render functions up into multiple 'granular pieces' of a render function. Each 'grain' (like tree grain, a flow pattern in timber) represents a distinctly different data dependency tree, from state and props to the component.

As an example, suppose we have a Page component that takes no props, but receives state via a hook.

function Page() {
const { isLoggedIn } = useLoginToken()
return (
<>
<Header isLoggedIn={isLoggedIn} />
<Body>...</Body>
<Footer />
</>
)
}

With some static dependency analysis, this can be split up into the following:

function Page__grain_1() {
const { isLoggedIn } = useLoginToken()
return (
<>
<Header isLoggedIn={isLoggedIn} />
</>
)
}
function Page__grain_2() {
return (
<>
<Body>...</Body>
<Footer />
</>
)
}
function Page() {
return (
<>
<Page__grain_1 />
<Page__grain_2 />
</>
)
}

The Header component can be split out, so that any state changes within the Header only affect the Header, rather than causing unnecessary re-renders for sibling components or the Page component as a whole.

The body and footer are static and never need to re-render.

Stepping into the Header component:

function Header({ isLoggedIn }) {
return (
<>
<Link to="/">Home</Link>
<Link to="/pricing">Pricing</Link>
{isLoggedIn ? (
<Link to="/profile">Profile</Link>
) : (
<Link to="/signup">Sign Up</Link>
)}
</>
)
}

The technique can be applied again:

function Header__grain_1() {
return (
<>
<Link to="/">Home</Link>
<Link to="/pricing">Pricing</Link>
</>
)
}
function Header__grain_2({ isLoggedIn }) {
return (
<>
{isLoggedIn ? (
<Link to="/profile">Profile</Link>
) : (
<Link to="/signup">Sign Up</Link>
)}
</>
)
}
function Header({ isLoggedIn }) {
return (
<>
<Header__grain_1 />
<Header__grain_2 isLoggedIn={isLoggedIn} />
</>
)
}

The Header can be split into a static and a dynamic component.

Reaching this leaf component, there are no separable data dependency trees, so the pass has no effect.

function Link({ to, children }) {
return <a href={to}>{children}</a>
}

This pass can be done on a per-component basis, without global knowledge of how the components are put together.

By separating components into these 'grains', changes of state automatically bypass sibling components that are guaranteed to be unaffected by them. There are still unnecessary re-renders as props drill-down, but that can be mitigated with a later technique.

With a compiler that has access to more global, inter-component context, these deep trees could be 'flattened', inlining their calls, reducing their size. There's an open issue in the React repo about this kind of component folding. While React Server Components has an ability to minimise the downside effects of these additional separated trees, static folding helps the runtime performance of the server as well, or on the client where React Server Components can't be run.

Monomorphised batch update functions

During a component render, the React Reconciler computes a diff of the instance (with prepareUpdate), then applies that diff to the instance (with commitUpdate). This next optimisation works at two potential levels, first by making just the computation of that diff easier, and the second by building diff and update functions specialised (monomorphised) to the component.

Lets look at the 'first' level of this optimisation, imagine the following component:

function ColorPicker({ defaultColor, setOnRelease }) {
const [color, setColor] = useState(defaultColor) // call this slot 0
const background = calculateBackground(color)
const markerStyle = calculateMarkerStyle(color)
const eventHandler = useCallback(event => {
/* ... */
}, [setColor, setOnRelease])
return (
<div className="background" style={{ background }}>
<div
className="marker"
style={markerStyle}
onMouseDown={eventHandler}
onMouseUp={eventHandler}
/>
</div>
)
}

The children can only be updated by a change of the state stored in color. We know their diffs only involve the style attribute, and for the background component, we know exactly which key of the style is being affected.

At our first level, our compiler could produce factory functions, tied to changes to this state slot:

function ColorPicker__updater__instance_0__prepare_update(lastArg, arg) {
return {
style: {
background: arg,
},
}
}
function ColorPicker__updater__instance_1__prepare_update(lastArg, arg) {
return {
style: arg,
}
}
function ColorPicker__prepare_update__slot_0(instances, slotState, prevCached) {
const background = calculateBackground(slotState)
const markerStyle = calculateMarkerStyle(slotState)
// These function call are illustrative and not exactly how the glue would work
instances[0].updateQueue.add(
ColorPicker__updater__instance_0__prepare_update(prevCached[0], background),
)
instances[1].updateQueue.add(
ColorPicker__updater__instance_1__prepare_update(
prevCached[1],
markerStyle,
),
)
}

React can then queue updates without having to perform any actual diffs. Equality checks could also happen at this level, inserting branches to check if the values really changed before queuing the update. For the most part we don't need to worry about referential vs deep equality, since at this point in the execution they're almost certainly "close to" primitive values that can be compared referentially.

At our more advanced level, we can ask the reconciler to produce code for this exact kind of update. This would require more buy-in from the reconciler, but I don't think it's out of the question to build a function where we pass the instance type and props to be modified, and it statically produces code that would update those attributes.

function ColorPicker__prepare_update__slot_0(instances, slotState, prevCached) {
const markerStyle = calculateMarkerStyle(slotState)
const background = calculateBackground(slotState)
instances[0].updatePayload = background
instances[1].updatePayload = markerStyle
}
function ColorPicker__commit_update__slot_0(instances) {
// the background div
instances[0].style.background = instances[0].updatePayload
// the marker div
instances[1].style = instances[1].updatePayload
}

These imperative render functions can still be called by the scheduler at appropriate times so that updates stay in sync and are batched together. For the most part we're just calling user code, but different pieces of it at different times, selected by static analysis.

Again this pass can be done on a per-component basis, without global knowledge of how the components are put together.

The same kind of dependency analysis can be done so that elements that share state are updated together. Again, memoisation can happen at the update preparation level, only referential equality checks would be necessary.

While it is possible to also create monomorphised mount and unmount functions, for the initial creation and final teardown of component instances, I think focusing on the update cycle will be more fruitful. There's an open issue suggesting a similar technique, but without the distinction of utilising the reconciler prepareUpdate and commitUpdate functions.

Optimised update call propagation

There has been significant discussion around signals lately as a distinctly different state primitive to React's offerings. I think this discussion has been more 'solution' focused than 'problem' focused. React sometimes does unnecessary re-renders on state updates, and specifying memoisation locations through useMemo hooks and other techniques is additional developer work. Signals can 'fix' this problem, by providing finer grained change notification, and in frameworks like Preact, it can perform imperative updates on change.

Signal-like state management solutions require you to specify your data flow graph separately to the component tree. I don't think this is a win in all cases. Tree-like top-down data flow is much simpler to reason about than separate 'state graphs'. Top-down data flow allows for local reasoning, prevents a class of bugs surrounding infinite loops and circular dependencies, and reduces the potential of diamond shaped dependencies.

With granular render functions, the component tree is re-ordered to co-locate components that rely on the same state. This reduces unnecessary re-renders. By monomorphising update functions, those re-renders are cheaper to do. However we still have the cost of call propagation in this top down manner, prop-drilling through components that don't need to know about the update.

I think it should be possible for an internal signal-like primitive to be setup in React that can handle fine grained change notification, deep through the tree. Components could be statically tagged as being state producers, consumers, 'passthrough' facilitators, any combination of those, or even completely static. React already has us modelling our data flow, it just happens to be done implicitly by modelling our component tree.

In the example below Page__grain_1 is the source of events, Header never needs to actually re-render, it just needs to pass changes through, and Header__grain_2 is the actual recipient of changes.

function useLoginToken() {
// this calls into React's internals, gets a slot
const [token, setToken] = useState(null)
// useEffect call to some singleton that manages the token
return {
token,
isLoggedIn: token !== null,
}
}
function Page__grain_1() {
const { isLoggedIn } = useLoginToken()
return (
<>
<Header isLoggedIn={isLoggedIn} />
</>
)
}
function Header({ isLoggedIn }) {
return (
<>
<Header__grain_1 />
<Header__grain_2 isLoggedIn={isLoggedIn} />
</>
)
}
function Header__grain_2({ isLoggedIn }) {
return (
<>
{isLoggedIn ? (
<Link to="/profile">Profile</Link>
) : (
<Link to="/signup">Sign Up</Link>
)}
</>
)
}

When Page__grain_1 is rendered, the setters of all useState hooks called from the component could be augmented to notify an emitter on change. That emitter could be passed down to children. The Header component on rendering would receive this emitter, but since it doesn't directly use any state, it doesn't subscribe to it, it merely passes it down to its children. Finally the Header__grain_2 component does use the state, and it subscribes to the emitter.

I think this technique would work better if custom hooks could be introspected by the compiler, which would require global context. Ideally data dependencies are statically traced all the way back to their source hooks or props, the more granular the better.

An alternative to this technique would be better component folding, as discussed earlier. If the tree is flattened, there would be less 'passthrough' calls causing unnecessary re-renders.

As an aside there's a nice blog post by lord outlining the nuances of how to compute data dependencies efficiently.

Minor additional optimisations

These are some additional passes that I believe could be included.

Branch unwrapping

With the previous optimisations, it might be advantageous to do a transform where branches in components are 'unwrapped' under specific circumstances, such as identical components with different props.

// Turn this...
function Header__grain_2({ isLoggedIn }) {
return (
<>
{isLoggedIn ? (
<Link to="/profile">Profile</Link>
) : (
<Link to="/signup">Sign Up</Link>
)}
</>
)
}
// ...into this
function Header__grain_2({ isLoggedIn }) {
return (
<>
<Link to={isLoggedIn ? "/profile" : "/signup"}>
{isLoggedIn ? "Profile" : "Sign Up"}
</Link>
</>
)
}

The latter is more optimisable with the monomorphised render update functions, but the former is easier to read for the developer. Ideally we can write the easy to read version, and have the performant version compiled out for us.

Hoisting

It would be nice if object literals could be hoisted out of components so that they are always considered referentially equal, and don't need to be re-created on every render.

function Component() {
return <div style={{ background: "white" }}>Text</div>
}
// becomes
const Component__style__hoisted = { background: "white" }
function Component() {
return <div style={Component__style__hoisted}>Text</div>
}

There's a GitHub issue about this kind of optimisation and a Babel plugin that can perform it. While JS semantics technically allows for the style attribute to be modified (via instance.props.style), I've never seen this in the real world, or frankly outside discussing this exact technique. While there is a babel plugin available, inclusion in an official compiler will hopefully attract more usage.

A note on control flow

I want to write JS for control flow. I think React is particularly powerful with its ability to pass 'anything' as props and keep 'anything' as state, with no wrappers needed.

function Component({ videos }) {
return (
<>
<h2>{videos.length > 1 ? "Videos" : "Video"}</h2>
{videos.map((video) => (
<Video key={video.id} data={video.data} />
))}
</>
)
}

I think keeping this feature is important, as opposed to falling back to using 'DSL-like' components for control flow, such as a <For /> component for iteration, or an <If /> component for branches.

function Component({ videos }) {
return (
<>
<h2>
<If predicate={videos.length > 1} truthy="Videos" falsy="Video" />
</h2>
<For
iterable={videos}
each={(video) => <Video key={video.id} data={video.data} />}
/>
</>
)
}

Final thoughts

I look forward to the release of a React Compiler, not just for the immediate benefits of auto-memoisation, but for the platform to build additional optimisations upon. I'm not sure how much of a performance improvement these techniques will yield, but if it's anything like the benchmarks of my experiments with fast text updating, I'm quite hopeful!