<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Of Rabbits and Holes</title><link>https://ergeysay.github.io/</link><description>Recent content on Of Rabbits and Holes</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Tue, 26 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ergeysay.github.io/index.xml" rel="self" type="application/rss+xml"/><item><title>Safe Made Easy Pt.1: Single Ownership is (Not) Optional</title><link>https://ergeysay.github.io/safe-made-easy-pt1.html</link><pubDate>Tue, 26 May 2026 00:00:00 +0000</pubDate><guid>https://ergeysay.github.io/safe-made-easy-pt1.html</guid><description>&lt;ul>
&lt;li>&lt;a href="#intro">Intro&lt;/a>&lt;/li>
&lt;li>&lt;a href="#what-it-promises-and-what-it-doesnt">What it promises and what it doesn&amp;rsquo;t&lt;/a>&lt;/li>
&lt;li>&lt;a href="#motivating-example">Motivating example&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-proposal">The proposal&lt;/a>&lt;/li>
&lt;li>&lt;a href="#linear-drops">Linear drops&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-formal-background">The formal background&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-rules-so-far">The rules so far&lt;/a>&lt;/li>
&lt;li>&lt;a href="#conclusion">Conclusion&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="intro">Intro&lt;/h2>
&lt;p>This post introduces an approach to memory safety that I believe is more practical and more ergonomic than the available alternatives.&lt;/p>
&lt;p>It all started way back when, and was inspired by things I read and wrote:&lt;/p>
&lt;ul>
&lt;li>Attempts to bolt linear types on top of Rust &lt;a href="https://faultlore.com/blah/linear-rust/">(1)&lt;/a>, &lt;a href="https://blog.yoshuawuyts.com/linearity-and-control/">(2)&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://faultlore.com/blah/everyone-poops/">Leakpocalypse&lt;/a>&lt;/li>
&lt;li>Verdagon&amp;rsquo;s post on &lt;a href="https://verdagon.dev/blog/higher-raii-uses-linear-types">Vale design and higher RAII&lt;/a>&lt;/li>
&lt;li>A lot, and I mean A LOT, of TypeScript&lt;/li>
&lt;/ul>
&lt;p>Three years of development later, I believe I finally got it. The proposal is complete.&lt;/p></description><content:encoded><![CDATA[<ul>
<li><a href="#intro">Intro</a></li>
<li><a href="#what-it-promises-and-what-it-doesnt">What it promises and what it doesn&rsquo;t</a></li>
<li><a href="#motivating-example">Motivating example</a></li>
<li><a href="#the-proposal">The proposal</a></li>
<li><a href="#linear-drops">Linear drops</a></li>
<li><a href="#the-formal-background">The formal background</a></li>
<li><a href="#the-rules-so-far">The rules so far</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
<h2 id="intro">Intro</h2>
<p>This post introduces an approach to memory safety that I believe is more practical and more ergonomic than the available alternatives.</p>
<p>It all started way back when, and was inspired by things I read and wrote:</p>
<ul>
<li>Attempts to bolt linear types on top of Rust <a href="https://faultlore.com/blah/linear-rust/">(1)</a>, <a href="https://blog.yoshuawuyts.com/linearity-and-control/">(2)</a></li>
<li><a href="https://faultlore.com/blah/everyone-poops/">Leakpocalypse</a></li>
<li>Verdagon&rsquo;s post on <a href="https://verdagon.dev/blog/higher-raii-uses-linear-types">Vale design and higher RAII</a></li>
<li>A lot, and I mean A LOT, of TypeScript</li>
</ul>
<p>Three years of development later, I believe I finally got it. The proposal is complete.</p>
<p>Moreover, I have implemented it in my own programming language I intend to release soon-ish, and I want to share the design decisions and the entire path from &ldquo;huh, why not&rdquo; to &ldquo;omg it&rsquo;s live&rdquo;.</p>
<p>So, TL;DR: linear types (which are dropped exactly once) + abstract interpretation + a bunch of tricks allows us to eliminate the same classes of bugs as Rust does (at least in non-concurrent environments) <em>plus</em> memory leaks, and we can extend the approach to <em>also</em> cover concurrent environments, all the while being <em>more</em> ergonomic and less restrictive.</p>
<p>Sounds fun? Let&rsquo;s dig in.</p>
<h2 id="what-it-promises-and-what-it-doesnt">What it promises and what it doesn&rsquo;t</h2>
<ul>
<li>
<p>It is <em>safe</em> - it completely eliminates entire classes of bugs, such as:</p>
<ul>
<li>Double-free</li>
<li>Use-after-free</li>
<li>Dangling pointers</li>
<li>Null pointer dereferences</li>
<li>Buffer overflows</li>
<li>Out-of-bounds accesses</li>
<li>Iterator invalidation</li>
<li>Uninitialized memory access</li>
<li>Memory leaks</li>
</ul>
<p>Single ownership enables linearity - each value is dropped exactly once - and prohibits ownership cycles. Together with the flow-sensitive type system built to enforce it, these eliminate most of the above. Buffer overflows and OOB accesses are covered separately, but the mechanics of the rest of the system make dealing with these easy and efficient.</p>
</li>
<li>
<p>It is <em>sound</em> - I will demonstrate over the course of this series that the claims hold for arbitrary inputs. There are no holes that can be used to break the guarantees provided from inside the system.</p>
</li>
<li>
<p>It is <strong>NOT</strong> simple - there is a fairly large number of primitives working together so that the whole system can uphold the safety guarantees promised.</p>
</li>
<li>
<p>It is <strong>NOT</strong> concerned with concurrency - though the &ldquo;fearless concurrency&rdquo; guarantees are a natural extension to the proposed system, it has not been implemented in a complete enough way to demonstrate the viability of the approach. I will expand on this in a future post once I get it up and running.</p>
</li>
<li>
<p>It is <strong>NOT</strong> claiming to be &ldquo;zero cost&rdquo;, though it keeps runtime overhead to the minimum - it introduces runtime checks (a single branch per indeterminate access) if the compiler cannot statically prove availability.</p>
</li>
</ul>
<h2 id="motivating-example">Motivating example</h2>
<p>Consider this pseudocode:</p>





<pre tabindex="0"><code>var x: T = new T;
if random() &gt; 0.5 {
    drop x;
}
print(x);</code></pre><p>What this code does is it <em>conditionally</em> consumes a value.</p>
<p>There are two ways this could go in a real language. C++ doesn&rsquo;t particularly care and will happily compile this code:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-c++" data-lang="c++"><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;cstdlib&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;cstdio&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp"></span><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kt">int</span> <span class="o">*</span><span class="n">i</span> <span class="o">=</span> <span class="k">new</span> <span class="kt">int</span><span class="p">(</span><span class="mi">42</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">((</span><span class="kt">double</span><span class="p">)</span><span class="n">rand</span><span class="p">()</span> <span class="o">/</span> <span class="n">RAND_MAX</span> <span class="o">&gt;</span> <span class="mf">0.5</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">i</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">printf</span><span class="p">(</span><span class="s">&#34;i=%d</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="o">*</span><span class="n">i</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Which will then proceed to invoke UB in about 50% of runs. A modern C++ developer would reach for <code>std::unique_ptr</code> and <code>std::optional</code> here - and they would help, partially. RAII via smart pointers eliminates the manual <code>delete</code>, and <code>optional</code> gives you a way to represent &ldquo;maybe moved.&rdquo; But <code>unique_ptr</code> only manages heap-allocated objects, and the type system does not <em>enforce</em> the optional check - <code>operator*</code> on an empty optional is undefined behavior, and even <code>.value()</code> only gives you a runtime exception instead of a compile-time error. It is still on you to remember.</p>
<p>In Rust, though, this code does not compile at all:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-rust" data-lang="rust"><span class="line"><span class="cl"><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="kd">let</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Box</span>::<span class="n">new</span><span class="p">(</span><span class="mi">42</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">if</span><span class="w"> </span><span class="n">rand</span>::<span class="n">random</span>::<span class="o">&lt;</span><span class="kt">f64</span><span class="o">&gt;</span><span class="p">()</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mf">0.5</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nb">drop</span><span class="p">(</span><span class="n">x</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="fm">println!</span><span class="p">(</span><span class="s">&#34;</span><span class="si">{}</span><span class="s">&#34;</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">);</span><span class="w"> </span><span class="c1">// error[E0382]: borrow of moved value: `x`
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">}</span></span></span></code></pre></div><p>Rust takes a very different approach. The compiler tracks moves through control flow - it sees that <code>x</code> <em>might</em> have been moved in the <code>if</code> branch, and rejects the program outright. Rust&rsquo;s ownership model requires that every variable&rsquo;s move state is statically known at every point in the program - a conditionally-moved value violates that requirement, so the program is rejected. You <em>can</em> wrap the value in <code>Option&lt;T&gt;</code> yourself and <code>.take()</code> it manually, but Rust won&rsquo;t do that for you - the burden is on the developer to restructure the code upfront.</p>
<p>So, what if there was a third way between these two?</p>
<h2 id="the-proposal">The proposal</h2>
<p>The proposed solution is straightforward:</p>





<pre tabindex="0"><code>var x: T = new T;
if rand() &gt; 0.5f {
    drop x;
}
// &lt;- At this point, typeof(x) is Option&lt;T&gt;</code></pre><p>The type of the value is now control-flow-dependent - the compiler evaluates it as it goes through the program, <em>widening</em> it each time control flow diverges to accommodate for <em>both</em> possibilities. Then it becomes the developer responsibility to <em>narrow</em> it down when they want to use it:</p>





<pre tabindex="0"><code>var x: T = new T;
if rand() &gt; 0.5f {
    drop x;
}   // x is _widened_ to an Option&lt;T&gt;
if x {
    // x is definitely available in this branch and can be used
} else {
    // x is definitely not available
}   // x is _widened_ to an Option&lt;T&gt; again</code></pre><p>One way to view this is to consider which information is available at the compiler at various points:</p>
<ul>
<li>First conditional statement makes the compiler <em>lose</em> information on availability of <code>x</code>, which is expressed by the type system as widening type of <code>x</code> to <code>Option&lt;T&gt;</code></li>
<li>Second conditional statement provides information to the compiler - in each branch of the statement, <code>x</code> has a definite availability</li>
<li>But after the second conditional statement we are back to the state where the information is not available</li>
</ul>
<p>Compared to C++ approach, we now force the developer to consider the state space explicitly and avoid the crash, because the typechecker will catch all attempts to use an <code>Option&lt;T&gt;</code> where a <code>T</code> should be used, or to use a definitely non-available value.</p>
<p>Compared to Rust approach, we gain flexibility at a cost of a runtime check - a single null/tag comparison at the point of refinement.</p>
<p>It should be noted that this is not a new idea. If anything, one of the most popular languages in the world, TypeScript, does exactly that. However, TypeScript compiles to JavaScript - a language with garbage collection and shared ownership, which does not concern itself with lifetimes, memory or resource management issues, or concurrency, which are all something I need to cover.</p>
<p>This is just the tip of the iceberg, the very beginning of the system. When we cover more ground - aggregates, references, function calls, dynamic dispatch, lambdas and closures - it will grow to accommodate new requirements.</p>
<h2 id="linear-drops">Linear drops</h2>
<p>One more thing that I also wanted is that each value is guaranteed to be dropped exactly once. This is known as <em>linear typing</em>. When linear typing is discussed, the guarantee is usually formulated as &ldquo;used exactly once&rdquo;, but what constitutes a <em>use</em> can vary. In my case, use == drop.</p>
<p>With the proposed approach it becomes trivially simple:</p>
<ul>
<li>If a value is of an owned type <code>T</code>, it is being dropped when it is either going out of scope or its owner is going out of scope, or manually - which transitions its type to <code>None</code></li>
<li>If a value has an indeterminate availability - i.e. it is of type <code>Option&lt;T&gt;</code> where <code>T</code> is an owned type - the compiler inserts a runtime check and a conditional drop instead at the end of the scope, but these can also be dropped manually</li>
</ul>
<p>This raises a question about the refined branch:</p>





<pre tabindex="0"><code>var x: T = new T;
if rand() &gt; 0.5f {
    drop x;
}   // x is _widened_ to an Option&lt;T&gt;
if x {
    // x is definitely available in this branch and can be used,
    // but what type is it?
} else {
    // x is None
}   // x is _widened_ to an Option&lt;T&gt; again</code></pre><p>Which type, exactly, will <code>x</code> have when it is refined to something available in the second conditional statement?</p>
<p>If it&rsquo;s refined to <code>T</code>, it will be dropped at the end of the scope of the <code>if</code> branch. That would be silly - a given value could only ever be refined once, used, then immediately dropped when the branch ends. Safe, but draconian.</p>
<p>Instead, we define a new type - <code>Some&lt;T&gt;</code> - whose only purpose is to <em>avoid</em> being dropped, or to serve as <em>proof of availability without taking ownership</em>.</p>
<p>This is one of the many types that are handled by the type checker in a special way; namely:</p>
<ul>
<li>It is not automatically dropped when it goes out of scope - that&rsquo;s the whole point</li>
<li>It cannot be constructed except by refining an <code>Option&lt;T&gt;</code> - constructing it from a <code>T</code> would move ownership in, and since <code>Some&lt;T&gt;</code> is not auto-dropped, the value would leak</li>
<li>It is <em>dependent</em> on the original <code>Option&lt;T&gt;</code> - if the value contained in the unwrapped <code>Option&lt;T&gt;</code> is somehow dropped, the refined view of that option should not be usable. I will cover dependencies and how they help us in a later post</li>
<li>Conversely, explicitly dropping the <code>Option&lt;T&gt;</code> invalidates the <code>Some&lt;T&gt;</code> - the dependency is bidirectional</li>
<li>It can be freely used in all the same places as <code>T</code>. The developer can explicitly drop it - this consumes the underlying value and sets the original <code>Option&lt;T&gt;</code> to <code>None</code></li>
</ul>
<p>As I said - definitely <em>not</em> simple. But not that complicated either.</p>
<h2 id="the-formal-background">The formal background</h2>
<p>The approach described above has roots in several well-established areas of programming language theory.</p>
<p><strong>Flow-sensitive typing</strong> allows the type of a variable to change as the program executes. Most type systems are flow-<em>insensitive</em> - a variable declared as <code>T</code> stays <code>T</code> for its entire scope. Flow-sensitive systems, such as TypeScript&rsquo;s control flow narrowing, track how types evolve along different execution paths. What we add is applying this to <em>ownership</em> - the availability of a value is part of its type, and that availability changes as the program flows through moves, drops, and conditional branches.</p>
<p><strong>Refinement types</strong> allow types to be narrowed by predicates. When we write <code>if x { ... }</code>, we are refining the type of <code>x</code> from <code>Option&lt;T&gt;</code> to <code>Some&lt;T&gt;</code> (or to <code>None</code> in the else branch). This is a direct application of refinement typing - the conditional acts as a proof that the value is available, and the type system reflects that proof.</p>
<p>There are several parts of the system that <em>link</em> values and their types during compilation - <code>Some&lt;T&gt;</code> depends on the <code>Option&lt;T&gt;</code> it was refined from, and as we will see in later posts, references depend on the values they point to. This is related to <strong>dependent typing</strong> in the limited sense that type validity depends on specific program values and ownership relationships. The system does not attempt full dependent typing in the Idris or Agda sense, but it does track value-to-type dependencies across function boundaries and through control flow.</p>
<p><strong>Abstract interpretation</strong> provides the unifying framework. What the compiler does is interpret the program abstractly in the <em>availability domain</em> - instead of computing actual values, it computes whether each variable is definitely available, definitely unavailable, or indeterminate. Branch joins widen the state, and conditionals narrow it. This is a standard abstract interpretation over a simple lattice: <code>T</code> (available) and <code>None</code> (unavailable) are the precise states, <code>Option&lt;T&gt;</code> is their join.</p>
<p>It is worth noting that availability is not the only domain the compiler interprets in. Later posts will introduce additional domains - each with its own lattice - interpreted over the same control flow structure. Dependency tracking, reference validity, and ownership of aggregates all follow the same abstract-interpretation approach.</p>
<h2 id="the-rules-so-far">The rules so far</h2>
<p>I will maintain a running list of the rules, invariants, and behaviors of the system as we go. Each post will add to it. This list may seem ad-hoc and chaotic because <em>there is no single syntactic trick that everything falls out of</em> - there are several underlying principles playing together, which generate many small rules.</p>
<p>I warned you at the beginning that this is <em>not</em> simple.</p>
<p>Here&rsquo;s the thing, though: you have to uphold these rules for safety <em>anyway</em> - in C++ you do it in your head, in Rust you do it by fighting the borrow checker. All we do here is mechanically shift the responsibility from the developer to the compiler. Each rule is grounded in well-established theory - we just apply it in a way that doesn&rsquo;t require the developer to think about it.</p>
<ol start="0">
<li>Types are partitioned into <em>owned</em> and <em>plain</em>. Owned types require destruction; plain types (integers, booleans, floats) do not</li>
<li>Each owned value has exactly one owner</li>
<li>Each owned value is dropped exactly once - when it or its owner goes out of scope, when it is explicitly dropped, or conditionally if its availability is indeterminate</li>
<li>A conditionally-dropped owned value is widened to <code>Option&lt;T&gt;</code></li>
<li><code>Option&lt;T&gt;</code> is itself an owned type when <code>T</code> is owned</li>
<li><code>Option&lt;T&gt;</code> can be refined to <code>Some&lt;T&gt;</code> or <code>None</code> by a conditional check</li>
<li><code>Some&lt;T&gt;</code> is non-owning - refinement does not transfer ownership</li>
<li><code>Some&lt;T&gt;</code> obtained by refining <code>Option&lt;T&gt;</code> and the source <code>Option&lt;T&gt;</code> <em>depend</em> on each other</li>
<li>Refinement is not sticky - it expires when control flow rejoins</li>
</ol>
<h2 id="conclusion">Conclusion</h2>
<p>So this is the end of the beginning, and there&rsquo;s much, much more left to cover.</p>
<p>In the next post I will generalize this approach to aggregates such as records and arrays, and introduce references to allow sharing values without transferring ownership.</p>
]]></content:encoded></item><item><title>Optimising interpreters: fusion</title><link>https://ergeysay.github.io/optimizing-interpreters-fusion.html</link><pubDate>Sun, 01 Sep 2024 00:00:00 +0000</pubDate><guid>https://ergeysay.github.io/optimizing-interpreters-fusion.html</guid><description>&lt;ul>
&lt;li>&lt;a href="#preface">Preface&lt;/a>&lt;/li>
&lt;li>&lt;a href="#the-model-task">The model task&lt;/a>&lt;/li>
&lt;li>&lt;a href="#worlds-simplest-interpreter">World&amp;rsquo;s simplest interpreter&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#base-node-and-constants">Base node and constants&lt;/a>&lt;/li>
&lt;li>&lt;a href="#arithmetic-comparison-and-conditional-execution">Arithmetic, comparison and conditional execution&lt;/a>&lt;/li>
&lt;li>&lt;a href="#functions-calls-return-statements-and-argument-access">Functions, calls, return statements and argument access&lt;/a>&lt;/li>
&lt;li>&lt;a href="#what-about-the-context">What about the context?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#calling-it-done">Calling it done&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#basic-optimization">Basic optimization&lt;/a>&lt;/li>
&lt;li>&lt;a href="#going-deeper">Going deeper&lt;/a>&lt;/li>
&lt;li>&lt;a href="#simplifying-calls">Simplifying calls&lt;/a>&lt;/li>
&lt;li>&lt;a href="#what-about-clang">What about clang?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#drawing-the-rest-of-the-eval">Drawing the rest of the eval&lt;/a>&lt;/li>
&lt;li>&lt;a href="#conclusion">Conclusion&lt;/a>&lt;/li>
&lt;li>&lt;a href="#appendix-methodology-of-benchmarks">Appendix: methodology of benchmarks&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="preface">Preface&lt;/h2>
&lt;p>Let me start with making an assumption that everyone has at some point in their life encountered a humble tree-walking interpreter.&lt;/p></description><content:encoded><![CDATA[<ul>
<li><a href="#preface">Preface</a></li>
<li><a href="#the-model-task">The model task</a></li>
<li><a href="#worlds-simplest-interpreter">World&rsquo;s simplest interpreter</a>
<ul>
<li><a href="#base-node-and-constants">Base node and constants</a></li>
<li><a href="#arithmetic-comparison-and-conditional-execution">Arithmetic, comparison and conditional execution</a></li>
<li><a href="#functions-calls-return-statements-and-argument-access">Functions, calls, return statements and argument access</a></li>
<li><a href="#what-about-the-context">What about the context?</a></li>
<li><a href="#calling-it-done">Calling it done</a></li>
</ul>
</li>
<li><a href="#basic-optimization">Basic optimization</a></li>
<li><a href="#going-deeper">Going deeper</a></li>
<li><a href="#simplifying-calls">Simplifying calls</a></li>
<li><a href="#what-about-clang">What about clang?</a></li>
<li><a href="#drawing-the-rest-of-the-eval">Drawing the rest of the eval</a></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#appendix-methodology-of-benchmarks">Appendix: methodology of benchmarks</a></li>
</ul>
<h2 id="preface">Preface</h2>
<p>Let me start with making an assumption that everyone has at some point in their life encountered a humble tree-walking interpreter.</p>
<p>Between older implementations of popular programming languages, expression evaluators embedded in user-facing tools, and home-grown almost-complete-Lisp interpreters, these can still be found everywhere.</p>
<p>This type of interpreter is considered to be the simplest and the slowest one. Oftentimes, you may see people rewriting their interpreters to bytecode-based ones just to get a bit more performance.</p>
<p>However, recently a project named <a href="https://dascript.org/">daScript</a> (apparently <a href="https://borisbat.github.io/dascf-blog/2023/12/28/daslang-it-is/">renamed to Daslang</a>) came into my attention. It is a very fast programming language, boasting one of the fastest interpreters I ever saw.</p>
<p>What is more interesting is that this interpreter is tree-based, and yet it still manages to outperform bytecode-based interpreters with ease.</p>
<p>I got nerd-sniped and set out to investigate why it is so fast, and I found that it uses a pretty interesting optimisation technique that authors call <em>fusion</em>. As always, it turned out that the idea has already been explored and is also known as <a href="https://kar.kent.ac.uk/93936/">supernodes</a> or, in the context of bytecode interpreters, as <a href="https://www2.cs.arizona.edu/~collberg/Teaching/553/2011/Resources/superops.pdf">superoperators</a>.</p>
<p>In this post, I will explain how it works by optimising a tiny model tree-walking interpreter step-by-step. There will be a lot of benchmarks, and you can find the <a href="#appendix-methodology-of-benchmarks">methodology</a> at the end of this post.</p>
<p><strong>Disclaimer:</strong> this post borrows <em>heavily</em> from the daScript implementation. All ideas I will explain here originally belong to daScript authors, and any mistakes you may find are mine. That said, the implementation is completely new and does not include daScript source in any form.</p>
<h2 id="the-model-task">The model task</h2>
<p>We are going to use the Fibonacci function. Our humble hero looks like this:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="kt">uint32_t</span> <span class="nf">fib</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">&lt;</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">i</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">fib</span><span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fib</span><span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">2</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>It may look simple, but it has a lot going on. Consider what an interpreter should implement to be able to evaluate it:</p>
<ul>
<li>Function calls</li>
<li>Recursive function calls</li>
<li>Argument accesses</li>
<li>Two different arithmetic operators</li>
<li>A comparison operator</li>
<li>Conditional control flow</li>
</ul>
<p>And this is what makes it a good model task: it is short and simple to implement in any language, and it is sufficiently complex at the same time to impose certain requirements on the environment it is being implemented in.</p>
<p>With this model in mind, let&rsquo;s consider what exactly we will be measuring. We only have most basic evaluation primitives in play - there is no memory management, no type checks, just <em>raw evaluation</em>, so we will be measuring just that - performance of evaluation primitives. The benchmark will say nothing about the performance of each primitive used but will be able to give us a rough impression of which approach performs better.</p>
<p>I put all implementations in a single source file; calls to each implementation are guarded with a particular preprocessor directive in the source to avoid parsing command line flags in order to keep things simple.</p>
<p>Let&rsquo;s put the code above to the test and calculate <code>fib(42)</code>:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">&gt; cl_build.bat /DBASELINE oif.cpp
</span></span><span class="line"><span class="cl">&gt; hyperfine oif.exe
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Benchmark 1: oif.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:     803.2 ms ±   4.7 ms    <span class="o">[</span>User: 591.9 ms, System: 1.5 ms<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:   797.1 ms … 814.2 ms    <span class="m">10</span> runs</span></span></code></pre></div><h2 id="worlds-simplest-interpreter">World&rsquo;s simplest interpreter</h2>
<p>As I mentioned earlier, the venerable <em>tree-walking interpreter</em> is the simplest and arguably easiest to implement type of evaluator.</p>
<p>A quick recap on how it works:</p>
<ul>
<li>The interpreter operates on a <em>tree</em> consisting of <em>nodes</em>, which may or may not have child nodes</li>
<li><em>Evaluating</em> a node yields a result</li>
<li>If a node has any children, they have to be evaluated first before the parent node itself could be evaluated (thus the interpreter <em>walks</em> the <em>tree</em> recursively in depth-first order)</li>
<li>Evaluating the root node yields the result of evaluation of the entire program</li>
</ul>
<p>In the scope of this post, I will assume the following:</p>
<ul>
<li>There is no type checking during evaluation</li>
<li>There is no memory management</li>
<li>There are no other extraneous checks</li>
<li>There is only one supported type, unsigned 32-bit integer (<code>uint32_t</code>)</li>
</ul>
<p>Let&rsquo;s imagine what our <code>fib()</code> function would look like if implemented in such an interpreter:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="kt">uint32_t</span> <span class="nf">fib</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Context</span> <span class="n">ctx</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">Function</span><span class="o">*</span> <span class="n">function</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Function</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">function</span><span class="o">-&gt;</span><span class="n">init</span><span class="p">({</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">IfNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">LessNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">2</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">ReturnNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">())),</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">ReturnNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">AddNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">                <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="k">new</span> <span class="n">SubNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">1</span><span class="p">))),</span>
</span></span><span class="line"><span class="cl">                <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="k">new</span> <span class="n">SubNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">2</span><span class="p">)))))</span>
</span></span><span class="line"><span class="cl">    <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">CallNode</span><span class="o">*</span> <span class="n">call</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="n">n</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">result</span> <span class="o">=</span> <span class="n">call</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">delete</span> <span class="n">function</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="k">delete</span> <span class="n">call</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>This is an exact mapping from our C++ implementation to our imaginary tree-walking interpreter: it implements the same operations our C++ implementation does, and performs them in the same order.</p>
<p>Now, let&rsquo;s try to actually make it work.</p>
<h3 id="base-node-and-constants">Base node and constants</h3>
<p>The very first thing we will do is to define the basic shape of a node:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">virtual</span> <span class="o">~</span><span class="n">Node</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">virtual</span> <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>There&rsquo;s really not that much to it. It&rsquo;s just a class with a single virtual method <code>eval()</code> that accepts a mutable global interpreter state (also known as <em>context</em>) - more on that later.</p>
<p>The <code>ConstNode</code> is not that interesting either - it just evaluates to the value it holds:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">ConstNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">value</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">ConstNode</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">value</span><span class="p">)</span> <span class="o">:</span> <span class="n">value</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">value</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><h3 id="arithmetic-comparison-and-conditional-execution">Arithmetic, comparison and conditional execution</h3>
<p>These are a bit more complex since they have two child nodes, which should be recursively evaluated before the value of the node itself can be computed, but these are not terribly complicated either:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">AddNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">AddNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">,</span> <span class="n">Node</span><span class="o">*</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">:</span> <span class="n">lhs</span><span class="p">(</span><span class="n">lhs</span><span class="p">),</span> <span class="n">rhs</span><span class="p">(</span><span class="n">rhs</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">AddNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">lhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span> <span class="o">+</span> <span class="n">rhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">SubNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">SubNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">,</span> <span class="n">Node</span><span class="o">*</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">:</span> <span class="n">lhs</span><span class="p">(</span><span class="n">lhs</span><span class="p">),</span> <span class="n">rhs</span><span class="p">(</span><span class="n">rhs</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">SubNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">lhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span> <span class="o">-</span> <span class="n">rhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">LessNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">LessNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">LessNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">,</span> <span class="n">Node</span><span class="o">*</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">:</span> <span class="n">lhs</span><span class="p">(</span><span class="n">lhs</span><span class="p">),</span> <span class="n">rhs</span><span class="p">(</span><span class="n">rhs</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">lhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">rhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>Now that we have a comparison node, it would make sense to implement a conditional execution node to be able to use it:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">IfNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">condition</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">body</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">IfNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">condition</span><span class="p">,</span> <span class="n">Node</span><span class="o">*</span> <span class="n">body</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="o">:</span> <span class="n">condition</span><span class="p">(</span><span class="n">condition</span><span class="p">),</span> <span class="n">body</span><span class="p">(</span><span class="n">body</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">IfNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">condition</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">body</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">condition</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">))</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="n">body</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>I omitted implementation of the <code>else</code> branch, as it is not needed by our first implementation, but we will return to it just a bit later.</p>
<h3 id="functions-calls-return-statements-and-argument-access">Functions, calls, return statements and argument access</h3>
<p>Next, let&rsquo;s implement function calls and related nodes. This is arguably the most complex part of the entire interpreter, but as you will see, it&rsquo;s actually pretty straightforward.</p>
<p>In our simple interpreter a function is just a list of nodes of a fixed length:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">Function</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">**</span> <span class="n">body</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">numNodes</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">Function</span><span class="p">()</span> <span class="o">:</span> <span class="n">body</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">numNodes</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">void</span> <span class="nf">init</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">initializer_list</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">*&gt;</span> <span class="n">body</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">numNodes</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)</span> <span class="n">body</span><span class="p">.</span><span class="n">size</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">        <span class="k">this</span><span class="o">-&gt;</span><span class="n">body</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Node</span> <span class="o">*</span> <span class="p">[</span><span class="n">numNodes</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="nl">statement</span> <span class="p">:</span> <span class="n">body</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">this</span><span class="o">-&gt;</span><span class="n">body</span><span class="p">[</span><span class="n">i</span><span class="o">++</span><span class="p">]</span> <span class="o">=</span> <span class="n">statement</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">Function</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">numNodes</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">delete</span> <span class="n">body</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span><span class="p">[]</span> <span class="n">body</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>As you can see, a function itself is not a node, and as such cannot be evaluated. Instead, to actually evaluate it we will use a different node representing a function <em>call</em>:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">CallNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Function</span><span class="o">*</span> <span class="n">function</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">arg</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">CallNode</span><span class="p">(</span><span class="n">Function</span><span class="o">*</span> <span class="n">function</span><span class="p">,</span> <span class="n">Node</span><span class="o">*</span> <span class="n">arg</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="o">:</span> <span class="n">function</span><span class="p">(</span><span class="n">function</span><span class="p">),</span> <span class="n">arg</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">CallNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">arg</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">==</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">kStackSize</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span><span class="p">]</span> <span class="o">=</span> <span class="n">arg</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">end_i</span> <span class="o">=</span> <span class="n">function</span><span class="o">-&gt;</span><span class="n">numNodes</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">end_i</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="n">function</span><span class="o">-&gt;</span><span class="n">body</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stopForReturn</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="k">break</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stopForReturn</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">returnValue</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>Let&rsquo;s unpack what is happening here.</p>
<p>A <code>CallNode</code> represents a unary function call, that is, a call of a function that only takes a single argument. This argument is evaluated and put on the stack in order for us to be able to access it later when evaluating the function body. Use of the stack will also allow us to create recursive functions.</p>
<p>To evaluate a function, we just iterate over the nodes of the function&rsquo;s body and evaluate them sequentially. That would be pretty much it, except we also need to support the return statement and ability to return values from the function.</p>
<p>In order to do that, we need two pieces of state:</p>
<ul>
<li>a flag which indicates that the return statement has been encountered and we need to stop the evaluation and return control to the caller, and</li>
<li>a field that holds the value that we should return.</li>
</ul>
<p>We reset the former before exiting to ensure that the subsequent function calls will not return immediately, and we use the latter as the actual result of the evaluation of the call.</p>
<p>Let&rsquo;s implement the <code>ReturnNode</code>:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">ReturnNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">ReturnNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">:</span> <span class="n">rhs</span><span class="p">(</span><span class="n">rhs</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">ReturnNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">returnValue</span> <span class="o">=</span> <span class="n">rhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stopForReturn</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="c1">// Since we pass the result in the ctx-&gt;returnValue field,
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>        <span class="c1">// we don&#39;t need to return anything here
</span></span></span><span class="line"><span class="cl"><span class="c1"></span>        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>It evaluates the right-hand-side node, puts the result into the context, and signals that a return statement has been encountered.</p>
<p>Last, but not least, we need to implement a way to access function arguments:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">ArgNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">ArgNode</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">-</span> <span class="mi">1</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p><code>ArgNode</code> just returns the topmost value on the stack. This is enough since we only deal with unary functions; it would be just a tad more complicated if we were to support multiple arguments.</p>
<h3 id="what-about-the-context">What about the context?</h3>
<p>Now that we know which state we need to be available for all nodes, we can put it all in a simple, neat struct.</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">Context</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">kStackSize</span> <span class="o">=</span> <span class="mi">4096</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">bool</span> <span class="n">stopForReturn</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">returnValue</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span><span class="o">*</span> <span class="n">stack</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">stackTop</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">Context</span><span class="p">()</span> 
</span></span><span class="line"><span class="cl">        <span class="o">:</span> <span class="n">stopForReturn</span><span class="p">(</span><span class="nb">false</span><span class="p">),</span> <span class="n">returnValue</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">stack</span><span class="p">(</span><span class="k">new</span> <span class="kt">uint32_t</span><span class="p">[</span><span class="n">kStackSize</span><span class="p">]),</span> <span class="n">stackTop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">Context</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span><span class="p">[]</span> <span class="n">stack</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><h3 id="calling-it-done">Calling it done</h3>
<p>Let&rsquo;s take a look at our function once again:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="kt">uint32_t</span> <span class="nf">fib</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Context</span> <span class="n">ctx</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">Function</span><span class="o">*</span> <span class="n">function</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Function</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">function</span><span class="o">-&gt;</span><span class="n">init</span><span class="p">({</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">IfNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">LessNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">2</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">ReturnNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">())),</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">ReturnNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">AddNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">                <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="k">new</span> <span class="n">SubNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">1</span><span class="p">))),</span>
</span></span><span class="line"><span class="cl">                <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="k">new</span> <span class="n">SubNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">2</span><span class="p">)))))</span>
</span></span><span class="line"><span class="cl">    <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">CallNode</span><span class="o">*</span> <span class="n">call</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="n">n</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">result</span> <span class="o">=</span> <span class="n">call</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">delete</span> <span class="n">function</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="k">delete</span> <span class="n">call</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>Now that we know how exactly each node is implemented, it all starts to make sense. There is only one thing to point out: the instantiation and initialization of the function are split into two parts in order for us to be able to refer to the function from inside of its body.</p>
<p><strong>Congratulations!</strong> We have just implemented the simplest possible interpreter that can actually compute the value of the Fibonacci function.</p>
<p>It is more or less the same as tree-walking interpreter straight from a CS course or one of the &ldquo;building a Lisp&rdquo; books:</p>
<ul>
<li>We have some kind of a data structure representing an AST node</li>
<li>We have polymorphic dispatch that allows us to evaluate different kinds of nodes in different ways</li>
<li>We can recursively evaluate an AST tree to compute a single value, the result of evaluation</li>
<li>Our implementation is powerful enough to evaluate recursive functions</li>
</ul>
<p>Now, let&rsquo;s see how it performs:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">&gt; cl_build.bat /DSIMPLEST oif.cpp
</span></span><span class="line"><span class="cl">&gt; hyperfine oif.exe
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Benchmark 1: oif.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:     13.570 s ±  1.474 s    <span class="o">[</span>User: 10.661 s, System: 0.025 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:   11.847 s … 16.103 s    <span class="m">10</span> runs</span></span></code></pre></div><p>&hellip;Huh. It&rsquo;s an <em>order of magnitude</em> slower than the baseline implementation. As expected, even the simplest tree-walking interpreter is quite slow compared to equivalent C++ code.</p>
<p>But don&rsquo;t worry, it will get better.</p>
<h2 id="basic-optimization">Basic optimization</h2>
<p>While this code looks like an absolute minimal implementation, there are still ways to make it even smaller. One way to do this would be to merge, or <em>fuse</em>, our small nodes into larger and more specialised nodes to reduce overhead introduced by virtual function calls, among other things.</p>
<p>Good candidates to fuse would be nodes that take constant arguments. In our case, that would be <code>LessNode</code> and <code>SubNode</code>. Let&rsquo;s implement fused versions:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">LessConstNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">constant</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">LessConstNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">constant</span><span class="p">)</span> 
</span></span><span class="line"><span class="cl">        <span class="o">:</span> <span class="n">lhs</span><span class="p">(</span><span class="n">lhs</span><span class="p">),</span> <span class="n">constant</span><span class="p">(</span><span class="n">constant</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">LessConstNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">lhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">constant</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">SubConstNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">constant</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">SubConstNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">lhs</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">constant</span><span class="p">)</span> 
</span></span><span class="line"><span class="cl">        <span class="o">:</span> <span class="n">lhs</span><span class="p">(</span><span class="n">lhs</span><span class="p">),</span> <span class="n">constant</span><span class="p">(</span><span class="n">constant</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">SubConstNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">lhs</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span> <span class="o">-</span> <span class="n">constant</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>And update our function to use these nodes:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="kt">uint32_t</span> <span class="nf">fib</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Context</span> <span class="n">ctx</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">Function</span><span class="o">*</span> <span class="n">function</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Function</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">function</span><span class="o">-&gt;</span><span class="n">init</span><span class="p">({</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">IfNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">LessConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="mi">2</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">ReturnNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">())),</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">ReturnNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">AddNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">                <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="k">new</span> <span class="n">SubConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="mi">1</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">                <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="k">new</span> <span class="n">SubConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="mi">2</span><span class="p">))))</span>
</span></span><span class="line"><span class="cl">        <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">CallNode</span><span class="o">*</span> <span class="n">call</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="n">n</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">result</span> <span class="o">=</span> <span class="n">call</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">delete</span> <span class="n">function</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="k">delete</span> <span class="n">call</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>That&rsquo;s it, on to benchmarking:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">&gt; cl_build.bat /DSIMPLE_FUSION oif.cpp
</span></span><span class="line"><span class="cl">&gt; hyperfine oif.exe
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Benchmark 1: oif.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      7.754 s ±  0.052 s    <span class="o">[</span>User: 5.189 s, System: 0.009 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    7.703 s …  7.874 s    <span class="m">10</span> runs</span></span></code></pre></div><p>Implementing this simple optimization makes our interpreter almost <strong>two times faster</strong> than our initial implementation.</p>
<h2 id="going-deeper">Going deeper</h2>
<p>Continuing this line of thought, let&rsquo;s take it further. What else can we merge or remove?</p>
<p>We can introduce new, even more specialised nodes - not just the <em>&ldquo;less-than-constant&rdquo;</em> node, but the <em>&ldquo;argument-less-than-constant&rdquo;</em> node, and a similar one for subtraction:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">LessArgConstNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">constant</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">LessArgConstNode</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">constant</span><span class="p">)</span> <span class="o">:</span> <span class="n">constant</span><span class="p">(</span><span class="n">constant</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stack_top</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">&lt;</span> <span class="n">constant</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">SubArgConstNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">constant</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">SubArgConstNode</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">constant</span><span class="p">)</span> <span class="o">:</span> <span class="n">constant</span><span class="p">(</span><span class="n">constant</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stack_top</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">constant</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>And this is where we should stop and reconsider what we are doing.</p>
<p>As you can see, we copied the entire implementation of the <code>ArgNode</code> into these new node types, just like we did before for <code>ConstNode</code>. It may look innocent at this point; however, this is quite error-prone and will quickly become tedious as more fused node types are introduced.</p>
<p>Actually, there is a way to remove duplication. We will introduce a new method, <code>compute(Context* ctx)</code>, and make sure it is always inlined. We will also express our <code>eval(Context* ctx)</code> method in terms of <code>compute(Context* ctx)</code>. This way we will always have a single implementation for a single node type, which can then be embedded in any other node.</p>
<p>Strictly speaking, we don&rsquo;t want to do this for all nodes (and it barely makes sense to do this for <code>ConstNode</code>), but I will still do it this way for the sake of consistency.</p>
<p>One more thing to mention before we proceed is that <code>compute(Context* ctx)</code> is non-virtual. This means that we can only use it when we know the exact type of the node on which we call it. In case of more generic nodes, we should still use <code>eval(Context* ctx)</code>.</p>
<p>This is what updated versions of the nodes look like:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="cp">#if defined(__clang__)
</span></span></span><span class="line"><span class="cl"><span class="cp">#define FORCEINLINE __attribute__((always_inline))
</span></span></span><span class="line"><span class="cl"><span class="cp">#elif defined(_MSC_VER) </span><span class="c1">// clang defines _MSC_VER on Windows for some reason
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="cp">#define FORCEINLINE __forceinline 
</span></span></span><span class="line"><span class="cl"><span class="cp">#endif
</span></span></span><span class="line"><span class="cl"><span class="cp"></span>
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">ConstNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">value</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">ConstNode</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">value</span><span class="p">)</span> <span class="o">:</span> <span class="n">value</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">compute</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">FORCEINLINE</span> <span class="nf">compute</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">value</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">ArgNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">ArgNode</span><span class="p">()</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">compute</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">FORCEINLINE</span> <span class="nf">compute</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">-</span> <span class="mi">1</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">LessArgConstNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">ArgNode</span><span class="o">*</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">ConstNode</span><span class="o">*</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">LessArgConstNode</span><span class="p">(</span><span class="n">ArgNode</span><span class="o">*</span> <span class="n">lhs</span><span class="p">,</span> <span class="n">ConstNode</span><span class="o">*</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">:</span> <span class="n">lhs</span><span class="p">(</span><span class="n">lhs</span><span class="p">),</span> <span class="n">rhs</span><span class="p">(</span><span class="n">rhs</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">compute</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">FORCEINLINE</span> <span class="nf">compute</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">lhs</span><span class="o">-&gt;</span><span class="n">compute</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">rhs</span><span class="o">-&gt;</span><span class="n">compute</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">SubArgConstNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">ArgNode</span><span class="o">*</span> <span class="n">lhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">ConstNode</span><span class="o">*</span> <span class="n">rhs</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">SubArgConstNode</span><span class="p">(</span><span class="n">ArgNode</span><span class="o">*</span> <span class="n">lhs</span><span class="p">,</span> <span class="n">ConstNode</span><span class="o">*</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">:</span> <span class="n">lhs</span><span class="p">(</span><span class="n">lhs</span><span class="p">),</span> <span class="n">rhs</span><span class="p">(</span><span class="n">rhs</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">compute</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">FORCEINLINE</span> <span class="nf">compute</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">lhs</span><span class="o">-&gt;</span><span class="n">compute</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span> <span class="o">-</span> <span class="n">rhs</span><span class="o">-&gt;</span><span class="n">compute</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>With these our Fibonacci function becomes:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="kt">uint32_t</span> <span class="nf">fib</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Context</span> <span class="n">ctx</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">Function</span><span class="o">*</span> <span class="n">function</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Function</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">function</span><span class="o">-&gt;</span><span class="n">init</span><span class="p">({</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">IfNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">LessArgConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">2</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">ReturnNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">())),</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">ReturnNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="k">new</span> <span class="n">AddNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">                <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">SubArgConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">1</span><span class="p">))),</span>
</span></span><span class="line"><span class="cl">                <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">SubArgConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">2</span><span class="p">)))))</span>
</span></span><span class="line"><span class="cl">        <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">CallNode</span><span class="o">*</span> <span class="n">call</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CallNode</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="n">n</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">result</span> <span class="o">=</span> <span class="n">call</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">delete</span> <span class="n">function</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="k">delete</span> <span class="n">call</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>What&rsquo;s great about this is that not only did we get rid of the code duplication and associated pitfalls, but we also gained the ability to easily generate new fused node types (for example, with macros) and we paid no additional performance cost for this.</p>
<p>And we also gained a bit of performance:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">&gt; cl_build.bat /DBETTER_FUSION oif.cpp
</span></span><span class="line"><span class="cl">&gt; hyperfine oif.exe
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Benchmark 1: oif.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      6.413 s ±  0.024 s    <span class="o">[</span>User: 4.452 s, System: 0.008 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    6.364 s …  6.452 s    <span class="m">10</span> runs</span></span></code></pre></div><p>This shaves another second and a half, nice. But we still can do better.</p>
<h2 id="simplifying-calls">Simplifying calls</h2>
<p>Let&rsquo;s take a look at our Fibonacci function again and figure out what else we can do.</p>
<p>Going from the top of the function, the first thing we see is a <code>Function</code> wrapper. We only need it in order to be able to evaluate several nodes sequentially, and, more importantly, to be able to refer to the function from its body to perform recursive calls. However, all the evaluation logic belongs to <code>CallNode</code>, meaning that we probably can do without this wrapper.</p>
<p>In order to remove it, we will need to do the following:</p>
<ul>
<li>Update <code>CallNode</code> to be able to accept any <code>Node</code> as callable instead of a <code>Function</code> pointer</li>
<li>Add ability to use <code>else</code> branches to the <code>IfNode</code></li>
</ul>
<p>While we are at it, we can also remove one <code>ReturnNode</code> instance by allowing the <code>IfNode</code> to return evaluation results of it branches.</p>
<p>I will implement new versions of the nodes as separate classes so that we will be able to see the old and new implementations side-by-side:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">CallAnyNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">function</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">arg</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">CallAnyNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">function</span><span class="p">,</span> <span class="n">Node</span><span class="o">*</span> <span class="n">arg</span><span class="p">)</span> 
</span></span><span class="line"><span class="cl">        <span class="o">:</span> <span class="n">function</span><span class="p">(</span><span class="n">function</span><span class="p">),</span> <span class="n">arg</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">CallAnyNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">arg</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">==</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">kStackSize</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">[</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span><span class="p">]</span> <span class="o">=</span> <span class="n">arg</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="kt">uint32_t</span> <span class="n">result</span> <span class="o">=</span> <span class="n">function</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stopForReturn</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">stackTop</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">struct</span> <span class="nc">IfElseNode</span> <span class="o">:</span> <span class="n">Node</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">condition</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">ifBody</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">Node</span><span class="o">*</span> <span class="n">elseBody</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">IfElseNode</span><span class="p">(</span><span class="n">Node</span><span class="o">*</span> <span class="n">condition</span><span class="p">,</span> <span class="n">Node</span><span class="o">*</span> <span class="n">ifBody</span><span class="p">,</span> <span class="n">Node</span><span class="o">*</span> <span class="n">elseBody</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="o">:</span> <span class="n">condition</span><span class="p">(</span><span class="n">condition</span><span class="p">),</span> <span class="n">ifBody</span><span class="p">(</span><span class="n">ifBody</span><span class="p">),</span> <span class="n">elseBody</span><span class="p">(</span><span class="n">elseBody</span><span class="p">)</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="o">~</span><span class="n">IfElseNode</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">condition</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">ifBody</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">        <span class="k">delete</span> <span class="n">elseBody</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="nf">eval</span><span class="p">(</span><span class="n">Context</span><span class="o">*</span> <span class="n">ctx</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="p">(</span><span class="n">condition</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">))</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="n">ifBody</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="n">elseBody</span><span class="o">-&gt;</span><span class="n">eval</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div><p>With these, our function now looks like this:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="kt">uint32_t</span> <span class="nf">fib</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">using</span> <span class="n">BetterFusion</span><span class="o">::</span><span class="n">ArgNode</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="k">using</span> <span class="n">BetterFusion</span><span class="o">::</span><span class="n">ConstNode</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">Context</span> <span class="n">ctx</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">IfElseNode</span> <span class="n">function</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">function</span><span class="p">.</span><span class="n">condition</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LessArgConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">2</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="n">function</span><span class="p">.</span><span class="n">ifBody</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArgNode</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">    <span class="n">function</span><span class="p">.</span><span class="n">elseBody</span> <span class="o">=</span> <span class="k">new</span> <span class="n">AddNode</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">CallAnyNode</span><span class="p">(</span><span class="o">&amp;</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">SubArgConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">1</span><span class="p">))),</span>
</span></span><span class="line"><span class="cl">        <span class="k">new</span> <span class="n">CallAnyNode</span><span class="p">(</span><span class="o">&amp;</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">SubArgConstNode</span><span class="p">(</span><span class="k">new</span> <span class="n">ArgNode</span><span class="p">(),</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="mi">2</span><span class="p">))));</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">CallAnyNode</span> <span class="n">call</span><span class="p">(</span><span class="o">&amp;</span><span class="n">function</span><span class="p">,</span> <span class="k">new</span> <span class="n">ConstNode</span><span class="p">(</span><span class="n">n</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kt">uint32_t</span> <span class="n">result</span> <span class="o">=</span> <span class="n">call</span><span class="p">.</span><span class="n">eval</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>While our first interpreter was a 1-1 mapping from the C++ source code, this one resembles the actual <em>compiled</em> code - probably, at the very least at one of the stages. Which should give you intuition on what is actually happening here: we are doing the job of an optimising compiler manually and observe effects of optimisations first-hand.</p>
<p>What about the timings?</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">&gt; cl_build.bat /DSIMPLIFY_CALLS oif.cpp
</span></span><span class="line"><span class="cl">&gt; hyperfine oif.exe
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Benchmark 1: oif.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      4.528 s ±  0.014 s    <span class="o">[</span>User: 3.470 s, System: 0.006 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    4.498 s …  4.551 s    <span class="m">10</span> runs</span></span></code></pre></div><p>And another two seconds gone.</p>
<p>This will be the last version of the interpreter within the scope of this post. We went from 13.5s to 4.5s - the final version is <strong>3 times faster</strong> than our initial implementation and <em>only</em> 5.64 times slower than the baseline C++ version.</p>
<p>To put things in perspective, let&rsquo;s add some <em>real</em> interpreter results:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">&gt; node --version
</span></span><span class="line"><span class="cl">v18.18.0
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; hyperfine <span class="s2">&#34;node fib.js&#34;</span>
</span></span><span class="line"><span class="cl">Benchmark 1: node fib.js
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      1.746 s ±  0.050 s    <span class="o">[</span>User: 0.823 s, System: 0.007 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    1.694 s …  1.854 s    <span class="m">10</span> runs
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; python --version
</span></span><span class="line"><span class="cl">Python 3.10.4
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; hyperfine <span class="s2">&#34;python fib.py&#34;</span>
</span></span><span class="line"><span class="cl">Benchmark 1: python fib.py
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:     35.415 s ±  0.697 s    <span class="o">[</span>User: 18.959 s, System: 0.020 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:   34.892 s … 37.312 s    <span class="m">10</span> runs
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; ruby --version
</span></span><span class="line"><span class="cl">ruby 3.2.2 <span class="o">(</span>2023-03-30 revision e51014f9c0<span class="o">)</span> <span class="o">[</span>x64-mingw-ucrt<span class="o">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; hyperfine <span class="s2">&#34;ruby fib.rb&#34;</span>
</span></span><span class="line"><span class="cl">Benchmark 1: ruby fib.rb
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:     12.796 s ±  0.201 s    <span class="o">[</span>User: 6.513 s, System: 0.021 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:   12.637 s … 13.317 s    <span class="m">10</span> runs
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; luajit -v
</span></span><span class="line"><span class="cl">LuaJIT 2.0.4 -- Copyright <span class="o">(</span>C<span class="o">)</span> 2005-2015 Mike Pall. http://luajit.org/
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; hyperfine <span class="s2">&#34;luajit fib.lua&#34;</span>
</span></span><span class="line"><span class="cl">Benchmark 1: luajit fib.lua
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      1.097 s ±  0.011 s    <span class="o">[</span>User: 0.713 s, System: 0.001 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    1.082 s …  1.110 s    <span class="m">10</span> runs
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; hyperfine <span class="s2">&#34;luajit -joff fib.lua&#34;</span>
</span></span><span class="line"><span class="cl">Benchmark 1: luajit -joff fib.lua
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      7.966 s ±  0.029 s    <span class="o">[</span>User: 4.146 s, System: 0.002 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    7.915 s …  8.005 s    <span class="m">10</span> runs</span></span></code></pre></div><p>Some thoughts on these results:</p>
<ul>
<li>Both JITs leave everything else in the dust, with LuaJIT 2 being 1.7x faster than V8</li>
<li>As expected, <em>real</em> interpreters are much, much slower than our toy interpreter</li>
<li>But I did not expect Ruby to be almost 3 times faster than Python!</li>
<li>As a pleasant surprise, our toy interpreter manages to hold its own against LuaJIT in interpreter mode</li>
</ul>
<h2 id="what-about-clang">What about clang?</h2>
<p>Honestly, I rarely use clang due to a force of habit, so imagine my surprise when I saw this:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">&gt; clang++ --version
</span></span><span class="line"><span class="cl">clang version 17.0.6
</span></span><span class="line"><span class="cl">Target: x86_64-pc-windows-msvc
</span></span><span class="line"><span class="cl">Thread model: posix
</span></span><span class="line"><span class="cl">InstalledDir: D:<span class="se">\T</span>ools<span class="se">\L</span>LVM<span class="se">\b</span>in
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; clang++ -O3 -ffast-math -DBASELINE oif.cpp -o oif-clang.exe
</span></span><span class="line"><span class="cl">&gt; hyperfine oif-clang.exe
</span></span><span class="line"><span class="cl">Benchmark 1: oif-clang.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:     503.4 ms ±   6.8 ms    <span class="o">[</span>User: 315.3 ms, System: 3.0 ms<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:   494.9 ms … 514.0 ms    <span class="m">10</span> runs
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; clang++ -Ofast -ffast-math -DSIMPLEST oif.cpp -o oif-clang.exe
</span></span><span class="line"><span class="cl">&gt; hyperfine oif-clang.exe
</span></span><span class="line"><span class="cl">Benchmark 1: oif-clang.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      8.604 s ±  0.072 s    <span class="o">[</span>User: 4.393 s, System: 0.003 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    8.476 s …  8.744 s    <span class="m">10</span> runs
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; clang++ -O3 -ffast-math -DSIMPLE_FUSION oif.cpp -o oif-clang.exe
</span></span><span class="line"><span class="cl">&gt; hyperfine oif-clang.exe
</span></span><span class="line"><span class="cl">Benchmark 1: oif-clang.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      7.050 s ±  0.045 s    <span class="o">[</span>User: 3.069 s, System: 0.005 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    6.991 s …  7.147 s    <span class="m">10</span> runs
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; clang++ -O3 -ffast-math -DBETTER_FUSION oif.cpp -o oif-clang.exe
</span></span><span class="line"><span class="cl">&gt; hyperfine oif-clang.exe
</span></span><span class="line"><span class="cl">Benchmark 1: oif-clang.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      5.434 s ±  0.015 s    <span class="o">[</span>User: 2.588 s, System: 0.004 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    5.418 s …  5.468 s    <span class="m">10</span> runs
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">&gt; clang++ -O3 -ffast-math -DSIMPLIFY_CALLS oif.cpp -o oif-clang.exe
</span></span><span class="line"><span class="cl">&gt; hyperfine oif-clang.exe
</span></span><span class="line"><span class="cl">Benchmark 1: oif-clang.exe
</span></span><span class="line"><span class="cl">  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      5.012 s ±  0.033 s    <span class="o">[</span>User: 2.391 s, System: 0.003 s<span class="o">]</span>
</span></span><span class="line"><span class="cl">  Range <span class="o">(</span>min … max<span class="o">)</span>:    4.968 s …  5.074 s    <span class="m">10</span> runs</span></span></code></pre></div><p>I am not sure what to make of this, except that I certainly did not expect these results. The baseline and the first, unoptimised version of the interpreter are a bit faster, but all of the optimisations have far less of an effect. I <em>definitely</em> need to dig into this, but this post is already long as it is.</p>
<h2 id="drawing-the-rest-of-the-eval">Drawing the rest of the eval</h2>
<p>I handwaved away a lot of the important parts of a <em>real</em> interpreter at the very beginning of this post, so let&rsquo;s take a step back.</p>
<p>First and foremost, the optimised representation is more like a bytecode - you probably should not attempt to parse directly into it. Instead, you will need an optimising pass (or rather <em>optimising passes</em>) that will transform your AST into this representation. Just like an optimising compiler would do.</p>
<p>You also want to be able to work with different types of data, and you want to minimise conversions between these types. <em>Anything</em> not strictly related to evaluation - any type checks, conversions, and allocations - will inevitably affect performance.</p>
<p>But you will also need to perform type-checking somewhere, just not at the hot path. This suggests that this approach is best suited for <em>statically typed</em> languages; however, you probably can do something like a tracing JIT where you figure out types for a subtree dynamically during runtime, generate an optimised representation on the fly and use guards to make sure that the types were not changed during execution.</p>
<p>Next, in a real interpreter, you would probably like to call functions with more than one argument. And you <em>will</em> need a function wrapper for functions that are not as simple as Fibonacci function. Passing arguments will also take time.</p>
<p>All in all, this post only covers a very tiny, very specific part of what an interpreter should do, but I hope it manages to explain how tree-walking interpreters can be made much faster and that it will still be useful to someone.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Even something as simple as a tree-walking interpreter can sometimes be made even simpler for surprising results.</p>
<p>My thanks go to Anton Yudintsev and Boris Batkin for daScript, the implementation of which served as inspiration for this post. I should note that there is a whole bunch of tricks the daScript interpreter uses besides fusion, which I hope to explore in detail in posts to come.</p>
<p>You may find the entire source code for the interpreter <a href="https://github.com/ergeysay/optimising-interpreters-fusion">here</a>.</p>
<p>Thank you for staying with me until the very end! See you, and stay tuned!</p>
<h2 id="appendix-methodology-of-benchmarks">Appendix: methodology of benchmarks</h2>
<p>I won&rsquo;t go into much detail on the perils of benchmarking in 2024, but know that there <a href="https://github.com/google/benchmark/blob/1e96bb0ab5e758861f5bbbd4edbd0a8d9a2a7cae/docs/reducing_variance.md">are</a> <a href="https://github.com/sharkdp/hyperfine/issues/239">many</a>. Contrary to popular belief, implementing a benchmark that measures what you think it is measuring is a feat in and of itself and is a proper rabbit hole - far too deep than I wanted to take you today. Perhaps I will write another post on this topic sometime.</p>
<p>In the scope of this post, however, I have taken it easy. I used <a href="https://github.com/sharkdp/hyperfine">hyperfine</a> and I tried to make sure that the load of our benchmark is heavy enough to overweigh any start-up costs of any interpreter we are going to use.</p>
<p>In this setup, the precise number of milliseconds doesn&rsquo;t really matter, just the <em>relative</em> performance compared to a set baseline.</p>
<p>The code was compiled using MSVC 2022 cl 19.38.33130 with the following command line flags:</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">/permissive- /GS0 /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Zc:inline /fp:fast /D <span class="s2">&#34;NDEBUG&#34;</span> /WX- /Zc:forScope /Gd /Oi /MD /FC /EHsc /nologo /Ot /Fp</span></span></code></pre></div><p>I wrapped this thing in a .bat file for convenience, which you can find with the rest of the source.</p>
]]></content:encoded></item><item><title>About</title><link>https://ergeysay.github.io/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://ergeysay.github.io/about/</guid><description>&lt;p>Hi there! I am Sergei Kharitonov, a web developer by day and a hobbyist game developer by night.&lt;/p>
&lt;p>Oftentimes I find myself falling into various rabbit holes, and I decided to document my journeys for those who follow.&lt;/p>
&lt;p>Contact: &lt;a href="mailto://hello@ergeysay.me">hello@ergeysay.me&lt;/a>&lt;/p></description><content:encoded><![CDATA[<p>Hi there! I am Sergei Kharitonov, a web developer by day and a hobbyist game developer by night.</p>
<p>Oftentimes I find myself falling into various rabbit holes, and I decided to document my journeys for those who follow.</p>
<p>Contact: <a href="mailto://hello@ergeysay.me">hello@ergeysay.me</a></p>
]]></content:encoded></item></channel></rss>