# Trickery to Tame Big WebAssembly Binaries

****

In light of the extremely popular [PyScript](https://pyscript.net)
project (it's great, check it out!), I thought I'd touch on
a somewhat glaring issue...

![](https://i.imgur.com/lcudcUd.png)

8MB of assets for a "hello world" example.
(You can check that out [here](https://pyscript.net/examples/hello_world.html).)

### Big Binaries and Where To Find Them

It should come as no surprise that compiling a large project
into an extremely simple ISA blows up the binary size.
CPython
[is around ~350,000 lines of C](https://tenthousandmeters.com/blog/python-behind-the-scenes-3-stepping-through-the-cpython-source-code/)
and getting it compiled to only ~6MB of wasm is nothing short
of impressive.

But, sending 6MB over the wire before *anything* useful can happen isn't
really an ideal experience.

<center>
<video playsinline loop muted autoplay controls style="max-width:100%" src="https://i.imgur.com/JB2FWiK.mp4"></video>
</center>

Many other useful projects will face this issue,
so a solution would be quite nice to have.

What can we do?
In this post I go over some techniques I've been playing with.
edification. :)

### Splitting up Files

The first approach to explore is manually splitting up files and

cpp
int a_fn(int x) {
return x + 13;
}


cpp
int b_fn(int x) {
return x * 7;
}


![](https://i.imgur.com/MFNo5FD.png)

This is by far the easiest, conceptually, but it can
be quite a hassle.  Not all files can be cleanly split
because they may have dependencies.
In the worst (yet quite common) case, every file has some function
that depends on some functionality in another file.

cpp
int a_fn(int x) {
x = b_fn(x); // damn :(
return x + 13;
}

<center><i>our problem case</i></center>

### Inspecting the wasm

Let's take a peak under the hood to check out
what that dependency looks like.

WebAssembly has a restrictive and easily analyzed ISA,
which makes this pretty easy.
To inspect it, we'll use wasm-objdump.
The disassembled code for the binary looks
a bit like this:

bash
$wasm-objdump -d combined.wasm  wasm func[1] <a_fn(int)>: local.get 0 call 2 <b_fn(int)> <--- Our dependency! i32.const 13 i32.add end func[2] <b_fn(int)>: local.get 0 i32.const 7 i32.mul end  Let's pretend these functions are *much* bigger and discuss two possible outcomes after we send the binary to the user: 1. The user calls int a_fn(int) 2. The user only calls int b_fn(int) In case 1, we've done all we can and there's nothing to worry about. The issue is case 2. We just sent all of the int a_fn(int) section over the wire as well, potentially causing many milliseconds of hang time! Let's avoid that *at all costs*. ### Imports One way to split up binares even when there are dependencies is to just force it. The flag --allow-undefined will make this possible (for ld). Other compilation stacks will likely have similar flags. Note that this doesn't involve changing the source code, just the way we compile it! cpp // b.cpp int b_fn(int x) { return x * 7; }  cpp // a.cpp int b_fn(int x); int a_fn(int x) { x = b_fn(x); return x + 13; }   clang++ --target=wasm32 -nostdlib -O3 -c a.cpp -o /tmp/a.o clang++ --target=wasm32 -nostdlib -O3 -c b.cpp -o /tmp/b.o wasm-ld --no-entry --export-all --lto-O3 --allow-undefined --import-memory /tmp/a.o -o a.wasm wasm-ld --no-entry --export-all --lto-O3 --allow-undefined --import-memory /tmp/b.o -o b.wasm  The a.wasm binary ends up with an [import section]((https://webassembly.github.io/spec/core/binary/modules.html#binary-importsec): bash$ wasm-objdump -x -j Import a.wasm


a.wasm:	file format wasm 0x1

Section Details:

Import[2]:
- memory[0] pages: initial=2 <- env.memory
- func[0] sig=0 <b_fn(int)> <- env._Z4b_fni


But its code section looks exactly as expected!

bash
\$ wasm-objdump -d Import a.wasm



a.wasm:	file format wasm 0x1

Code Disassembly:

func[2] <a_fn(int)>:
local.get 0
call 0 <b_fn(int)>
i32.const 13
end


Ok, but wouldn't this just break if we ran it?  Yes.

We'll need to specify the implementation for int b_fn(int)
Something like this:

javascript
const memory = new WebAssembly.Memory({initial:10, maximum:1000});
const b_imports = {
env: {
memory: memory
}
};
const { b_instance } = await WebAssembly.instantiateStreaming(fetch('b.wasm'), b_imports);

const a_imports = {
env: {
memory: memory,
b_fn: b_instance.exports.b_fn
}
};
const { a_instance } = await WebAssembly.instantiateStreaming(fetch('a.wasm'), a_imports);



And now, if we don't really need int a_fn(int),
we can just skip that last bit of code and save a bunch of
bandwidth.  Woo!

(Note that the memory between these modules is shared!
That means more complex heap-based computation is fine.)

### Generating imports

Manually tracking call sites and then generating
the correct imports as we did above is quite arduous.
We can automate this of course.

I ended up creating a data structure to answer three questions:

1. Given a module, which functions does it need to import?
2. Given a function, which module does it live in?
3. Given a function, which functions (in different files) does it depend on?

The first two are easily answered by wasm-objdump but the last
requires some care to be taken.
I won't go into details in this post, but it's an interesting
little problem.

I wrote a [Python script to do this](https://github.com/bwasti/web_assembly_experiments/blob/main/lazy_load/process.py)
and the output is JSON:
json
{
"module_imports": {
"a.wasm": [
"b_fn"
],
"b.wasm": []
},
"func_locations": {
"a_fn": "a.wasm",
"b_fn": "b.wasm"
},
"func_import_deps": {
"a_fn": [
"b_fn"
]
}
}


Ok great, we have the ability to load b.wasm
without loading a.wasm.
But no one wants to figure out which modules to load in JavaScript.
We want the necessary WebAssembly to be loaded **automatically**
and **on-demand** without changing the C/C++ that we're compiling.

Here's the trick:

1. WebAssembly imports take JavaScript references
and these references can be updated *later*.

structure to reference null for all function imports.

3. We're going to wrap every function call with a
check to see if it (and its dependencies) have been loaded already.
If it *hasn't* been loaded, we load it and repopulate the null
with a legitimate function.

This way, users can call whatever functions they want
and only the relevant wasm will be pulled over the network.
In the worst case, the user calls *every* function and we end up
with the same amount of wasm loaded as when we started.

### Code Stuff

First we'll create a Loader class that takes in our above
generated JSON file.

javascript
constructor(json_file) {
this.json_file = json_file;
}

async init() {
this.json = await (await fetch(this.json_file)).json();
this.module_imports = this.json.module_imports;
this.func_locations = this.json.func_locations;
this.funcs = {};
for (let func of Object.keys(this.func_locations)) {
this.funcs[func] = new Func(func, this);
}
this.func_import_deps = this.json.func_import_deps;
}

// TODO
}


It also initalizes a bunch of Funcs, which
will store references to the actual WebAssembly functions.

javascript
class Func {
this.fn = fn;
this.func = null;
}

async call(...args) {
}
return this.func(...args);
}
}

This sets up a structure that will
be used like this:





Why all the awaits?  Well we're converting our
WebAssembly functions into lazily loaded functions
that *may* hit the network to resolve dependencies on the first call.

If it does, we'll have to wait for the result to come back.
Otherwise, we just call the function immediately.

Finally, we need to implement the load(fn) function
that we hit in the worst case.  Over in the Loader class
we can add these two methods:

javascript
// in Loader class, dedented for clarity

return;
}
// if we have deps, load them first
if (fn in this.func_import_deps) {
for (let dep of this.func_import_deps[fn]) {
continue;
}
// recurse :^)
}
}

// bring our module into memory
}

const imports = { env: { memory: memory } };
if (wasm_fn in this.module_imports) {
for (let imp of this.module_imports[wasm_fn]) {
// this might acutally be null! that's okay
// it'll be updated when needed.
imports.env[imp] = this.funcs[imp].func;
}
}
const m = await WebAssembly.instantiateStreaming(fetch(wasm_fn), imports);
const exports = m.instance.exports;
for (let e of Object.keys(exports)) {
if (e in this.funcs) {
// this is the key bit of magic
this.funcs[e].func = exports[e];
}
}
}



And that's pretty much it!
We've now got a nice way to load chunks of a library
at the granularity of individual compilation units
without changing any of the source code.

### Results

I wasn't happy with just the toy case above
so I [generated a thousand files](https://github.com/bwasti/web_assembly_experiments/blob/main/lazy_load/generate.py)
with varying dependencies
on each other to test out this approach.

The result was an 8.7MB fully merged single binary.
Using the methods above, it's 1000 seperate files each about 8-9KB in size.

Below is a video of the results.
First we use the Loader written above and then we
we load the entire wasm.
For the three functions we call,
this ends up being 5x faster! (0.25 seconds vs 1.35 seconds)

The performance of individual functions takes a reasonable hit,
but we're still within ~1.2x of the original performance.
This isn't bad for a 5x speedup in terms of initial load, though!
module in the background while providing a snappy first load.

<center>
<video playsinline loop muted autoplay controls style="max-width:100%" src="https://i.imgur.com/g9TLfH2.mp4"></video>
</center>

And here are the network requests over time:

<center>
<video playsinline loop muted autoplay controls style="max-width:100%" src="https://i.imgur.com/djT7hfy.mp4"></video>
</center>

It's cool to see that even though we only called 3 functions,
they depended on a couple others and those all got loaded
automatically for us.