jott - profiling

# Profiling JavaScript/TypeScript in Bun 0.1.11

*by [@bwasti](https://twitter.com/bwasti)*

****

Bun exposed the JSC profiler in the latest release,
and it's become a key tool for performance
improvments in [my work on a fast JS tensor library](https://github.com/facebookresearch/shumai).

*tl;dr*

```javascript
import { startSamplingProfiler } from 'bun:jsc'
startSamplingProfiler('samples')
```
and then look inside the freshly created `samples/` directory! 😁
Some other features are reviewed below.

## Profiling vs Benchmarking

If you have two self contained heavy workloads,
it's easy to determine which takes up more time:

```javascript
function sqrt_work(iters, x) {
  for (let i = 0; i < iters; ++i) {
    x = Math.sqrt(x)
  }
  return x
}

function log_work(iters, x) {
  for (let i = 0; i < iters; ++i) {
    x = Math.log(x)
  }
  return x
}

const bench = (f) => {
  const t0 = performance.now()
  f()
  return performance.now() - t0
}

let iters = 1000000
console.log(bench(()=>sqrt_work(iters, 10)))
console.log(bench(()=>log_work(iters, 10)))
```
Square root is a more expensive operation:

```
bwasti@bwasti-mbp code % bun bench.js
8.130084
4.056957999999998
```

But what if one function is called significantly more than another?
When it comes to overall performance, it
doesn't matter `sqrt_work` is 2x slower if `log_work` is called 10x
more frequently!

Benchmarking simply cannot capture this.  But a profiler can!

```javascript
import { startSamplingProfiler } from 'bun:jsc'

startSamplingProfiler('samples')

for (let i = 0; i < iters; ++i) {
  if (Math.random() < 0.1) {
    sqrt_work(iters, 10)
  } else {
    log_work(iters, 10)
  }
}
```
This will write the sampled data into a file in the directory `samples`
(which it will make for you, if it doesn't exist).  We can `cat` this file:

```
bwasti@bwasti-mbp code % cat samples/*


Sampling rate: 1000.000000 microseconds. Total samples: 339
Top functions as <numSamples  'functionName#hash:sourceID'>
   198    'log_work#<nil>:2'
    76    '(module)#<nil>:2'
    64    'sqrt_work#<nil>:2'
     1    'exit#<nil>:4294967295'

[truncated the output for brevity]
```
We clearly see that `log_work` is where our code spends most of its time.
Time to optimize!

## Going Deeper

### Increase Frequency

Sample every millisecond can be a bit infrequent.  Luckily we can
amp that number up (or down?) by passing in a second parameter to
`startSamplingProfiler`:


```javascript
startSamplingProfiler('samples', 100)
```

```
bwasti@bwasti-mbp code % cat samples/*


Sampling rate: 100.000000 microseconds. Total samples: 3006
Top functions as <numSamples  'functionName#hash:sourceID'>
  1853    'log_work#<nil>:2'
   628    '(module)#<nil>:2'
   515    'sqrt_work#<nil>:2'
     5    'log#<nil>:4294967295'
     3    'exit#<nil>:4294967295'
     1    'exit#<nil>:2'
     1    'sqrt#<nil>:4294967295'
```
Impressively,
even sampling every 100 microseconds doesn't hurt performance much on my M1.

### Prevent Inlining

Sometimes, JSC optimizes *too much* and it's hard to
understand what's going on under the hood.
In the case of small functions, JSC will often inline them
for performance.  This will ruin our profiling results:

```javascript
function foo(x) {
  return x + 0
}
function bar(x) {
  return x * 1
}

function other_work(iters, x) {
  for (let i = 0; i < iters; ++i) {
    x = foo(x)
    x = bar(x)
  }
  return x
}

startSamplingProfiler('samples', 100)
other_work(iters, 10)
```

Running this we see that `foo` and `bar` take up a collective...
0.4% of compute?

```
bwasti@bwasti-mbp code % cat samples/*


Sampling rate: 100.000000 microseconds. Total samples: 6630
Top functions as <numSamples  'functionName#hash:sourceID'>
  6599    'other_work#<nil>:2'
    16    'foo#<nil>:2'
    12    'bar#<nil>:2'
     1    'promiseReactionJob#<nil>:3'
     1    'evaluate#<nil>:4294967295'
     1    'resolveWithoutPromise#<nil>:3'
```

This is obviously incorrect.
`foo` and `bar` were explicitly called only on
the first couple runs (before the JIT took over and inlined them).
The fix? We can tell the JIT to not do this!

```javascript
import { noInline } from 'bun:jsc'
noInline(foo)
noInline(bar)

startSamplingProfiler('samples', 100)
other_work(iters, 10)
```

Now the results seem a lot more reasonable:

```
bwasti@bwasti-mbp code % cat samples/*


Sampling rate: 100.000000 microseconds. Total samples: 29065
Top functions as <numSamples  'functionName#hash:sourceID'>
 15927    'other_work#<nil>:2'
  6788    'foo#<nil>:2'
  6348    'bar#<nil>:2'
```

### Runtime Sample Inspection

Writing to file is great and all, but sometimes
it's useful to get those profiling results *during execution*.
We can do that with `samplingProfilerStackTraces()`!

```javascript
import { samplingProfilerStackTraces } from 'bun:jsc'

function monitor_work(iters, x) {
  for (let i = 0; i < iters; ++i) {
    x = foo(x)
    x = bar(x)
    if (i && i % 1000000 === 0) {
      console.log(samplingProfilerStackTraces())
    }
  }
  return x
}
```

The output `sampleProfilerStackTraces` is a list of stacks.
A stack is represented as a list of strings.  If a series of
functions is invocated like this:

```javascript
main() // calls f0
 -> f0() // calls f1
  -> f1() // calls f2
   -> f2() // a profile was collected here!
```
The resultant stack would be
```javascript
["f2", "f1", "f0", "main", ...]
```

## That's it!

Be sure to check out [bun](https://bun.sh) if you haven't already :)