jott - pic2emoji

## Converting Pictures to Emoji-art with `tinygrad`

My [girlfriend](https://twitter.com/s_han_non_lin/status/1372937642946535428)
made an [emoji pixel-art tool](http://emojraw.glitch.me) this weekend.
It's extremely fun to use, so I thought I'd try my hand
at removing the fun and getting a computer to do all the drawing for me.

I trained a tiny neural network to convert 36x36 images into
emoji and then ran it on patches of real images.

You can find all the code here:
https://github.com/bwasti/pic2emoji


Base Image           | Emoji'd Image
:-------------------------:|:-------------------------:
[![](https://i.imgur.com/9SpMCEI.png)](https://i.imgur.com/9SpMCEI.png) |  [![](https://i.imgur.com/8hBz4HP.png)](https://i.imgur.com/8hBz4HP.png)


### Design

The idea is to train an emoji classifier on noisy 36x36
emoji screenshots.
My theory was that by adding sufficient noise to the
training data, the resultant classifier
would be general enough to handle non-emoji images.

### Dataset

Given the above design, the dataset can easily be generated
from a set of reference emoji.
Luckily for me, the folks over at emojipedia
have put together the entire collection as a set
of `png` files in a [single page](https://emojipedia.org/apple/).
I went ahead and downloaded all ~3200 of them.

There are a fair number of skin-tone related emoji
that I filtered out, bringing the final number down to ~2000.


#### Generating the data

The $(x, y)$ pairs used to train the network
are a noisy version of the emoji and a one-hot encoded
index into the emoji list.

To generate noisy emoji (the $x$'s) I added

- Blurring
- Shifting
- Hue adjustment

I could probably have added more noise, but I was
(\*cough\* *lazy* \*cough\*) worried that too much pollution of 
the original images might ruin the information learned by the network.

For blurring, I used a box blur:
```
def blur(a, k):
  kernel = np.ones(k, dtype=np.float32)
  a = np.apply_along_axis(lambda x: np.convolve(x, kernel, mode='same'), 2, a)
  a = np.apply_along_axis(lambda x: np.convolve(x, kernel, mode='same'), 3, a)
  a /= (k ** 2)
  return a
```

Shifting and color adjustment were a matter of
manipulating the spatial and channel dimensions respectively:

```
# from https://stackoverflow.com/questions/27087139/shifting-an-image-in-numpy
def shift(a, ox, oy):
  non = lambda s: s if s<0 else None
  mom = lambda s: max(0,s)
  out = np.zeros_like(a)
  out[:,:, mom(oy):non(oy), mom(ox):non(ox)] = a[:,:, mom(-oy):non(-oy), mom(-ox):non(-ox)]
  return out

def color(a, k, pct):
  a = np.copy(a)
  a[:,k,:,:] *= (1 + pct)
  a = a.clip(0, 255)
  return a
```

I then perturbed each image with a random combination of the three:

```
def perturb(d):
  # blur radius no more than 14% of image width
  b = np.random.randint(0, 5)
  if b:
    d = blur(d, b)

  c = np.random.randint(0, 4)
  # change color no more than 14%
  pct = np.random.rand(1)[0] / 7
  if c: # 1,2,3
    d = color(d, c - 1, pct)

  # shift no more than 14% each direction
  sx = np.random.randint(-5, 5)
  sy = np.random.randint(-5, 5)
  d = shift(d, sx, sy)

  return d
```
Which produced the following:

![](https://i.imgur.com/0fjeY75.png)

Instead of saving the outputs of perturbations,
I left the training loop to generate them on the fly.
This isn't really considered best practice as
there's no well defined hold-out set for automatic validation.
(\*cough\* *lazy* \*cough\*)


#### Modeling

For modeling I used [`tinygrad`](https://github.com/geohot/tinygrad).
It's close to PyTorch (which doesn't have M1 support yet),
and is really easy to get started with.

I ended up stealing the [TinyConvNet](https://github.com/geohot/tinygrad/blob/master/examples/train_efficientnet.py#L13)
included in the examples:

```
class TinyConvNet:
  def __init__(self, classes):
    conv = 3
    inter_chan, out_chan = 8, 16
    self.c1 = Tensor.uniform(inter_chan,3,conv,conv)
    self.c2 = Tensor.uniform(out_chan,inter_chan,conv,conv)
    self.l1 = Tensor.uniform(out_chan*7*7, classes)

  def forward(self, x):
    x = x.conv2d(self.c1).relu().max_pool2d()
    x = x.conv2d(self.c2).relu().max_pool2d()
    x = x.reshape(shape=[x.shape[0], -1])
    return x.dot(self.l1).logsoftmax()
```

It runs in about 2.2MFlops, which is pretty tiny.
For an emoji-rendered image to look decent,
it requires about ~5,000 36x36 patches.
That means the net flops per image is ~11GFlops.
Good enough for me!

#### Training

I also took the training loop from the `tinygrad` examples.
It includes a cute implementation of cross-entropy loss for
one-hot encoded vectors.

```
def train(model):
  optim = optimizer.SGD([model.c1, model.c2, model.l1], lr=0.001)
  iters = 5 * classes
  batch = 32
  for _ in range(iters):
    samp = np.random.randint(0, files.shape[0], size=(batch))
    img = Tensor(perturb(files[samp]))
    out = model.forward(img)
    y = np.zeros((batch, classes), dtype=np.float32)
    
    # cross entropy loss
    y[range(y.shape[0]),samp] = -classes
    y = Tensor(y)
    loss = out.mul(y).mean()
    
    optim.zero_grad()
    loss.backward()
    optim.step()
 ```

The batch size and number of iterations are from manually toying
with the hyperparameters a bit.

It takes about 10 minutes to train on 1000 emoji on my laptop.
With a set of 200 (easily enough), this drops down to 2 minutes.


#### Try it!

I tried to make it easy to play with:

```
git clone https://github.com/bwasti/pic2emoji.git
pip install tinygrad
```

and if you want GPU training, `pip install pyopencl`.

The repo comes pre-loaded with a random selection of 50 emoji.
If you'd like to add more, just ensure they're of size 72x72
(or you'll have to change the code).
You can find more emoji here: https://emojipedia.org/apple/

To train on a folder, run
```
$ python train.py emoji_set
```
and then infer with

```
$ python train.py emoji_set mario.png
```

The model works pretty well on custom sets of emoji.
For example, I trained it on only emoji found in the r/wallstreetbets
subreddit:

![](https://i.redd.it/r86di2skcfo61.png)

Thanks for reading!