Accelerating PyTorch models with JAX

Accelerate your Pytorch models by converting them to JAX for faster inference.

⚠️ If you are running this notebook in Colab, you will have to install Ivy and some dependencies manually. You can do so by running the cell below ⬇️

If you want to run the notebook locally but don’t have Ivy installed just yet, you can check out the Get Started section of the docs.

Make sure you run this demo with GPU enabled!

!pip install -q ivy
!pip install -q transformers
!pip install -q dm-haiku

Let’s now import Ivy and the libraries we’ll use in this example:

import jax
import ivy
import torch
import requests
import numpy as np
from PIL import Image
from transformers import AutoModel, AutoFeatureExtractor

Now we can load a ResNet model and its corresponding feature extractor from Hugging Face transformers library

jax.config.update("jax_enable_x64", False)

arch_name = "ResNet"
checkpoint_name = "microsoft/resnet-50"

feature_extractor = AutoFeatureExtractor.from_pretrained(checkpoint_name)
model = AutoModel.from_pretrained(checkpoint_name)

We will also need a sample image to pass during tracing, so let’s use the feature extractor to get the corresponding torch tensors.

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = feature_extractor(
    images=image, return_tensors="pt"
)

And finally, let’s transpile the model to haiku!

transpiled_graph = ivy.transpile(model, to="haiku", kwargs=inputs)

After transpiling our model, we can see what’s the improvement in runtime efficiency like. For this let’s compile the original PyTorch model using torch.compile

inputs = feature_extractor(
    images=image, return_tensors="pt"
).to("cuda")

model.to("cuda")

def _f(**kwargs):
  return model(**kwargs)

comp_model = torch.compile(_f)
_ = comp_model(**inputs)

Let’s now do the equivalent transformation in our new haiku model by using JAX just in time compilation:

inputs_jax = feature_extractor(
    images=image, return_tensors="jax"
)

import haiku as hk

def _forward(**kwargs):
  module = transpiled_graph()
  return module(**kwargs).last_hidden_state

rng_key = jax.random.PRNGKey(42)
jax_forward = hk.transform(_forward)
params = jax_forward.init(rng=rng_key, **inputs_jax)
jit_apply = jax.jit(jax_forward.apply)

Now that we have both models optimized, let’s see how their runtime speeds compare to each other!

%%timeit
_ = comp_model(**inputs)

9.67 ms ± 2.28 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
out = jit_apply(params, None, **inputs_jax)

4.09 ms ± 9.48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As expected, we have made the model significantly faster with just one line of code, getting a ~2x increase in its execution speed! 🚀

Finally, as a sanity check, let’s load a different image and make sure that the results are the same in both models

url = "http://images.cocodataset.org/train2017/000000283921.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = feature_extractor(
    images=image, return_tensors="pt"
).to("cuda")
inputs_jax = feature_extractor(
    images=image, return_tensors="jax"
)
out_torch = comp_model(**inputs)
out_jax = jit_apply(params, None, **inputs_jax)

np.allclose(out_torch.last_hidden_state.detach().cpu().numpy(), out_jax, atol=1e-4)

True

That’s pretty much it! The results from both models are the same, but we have achieved a solid speed up by using Ivy’s transpiler to convert the model to JAX!