Introducing the key role that Bazel plays in our build ecosystem and how our Machine Learning engineers take models to production fast.

How We Used Bazel to Streamline Our AI Development

Dhanshree Arora

Sep 15, 2020

•

15 min read

We love microservices! We pack our predictive models in neat little Flask apps integrating with the back-end and give you the experience of SpotDraft. It all sounds wonderful, until it doesn’t work. Any Data Scientist or Machine Learning engineer would know all too well the nightmare that is managing dependencies. And well tested code -- do we really do that? The Data Scientist’s utopia lies in spending 90% time working with data and optimizing models, and spending the remaining 10% in packaging code and deploying it to production - but this does not always work within smaller teams and young startups.

Ours is one such startup with an engineering team of less than 15 people. Owing to this nature, DevOps resources cannot be siloed. The machine learning team is a very small subset of engineering, and with our product shifting to large dependencies on front-end and back-end development, oftentimes the team gets little DevOps love. This is the story of how we used Bazel and created a machine learning dream team without a DevOps engineer.

Okay, But What is Bazel?

It wouldn’t be surprising if at this point you’re wondering what is Bazel. Afterall, in Machine Learning land the search trends favour the likes of PyTorch and Tensorflow (also part of our cool stack by the way).

In a nutshell, Bazel is an open source subset of Blaze -- Google’s internal build automation tool-- which uses Starlark syntax (a Python predecessor) to write BUILD files to build projects in Python, .NET, C++, and much more. Since we primarily use Python, I’ll talk about Bazel’s py_* rules and how we leverage them within the SpotDraft codebase.

Everything in a BUILD file is either a file or a rule, collectively called targets. A target specifies the source files and dependencies it depends on, including other targets. Bazel has built in rules for Python presenting them as the holy trinity of py_library, py_binary, and py_test, which look like this:

py_library(
	name = "foo",
	srcs = ["foo.py"],
)

py_binary(
	name = "bar",
	srcs = [
            "bar.py",
            ":foo",
      ],
)

py_test(
	name = "test_foo",
	srcs = "test_foo.py",
	deps = [
		":foo",
      ],
)

Ship Fast (And Never Write A Dockerfile Again!)

We haven’t written a Dockerfile ever since we integrated Bazel, yet all our microservices run as Docker containers. Bazel supports docker rules, and thus creating Docker images becomes as easy as the few lines of these py3_image, container_image, and container_push rules.

py3_image(
	name = "foo_base",
	srcs = ["foo.py"],
	main = "foo.py",
	srcs_version = "PY3ONLY",
	# other deps
  deps = [...],
)

container_image(
	name = "foo_image",
	srcs = ":foo_base",
	ports = ["5000"]
)

container_push(
	name = "push_foo_image",
	format = "Docker",
	image = ":foo_image",
	registry = "<your-registry>",
	repository = "<your-repository>",
	tag = "latest",
)

‍

This is just the beginning. In our opinion, Bazel really shines when it comes to the next use cases we discuss within the context of our large codebase.

Multiple Languages Within A Single Repository

We recently integrated most of our Python based code with an upstream service written in C#, and naturally fell prey to the pitfalls. Any breaking change in the upstream service cost us several hours trying to figure out where the bug came from. Sure, we could have written integration tests from the get go, but there are two problems with that: the added overhead of network requests and the latencies involved, and tracking down the latest staging version of the upstream service.

Why do that when you can easily bundle code from several different languages in the same repository. This is exactly what we did with the upstream service. The ease of defining dependencies in Bazel makes it a breeze to work with multiple languages. Guess who did not have to spend anymore time trying to trace bugs!

Another advantage of this approach is that you never have to worry about setting up the environment to work with a different language because as we have seen, Bazel can easily accomplish that for you.

Make Cross-Cutting Changes With Confidence

No one likes writing code that’s already been written, and not reinventing the wheel is highly encouraged at SpotDraft. However, what happens when you change a piece of code written some eons ago to work with your new requirements and find a few sprints later that it breaks something else. You tested your code, so surely you couldn’t be in the wrong. Turns out, that’s not the case.

The importance of testing cross cutting changes cannot be stressed enough, but how do you find out which parts of the larger codebase are affected by your change. This brings me to the last step of our Bazel workflow - the rdeps rule.

The rdeps rule computes reverse dependencies. Say, from our above example, you change something in package1:foo, and only test package1:bar. However there’s a package2:baz which also depends on package1:foo and your change introduces a bug in package2:baz. Instead of searching for all references to package1.foo in your code, which is largely limited by the efficiency of your editor, it’s far simpler to run the bazel command bazel query "rdeps(//package1:foo, //...)" which will output:

//package1:bar
//package2:baz

And voila! Now you know all the tests you need to run.

Putting it All Together - The Dream CI Pipeline

To recapture, these are the steps involved in the Bazel workflow.

Write BUILD files, and update requirements as necessary.
Run bazel build or bazel run to ensure your code builds.
Get all the files depending on your change using bazel query with rdeps
Finally test all the files with bazel test.

We took inspiration from this open source CI script provided in the bazel repository to execute this flow in tandem with Gitlab’s CI environment which is what we also use at SpotDraft.

Friendly Disclaimer: User Discretion Advised!

This wouldn’t be a true retelling of our Bazel story if I do not talk about the pains we have felt along the way in Bazel’s adoption.

Build errors are hard to debug: The learning curve involved is steep, and a lack of far reaching adoption within the tech community leads to limited search results on StackOverflow. On a good day, well tested code goes out fast and everyone’s happy, on a bad day Bazel just refuses to work for you. However once you get used to the debugging suggestions from Bazel itself, the ride becomes a little easier.

Bazel versioning is difficult: Everyone on our team was using different versions of Bazel and everything was fine, until we started using Protocol Buffers. Some of us could not compile these protobufs and after two engineers and a day of debugging we realised the issue was versioning related. We then moved to using Bazelisk, an open source wrapper which first ensures whether you have the right Bazel version installed.

Sandboxes add an extra layer of complexity: Bazel is a local build system. To make builds truly hermetic and reproducible across machines, the build process is sandboxed. This sandboxing is OS specific and debugging within this is especially messy. This realisation became glaring when a local build on my machine (macOS) did not succeed but built just fine on CI (ubuntu base). Unfortunately there’s no easy way to work around this and the debugging process can be taxing - however the lesson is to test build on different machines when it refuses to work on yours.

We think setting up Bazel and learning the BUILD file way is a good initial investment with long term returns. Which Build tools does your team use? If you have been using Bazel, what pains have you felt? Reach out to us, we’d love to hear from you!

‍