About

Website | Getting Started | Documentation | Discord | Contributing

About

ZML is a production inference stack, purpose-built to decouple AI workloads from proprietary hardware.

Any model, many hardwares, one codebase, peak performance.

Compiled directly to NVIDIA, AMD, TPU, Trainium for peak hardware performance on any accelerator. No rewriting.

It is built using the Zig language, MLIR, and Bazel.

Getting started

Prerequisites

We use bazel to build ZML and its dependencies. The only prerequisite is bazel, which we recommend to download through bazelisk, a version manager for bazel.

Install Bazel (recommended):

macOS

brew install bazelisk

Linux

curl -L -o /usr/local/bin/bazel 'https://github.com/bazelbuild/bazelisk/releases/download/v1.28.0/bazelisk-linux-amd64'
chmod +x /usr/local/bin/bazel

Run a pre-packaged model

We have implemented a variety of example models in ZML. See our reference implementations in the examples folder.

MNIST

The classic handwritten digits recognition task. The model is tasked to recognize a handwritten digit, which has been converted to a 28x28 pixel monochrome image. Bazel will download a pre-trained model, and the test dataset. The program will load the model, compile it, and classify a randomly picked example from the test dataset.

On the command line:

bazel run //examples/mnist

Meta Llama 3.2 1B

This model has restrictions, see here. It requires approval from Meta on Hugging Face, which can take a few hours to get granted.

Ensure you are authenticated with the Hugging Face CLI:

hf auth login

Alternatively, set the HF_TOKEN environment variable.

Now, you can run the model like so:

bazel run //examples/llm -- --model=hf://meta-llama/Llama-3.2-1B-Instruct --prompt="What is the capital of France?"

For a larger 3.2 model, you can also try Llama-3.2-3B-Instruct.

Meta Llama 3.1 8B

Like the 1B model above, this model also requires approval. See here for access requirements.

bazel run //examples/llm -- --model=hf://meta-llama/Llama-3.1-8B-Instruct --prompt="What is the capital of France?"

You can also try Llama-3.1-70B-Instruct if you have enough memory.

Running Models on GPU / TPU

You can compile models for accelerator runtimes by appending one or more of the following arguments to the command line when compiling / running a model:

NVIDIA CUDA: --@zml//platforms:cuda=true
AMD RoCM: --@zml//platforms:rocm=true
Google TPU: --@zml//platforms:tpu=true
AWS Trainium/Inferentia 2: --@zml//platforms:neuron=true
AVOID CPU: --@zml//platforms:cpu=false

The latter, avoiding compilation for CPU, cuts down compilation time.

So, to run the Llama 3.1 8B model from above on your host supporting an NVIDIA GPU, run the following:

bazel run //examples/llm --@zml//platforms:cuda=true -- --model=hf://meta-llama/Llama-3.1-8B-Instruct

And on your host supporting an AMD GPU:

bazel run //examples/llm --@zml//platforms:rocm=true -- --model=hf://meta-llama/Llama-3.1-8B-Instruct

Same goes for all supported platforms.

Run Tests

bazel test //zml:test

A taste of ZML

MNIST

const Mnist = struct {
    fc1: Layer,
    fc2: Layer,

    const Layer = struct {
        weight: zml.Tensor,
        bias: zml.Tensor,

        pub fn init(store: zml.io.TensorStore.View) Layer {
            return .{
                .weight = store.createTensor("weight", .{ .d_out, .d }, null),
                .bias = store.createTensor("bias", .{.d_out}, null),
            };
        }

        pub fn forward(self: Layer, input: zml.Tensor) zml.Tensor {
            return self.weight.dot(input, .d).add(self.bias).relu().withTags(.{.d});
        }
    };

    pub fn init(store: zml.io.TensorStore.View) Mnist {
        return .{
            .fc1 = .init(store.withPrefix("fc1")),
            .fc2 = .init(store.withPrefix("fc2")),
        };
    }

    pub fn load(
        self: *const Mnist,
        allocator: std.mem.Allocator,
        io: std.Io,
        platform: *const zml.Platform,
        store: *const zml.io.TensorStore,
        shardings: []const zml.sharding.Sharding,
    ) !zml.Bufferized(Mnist) {
        return zml.io.load(Mnist, self, allocator, io, platform, store, .{
            .shardings = shardings,
            .parallelism = 1,
            .dma_chunks = 1,
            .dma_chunk_size = 16 * 1024 * 1024,
        });
    }

    pub fn unloadBuffers(self: *zml.Bufferized(Mnist)) void {
        self.fc1.weight.deinit();
        self.fc1.bias.deinit();
        self.fc2.weight.deinit();
        self.fc2.bias.deinit();
    }

    /// just two linear layers + relu activation
    pub fn forward(self: Mnist, input: zml.Tensor) zml.Tensor {
        var x = input.flatten().convert(.f32).withTags(.{.d});
        const layers: []const Layer = &.{ self.fc1, self.fc2 };
        for (layers) |layer| {
            x = layer.forward(x);
        }
        return x.argMax(0).indices.convert(.u8);
    }
};

Where to go next:

You might want to check out more examples, read through the documentation directly on GitHub.

Contributing

See here.

License

ZML is licensed under the Apache 2.0 license.

Thanks to our contributors

Name		Name	Last commit message	Last commit date
Latest commit History 605 Commits
.github		.github
.vscode		.vscode
.zed		.zed
bazel		bazel
docs		docs
examples		examples
ffi		ffi
mlir		mlir
pjrt		pjrt
platforms		platforms
stdx		stdx
third_party		third_party
tools		tools
upb		upb
zml		zml
.bazelignore		.bazelignore
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.editorconfig		.editorconfig
.gitignore		.gitignore
.nvim.lua		.nvim.lua
AGENTS.md		AGENTS.md
BUILD.bazel		BUILD.bazel
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MODULE.bazel		MODULE.bazel
MODULE.bazel.lock		MODULE.bazel.lock
README.md		README.md
bazel.sh		bazel.sh
build.zig		build.zig
devenv.lock		devenv.lock
devenv.nix		devenv.nix
devenv.yaml		devenv.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Getting started

Prerequisites

macOS

Linux

Run a pre-packaged model

MNIST

Meta Llama 3.2 1B

Meta Llama 3.1 8B

Running Models on GPU / TPU

Run Tests

A taste of ZML

MNIST

Where to go next:

Contributing

License

Thanks to our contributors

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Getting started

Prerequisites

macOS

Linux

Run a pre-packaged model

MNIST

Meta Llama 3.2 1B

Meta Llama 3.1 8B

Running Models on GPU / TPU

Run Tests

A taste of ZML

MNIST

Where to go next:

Contributing

License

Thanks to our contributors

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages