OffRAC lets clients invoke FPGA-hosted accelerators directly over the network with no host CPU involved. Incoming requests are reassembled on the wire and dispatched to virtualized compute units (CNN inference, logit, normalization, top-k selection, and echo), enabling FaaS-style multi-tenant acceleration at 100 Gbps with ~10.5 µs end-to-end latency.
This project uses the fpgasystems/Vitis_with_100Gbps_TCP-IP network stack.
- Xilinx Vitis 2020.1
- Xilinx Vivado 2020.1
- Target platform:
xilinx_u280_xdma_201920_3 - CMake ≥ 3.0
Run once from the repository root to synthesize the network stack HLS IPs and install them into the IP repository:
mkdir build
cd build
cmake .. -DFDEV_NAME=u280 -DTCP_STACK_EN=1 -DTCP_STACK_RX_DDR_BYPASS_EN=1
make installipThis populates build/fpga-network-stack/iprepo/ with all required IP cores, including myproject_1 (The CNN IP generated from HLS4ML).
The Xilinx IP cores used by offrac_krnl (AXI4-Stream FIFOs and Floating Point operators)
cannot be distributed in this repository due to Xilinx licensing. Regenerate them with the
provided script before building:
cd kernel/user_krnl/offrac_krnl/src/hdl/offrac
vivado -mode batch -source gen_ip.tclSee kernel/user_krnl/offrac_krnl/src/hdl/offrac/README.md
for the full list of IPs and their parameters.
From the repository root, run a full hardware build:
make all TARGET=hw DEVICE=/opt/xilinx/platforms/xilinx_u280_xdma_201920_3/xilinx_u280_xdma_201920_3.xpfm USER_KRNL=offrac_krnl USER_KRNL_MODE=rtl NETH=4This produces build_dir.hw.<xsa>/network.xclbin.
To rebuild only the host executable without re-running synthesis:
make compile USER_KRNL=offrac_krnl USER_KRNL_MODE=rtl NETH=4The host binary is written to host/host.
./host/host <path_to.xclbin> kernel/user_krnl/offrac_krnl/src/hdl/offrac/template_workload.v is a
minimal template for integrating your own accelerator into OffRAC. Copy and rename it, then fill in the three marked sections:
-
Workload ID — assign a new ID constant in the module parameters (e.g.
TEMPLATE = 16'b1000, workload byte08 00). The gating logic at the top of the module forwards AXI-Stream beats here only when the 16-bit workload field in the request header matches that ID. -
Packet parser (optional, commented out) — if your accelerator expects a different data width than the 512-bit network bus, insert a
packet_parserstage to re-pack the beats before passing them to the compute core. -
Accelerator instance (commented out) — instantiate your HLS or RTL core and connect its AXI-Stream
in_r/out_rports. The template uses a 15-cycle delay counter as a stand-in for this stage.
Once the module is complete, instantiate it inside pkt_logic.v.
Every invocation is a raw TCP byte stream. All multi-byte integers are little-endian.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0–55 | 56 B | 0xFF × 56 |
Reserved / padding |
| 56–59 | 4 B | uint32 |
Total message size in bytes (header + all data lines) |
| 60–61 | 2 B | 0xFFFF |
TOP-k configuration, K |
| 62–63 | 2 B | uint16 |
Workload ID (see table below) |
| Bytes (hex, wire order) | Workload | Response size |
|---|---|---|
00 00 |
Echo | same as input data |
02 00 |
CNN | 24KB input response with 64B |
03 00 |
Logit | input data - 64 |
05 00 |
Normalization | input data - 64 |
The examples below use nc to simply test the offrac with top-k invocation.
Replace <FPGA_IP> and <PORT> with the board's IP and the target port (2888 by default).
Send 1 line of 16 integers (values 0–15, each 4-byte); receive 64 bytes of top-k results. Should be able to monitor using wireshark.
echo "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff80000000ffff010001000000010000000100000001000000010000000100000001000000010000000100000001000000010000000100000001000000010000000100000001000000" | xxd -r -p | nc [FPGA_IP] [PORT]Breakdown of the hex string:
ff×56 → 56 bytes padding
80 00 00 00 → total size = 128 bytes (LE uint32)
ff ff → top-16 configuration
01 00 → workload ID: top-k
xx xx xx xx → value 1
xx xx xx xx → value 16
If you use this work, please cite:
@misc{yang2025offrac,
title = {OffRAC: Offloading Through Remote Accelerator Calls},
author = {Ziyi Yang and Krishnan B. Iyer and Yixi Chen and Ran Shu and Zsolt Istv{\'a}n and Marco Canini and Suhaib A. Fahmy},
year = {2025},
eprint = {2504.04404},
archivePrefix = {arXiv},
primaryClass = {cs.NI},
url = {https://arxiv.org/abs/2504.04404}
}