OffRAC: Offloading Through Remote Accelerator Calls

OffRAC lets clients invoke FPGA-hosted accelerators directly over the network with no host CPU involved. Incoming requests are reassembled on the wire and dispatched to virtualized compute units (CNN inference, logit, normalization, top-k selection, and echo), enabling FaaS-style multi-tenant acceleration at 100 Gbps with ~10.5 µs end-to-end latency.

Network Stack

This project uses the fpgasystems/Vitis_with_100Gbps_TCP-IP network stack.

Prerequisites

Xilinx Vitis 2020.1
Xilinx Vivado 2020.1
Target platform: xilinx_u280_xdma_201920_3
CMake ≥ 3.0

1. Configure and Build the TCP Stack IPs

Run once from the repository root to synthesize the network stack HLS IPs and install them into the IP repository:

mkdir build
cd build
cmake .. -DFDEV_NAME=u280 -DTCP_STACK_EN=1 -DTCP_STACK_RX_DDR_BYPASS_EN=1
make installip

This populates build/fpga-network-stack/iprepo/ with all required IP cores, including myproject_1 (The CNN IP generated from HLS4ML).

2. Generate Xilinx IP XCI Files

The Xilinx IP cores used by offrac_krnl (AXI4-Stream FIFOs and Floating Point operators) cannot be distributed in this repository due to Xilinx licensing. Regenerate them with the provided script before building:

cd kernel/user_krnl/offrac_krnl/src/hdl/offrac
vivado -mode batch -source gen_ip.tcl

See kernel/user_krnl/offrac_krnl/src/hdl/offrac/README.md for the full list of IPs and their parameters.

3. Build the offrac Kernel

From the repository root, run a full hardware build:

make all TARGET=hw DEVICE=/opt/xilinx/platforms/xilinx_u280_xdma_201920_3/xilinx_u280_xdma_201920_3.xpfm USER_KRNL=offrac_krnl USER_KRNL_MODE=rtl NETH=4

This produces build_dir.hw.<xsa>/network.xclbin.

4. Recompile Host Code Only

To rebuild only the host executable without re-running synthesis:

make compile USER_KRNL=offrac_krnl USER_KRNL_MODE=rtl NETH=4

The host binary is written to host/host.

5. Program

./host/host <path_to.xclbin>

Adding a Custom Accelerator

kernel/user_krnl/offrac_krnl/src/hdl/offrac/template_workload.v is a minimal template for integrating your own accelerator into OffRAC. Copy and rename it, then fill in the three marked sections:

Workload ID — assign a new ID constant in the module parameters (e.g. TEMPLATE = 16'b1000, workload byte 08 00). The gating logic at the top of the module forwards AXI-Stream beats here only when the 16-bit workload field in the request header matches that ID.
Packet parser (optional, commented out) — if your accelerator expects a different data width than the 512-bit network bus, insert a packet_parser stage to re-pack the beats before passing them to the compute core.
Accelerator instance (commented out) — instantiate your HLS or RTL core and connect its AXI-Stream in_r / out_r ports. The template uses a 15-cycle delay counter as a stand-in for this stage.

Once the module is complete, instantiate it inside pkt_logic.v.

Payload Format

Every invocation is a raw TCP byte stream. All multi-byte integers are little-endian.

Header (64 bytes, sent once per connection)

Offset	Size	Value	Description
0–55	56 B	`0xFF` × 56	Reserved / padding
56–59	4 B	`uint32`	Total message size in bytes (header + all data lines)
60–61	2 B	`0xFFFF`	TOP-k configuration, K
62–63	2 B	`uint16`	Workload ID (see table below)

Workload IDs

Bytes (hex, wire order)	Workload	Response size
`00 00`	Echo	same as input data
`02 00`	CNN	24KB input response with 64B
`03 00`	Logit	input data - 64
`05 00`	Normalization	input data - 64

Quick Test with `nc`

The examples below use nc to simply test the offrac with top-k invocation. Replace <FPGA_IP> and <PORT> with the board's IP and the target port (2888 by default).

Top-K (workload `0x0100`)

Send 1 line of 16 integers (values 0–15, each 4-byte); receive 64 bytes of top-k results. Should be able to monitor using wireshark.

echo "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff80000000ffff010001000000010000000100000001000000010000000100000001000000010000000100000001000000010000000100000001000000010000000100000001000000" | xxd -r -p | nc [FPGA_IP] [PORT]

Breakdown of the hex string:

ff×56          → 56 bytes padding
80 00 00 00    → total size = 128 bytes (LE uint32)
ff ff          → top-16 configuration
01 00          → workload ID: top-k 
xx xx xx xx    → value 1
xx xx xx xx    → value 16

Citation

If you use this work, please cite:

@misc{yang2025offrac,
  title         = {OffRAC: Offloading Through Remote Accelerator Calls},
  author        = {Ziyi Yang and Krishnan B. Iyer and Yixi Chen and Ran Shu and Zsolt Istv{\'a}n and Marco Canini and Suhaib A. Fahmy},
  year          = {2025},
  eprint        = {2504.04404},
  archivePrefix = {arXiv},
  primaryClass  = {cs.NI},
  url           = {https://arxiv.org/abs/2504.04404}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cmake		cmake
common		common
fpga-network-stack		fpga-network-stack
host		host
kernel		kernel
myproject_1		myproject_1
scripts		scripts
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md
config_hls.mk		config_hls.mk
config_rtl.mk		config_rtl.mk
utils.mk		utils.mk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OffRAC: Offloading Through Remote Accelerator Calls

Network Stack

Prerequisites

1. Configure and Build the TCP Stack IPs

2. Generate Xilinx IP XCI Files

3. Build the offrac Kernel

4. Recompile Host Code Only

5. Program

Adding a Custom Accelerator

Payload Format

Header (64 bytes, sent once per connection)

Workload IDs

Quick Test with `nc`

Top-K (workload `0x0100`)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OffRAC: Offloading Through Remote Accelerator Calls

Network Stack

Prerequisites

1. Configure and Build the TCP Stack IPs

2. Generate Xilinx IP XCI Files

3. Build the offrac Kernel

4. Recompile Host Code Only

5. Program

Adding a Custom Accelerator

Payload Format

Header (64 bytes, sent once per connection)

Workload IDs

Quick Test with nc

Top-K (workload 0x0100)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Quick Test with `nc`

Top-K (workload `0x0100`)

Packages