Skip to content

accl-kaust/offrac

Repository files navigation

OffRAC: Offloading Through Remote Accelerator Calls

OffRAC lets clients invoke FPGA-hosted accelerators directly over the network with no host CPU involved. Incoming requests are reassembled on the wire and dispatched to virtualized compute units (CNN inference, logit, normalization, top-k selection, and echo), enabling FaaS-style multi-tenant acceleration at 100 Gbps with ~10.5 µs end-to-end latency.

Network Stack

This project uses the fpgasystems/Vitis_with_100Gbps_TCP-IP network stack.

Prerequisites

  • Xilinx Vitis 2020.1
  • Xilinx Vivado 2020.1
  • Target platform: xilinx_u280_xdma_201920_3
  • CMake ≥ 3.0

1. Configure and Build the TCP Stack IPs

Run once from the repository root to synthesize the network stack HLS IPs and install them into the IP repository:

mkdir build
cd build
cmake .. -DFDEV_NAME=u280 -DTCP_STACK_EN=1 -DTCP_STACK_RX_DDR_BYPASS_EN=1
make installip

This populates build/fpga-network-stack/iprepo/ with all required IP cores, including myproject_1 (The CNN IP generated from HLS4ML).

2. Generate Xilinx IP XCI Files

The Xilinx IP cores used by offrac_krnl (AXI4-Stream FIFOs and Floating Point operators) cannot be distributed in this repository due to Xilinx licensing. Regenerate them with the provided script before building:

cd kernel/user_krnl/offrac_krnl/src/hdl/offrac
vivado -mode batch -source gen_ip.tcl

See kernel/user_krnl/offrac_krnl/src/hdl/offrac/README.md for the full list of IPs and their parameters.

3. Build the offrac Kernel

From the repository root, run a full hardware build:

make all TARGET=hw DEVICE=/opt/xilinx/platforms/xilinx_u280_xdma_201920_3/xilinx_u280_xdma_201920_3.xpfm USER_KRNL=offrac_krnl USER_KRNL_MODE=rtl NETH=4

This produces build_dir.hw.<xsa>/network.xclbin.

4. Recompile Host Code Only

To rebuild only the host executable without re-running synthesis:

make compile USER_KRNL=offrac_krnl USER_KRNL_MODE=rtl NETH=4

The host binary is written to host/host.

5. Program

./host/host <path_to.xclbin> 

Adding a Custom Accelerator

kernel/user_krnl/offrac_krnl/src/hdl/offrac/template_workload.v is a minimal template for integrating your own accelerator into OffRAC. Copy and rename it, then fill in the three marked sections:

  1. Workload ID — assign a new ID constant in the module parameters (e.g. TEMPLATE = 16'b1000, workload byte 08 00). The gating logic at the top of the module forwards AXI-Stream beats here only when the 16-bit workload field in the request header matches that ID.

  2. Packet parser (optional, commented out) — if your accelerator expects a different data width than the 512-bit network bus, insert a packet_parser stage to re-pack the beats before passing them to the compute core.

  3. Accelerator instance (commented out) — instantiate your HLS or RTL core and connect its AXI-Stream in_r / out_r ports. The template uses a 15-cycle delay counter as a stand-in for this stage.

Once the module is complete, instantiate it inside pkt_logic.v.


Payload Format

Every invocation is a raw TCP byte stream. All multi-byte integers are little-endian.

Header (64 bytes, sent once per connection)

Offset Size Value Description
0–55 56 B 0xFF × 56 Reserved / padding
56–59 4 B uint32 Total message size in bytes (header + all data lines)
60–61 2 B 0xFFFF TOP-k configuration, K
62–63 2 B uint16 Workload ID (see table below)

Workload IDs

Bytes (hex, wire order) Workload Response size
00 00 Echo same as input data
02 00 CNN 24KB input response with 64B
03 00 Logit input data - 64
05 00 Normalization input data - 64

Quick Test with nc

The examples below use nc to simply test the offrac with top-k invocation. Replace <FPGA_IP> and <PORT> with the board's IP and the target port (2888 by default).

Top-K (workload 0x0100)

Send 1 line of 16 integers (values 0–15, each 4-byte); receive 64 bytes of top-k results. Should be able to monitor using wireshark.

echo "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff80000000ffff010001000000010000000100000001000000010000000100000001000000010000000100000001000000010000000100000001000000010000000100000001000000" | xxd -r -p | nc [FPGA_IP] [PORT]

Breakdown of the hex string:

ff×56          → 56 bytes padding
80 00 00 00    → total size = 128 bytes (LE uint32)
ff ff          → top-16 configuration
01 00          → workload ID: top-k 
xx xx xx xx    → value 1
xx xx xx xx    → value 16

Citation

If you use this work, please cite:

@misc{yang2025offrac,
  title         = {OffRAC: Offloading Through Remote Accelerator Calls},
  author        = {Ziyi Yang and Krishnan B. Iyer and Yixi Chen and Ran Shu and Zsolt Istv{\'a}n and Marco Canini and Suhaib A. Fahmy},
  year          = {2025},
  eprint        = {2504.04404},
  archivePrefix = {arXiv},
  primaryClass  = {cs.NI},
  url           = {https://arxiv.org/abs/2504.04404}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors