Skip to content

kroggen/qwen3.5-c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Qwen3.5 in C

Inference of Qwen3.5 models in pure C, for learning purpose

No pytorch required. The safetensors loading is done with safetensors-cpp

Inspired by llama2.c and mamba.c

For those interested in (or that only learn by) seeing the actual operations on the weights and state at a lower level

Qwen 3.5 combines multi-head attention and linear attention (GatedDeltaNet) layers

For fast inference, use other methods like qwen3.5-triton

Fast Start

pip install huggingface_hub transformers
python prepare.py Qwen/Qwen3.5-0.8B   # download + create tokenizer
make fast
./qwen35 Qwen3.5-0.8B

If there are more than 1 model with the same name on the local cache, pass the full name of the model:

./qwen35 Qwen/Qwen3.5-0.8B

Or pass the path to the folder containing the model:

./qwen35 ./Qwen3.5-0.8B

Models

Use Qwen3.5 dense models from Qwen's Huggingface folder or finetunes

Examples:

  • Qwen/Qwen3.5-0.8B
  • Qwen/Qwen3.5-2B
  • Qwen/Qwen3.5-4B
  • Qwen/Qwen3.5-9B

Many of these repos are vision–language on the Hub; prepare.py uses text_config when present and exports the text transformer only.

Not supported: MoE checkpoints (e.g. Qwen3.5-35B-A3B, 122B-A10B, 397B-A17B), FP8 / GPTQ / other non-float weight formats.

Build

make          # reference
make fast     # -Ofast -march=native
make omp      # OpenMP (set OMP_NUM_THREADS when running)
make debug
make clean

Run

./qwen35 <model> [options]

# <model> can be a model name (after prepare.py) or a local directory
./qwen35 Qwen3.5-0.8B -i "Hello!"
./qwen35 ./Qwen3.5-0.8B -y "You are a helpful assistant."

License

WTFPL (Do What The Fuck You Want To Public License)

About

Qwen 3.5 in C

Resources

Stars

Watchers

Forks

Contributors