Skip to content

zml/tokenizer: add IREE tokenizer#416

Draft
Finistere wants to merge 7 commits intozml/v2from
brabier/iree-tokenizer
Draft

zml/tokenizer: add IREE tokenizer#416
Finistere wants to merge 7 commits intozml/v2from
brabier/iree-tokenizer

Conversation

@Finistere
Copy link
Copy Markdown
Contributor

No description provided.

@Finistere Finistere changed the title add IREE to zml zml/tokenizer: add IREE tokenizer Mar 18, 2026
@Finistere Finistere force-pushed the brabier/iree-tokenizer branch 6 times, most recently from 0859bb9 to cf3c839 Compare March 19, 2026 16:42
Finistere and others added 7 commits March 24, 2026 18:16
Using tensor parallelism strategy. From 86 toks/s to 119 tok/s for Qwen
3.5 on 2x5090
```
4cpus: 
    total: 2m56s
    ci-cpu: 2m14
    ci-neuron: 2m07s
    ci-tpu: 1m58s
8cpus:
    total: 2m26s
    ci-cpu: 1m37s
    ci-neuron: 1m47s
    ci-tpu: 1m36s
16cpus (mi8-flex):
    total: 2m07s
    ci-cpu: 1m24s
    ci-neuron: 1m26s
    ci-tpu: 1m19s
16cpus (c8i-flex):
    total: 1m58s
    ci-cpu: 1m17s
    ci-neuron: 1m15s
    ci-tpu: 1m17s
32cpus:
    total: 1m48s
    ci-cpu: 1m05s
    ci-neuron: 1m06s
    ci-tpu: 1m02s
```

RunsOn tends to prefer m8i instance (memory-optimzed) rather than c8i
ones for some reason, so forced the family with `c8+c7`, maybe we could
just use `c8`. But not sure if it's worth being that selective. RunsOn
selects the best there is.
@Finistere Finistere force-pushed the brabier/iree-tokenizer branch from cf3c839 to e554054 Compare March 25, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants