-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Pull requests: huggingface/tokenizers
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
BPE trainer: inherit continuing_subword_prefix/end_of_word_suffix from the model
#2108
opened Jun 13, 2026 by
discobot
Loading…
Fix Unigram trainer prune loss to use per-piece alternative count
#2106
opened Jun 13, 2026 by
NahButch
Loading…
Return an error instead of panicking on out-of-range BPE merges
#2104
opened Jun 12, 2026 by
NahButch
Loading…
ci: cache HF test data and authenticate Hub downloads to avoid rate limits
#2102
opened Jun 12, 2026 by
SBrandeis
Contributor
Loading…
Fix empty Encoding.overflowing when truncation is enabled
#2098
opened Jun 12, 2026 by
discobot
Loading…
chore(deps-dev): bump shell-quote from 1.8.3 to 1.8.4 in /tokenizers/examples/unstable_wasm/www
dependencies
Pull requests that update a dependency file
javascript
Pull requests that update Javascript code
#2096
opened Jun 10, 2026 by
dependabot
Bot
Loading…
chore(deps): bump shell-quote from 1.8.1 to 1.8.4 in /bindings/node
dependencies
Pull requests that update a dependency file
javascript
Pull requests that update Javascript code
#2093
opened Jun 9, 2026 by
dependabot
Bot
Loading…
Reduce BPE merge update allocations and add file progress logs
#2092
opened Jun 8, 2026 by
voidful
Loading…
fix(typing): correct encode() input typing (PreTokenizedInputSequence tuple + stub Any)
#2089
opened Jun 7, 2026 by
Anai-Guo
Loading…
fix(bpe): widen pair_counts to i64 + add overflow regression test (#2058)
#2087
opened Jun 6, 2026 by
pjdurden
Loading…
ByteLevel: single-pass byte-level transform (
apply_byte_map)
#2086
opened Jun 5, 2026 by
dmatth1
Loading…
Avoid walking sparse vocab holes during serialization
#2085
opened Jun 4, 2026 by
dfgvaetyj3456356-hash
Loading…
Validate BPE prefix merges without unchecked UTF-8
#2082
opened Jun 1, 2026 by
dfgvaetyj3456356-hash
Loading…
Complete incomplete Viterbi lattice tests
#2081
opened Jun 1, 2026 by
eunseo9311
Contributor
Loading…
Add group capture support for Replace normalizer in Rust and Python
#2080
opened May 30, 2026 by
ander-db
Loading…
Batch encode: coarsen rayon tasks with with_min_len
#2077
opened May 28, 2026 by
sebpop
Contributor
Loading…
bindings & bench: use mimalloc as global allocator on tested targets
#2073
opened May 26, 2026 by
sebpop
Contributor
Loading…
chore: enable Dependabot weekly GitHub Actions bumps
dependabot
#2071
opened May 26, 2026 by
hf-dependantbot-rollout
Bot
Loading…
Fix Unigram trainer prune loss to use per-piece alternative count
#2070
opened May 24, 2026 by
hunter-heidenreich
Loading…
chore(deps): bump qs and express in /tokenizers/examples/unstable_wasm/www
dependencies
Pull requests that update a dependency file
javascript
Pull requests that update Javascript code
#2067
opened May 22, 2026 by
dependabot
Bot
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-06-12.