Bump stanza from 1.10.1 to 1.12.2 by dependabot[bot] · Pull Request #4887 · HyphaApp/hypha

dependabot · 2026-06-20T15:28:23Z

Bumps stanza from 1.10.1 to 1.12.2.

Release notes

v1.12.2 - weights_only security fix

This is the same as v1.12.1, with the following important update: all model now enforce weights_only=True when loading. This may obsolete some old models (all models distributed with Stanza are already patched). If that happens, please load and then resave your model with an older version of Stanza, such as v1.12.1.

Security / PyTorch compatibility

Enforce weights_only=True when loading the lemma classifier, addressing part of the security advisory GHSA-v5jw-96jm-7h2c. This should already be the default in later versions of PyTorch, but is now explicitly enforced. #1584

All models now have the code path which allows for weights_only=False removed. Instead, attempting to load a legacy model will throw an exception. If that happens, please load your model with an older version (such as 1.12.1) and resave it before proceeding. #1587

Tokenizer improvements

Add control characters to the set of characters treated as whitespace when tokenizing, fixing a bug where certain Unicode control characters (such as "region end" markers) were incorrectly attached to words. #1573 Addresses #1257

Add tokenizer augmentation that occasionally replaces commas with en-dashes or em-dashes, so that models trained on datasets that lack those characters learn to treat them similarly to commas. #1573

Add regression tests for Spanish tokenization errors reported in #1257 and tests for the whitespace/control-character handling and tokenizer augmentations. #1573

Lemmatizer improvements

Enforce weights_only=True when loading the lemma classifier, avoiding a possible security risk. #1584

The lemma classifier for ja_gsd is now also attached to ja_combined. #1584

Train and attach two lemma classifiers to en_combined — both 's and her can be reliably classified from the available data. #1584

Add end-to-end unit tests for run_lemma.py, including training a lemmatizer and attaching multiple lemma classifiers. #1586

Spanish model improvements

Add a silver dataset covering como_VERB in Spanish to the combined Spanish training data, addressing #1440. Also adds a utility to print a confusion matrix of tagging results filtered by a word regex (e.g. --upos_word_regex "^(?i:como)$"), making it easier to isolate the effects of annotation changes. #1579 stanfordnlp/handparsed-treebank@d0c29a3

Add silver training sentences covering unknown Spanish VERB lemmas to the combined Spanish lemmatizer, addressing #1255. Also includes a script to check lemmatizer results for a batch of word/POS combinations. #1580 stanfordnlp/handparsed-treebank@11327ef

Italian model improvements

Rebuild Italian models with additional training data to fix incorrect lemmatization of common words including "violino" (was incorrectly mapped to "violare") #1563 stanfordnlp/handparsed-treebank@9c46db1, and "diversi" (was incorrectly split and mapped to "dire") — resolved by retraining with the more accurate models #1564

English model improvements

The long-standing issue of "can" being tagged as a modal verb (MD) rather than a noun (NN) in noun phrases like "trash can" and "soda can" is now resolved with the combined English models. #408

New / updated models

Odia (Oriya) now uses the ODTB package as the default. Mixed POS and depparse training data is constructed from the Odia dataset combined with related Indic languages present in MuRIL-Large, following the approach used for Sindhi. The Odia NER model is now also connected to the default package. #1583

Demo and visualization improvements

Rewrite stanza-parseviewer.js to use a proper constituency parse visualizer instead of a repurposed dependency parse visualizer, fixing the broken vertical striping. Also adds a table of morphological features to the visualization. #1581 Addresses #1358

Various small improvements to the web demo: route all responses to /; templatize stanza-brat.html so the version number is sourced from _version.py; move the logo to the demo directory for easier serving; add favicon support to the pipeline demo; guard against empty POST requests. #1582

... (truncated)

Commits

8e745c8 Turns out the correct lemma test was actually kind of useless, since the tag ...
afe1c4f this should also always be right with the lemma_classifier - there's is extre...
73797ab Minor tweak to the lemma classifier pipeline test - verify that it is getting...
966e591 Push the download of the lemma_classifier from the unit test into the setup s...
1e01937 Fix a doc bug - the PretrainedWordVocab is the part that normalizes spaces
d345df1 Document the location of languages.html
890095d Add a bunch more languages from https://universaldependencies.org/languages.h...
5d8a514 Update the short_name_to_treebank module (using build...)
ced3db9 Nenets uses yrk, not nrk, in the UD datasets
1685f02 Add several langcodes from UD 2.18 that aren't already known by Stanza
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [stanza](https://github.com/stanfordnlp/stanza) from 1.10.1 to 1.12.2. - [Release notes](https://github.com/stanfordnlp/stanza/releases) - [Commits](stanfordnlp/stanza@v1.10.1...v1.12.2) --- updated-dependencies: - dependency-name: stanza dependency-version: 1.12.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot added dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code labels Jun 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump stanza from 1.10.1 to 1.12.2#4887

Bump stanza from 1.10.1 to 1.12.2#4887
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/uv/stanza-1.12.2

dependabot Bot commented on behalf of github Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github Jun 20, 2026

v1.12.2 - weights_only security fix

Security / PyTorch compatibility

Tokenizer improvements

Lemmatizer improvements

Spanish model improvements

Italian model improvements

English model improvements

New / updated models

Demo and visualization improvements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants