Add ModelSEED-retrained dGPredictor as an additive reaction-energy source#264
Open
freiburgermsu wants to merge 1 commit into
Open
Conversation
…urce Records the dGPredictor group-contribution model retrained on the ModelSEED compound structures as its own per-method entry, "dGPredictor-ModelSEED", in each reaction's `thermodynamics` dict. Purely additive: it sits next to the Group contribution / eQuilibrator / (original KEGG-based) dGPredictor records, and the original "dGPredictor" entry is left untouched. The canonical deltag / deltagerr / reversibility are not changed, and no .tsv or compound files change. - New staged predictions: Biochemistry/Thermodynamics/dGPredictor/ modelseed_retrained_dG.json (31,924 reactions, kJ/mol). - New writer: Scripts/Thermodynamics/Update_Reaction_dGPredictor_ModelSEED_ Energies.py (kJ->kcal /4.184; operator via reversibility_from_energy). - 31,924 reactions gain a dGPredictor-ModelSEED record (incl. ~11,400 the original KEGG-based dGPredictor could not reach); 24,088 reactions unchanged. - Verified: every modified reaction differs from dev ONLY by the added dGPredictor-ModelSEED key; added values equal dG_mean/4.184; the writer is idempotent. - Docs: sources.yaml, Scripts/Thermodynamics/README.md, Rerun_Thermodynamics.sh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the ModelSEED-retrained dGPredictor as its own additive per-method reaction-energy source,
dGPredictor-ModelSEED, alongside the existingGroup contribution/eQuilibrator/dGPredictorrecords. Purely additive — the original KEGG-baseddGPredictorrecord is left untouched, there are no canonicaldeltag/deltagerr/reversibilitychanges, and no.tsvor compound-file changes (thermodynamicsis a JSON-only field). This continues the additive per-source philosophy of #263.What this is
dGPredictor(Wang et al. 2021) as shipped underBiochemistry/Thermodynamics/dGPredictor/was trained on KEGG compound structures.dGPredictor-ModelSEEDis the same model retrained on the ModelSEED compound structures: every ModelSEED compound carrying a complete structure is re-decomposed into atom-centered fragments (radius 1 & 2), expanding the group vocabulary, and the BayesianRidge model is refit on the same 4,001 experimental measurements remapped into ModelSEED ID space. It predicts dG for 31,924 reactions (pH 7, I 0.25 M, 298.15 K), including ~11,400 reactions the original KEGG-based model could not reach (compounds with no KEGG cross-reference).Each reaction now carries both dGPredictor estimates side-by-side, e.g.
rxn00001:New-coverage reactions (e.g.
rxn00013) carry adGPredictor-ModelSEEDrecord where there is no originaldGPredictorone — and their canonicaldeltagis still left untouched.How
Biochemistry/Thermodynamics/dGPredictor/modelseed_retrained_dG.json—{rxn: {dG_mean, dG_uncer}}in kJ/mol, 31,924 reactions.Scripts/Thermodynamics/Update_Reaction_dGPredictor_ModelSEED_Energies.py— storesdGPredictor-ModelSEED [energy, error, operator](kJ→kcal/4.184); the operator is this estimate's own thermodynamic direction via the sharedreversibility_from_energy(). Added toRerun_Thermodynamics.sh.Data changed
dGPredictor-ModelSEEDrecord; 24,088 reactions unchanged.devonly by the addeddGPredictor-ModelSEEDkey (deep per-reaction JSON equality across all 56,012 reactions; 0 other-field changes), every added value equalsdG_mean/4.184, and re-running the writer is idempotent (byte-identical output). Zero changes to canonicaldeltag/deltagerr/reversibility/notes, the originaldGPredictorrecord, any other field, or any.tsv.Docs updated in
Scripts/Thermodynamics/README.mdandBiochemistry/Structures/sources.yaml.🤖 Generated with Claude Code