PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
-
Updated
Jun 24, 2026 - Python
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
Extract structured data from local or remote LLM models
A schema-driven framework for LLM structured extraction enhanced by multi-stage RL training (SFT→DPO→GRPO), with interpretable reward design and end-to-end reproducibility.
Auditable LLM extraction for Java: structured output with source citations, PDF bounding boxes, confidence, provenance, and audit JSON.
Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.
Collection of purpose-built MCP servers for AI agent workflows.
A simple llm library
Send your low-confidence document extractions. A human reviews them against the PDF and returns a typed Pydantic/Zod response. Managed document verification for AI agents. PDF + handwritten OCR. Client-side fragmentation: full document never leaves your machine. $0.80/page + $5 free credit. Express 30-min SLA. Built on open source awaithumans.
news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.
A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt
Turn tutorial videos into structured specs — Pine Script, recipes, code walkthroughs
Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.
Structured CV extraction with strict JSON schema and anti-hallucination guarantees.
schema-driven evaluation for LLM JSON extraction, json evaluation, structured-extraction, benchmark
Local-first eval harness for unstructured-document extraction — compare LLMs, OCR/IDP tools, and strategies (cascade/ensemble/verify) on the same cohort.
Automated prompt optimization using mentor-agent architecture. Generate and refine prompts from labeled data.
Parses SEC EDGAR Form 10-K annual reports into standardized JSON, automatically identifying the content and status of every Item
AI-powered travel agency assistant (*) a LangGraph stateful agent on Telegram that captures preferences through natural conversation, generates personalized itineraries via Groq/Llama 3.3, auto-manages leads in Excel, and remembers returning users. Built with LangChain, FastAPI, and python-telegram-bot.
This repository implements a fully offline local SLM benchmark using Ollama, FastAPI, and Instructor. It runs three local models on your machine, evaluates structured JSON extraction quality, and compares inference latency and success across models.
ReAct-based intelligent analysis Agent with 4-layer architecture (Skill-Agent-LLMService-Tool), dual tool-calling modes (Native FC / Prompt-based), triple execution engine (Offline/Fast/Agent), incremental reflection with convergence detection, Skill template system, SSE streaming, Prometheus monitoring, and SFT trajectory export.
Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.
To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."