structured-extraction

Here are 42 public repositories matching this topic...

NameetP / pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

python pdf ocr mcp self-healing structured-extraction rag pdf-to-json pdf-extraction ai-agent llm document-parsing pdf-to-markdown docling opendataloader

Updated Jun 24, 2026
Python

jndiogo / sibila

Star

Extract structured data from local or remote LLM models

python ai openai structured-data gpt structured-extraction dataclasses local-models pydantic large-language-models llamacpp llm-inference local-ai gguf structured-generation

Updated Jun 21, 2024
Python

xinyuran / RLRefine

Star

A schema-driven framework for LLM structured extraction enhanced by multi-stage RL training (SFT→DPO→GRPO), with interpretable reward design and end-to-end reproducibility.

nlp reinforcement-learning chinese-nlp lora structured-extraction rlhf reward-model qwen grpo

Updated May 9, 2026
Python

doctruthhq / DocTruth

Star

Auditable LLM extraction for Java: structured output with source citations, PDF bounding boxes, confidence, provenance, and audit JSON.

Updated Jun 21, 2026
Java

sputnicyoji / Structured-Extractor

Star

Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.

python nlp knowledge-graph structured-extraction claude-code post-processing-pipeline

Updated Feb 9, 2026
Python

adi2355 / MCP-Server-Collection

Star

Collection of purpose-built MCP servers for AI agent workflows.

python typescript mcp web-scraping data-extraction jsonpath ai-agents structured-extraction llm deepseek firecrawl model-context-protocol mcp-server codebase-analysis agent-workflows

Updated Apr 7, 2026
HTML

vikyw89 / llmtext

Star

A simple llm library

python agent async asynchronous openai gpt structured-extraction tool-use instructor large-language-models llm chatgpt prompt-optimization agentic-ai

Updated May 19, 2026
Python

awaithumans / awaitverify-managed-document-verification-pdf-ocr-extraction

Star

Send your low-confidence document extractions. A human reviews them against the PDF and returns a typed Pydantic/Zod response. Managed document verification for AI agents. PDF + handwritten OCR. Client-side fragmentation: full document never leaves your machine. $0.80/page + $5 free credit. Express 30-min SLA. Built on open source awaithumans.

Updated Jun 2, 2026

chigwell / news-summizr

Sponsor

Star

news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.

pattern-matching data-analysis structured-extraction reporting-tools news-summary key-information-extraction workflow-integration headline-analysis retry-mechanisms reliable-output concise-summarization labeled-summaries

Updated Dec 21, 2025
Python

chigwell / summaryxtract

Sponsor

Star

A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt

Updated Dec 21, 2025
Python

vigneshc94 / tutorscribe

Star

Turn tutorial videos into structured specs — Pine Script, recipes, code walkthroughs

tutorial video whisper claude tradingview structured-extraction pine-script anthropic

Updated Apr 28, 2026
Python

BhaveshBytess / Research-Paper-Analyzer

Star

Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.

nlp machine-learning pdf-parsing scientific-papers structured-extraction academic-research evidence-extraction streamlit-app llm research-paper-analysis

Updated Nov 14, 2025
Python

Mathos34 / cv-extract-json

Star

Structured CV extraction with strict JSON schema and anti-hallucination guarantees.

nlp transformers pytorch information-extraction ner structured-extraction pydantic

Updated Jun 16, 2026
Python

FAIRmat-NFDI / extract-eval

Star

schema-driven evaluation for LLM JSON extraction, json evaluation, structured-extraction, benchmark

benchmark structured-extraction schema-driven-evaluation-for-llm-json-extraction extracted-json-evaluation

Updated Jun 11, 2026
Python

jonmoubayed / ezpz-evals

Star

Local-first eval harness for unstructured-document extraction — compare LLMs, OCR/IDP tools, and strategies (cascade/ensemble/verify) on the same cohort.

python benchmark ocr information-extraction gemini openai idp structured-extraction document-extraction llm anthropic evals ollama llm-evaluation

Updated Jun 26, 2026
Python

ademakdogan / prompt-optimizer

Star

Automated prompt optimization using mentor-agent architecture. Generate and refine prompts from labeled data.

openai structured-extraction ai-automation llm prompt-engineering prompt-versioning prompt-optimization llm-optimization

Updated Feb 2, 2026
Python

LLMSystems / SEC-10-K-Structured-Extraction

Star

Parses SEC EDGAR Form 10-K annual reports into standardized JSON, automatically identifying the content and status of every Item

information-extraction financial-data xbrl sec-edgar annual-reports structured-extraction rule-based-parsing form-10k

Updated Jun 10, 2026
Python

Adarsh-Menon / Zarik-Travel-Agency-Support

Star

AI-powered travel agency assistant (*) a LangGraph stateful agent on Telegram that captures preferences through natural conversation, generates personalized itineraries via Groq/Llama 3.3, auto-manages leads in Excel, and remembers returning users. Built with LangChain, FastAPI, and python-telegram-bot.

Updated May 1, 2026
Python

Aaron-Garvin / Local-SLM-Ollama

Star

This repository implements a fully offline local SLM benchmark using Ollama, FastAPI, and Instructor. It runs three local models on your machine, evaluates structured JSON extraction quality, and compares inference latency and success across models.

benchmark mistral structured-extraction fastapi ollama llama3 local-slm

Updated Jun 16, 2026
Python

xinyuran / review-extract-agent

Star

ReAct-based intelligent analysis Agent with 4-layer architecture (Skill-Agent-LLMService-Tool), dual tool-calling modes (Native FC / Prompt-based), triple execution engine (Offline/Fast/Agent), incremental reflection with convergence detection, Skill template system, SSE streaming, Prometheus monitoring, and SFT trajectory export.

sentiment-analysis chinese-nlp keyword-extraction structured-extraction self-reflection pydantic fastapi rlhf vllm llm-agents react-agent tool-calling

Updated May 28, 2026
Python

Improve this page

Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structured-extraction

Here are 42 public repositories matching this topic...

NameetP / pdfmux

jndiogo / sibila

xinyuran / RLRefine

doctruthhq / DocTruth

sputnicyoji / Structured-Extractor

adi2355 / MCP-Server-Collection

vikyw89 / llmtext

awaithumans / awaitverify-managed-document-verification-pdf-ocr-extraction

chigwell / news-summizr

chigwell / summaryxtract

vigneshc94 / tutorscribe

BhaveshBytess / Research-Paper-Analyzer

Mathos34 / cv-extract-json

FAIRmat-NFDI / extract-eval

jonmoubayed / ezpz-evals

ademakdogan / prompt-optimizer

LLMSystems / SEC-10-K-Structured-Extraction

Adarsh-Menon / Zarik-Travel-Agency-Support

Aaron-Garvin / Local-SLM-Ollama

xinyuran / review-extract-agent

Improve this page

Add this topic to your repo