A local-first research pipeline that retrieves academic papers from multiple scholarly APIs, ranks and clusters them with embeddings, synthesizes cross-paper insights, and exports reports in several formats. Uses Ollama by default for fully local LLM inference, with optional OpenAI and Anthropic providers.
Built with Python 3.13, pydantic-ai, sentence-transformers, and async I/O.
Documentation: https://ndevu12.github.io/Research_Assistant_Model/ β architecture, configuration, API, operations, and known issues.
- Multi-stage pipeline β query understanding β expansion β retrieval β deduplication β ranking β clustering β synthesis β gap analysis β citation export β report generation
- Local-first LLM β Ollama with resource-aware model auto-selection (
llama3.1:8borllama3.2:3b) - Cloud LLM support β OpenAI and Anthropic via
src/models/provider abstraction - Multi-source retrieval β OpenAlex, Semantic Scholar (arXiv, CrossRef, and others configurable)
- Embedding-backed stages β sentence-transformers (
bge-small-en-v1.5) for dedup, ranking, and clustering - Structured output β Pydantic models throughout; JSON, Markdown, HTML, and print-ready PDF (HTML)
- Citation export β BibTeX, APA, MLA, Chicago
- Session memory β optional SQLite-backed interactive sessions
- Auto-setup β installs dependencies, Ollama, and pulls the configured local model on first run
- Python 3.13+
- Pipenv for dependency management
- Internet connection (API retrieval; optional for fully offline LLM after model download)
Local LLM RAM (approximate):
| Model | RAM | Disk |
|---|---|---|
llama3.2:3b |
4β6 GB | ~2.5 GB |
llama3.1:8b |
8β10 GB | ~5 GB |
Cloud providers require only an API key β no Ollama install.
pip install pipenv
pipenv install
cp .env.example .env # optional; edit as needed
pipenv run python -m src "transformer attention mechanisms"On first run with the default Ollama provider, the assistant will:
- Check Python and embedding dependencies
- Install or start Ollama if needed
- Resolve your target model (
auto, env override, or config) - Pull the model if it is not already installed
- Run the research pipeline
Use Pipenv for all commands (pipenv run python -m src). Running plain python -m src may miss dependencies such as sentence-transformers.
While a query runs, the CLI streams live progress to stderr: pipeline stage checkmarks, sub-activities (e.g. βAnalyzing paper 2/5β), and AI token previews during LLM calls. Disable with --no-progress or RA_PIPELINE__STREAM_PROGRESS=false.
# Interactive mode
pipenv run python -m src
# Single query (markdown output)
pipenv run python -m src "your research query"
# HTML report saved to file
pipenv run python -m src --format html -o reports/report.html "your query"
# Print-ready PDF (HTML β open in browser β Print β Save as PDF)
pipenv run python -m src --format pdf -o reports/report.pdf.html "your query"
# JSON output with citation exports
pipenv run python -m src --format json --export bibtex,apa "your query"
# Session memory in batch mode
pipenv run python -m src --session "your query"pipenv run python -m setups.health_check
pipenv run python -m setups.manager # auto-select model
pipenv run python -m setups.manager --model llama3.1:8bSee setups/README.md for setup details.
Configuration is layered (highest precedence first):
- Environment variables (
RA_*prefix, nested with__) - Project
.envfile - YAML files in
config/(default.yaml,models.yaml,ranking.yaml,providers.yaml) - Code defaults
Copy .env.example to .env to get started.
| Variable | Required | Description |
|---|---|---|
S2_API_KEY |
No | Semantic Scholar API key (higher rate limits) |
RA_CROSSREF_MAILTO |
If CrossRef enabled | Email for CrossRef polite pool |
CROSSREF_MAILTO |
If CrossRef enabled | Alias for CrossRef mailto |
All providers use the RA_LLM__* namespace. API keys can also be set via provider-specific env vars (see below) or the unified RA_LLM__API_KEY.
| Variable | Default | Description |
|---|---|---|
RA_LLM__PROVIDER |
ollama |
ollama, openai, or anthropic |
RA_LLM__MODEL |
auto |
Model name, or auto for resource-based selection (Ollama only) |
RA_LLM__BASE_URL |
provider-specific | API base URL |
RA_LLM__API_KEY |
β | Unified API key override for any provider |
RA_LLM__TEMPERATURE |
0.2 |
Sampling temperature |
RA_LLM__TIMEOUT_SECONDS |
120 |
Request timeout |
| Variable | Default | Description |
|---|---|---|
RA_LLM__PROVIDER |
ollama |
Use local Ollama server |
RA_LLM__MODEL |
auto |
auto, llama3.1:8b, llama3.2:3b, etc. (see config/ollama_models.yaml) |
RA_LLM__BASE_URL |
http://localhost:11434/v1 |
Ollama OpenAI-compatible endpoint |
RA_LLM__API_KEY |
ollama |
Placeholder key (Ollama ignores it) |
OLLAMA_API_KEY |
β | Alternative to RA_LLM__API_KEY |
Model selection: Set RA_LLM__MODEL=auto to pick the best model for your RAM/disk. Override with a specific model name in .env (e.g. RA_LLM__MODEL=llama3.1:8b). Supported models are listed in config/ollama_models.yaml. Setup pulls missing models automatically on startup.
| Variable | Required | Description |
|---|---|---|
RA_LLM__PROVIDER |
Yes | Set to openai |
RA_LLM__MODEL |
Yes | e.g. gpt-4o-mini |
OPENAI_API_KEY |
Yes* | OpenAI API key |
RA_LLM__API_KEY |
Yes* | Alternative unified key |
RA_LLM__BASE_URL |
No | Custom endpoint (defaults to https://api.openai.com/v1; use for LM Studio and other OpenAI-compatible servers) |
* One of OPENAI_API_KEY or RA_LLM__API_KEY is required.
RA_LLM__PROVIDER=openai
RA_LLM__MODEL=gpt-4o-mini
OPENAI_API_KEY=sk-...
RA_SYNTHESIS__LLM_ENABLED=true| Variable | Required | Description |
|---|---|---|
RA_LLM__PROVIDER |
Yes | Set to anthropic |
RA_LLM__MODEL |
Yes | e.g. claude-3-5-haiku-latest |
ANTHROPIC_API_KEY |
Yes* | Anthropic API key |
RA_LLM__API_KEY |
Yes* | Alternative unified key |
RA_LLM__BASE_URL |
No | Custom Anthropic-compatible endpoint |
* One of ANTHROPIC_API_KEY or RA_LLM__API_KEY is required.
RA_LLM__PROVIDER=anthropic
RA_LLM__MODEL=claude-3-5-haiku-latest
ANTHROPIC_API_KEY=sk-ant-...
RA_SYNTHESIS__LLM_ENABLED=true| Variable | Default | Description |
|---|---|---|
RA_SYNTHESIS__LLM_ENABLED |
false |
Enable LLM-based synthesis (recommended for 8B+ local or cloud models) |
RA_RANKING__TOP_K |
25 |
Papers kept after ranking |
RA_PIPELINE__DEBUG |
false |
Verbose pipeline logging |
RA_PIPELINE__STREAM_PROGRESS |
true |
Live stage/LLM progress on stderr (TTY only) |
RA_DEBUG |
β | Alias for debug mode (1, true, yes) |
RA_CONFIG_DIR |
β | Override path to config/ directory |
Provider implementations live in src/models/ (ollama.py, openai.py, anthropic.py). Each resolves API keys via RA_LLM__API_KEY first, then the provider-specific env var.
Research_Assistant_Model/
βββ config/ # YAML configuration (merged at runtime)
β βββ default.yaml # Base settings
β βββ models.yaml # LLM provider overrides
β βββ ollama_models.yaml # Supported local models + RAM/disk requirements
β βββ ranking.yaml # Ranking weights and top-k
β βββ providers.yaml # Retrieval provider toggles
β
βββ src/ # Application source
β βββ __main__.py # CLI entry point (`python -m src`)
β βββ config/ # AppSettings, model auto-selection
β βββ core/ # Pipeline engine, stage recovery, metrics
β βββ retrieval/ # API clients, providers, deduplication
β β βββ providers/ # OpenAlex, Semantic Scholar, arXiv, β¦
β β βββ orchestrator.py # Pipeline facade
β β βββ models.py # RetrievedPaper, ResearchReport, β¦
β βββ research/ # Query expansion, ranking, clustering
β βββ analysis/ # Synthesis, gap analysis
β βββ embeddings/ # sentence-transformers + disk cache
β βββ models/ # LLM providers
β β βββ ollama.py # Local Ollama (default)
β β βββ openai.py # OpenAI / compatible endpoints
β β βββ anthropic.py # Anthropic Claude
β β βββ factory.py # AgentFactory + provider registry
β βββ reporting/ # Markdown, HTML, JSON renderers
β βββ export/ # BibTeX, APA, MLA, Chicago
β βββ memory/ # SQLite session store
β βββ utils/ # Logging, retry, response handling
β
βββ setups/ # Install & health-check scripts
β βββ manager.py # Full setup orchestrator
β βββ ollama.py # Ollama install + model pull
β βββ health_check.py # Validate deps, Ollama, model
β
βββ tests/ # Unit and integration tests
βββ reports/ # Generated reports (gitignored)
βββ data/ # Embeddings cache, SQLite DB (gitignored)
βββ logs/ # Structured logs (gitignored)
βββ .env # Local secrets (gitignored; see .env.example)
βββ Pipfile / Pipfile.lock
βββ README.md
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Query / CLI β
βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUERY UNDERSTANDING & EXPANSION β
β Parse intent Β· generate search variants Β· optional LLM query expansion β
βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PARALLEL PAPER RETRIEVAL β
β ββββββββββββββ ββββββββββββββββββββ βββββββββββ ββββββββββββ β
β β OpenAlex β β Semantic Scholar β β arXiv β β CrossRef β β¦ β
β ββββββββββββββ ββββββββββββββββββββ βββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EMBEDDING-BACKED PROCESSING (bge-small-en-v1.5) β
β Deduplication β Ranking β Relevance Scoring β Clustering β
βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ANALYSIS LAYER β
β Synthesis (heuristic or LLM) β Gap Analysis β Citation Export β
βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REPORT GENERATION β
β Markdown Β· JSON Β· HTML Β· PDF-ready HTML Β· BibTeX/APA/β¦ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The analysis stages call one backend selected via RA_LLM__PROVIDER:
ββββββββββββββββββββββββββββββββββββββββ
β src/models/ (factory) β
β AgentFactory Β· pydantic-ai agents β
βββββββββββββββββββββ¬βββββββββββββββββββ
β
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββββββ
β Ollama β β OpenAI β β Anthropic β
β (local default) β β gpt-4o-mini, β¦ β β claude-3-5-β¦ β
β β β β β β
β RA_LLM__MODEL=auto β β OPENAI_API_KEY β β ANTHROPIC_API_KEY β
β ollama_models.yaml β β RA_LLM__API_KEY β β RA_LLM__API_KEY β
ββββββββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββββββ
When RA_LLM__PROVIDER=ollama, startup runs this automatically if anything is missing:
pipenv run python -m src
β
βΌ
βββββββββββββββββββββ no ββββββββββββββββββββββββββββββ
β Ollama installed? ββββββββββββββΊβ Install Ollama (setups/) β
βββββββββββ¬ββββββββββ βββββββββββββββ¬βββββββββββββββ
β yes β
βΌ βΌ
βββββββββββββββββββββ no ββββββββββββββββββββββββββββββ
β Ollama running? ββββββββββββββΊβ Start ollama serve β
βββββββββββ¬ββββββββββ βββββββββββββββ¬βββββββββββββββ
β yes β
βΌ βΌ
βββββββββββββββββββββ no ββββββββββββββββββββββββββββββ
β Model installed? ββββββββββββββΊβ ollama pull <resolved> β
β (from .env/auto) β β e.g. llama3.1:8b / 3b β
βββββββββββ¬ββββββββββ βββββββββββββββ¬βββββββββββββββ
β yes β
βββββββββββββββββββ¬ββββββββββββββββββββ
βΌ
Run research pipeline
Cloud providers (openai, anthropic) skip Ollama setup and use API keys directly.
Detailed pipeline flow (Mermaid β renders on GitHub)
flowchart TD
Q[User Query] --> QU[Query Understanding]
QU --> QE[Query Expansion]
QE --> R[Parallel Retrieval]
R --> OA[OpenAlex]
R --> SS[Semantic Scholar]
R --> AX[arXiv / CrossRef / β¦]
OA --> DEDUP[Deduplication]
SS --> DEDUP
AX --> DEDUP
DEDUP --> RANK[Ranking]
RANK --> REL[Relevance Scoring]
REL --> CLU[Clustering]
CLU --> SYN[Synthesis]
SYN --> GAP[Gap Analysis]
GAP --> CIT[Citation Export]
CIT --> REP[Report Generation]
REP --> MD[Markdown]
REP --> JSON[JSON]
REP --> HTML[HTML / PDF-ready]
subgraph LLM["LLM backend (RA_LLM__PROVIDER)"]
OLL[Ollama]
OAI[OpenAI]
ANT[Anthropic]
end
QE -.-> LLM
SYN -.-> LLM
GAP -.-> LLM
Highest ββββββββββββββββββββββββββββββββββββββββββββββββΊ Lowest
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β RA_* env β β β .env β β β config/*.yamlβ β β defaults β
β (shell) β β file β β (merged) β β (in code) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
Reports include query summary, ranked papers, synthesis themes, gap analysis, and citations.
Example (markdown excerpt):
# Research Report: transformer attention mechanisms
## Executive Summary
Cross-paper synthesis highlights scaled dot-product attention, multi-head variants,
and efficiency techniques for long-context models.
## Thematic Clusters
### 1. Core Attention Architectures
2023 | NeurIPS
DOI: https://doi.org/10.xxxx/xxxxx
Key findings:
- Multi-head attention improves representational capacity
- FlashAttention reduces memory bandwidth bottlenecks
## Research Gaps
- Limited benchmarks on edge-device deployment
- Under-explored sparse attention for retrieval-augmented pipelinesExport formats:
--format |
Output |
|---|---|
markdown |
Terminal / stdout (default) |
json |
Structured EnhancedResearchReport JSON |
html |
Styled HTML report |
pdf |
Print-ready HTML (open β Print β Save as PDF) |
Use --export bibtex,apa,mla,chicago alongside any format for citation files.
pipenv run python -m setups.health_check
pipenv run python -m setups.manager- Model not installed β startup auto-pulls the resolved model; or run
ollama pull <model>manually - Wrong model β set
RA_LLM__MODELin.envor use--modelwith setup - Ollama not running β
ollama serveor re-run setup
- Set
RA_LLM__PROVIDERtoopenaioranthropicand provide the API key - Ollama setup is skipped automatically for non-Ollama providers
- Enable
RA_SYNTHESIS__LLM_ENABLED=truefor LLM-based synthesis
Always use Pipenv:
pipenv install
pipenv run python -m srctail -f logs/combined_*.logpipenv install --dev
pipenv run pytest
pipenv shell
python -m srcWithin the package (relative imports):
# In src/retrieval/openalex.py
from .models import RetrievedPaper
# In src/analysis/synthesis.py
from ..models import AgentFactory, AgentRole
from ..retrieval.models import RankedPaperFrom external scripts (absolute imports):
from src.retrieval.orchestrator import run_research_helper
from src.models import AgentFactory, create_llm_provider
from setups import run_setup, print_report| Package | Role |
|---|---|
pydantic-ai |
LLM agents with structured outputs |
aiohttp |
Async HTTP for scholarly APIs |
sentence-transformers |
Embeddings for dedup, ranking, clustering |
pydantic / pydantic-settings |
Schemas and configuration |
hdbscan |
Thematic paper clustering |