The SD Search API enables search across different datasets.
Supported configurations:
- Bigpicture image search
- PostgreSQL: database for search metadata
- OpenSearch: search indexes build from the search metadata
- Snowstorm: SNOMED CT ontology server
OpenSearch indexes:
- Bigpicture:
bp-image-index.json
Install uv, then create the virtualenv and install all dependencies:
uv sync --devActivate the pre-commit hook to run tox before every commit:
uv run pre-commit installtox -e ruff
tox -e mypytox -e pytestIntegration tests require Postgres and OpenSearch to be running. Start them with Docker Compose:
docker compose --env-file tests/integration/.env --profile dev up --buildThen run:
uv run pytest tests/integration/Environmental variables are defined in tests/integration/.env.
Snowstorm is a SNOMED CT terminology server used by the SD Search API to resolve SNOMED CT terms to concepts.
- A Snowstorm instance is available at
https://snowstorm.rahtiapp.fi. - A SNOMED browser instance is available at:
https://snomed-browser.rahtiapp.fi/.
This is only needed when importing a new SNOMED CT release into the shared instance. The full procedure is described in https://github.com/IHTSDO/snowstorm/blob/master/docs/loading-snomed.md.
First check that the Snowstorm service is healthy:
curl https://snowstorm.rahtiapp.fi/actuator/health
Expected output:
{"status":"UP","groups":["liveness","readiness"]}%
curl -i --location 'https://snowstorm.rahtiapp.fi/imports' \
--header 'Content-Type: application/json' \
--data '{"type":"SNAPSHOT","branchPath":"MAIN","createCodeSystemVersion":true}'
Example output:
HTTP/1.1 201
location: https://snowstorm.rahtiapp.fi/imports/<ID>
Get the import ID (e.g. f0801e81-3740-48bd-bc3e-848c7aa7468e) from the response location header and define the IMPORT_ID environmental variable:
export IMPORT_ID=<ID>
Upload SNOMED release file (e.g. SnomedCT_InternationalRF2_PRODUCTION_20260601T120000Z.zip):
curl --location -X POST "https://snowstorm.rahtiapp.fi/imports/${IMPORT_ID}/archive" \
-F "file=@<SNOMED release file>"
The upload and import can take several hours. Poll the import status until status is COMPLETED
or until the import job is no longer available:
curl --location "https://snowstorm.rahtiapp.fi/imports/${IMPORT_ID}"
Example output while running:
{
"status" : "RUNNING",
"type" : "SNAPSHOT",
"branchPath" : "MAIN",
"internalRelease" : false,
"moduleIds" : [ ],
"createCodeSystemVersion" : true
}
You can monitor the import progress also from the logs:
oc logs -f deployment/snowstorm
Once finished, verify that the import has been completed.
Check the imported versions:
curl -s https://snowstorm.rahtiapp.fi/codesystems/SNOMEDCT/versions | jq '.items[] | {version, branchPath}'
Example output:
{
"version": "2026-06-01",
"branchPath": "MAIN/2026-06-01"
}
Check the MAIN branch:
curl -s https://snowstorm.rahtiapp.fi/branches/MAIN
Example output:
{
"path" : "MAIN",
"state" : "UP_TO_DATE",
"containsContent" : true,
"locked" : false,
"creation" : "2026-06-11T05:12:34.688Z",
"base" : "2026-06-11T05:12:34.688Z",
"head" : "2026-06-11T05:52:38.457Z",
"creationTimestamp" : 1781154754688,
"baseTimestamp" : 1781154754688,
"headTimestamp" : 1781157158457,
...
}
Get number of concepts:
curl -s "https://snowstorm.rahtiapp.fi/MAIN/concepts?limit=1&active=true" | jq '{total}'
Example output:
{
"total": 532824
}
Get a concept:
curl -s "https://snowstorm.rahtiapp.fi/MAIN/concepts/337915000" | jq '{conceptId, active, fsn: .fsn.term}'
Example output:
{
"conceptId": "337915000",
"active": true,
"fsn": "Homo sapiens (organism)"
}
Load a single dataset directory (default):
uv run python scripts/admin.py Bigpicture load /path/to/dataset/ --loadLoad from a parent directory containing multiple dataset subdirectories:
uv run python scripts/admin.py Bigpicture load /path/to/datasets/ --multi-dir --loadOmit --load to parse XMLs without loading them to the database.
To also sync to OpenSearch immediately after loading, add --sync:
uv run python scripts/admin.py Bigpicture load /path/to/datasets/ --multi-dir --load --syncAfter a new SNOMED CT release, update the stored preferred terms to match the new release. The preferred-terms cache is shared across deployments, so this command is not tied to a specific one:
uv run python scripts/admin.py snomed refreshThe OpenSearch index mapping (search_api/opensearch/bigpicture/bp-image-index.json) is
is generated from the filtered and non-filtered field definitions, so that field names
and types stay in sync with them.
After changing them, regenerate and commit the file:
uv run python scripts/admin.py Bigpicture generate-indexAn unit test fails if this file is different from a freshy generated one.
generate-index only writes the mapping to a local file — it does not create the index in
OpenSearch. A new OpenSearch instance needs the index created from that mapping before the
first --sync. If documents are synced into an index that doesn't exist yet, OpenSearch
silently auto-creates it with a dynamic mapping (e.g. keyword fields become text, and
nested fields become plain objects), which breaks aggregations and nested queries in ways
that only surface later, disconnected from the actual cause.
Create the index explicitly:
uv run python scripts/admin.py --env-file <env> Bigpicture create-indexThis fails loudly if the index already exists, rather than silently leaving a stale mapping in place. If an index was already auto-created with the wrong mapping, OpenSearch cannot change an existing field's type in place, so it must be deleted and recreated, and previously-synced documents must be resynced:
curl -X DELETE https://<opensearch-host>:9200/bp-image-index -u <user>:<password>
uv run python scripts/admin.py --env-file <env> Bigpicture create-index
# Reset sync state so the next --sync repopulates the recreated index:
# UPDATE document SET synced_at = NULL;
uv run python scripts/admin.py --env-file <env> Bigpicture load <dir> --load --syncThe experimental Bigpicture LLM search endpoint uses a small local Ollama model. Install and start it before running the API:
brew install ollama
ollama pull qwen2.5:14b
ollama serveThe /ai/query endpoint accepts a query for the LLM search. The LLM translates
the query text into Beacon V2 filters and returns structured results.
Example:
curl -X POST "http://localhost:8000/ai/query" \
-H "Content-Type: application/json" \
-d '{"query": "images for human females"}'