Quadtrix.cpp

Language models in simple, dependency-free C++, with no need for 245MB of PyTorch or 107MB of cPython to understand how a transformer actually works. The native path is a from-scratch decoder-only GPT: tensors, embeddings, multi-head causal self-attention, layer norm, cross-entropy, and a analytical backward pass with AdamW, all in main.cpp and include/. No autograd, no framework — every gradient is derived and written out. technical notes: docs

Alongside it sits a parallel PyTorch implementation in engine/main.py and engine/inference.py, so you can train and generate the same architecture with torch + tiktoken when you want speed instead of transparency. A FastAPI middleware layer in backend/ and a React/TypeScript web UI in frontend/ let you chat with either backend in the browser. There's also an experimental integrated-GPU path in iGPU/.

The point of this repo is the C++ core. The PyTorch, FastAPI, and frontend layers exist to make the model usable, but if you're here to learn how a GPT is actually built and trained without a framework doing the work for you, include/backward.h is where to start reading.

quick start (C++, train + chat)

The fastest way to see the whole pipeline — tokenize, train, checkpoint, generate — using the bundled character-level corpus:

g++ -std=c++17 -O2 -I. -Iinclude -o quadtrix.exe main.cpp
./quadtrix.exe data/input.txt

This trains from scratch on data/input.txt and writes the best checkpoint to best_model.bin. Once you have a checkpoint, generate or chat with it:

./quadtrix.exe data/input.txt --generate
./quadtrix.exe data/input.txt --chat --chat-tokens 300

debugging tip: drop -O2 for -g when compiling if you want to step through include/backward.h or include/gpt.h in a debugger — the manual backward pass is much easier to follow one breakpoint at a time.

runtime arguments

quadtrix.exe [data_path] [--generate] [--chat] [--chat-tokens N]

Argument	Description
`data_path`	Plain-text corpus used to build the tokenizer and train/validation split
`--generate`	Load weights and continuously generate text
`--chat`	Load weights and start interactive terminal chat
`--chat-tokens N`	Max generated tokens per chat response

Env var	Default	Description
`GPT_DATA_PATH`	`data/input.txt`	Override the default training corpus
`GPT_MODEL_PATH`	`best_model.bin`	Override the checkpoint path

what's actually implemented in C++

No third-party runtime dependency — it builds from main.cpp, config/config.h, and include/*.h alone.

Character-level tokenizer built directly from the input corpus
Train/validation split via DataLoader
Token + positional embeddings
Multi-head causal self-attention with explicit QKV projections
Pre-layer-norm residual transformer blocks
Feed-forward MLP with ReLU
Cross-entropy loss
Fully analytical backward pass — every gradient (attention, layer norm, MLP, embeddings) is derived and coded in include/backward.h, not autograd
AdamW optimizer (first/second moment estimates, weight decay)
Checkpoint save/load
Autoregressive generation and terminal chat mode

Hyperparameters live in config/config.h and require a rebuild to take effect:

static const int BATCH_SIZE   = 4;
static const int BLOCK_SIZE   = 64;
static const int N_EMBD       = 128;
static const int N_HEAD       = 4;
static const int N_LAYER      = 4;
static const float DROPOUT    = 0.2f;
static const float LEARNING_RATE = 3e-4f;
static const int MAX_ITERS    = 3000;

For an optimized native build:

g++ -std=c++17 -O3 -march=native -I. -Iinclude -o quadtrix.exe main.cpp

the PyTorch reference path

engine/main.py trains the same architectural idea with torch, torch.nn, and GPT-2 BPE tokenization via tiktoken, useful when you want to scale past what C++ loops can comfortably train on CPU.

python engine/main.py

It looks for engine/input.txt by default; point it elsewhere with QUADTRIX_TRAIN_DATA if needed. Run inference against a saved checkpoint:

python engine/inference.py --checkpoint engine/best_model.pt --prompt "Once upon a time" --max-new-tokens 100

web chat (FastAPI + React)

To chat with either backend from a browser instead of the terminal, bring up the API and the frontend in two terminals:

# terminal 1 — backend
cd backend && uvicorn main:app --host 127.0.0.1 --port 3001

# terminal 2 — frontend
cd frontend && npm run dev

Then open http://localhost:5173 and select a backend. The PyTorch path works out of the box once a .pt checkpoint exists; the C++ backend option expects a compatible HTTP service at CPP_SERVER_URL exposing /health and /generate, which main.cpp does not currently serve on its own — use the PyTorch backend for the web UI unless you've built that bridge.

results so far

Run	Params	Val loss	Time	Notes
C++ CPU baseline	0.82M	1.31	39.4 min	small data, fragmented output
C++ CPU extended	0.83M	1.64	76.2 min	3,000 iters, char-level, 28.3M train tokens
T4	10.82M	0.72	61.3 min	coherent paragraphs, strong convergence
T4 optimized	1.99M	0.93	6.1 min	fast, stable, basic coherence

See run.md and the leaderboard in the full docs for more configurations.

how this differs from similar projects

Project	Focus	Language	Autograd
nanoGPT / minGPT	Minimal, educational GPT training	Python	PyTorch
llama2.c	Inference-only	C	None
Quadtrix.cpp	Training and inference, manual backward pass, web UI	C++ / Python / TypeScript	Manual (C++) + PyTorch

I'd like the C++ core (main.cpp, include/, config/) to stay dependency-free and to stay the part of this repo that explin transformer internals directly. The PyTorch engine, FastAPI middleware, and React frontend are welcome to grow more features, integrations, and UI polish. If you build a port to another language or framework, I'm happy to link to it from a notable-forks section; just open an issue or PR.

references

Vaswani et al., "Attention Is All You Need", 2017
Radford et al., GPT-2 technical work, 2019
nanoGPT and minGPT as educational reference points

license

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 464 Commits
.devops		.devops
.github		.github
.vscode		.vscode
assets		assets
backend		backend
benchmark		benchmark
config		config
data		data
docs		docs
engine		engine
frontend		frontend
include		include
llm.cpp		llm.cpp
model		model
scripts		scripts
train_test		train_test
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
benchmark.cpp		benchmark.cpp
docker-entrypoint.sh		docker-entrypoint.sh
init.py		init.py
main.cpp		main.cpp
requirements.txt		requirements.txt
supervisord.conf		supervisord.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quadtrix.cpp

quick start (C++, train + chat)

runtime arguments

what's actually implemented in C++

the PyTorch reference path

web chat (FastAPI + React)

results so far

how this differs from similar projects

references

license

About

Uh oh!

Releases 21

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quadtrix.cpp

quick start (C++, train + chat)

runtime arguments

what's actually implemented in C++

the PyTorch reference path

web chat (FastAPI + React)

results so far

how this differs from similar projects

references

license

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages