Language models in simple, dependency-free C++, with no need for 245MB of PyTorch or 107MB of cPython to understand how a transformer actually works. The native path is a from-scratch decoder-only GPT: tensors, embeddings, multi-head causal self-attention, layer norm, cross-entropy, and a analytical backward pass with AdamW, all in main.cpp and include/. No autograd, no framework — every gradient is derived and written out. technical notes: docs
Alongside it sits a parallel PyTorch implementation in engine/main.py and engine/inference.py, so you can train and generate the same architecture with torch + tiktoken when you want speed instead of transparency. A FastAPI middleware layer in backend/ and a React/TypeScript web UI in frontend/ let you chat with either backend in the browser. There's also an experimental integrated-GPU path in iGPU/.
The point of this repo is the C++ core. The PyTorch, FastAPI, and frontend layers exist to make the model usable, but if you're here to learn how a GPT is actually built and trained without a framework doing the work for you, include/backward.h is where to start reading.
The fastest way to see the whole pipeline — tokenize, train, checkpoint, generate — using the bundled character-level corpus:
g++ -std=c++17 -O2 -I. -Iinclude -o quadtrix.exe main.cpp
./quadtrix.exe data/input.txtThis trains from scratch on data/input.txt and writes the best checkpoint to best_model.bin. Once you have a checkpoint, generate or chat with it:
./quadtrix.exe data/input.txt --generate
./quadtrix.exe data/input.txt --chat --chat-tokens 300debugging tip: drop -O2 for -g when compiling if you want to step through include/backward.h or include/gpt.h in a debugger — the manual backward pass is much easier to follow one breakpoint at a time.
quadtrix.exe [data_path] [--generate] [--chat] [--chat-tokens N]| Argument | Description |
|---|---|
data_path |
Plain-text corpus used to build the tokenizer and train/validation split |
--generate |
Load weights and continuously generate text |
--chat |
Load weights and start interactive terminal chat |
--chat-tokens N |
Max generated tokens per chat response |
| Env var | Default | Description |
|---|---|---|
GPT_DATA_PATH |
data/input.txt |
Override the default training corpus |
GPT_MODEL_PATH |
best_model.bin |
Override the checkpoint path |
No third-party runtime dependency — it builds from main.cpp, config/config.h, and include/*.h alone.
- Character-level tokenizer built directly from the input corpus
- Train/validation split via
DataLoader - Token + positional embeddings
- Multi-head causal self-attention with explicit QKV projections
- Pre-layer-norm residual transformer blocks
- Feed-forward MLP with ReLU
- Cross-entropy loss
- Fully analytical backward pass — every gradient (attention, layer norm, MLP, embeddings) is derived and coded in
include/backward.h, not autograd - AdamW optimizer (first/second moment estimates, weight decay)
- Checkpoint save/load
- Autoregressive generation and terminal chat mode
Hyperparameters live in config/config.h and require a rebuild to take effect:
static const int BATCH_SIZE = 4;
static const int BLOCK_SIZE = 64;
static const int N_EMBD = 128;
static const int N_HEAD = 4;
static const int N_LAYER = 4;
static const float DROPOUT = 0.2f;
static const float LEARNING_RATE = 3e-4f;
static const int MAX_ITERS = 3000;For an optimized native build:
g++ -std=c++17 -O3 -march=native -I. -Iinclude -o quadtrix.exe main.cppengine/main.py trains the same architectural idea with torch, torch.nn, and GPT-2 BPE tokenization via tiktoken, useful when you want to scale past what C++ loops can comfortably train on CPU.
python engine/main.pyIt looks for engine/input.txt by default; point it elsewhere with QUADTRIX_TRAIN_DATA if needed. Run inference against a saved checkpoint:
python engine/inference.py --checkpoint engine/best_model.pt --prompt "Once upon a time" --max-new-tokens 100To chat with either backend from a browser instead of the terminal, bring up the API and the frontend in two terminals:
# terminal 1 — backend
cd backend && uvicorn main:app --host 127.0.0.1 --port 3001
# terminal 2 — frontend
cd frontend && npm run devThen open http://localhost:5173 and select a backend. The PyTorch path works out of the box once a .pt checkpoint exists; the C++ backend option expects a compatible HTTP service at CPP_SERVER_URL exposing /health and /generate, which main.cpp does not currently serve on its own — use the PyTorch backend for the web UI unless you've built that bridge.
| Run | Params | Val loss | Time | Notes |
|---|---|---|---|---|
| C++ CPU baseline | 0.82M | 1.31 | 39.4 min | small data, fragmented output |
| C++ CPU extended | 0.83M | 1.64 | 76.2 min | 3,000 iters, char-level, 28.3M train tokens |
| T4 | 10.82M | 0.72 | 61.3 min | coherent paragraphs, strong convergence |
| T4 optimized | 1.99M | 0.93 | 6.1 min | fast, stable, basic coherence |
See run.md and the leaderboard in the full docs for more configurations.
| Project | Focus | Language | Autograd |
|---|---|---|---|
| nanoGPT / minGPT | Minimal, educational GPT training | Python | PyTorch |
| llama2.c | Inference-only | C | None |
| Quadtrix.cpp | Training and inference, manual backward pass, web UI | C++ / Python / TypeScript | Manual (C++) + PyTorch |
I'd like the C++ core (main.cpp, include/, config/) to stay dependency-free and to stay the part of this repo that explin transformer internals directly. The PyTorch engine, FastAPI middleware, and React frontend are welcome to grow more features, integrations, and UI polish. If you build a port to another language or framework, I'm happy to link to it from a notable-forks section; just open an issue or PR.
- Vaswani et al., "Attention Is All You Need", 2017
- Radford et al., GPT-2 technical work, 2019
- nanoGPT and minGPT as educational reference points
MIT