Skip to content

Eamon2009/Quadtrix.cpp

Repository files navigation

Quadtrix.cpp

Language models in simple, dependency-free C++, with no need for 245MB of PyTorch or 107MB of cPython to understand how a transformer actually works. The native path is a from-scratch decoder-only GPT: tensors, embeddings, multi-head causal self-attention, layer norm, cross-entropy, and a analytical backward pass with AdamW, all in main.cpp and include/. No autograd, no framework — every gradient is derived and written out. technical notes: docs

Alongside it sits a parallel PyTorch implementation in engine/main.py and engine/inference.py, so you can train and generate the same architecture with torch + tiktoken when you want speed instead of transparency. A FastAPI middleware layer in backend/ and a React/TypeScript web UI in frontend/ let you chat with either backend in the browser. There's also an experimental integrated-GPU path in iGPU/.

The point of this repo is the C++ core. The PyTorch, FastAPI, and frontend layers exist to make the model usable, but if you're here to learn how a GPT is actually built and trained without a framework doing the work for you, include/backward.h is where to start reading.

quick start (C++, train + chat)

The fastest way to see the whole pipeline — tokenize, train, checkpoint, generate — using the bundled character-level corpus:

g++ -std=c++17 -O2 -I. -Iinclude -o quadtrix.exe main.cpp
./quadtrix.exe data/input.txt

This trains from scratch on data/input.txt and writes the best checkpoint to best_model.bin. Once you have a checkpoint, generate or chat with it:

./quadtrix.exe data/input.txt --generate
./quadtrix.exe data/input.txt --chat --chat-tokens 300

debugging tip: drop -O2 for -g when compiling if you want to step through include/backward.h or include/gpt.h in a debugger — the manual backward pass is much easier to follow one breakpoint at a time.

runtime arguments

quadtrix.exe [data_path] [--generate] [--chat] [--chat-tokens N]
Argument Description
data_path Plain-text corpus used to build the tokenizer and train/validation split
--generate Load weights and continuously generate text
--chat Load weights and start interactive terminal chat
--chat-tokens N Max generated tokens per chat response
Env var Default Description
GPT_DATA_PATH data/input.txt Override the default training corpus
GPT_MODEL_PATH best_model.bin Override the checkpoint path

what's actually implemented in C++

No third-party runtime dependency — it builds from main.cpp, config/config.h, and include/*.h alone.

  • Character-level tokenizer built directly from the input corpus
  • Train/validation split via DataLoader
  • Token + positional embeddings
  • Multi-head causal self-attention with explicit QKV projections
  • Pre-layer-norm residual transformer blocks
  • Feed-forward MLP with ReLU
  • Cross-entropy loss
  • Fully analytical backward pass — every gradient (attention, layer norm, MLP, embeddings) is derived and coded in include/backward.h, not autograd
  • AdamW optimizer (first/second moment estimates, weight decay)
  • Checkpoint save/load
  • Autoregressive generation and terminal chat mode

Hyperparameters live in config/config.h and require a rebuild to take effect:

static const int BATCH_SIZE   = 4;
static const int BLOCK_SIZE   = 64;
static const int N_EMBD       = 128;
static const int N_HEAD       = 4;
static const int N_LAYER      = 4;
static const float DROPOUT    = 0.2f;
static const float LEARNING_RATE = 3e-4f;
static const int MAX_ITERS    = 3000;

For an optimized native build:

g++ -std=c++17 -O3 -march=native -I. -Iinclude -o quadtrix.exe main.cpp

the PyTorch reference path

engine/main.py trains the same architectural idea with torch, torch.nn, and GPT-2 BPE tokenization via tiktoken, useful when you want to scale past what C++ loops can comfortably train on CPU.

python engine/main.py

It looks for engine/input.txt by default; point it elsewhere with QUADTRIX_TRAIN_DATA if needed. Run inference against a saved checkpoint:

python engine/inference.py --checkpoint engine/best_model.pt --prompt "Once upon a time" --max-new-tokens 100

web chat (FastAPI + React)

To chat with either backend from a browser instead of the terminal, bring up the API and the frontend in two terminals:

# terminal 1 — backend
cd backend && uvicorn main:app --host 127.0.0.1 --port 3001

# terminal 2 — frontend
cd frontend && npm run dev

Then open http://localhost:5173 and select a backend. The PyTorch path works out of the box once a .pt checkpoint exists; the C++ backend option expects a compatible HTTP service at CPP_SERVER_URL exposing /health and /generate, which main.cpp does not currently serve on its own — use the PyTorch backend for the web UI unless you've built that bridge.

results so far

image
Run Params Val loss Time Notes
C++ CPU baseline 0.82M 1.31 39.4 min small data, fragmented output
C++ CPU extended 0.83M 1.64 76.2 min 3,000 iters, char-level, 28.3M train tokens
T4 10.82M 0.72 61.3 min coherent paragraphs, strong convergence
T4 optimized 1.99M 0.93 6.1 min fast, stable, basic coherence

See run.md and the leaderboard in the full docs for more configurations.

how this differs from similar projects

Project Focus Language Autograd
nanoGPT / minGPT Minimal, educational GPT training Python PyTorch
llama2.c Inference-only C None
Quadtrix.cpp Training and inference, manual backward pass, web UI C++ / Python / TypeScript Manual (C++) + PyTorch

I'd like the C++ core (main.cpp, include/, config/) to stay dependency-free and to stay the part of this repo that explin transformer internals directly. The PyTorch engine, FastAPI middleware, and React frontend are welcome to grow more features, integrations, and UI polish. If you build a port to another language or framework, I'm happy to link to it from a notable-forks section; just open an issue or PR.

references

  • Vaswani et al., "Attention Is All You Need", 2017
  • Radford et al., GPT-2 technical work, 2019
  • nanoGPT and minGPT as educational reference points

license

MIT

About

LLM training & inference in python/C++

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors