Nils Matteson
Engineer working on LLM-inference infrastructure. CS master’s student and founder.
I work on the systems layer underneath AI: GPU and CUDA inference, distributed systems, and applied ML. Right now I am in Madison, moving to San Jose and the Bay Area in fall 2026. The way I work is a loop: build the thing, break it, figure out why it broke, rebuild it better.
thaw
thaw snapshots and restores live LLM inference state so a vLLM session forks in 0.88s median instead of a roughly 340s cold boot on an H100. It is open source and on PyPI as thaw-vllm. More on thaw.sh, the deep write-up, and the repo.
Elsewhere on the site
- Work. thaw first and in full, then Matteson Systems, the DoIT Bedrock eval, and selected projects: a Kafka-style distributed log, a transit-ETA model with conformal coverage, and a few others.
- Writing. Engineering write-ups, including the speculative-decoding engine I built that ended slower than baseline, and the four bugs along the way.
- About. Boise to Madison to the Bay, the schools, and the human side.
- Agents. The facts and the receipts in plain prose, for anyone pointing an LLM at this site.
Open to an SWE or MLE internship in summer 2027 and full-time in 2028, especially on GPU inference, distributed systems, and ML infrastructure teams. Currently full-time on thaw.