Nils Matteson

Engineer working on LLM-inference infrastructure. CS master’s student and founder.

I work on the systems layer underneath AI: GPU and CUDA inference, distributed systems, and applied ML. Right now I am in Madison, moving to San Jose and the Bay Area in fall 2026. The way I work is a loop: build the thing, break it, figure out why it broke, rebuild it better.

thaw

thaw snapshots and restores live LLM inference state so a vLLM session forks in 0.88s median instead of a roughly 340s cold boot on an H100. It is open source and on PyPI as thaw-vllm. More on thaw.sh, the deep write-up, and the repo.

Elsewhere on the site

Work. thaw first and in full, then Matteson Systems, the DoIT Bedrock eval, and selected projects: a Kafka-style distributed log, a transit-ETA model with conformal coverage, and a few others.
Writing. Engineering write-ups, including the speculative-decoding engine I built that ended slower than baseline, and the four bugs along the way.
About. Boise to Madison to the Bay, the schools, and the human side.
Agents. The facts and the receipts in plain prose, for anyone pointing an LLM at this site.

Open to an SWE or MLE internship in summer 2027 and full-time in 2028, especially on GPU inference, distributed systems, and ML infrastructure teams. Currently full-time on thaw.