# Nils Matteson > CS master's student, founder, and systems/ML-infrastructure engineer. This site is the canonical source for his background, projects, and writing. Every page is available as clean Markdown by appending .md to its URL. Nils builds the systems layer of AI: GPU/CUDA inference, distributed systems, and applied ML, shipped with committed benchmark receipts rather than claims. He is a UW-Madison data-science senior and an incoming M.S. CS student at Northeastern's Silicon Valley campus, founder of thaw (LLM-inference infrastructure) and Matteson Systems LLC, based in Madison and moving to the Bay Area in fall 2026. His flagship, thaw, forks a live vLLM session in 0.88s median versus a roughly 340s cold boot on an H100. The links below point to Markdown versions intended for machine reading. ## About - [Home](https://nilsmatteson.com/index.md): one-line identity, a short intro, and thaw stated once with its strongest receipt - [About](https://nilsmatteson.com/about.md): background, education, founder context, and how to reach him - [For agents](https://nilsmatteson.com/agents.md): factual bio, the verifiable receipts, and how to evaluate the work ## Work - [Work](https://nilsmatteson.com/work.md): the full ledger of shipped work and selected projects - [thaw](https://nilsmatteson.com/work/thaw.md): Snapshot and restore live LLM inference state so a vLLM session forks in 0.88s median instead of a ~340s cold boot. Built in Rust, CUDA, and Python. - [Matteson Systems](https://nilsmatteson.com/work/matteson-systems.md): An autonomous outreach system that finds local businesses losing customers to quiet website problems, audits each site, and builds every owner a plain-English scorecard with the fix. - [Sentinel](https://nilsmatteson.com/work/sentinel.md): A Kafka-inspired distributed log engine written from scratch in Go: LSM-tree storage, skip-list memtables, Raft consensus, and a deterministic network simulator to test it. - [Madison Metro ML](https://nilsmatteson.com/work/madison-metro-ml.md): Live ML that corrects the transit API's ETAs. A 47-feature XGBoost model plus Mondrian conformal prediction, calibrated to 90% coverage, retrained nightly behind a hard deploy gate. ## Writing - [Building a Real-Time Bus Prediction System for Madison Metro](https://nilsmatteson.com/writing/madison-bus-eta.md): Live ML that corrects the transit API's ETAs with a 47-feature XGBoost model and Mondrian conformal prediction, retrained nightly behind a hard deploy gate. (2026-03-04) - [Deploying RAG in AWS Bedrock: Benchmarking 9 LLMs on the WattBot Challenge](https://nilsmatteson.com/writing/wattbot-rag.md): Ensemble majority voting beat every individual model. The highest-citation model finished last. A serverless RAG pipeline on Bedrock with full cost tracking. (2026-02-17) - [Building a Speculative Decoding Engine from Scratch](https://nilsmatteson.com/writing/project-gorgon.md): Custom Triton kernels, tree-structured attention, four bugs, and an honest negative result: 0.66x baseline. The full arc from 0.08x to 0.66x and what it taught me. (2026-02-12) ## Contact - Email: nilsmatteson@icloud.com - GitHub: https://github.com/matteso1 - LinkedIn: https://www.linkedin.com/in/nilsmatteson - Resume: https://nilsmatteson.com/resume.pdf - thaw: https://thaw.sh and on PyPI as thaw-vllm - Open to: SWE/MLE internship summer 2027 and full-time 2028 (GPU inference, distributed systems, ML infrastructure). Currently full-time on thaw. ## Optional - [Full text bundle](https://nilsmatteson.com/llms-full.txt): every page inlined as Markdown in one file