The workbook.

build logs and field notes

8 posts · ~1×/week

What I build, what I break, and what I learn along the way. On-prem AI on a DGX Spark, agents, MCP servers and evaluations. Not from scratch, but once you're building with it.

Subscribe · RSS

★ Featured

23-06-26 On-prem AI 7 min

Gemma-4 v23 on the DGX Spark

New vLLM v0.23.0 runs for Gemma-4 on the DGX Spark: BF16, NVFP4 and MTP compared across decode, TTFT, tails and practical local-agent limits.

Read the article →

Category

All Build logs On-prem AI Field notes Reflections

23-06-26 On-prem AI 7 min

001 23-06-26

Gemma-4 v23 on the DGX Spark

New vLLM v0.23.0 runs for Gemma-4 on the DGX Spark: BF16, NVFP4 and MTP compared across decode, TTFT, tails and practical local-agent limits.

On-prem AI 7 min
22-05-26 On-prem AI 5 min

002 22-05-26

The three numbers behind a fast DGX Spark

Decode, prefill and queueing: three numbers decide whether a DGX Spark feels fast under a real workload, and those three are exactly what most reviews skip.

On-prem AI 5 min
05-05-26 Reflections 7 min

003 05-05-26

Why this blog and arena exist

I looked for concrete numbers on local AI on the DGX Spark and never found them. So I measure them myself, building the blog and arena as an open workbench.

Reflections 7 min
03-05-26 On-prem AI 14 min

004 03-05-26

Gemma-4 on the DGX Spark: NVFP4 vs BF16

Nine identical benchmarks, two precisions. NVFP4 runs 22 to 92 percent faster per token, and peak-hour capacity grows 69 percent on the Spark.

On-prem AI 14 min
03-05-26 On-prem AI 18 min

005 03-05-26

Nemotron-3 on the DGX Spark: BF16 vs FP8 vs NVFP4

One model, three precisions, the same Spark. What memory budget, decode speed and tail-latency do when you go from 16 bit to 8 bit to 4 bit.

On-prem AI 18 min
01-05-26 On-prem AI 27 min

006 01-05-26

Gemma-4 on the DGX Spark: the price of context

Nine benchmarks of Gemma-4-26B-A4B-it on the DGX Spark with llama-benchy and vLLM. Decode holds up; prefill and queueing decide how it feels.

On-prem AI 27 min
01-05-26 On-prem AI 8 min

007 01-05-26

I put a 24/7 assistant on a Raspberry Pi

A build-log about OpenClaw on a Raspberry Pi 5: Slack as the interface, GPT-5.5 as the model, and the Pi as an always-on agent layer next to the DGX Spark.

On-prem AI 8 min
01-05-26 On-prem AI 8 min

008 01-05-26

What quantization turned out to be

A practical look back at quantization on the DGX Spark: what BF16, FP8 and NVFP4 do to memory, speed and tail latency, after three rounds with vLLM.

On-prem AI 8 min

8 of 8 posts End of the archive

The workbook.

Gemma-4 v23 on the DGX Spark

Gemma-4 v23 on the DGX Spark

The three numbers behind a fast DGX Spark

Why this blog and arena exist

Gemma-4 on the DGX Spark: NVFP4 vs BF16

Nemotron-3 on the DGX Spark: BF16 vs FP8 vs NVFP4

Gemma-4 on the DGX Spark: the price of context

I put a 24/7 assistant on a Raspberry Pi

What quantization turned out to be