What we're learning, in writing.
Long-form notes from the studio — production lessons, dead ends, surprising wins, and the occasional opinion. Written by hand, never by autocomplete. New post every other week, more or less.
The trust ceiling: why AI in regulated work fails at the moment that matters most
Some workloads sit on a different side of a line that isn't about model quality. The architecture decisions that get a system above the trust ceiling are mostly not about better prompts — they are about what the system refuses to do, what it cites, what it owns versus what it routes to a human, and what it never computes. Trust is engineered, not inherited from the model.
Read the noteDeterminism at the floor. AI at the judgment layer.
Most AI products we get asked to inherit have one thing in common: an LLM call where there should be a function. AI is most valuable where the input is genuinely ambiguous; everywhere else, deterministic code is faster, cheaper, and debuggable. The skill is in knowing the boundary — and refactoring AI out of systems that didn't need it is one of the more common improvements we make.
Read the noteSubject-agnostic engines, specific first users
"Build a generic system, find market" sounds clean and produces a tepid product. Build for one specific user, then abstract upward — slower-sounding, but the engine you end up with is sharper than anything market-research could have spec'd. The first specific instance is not a stepping stone to the engine; it's the proof of the engine.
Read the noteBuild for one specific person before you build for a market
A real human's actual problem is a sharper spec than any persona document. Four projects in, the studio's pattern has stopped being an accident: every system we've shipped started life as one named person's mess. Building for a real user forces editorial restraint — a persona will never close the tab on you, but a real first user will.
Read the noteLoyalty by architecture, not by feature
Most loyalty programs are features — points, discounts, a stamps card. Features can be removed in the next sprint. Architecture choices can't. A different way to think about retention for vertical SaaS, and why we built our auto-services product around per-shop instances with QR onboarding instead of a marketplace.
Read the noteWhy money beats badges for kids who won't sit through a lecture
Gamification became the default motivation engine for ed-tech. For some learners it works. For the kids who can't focus on conventional study — too deep in games, too distracted, too unmoved by lectures — gamification is just more lectures with stickers. What works is transactional reward.
Read the noteAI's most underrated job is helping families remember
Millions of dormant digital archives sit on phones — screenshots saved and forgotten, photos accumulated without curation. The volume itself made them unreadable. AI on local hardware is the first technology that can finally make them useful again, without sending decades of personal data to a vendor. The hardest problem isn't recognition; it's restraint.
Read the noteCodified is not implemented
Writing a rule down is not the same as enforcing it. Long-running builds drift; sessions inherit assumptions; a simplification two months later silently undoes a hard-won lesson. The fix is mechanical — pre-flight scripts, sentinel values, smoke gates that refuse to launch when an invariant has drifted.
Read the noteRefusal is a feature, not a fallback
Most AI products are tuned to always answer. For high-stakes domains — legal, medical, financial advisory, regulated work — the system that knows when to refuse is the only one practitioners can build a workflow around. The evolution from threshold-based refusal to corpus-intrinsic groundedness, and why local-first hardware makes it tractable.
Read the noteHow we work with AI assistants — the fictional-council methodology
Building with AI assistants produces a particular failure mode: first-draft inertia. The first plan ships because it's plausible and nobody pushed back hard enough. Here's the deliberate-friction methodology we use — simulated peer review by fictional archetypes — to keep AI-led builds honest.
Read the noteBuilding production AI on used crypto-mining cards
The previous era's mining rigs are this era's AI rigs. We built a multi-GPU local inference rig out of cards bought during the mining boom, now obsolete by GPU standards but still doing real work. Why heterogeneous fleets need explicit role pinning, why throughput is not latency, and the operational rules nobody tells you about.
Read the noteOn chunking long-form documents — and why structure beats size
Generic chunkers fail on documents with hard structural rules — statutes, contracts, regulatory filings. A "Provided that..." clause is meaningless without the parent section it modifies. The fix isn't a fancier embedder; it's a chunker that understands what the document is before it slices anything. The two-pass approach that worked.
Read the notePicking an inference runtime for production
A popular open-source inference server is great for desktop chat. It is not great for ingesting a national-scale corpus at production throughput. We learned this the slow way. This is the short version of the story, including the "what we tried vs. what failed" table so you don't have to redo it.
Read the noteDon't trust the cloud: when local-first AI is the right answer
Most AI consultancies build you a wrapper around an OpenAI key. Six months in, you're locked in, your data lives in their logs, and your costs scale with every user you add. Here's when local-first is the right call, when it isn't, and the cost math that surprised us when we worked it out properly.
Read the noteGet the next note in your inbox.
Every other week. No tracking pixels. No "sponsored by." No AI-generated filler. Unsubscribe anytime in one click.