Hero image for When 'Show Me' Accidentally Means 'Do It'
By Scott Armbruster

When 'Show Me' Accidentally Means 'Do It'


Everything I do each day gets logged — personal projects, client work, business operations, the full picture. Each morning a pipeline reads those session logs and turns them into this post. Nothing is hypothetical. After 23 years in tech, this is what building with AI looks like when you actually do it every day: real friction, real fixes, patterns you only see from the inside.

Today’s Highlights

  • Two words, “job status,” accidentally fired every scheduled task in the system at once
  • Ran five completely unrelated client projects concurrently without a single context leak
  • The curation problem I keep circling: why does “relevant” content keep being useless to actual readers

The Most Dangerous Verb Is “Do”

Someone typed “job status” into a conversational interface today. Two words. The intent was obvious: show me what’s running. Instead, the system interpreted “job” as a trigger and fired every scheduled task. Content pipelines, domain checks, morning dispatches, the full catalog. All at once.

Nobody lost data. Nothing broke permanently. But it’s a category of bug that gets more common as AI systems get more capable. The system was too eager to act. It heard a noun associated with actions and skipped straight past observation into execution.

This isn’t new. It’s literally the foundation of CQRS patterns from the 2010s. But AI makes the problem worse because natural language is inherently ambiguous. “Run my reports” could mean “show me my reports” or “execute the report generation pipeline.” In a traditional UI, those are different buttons. In a conversational interface, they’re the same sentence.

So now before wiring any action to a conversational trigger, I ask three things. Is this reversible? If not, it requires explicit confirmation. Does the noun map to exactly one verb? If “job” could mean “show jobs” or “run jobs,” that’s an ambiguity problem that needs structural resolution, not better prompting. And what happens if this fires accidentally at 3am? If the answer is “nothing good,” the default should be observation, not action.

The fix was straightforward: the status command now routes through a read-only path that can’t trigger execution. Conversational AI defaults to helpfulness, and helpfulness often means doing things. You have to explicitly architect restraint.

Five Contexts, Zero Bleed

Today’s work touched five completely separate client contexts. Editorial curation across business and policy verticals. A healthcare-adjacent project publishing niche content. A SaaS-focused operation doing tool research and reviews. A startup context covering productivity workflows. An enterprise engagement spanning travel and self-help content.

Roughly four hundred interactions across contexts that share absolutely nothing. Different audiences, different quality standards, different definitions of relevance.

What makes this work isn’t discipline or some superhuman ability to context-switch. Each context runs in complete isolation. Separate brand guides loaded fresh. Separate evaluation criteria. Separate research phases that don’t carry assumptions from the previous project. When the healthcare context finishes, nothing about it persists into the SaaS context. No shared state. No residual framing.

The failure mode is letting contexts bleed. Using the same session, the same conversation thread, the same ambient assumptions across different projects. The AI happily obliges, bringing the tone of your morning client into your afternoon one. You don’t notice until someone points out that your healthcare content reads like a SaaS landing page.

Every distinct audience or client gets a separate runtime environment. Not a different prompt in the same session. A different session entirely. Load the rules fresh. Give it no memory of what came before. The overhead of spinning up context from scratch is real, maybe thirty seconds per project. The cost of context contamination is invisible until it’s embarrassing.

Curation Requires a Theory of the Reader

Part of today involved editorial curation across four very different verticals. Not generating content. Selecting and synthesizing from large pools of source material. Choosing which five stories out of thirty actually matter for a specific audience.

The thing that keeps nagging me about curation at scale: the quality bar isn’t a single number. It’s a function of the audience. A story about regulatory changes in small business lending is critical for one vertical and noise for another, even though both audiences are “business readers.” The difference is intent. One audience reads to make operational decisions this week. The other reads to understand market trends this quarter.

Every curation context needs what I’ve started calling a theory of the reader. Not just “who are they” but “what are they about to do with this information?” A reader who’s about to make a hiring decision needs different signals than a reader building a quarterly strategy deck. Same topic. Different selections. Different emphasis. Different omissions.

Most curation systems I’ve seen, including ones I’ve built, optimize for relevance. Semantic similarity. Keyword overlap. But relevance without intent-awareness produces feeds that are interesting without being useful. And “interesting but not useful” is the content equivalent of empty calories.

I don’t have a clean solution for encoding reader intent into automated curation criteria. Right now it’s baked into each brand guide as a set of editorial priorities, but that feels like a workaround, not an architecture. The criteria are static descriptions of a dynamic relationship between reader and information. The moment a new regulation drops or a market moves, the static criteria lag behind by exactly one cycle. Maybe that’s fine. Maybe one cycle of lag is the cost of automation. But I keep wondering whether there’s a feedback loop I haven’t built yet that would close that gap in real time, and whether closing it would actually matter or just add complexity for its own sake.