Get Out of the Model's Way

When something doesn't work, the instinct is to add more. More guardrails. More tools. More structure. With LLMs, this instinct is often wrong.

Paradoxically, AI engineers are building elaborate systems to constrain models that are now smarter than the constraints themselves. We're doing the model's thinking for it… and the model can think better than our scaffolding allows.

Andrew Qu, the engineer behind Vercel's internal text-to-SQL agent, recently shared this exact story. He deleted 80% of their tools and gave the model access to bash and their raw semantic layer files. Accuracy jumped, response time dropped 3.5x, and token usage fell by 37%.

"We simplified our multi-phase, many-tool text-to-SQL agent down to a file system agent with bash. The simplification led to significantly improved performance and re-iterated the bitter lesson of not trying to fight against the model's ability.” - Andrew Qu (Vercel)

Every tool you add is a choice you are forcing the model to make. Models are smart, and sometimes less is more.

Don’t Fighting Gravity

We used to build complicated systems to manage the "probabilistic" nature of models:

Complex DAGs with routing between layers, different prompts, different tools
Deterministic checks and guardrails everywhere
Custom abstractions and specialized tool calls
Heavy prompt engineering to constrain reasoning

Every edge case meant another patch. Every model update meant recalibrating all your constraints. We were fighting gravity. We were forcing models into boxes they'd already outgrown.

The Vercel agent mentioned above had this exact setup. Multiple specialized tools: GetEntityJoins, LoadCatalog, RecallContext, LoadEntityDetails, SearchCatalog, ClarifyIntent, SearchSchema, GenerateAnalysisPlan, FinalizeQueryPlan, SyntaxValidator... the list goes on.

All that scaffolding was solving problems the model could handle on its own. They assumed Claude would get lost in complex schemas, make bad joins, or hallucinate table names. In turn, they built guardrails for every turn.

I've seen this pattern everywhere. One of our customers had hundreds of nodes in their agent DAG. They claimed complexity was their moat. Suffice it to say, they've since rebuilt their entire stack.

Work Downhill, Not Uphill

Models are trained on human data. It’s easier to steer into what they already know, rather than trying to teach them something new.

If you're trying to teach a model a custom format or bespoke abstraction, you're working uphill. The model has way less training data on your special thing, so it's going to do a worse job. Work downhill instead.

Standard languages beat custom DSLs. LLMs are amazing at writing Python, TypeScript, and SQL. Conversely, they have never heard of your proprietary query language.

This is increasingly obvious for the top AI teams. They start with a bespoke XML language designed for agents. Then simplify to markdown results and tool calling. Now, most have simplified again and agents just write TypeScript directly.

Models are so good at regular coding languages. It's often harder to prompt or fine-tune the model than it is to just change your codebase.

Don't reinvent the wheel. If you can use Python or SQL, adapt your application to it, not vice versa.

This is somewhat controversial. Tool calling began as a custom abstraction. You could argue MCP is bespoke as well. However, now that model labs are training models around these paradigms, it’s good AI engineering to use them.

Bash & File Systems: The Universal Tool Call

There's an important distinction between two types of problems you might use AI to solve:

Exploratory problems don't have a defined workflow. You want the agent to try things, explore, check results, try other things. These are the problems where you want to get out of the model's way.
Defined workflow problems have a specific structure: an email template, report format, or mass email engine with strict design specs. Constraints make sense and you don't want ambiguity.

For exploratory problems, the answer is increasingly clear: give the model a file system and bash access.

The Vercel team has even open-sourced tooling to make this easy and secure: bash-tool and just-bash let you spin up filesystem agents without reinventing sandboxing as one example

I wrote about this recently—both Claude Code and OpenAI's Codex use remarkably simple architectures. No complex DAGs. Just a two-layer while loop where the model can call bash commands.

The file system becomes your context manager. grep, cat, ls are 50 years old, included in loads of model training data, and still do exactly what we need.

I often see Claude Code creating and running Python scripts solely as a way to test its own code. It writes a one-off Python file, runs it, checks the output, and moves on. No specialized "test runner" tool needed.

Code is the interface. Code is the handshake. Code is the ground truth.

It’s common to hear people ask: will there be a new high-level programming language invented now that LLMs write our standard code? No. Code the infinitely extendable, a deterministic source of truth, and thus the universal tool call.

AGI-Pilled Development

Build assuming models will keep getting dramatically better.

Today, you might need extra guardrails to ship something that works. But you should try to avoid it when you can, because every new model is still a step function improvement. Everyone still upgrades to the newest release without thinking twice.

Start with the simplest possible architecture: model + file system + goal. Add complexity only when you've proven it's necessary.

Clever scaffolding has a short half-life. You will be re-writing it every few months with every new model.

Often, the best thing you can do is get out of the model's way. Stop making choices for it. Stop constraining its reasoning. Stop building tools to protect it from complexity it can handle.

Behave like a human. Work downhill. Let the model cook.

LLM-as-a-Judge: Using AI Models to Evaluate AI Outputs

Browser agent security risk

Get Out of the Model's Way

Don’t Fighting Gravity

Work Downhill, Not Uphill

Bash & File Systems: The Universal Tool Call

AGI-Pilled Development

Multi-agent collaboration via evolving orchestration

Prompt Repetition Improves Non-Reasoning LLMs: Google's New Study

Benchmarking Gemini 3.1 Pro: Latency, cost, and reasoning trade-offs

The first platform built for prompt engineering

Usage

Company

Follow Us

Get Out of the Model's Way

Don’t Fighting Gravity

Work Downhill, Not Uphill

Bash & File Systems: The Universal Tool Call

AGI-Pilled Development

RECENT ARTICLES

The first platform built for prompt engineering

Usage

Company

Follow Us