Get Out of the Model's Way

When something doesn't work, the instinct is to add more. More guardrails. More tools. More structure. With LLMs, this instinct is often wrong. Paradoxically, AI engineers are building elaborate systems to constrain models that are now smarter than the constraints themselves. We're doing the model'

Watch my AI Engineering talk: How Claude Code Works

A few weeks ago, I gave a talk at the legendary the AI Engineering Summit. It’s titled: “How Claude Code Works” Claude Code completely changed how our engineering org functions. It really feels like a “moment” in this space. Importantly, it represents a new standard for building autonomous agents.

Every agent should be a VM

There is no doubt that OpenAI's Codex CLI and Anthropic's Claude Code agents are order of magnitude shifts in what we can expect from coding agents. I recently did a deep-dive and wrote some articles on how Claude Code works and how OpenAI Codex works behind-the-scenes.

Bringing the Fundamentals to AI Engineering

AI engineering is a new discipline, but that doesn't mean we should throw out everything we know about engineering. The same fundamentals apply: de-scope ruthlessly, think in functions, and don't build what you don't need. Too many are skipping the fundamentals. Marketing Outpaced Reality

PromptLayer Bakery Demo

We recently built a demo website called Artificial Indulgence to showcase how PromptLayer works in a real-world application. It's a fake bakery site, but everything you see is powered by live prompts managed through PromptLayer. Let me walk you through how it all works. The Setup The bakery

Orchestrating Agents at Scale (OpenAI DevDay Talk)

Building dynamic UIs for complex agentic workflows used to take months of custom engineering—now it takes minutes. At OpenAI's DevDay 2025, the company unveiled AgentKit, a revolutionary toolkit that transforms how developers build, deploy, and optimize AI agents. This isn't just another incremental update; companies

How OpenAI's Deep Research Works

OpenAI's Deep Research can accomplish in 30 minutes what takes human researchers 6-8 hours—and it's powered by a specialized reasoning model that autonomously browses the web, reads dozens of sources, and produces fully cited reports. Deep Research represents a new category of agentic AI that

OpenAI RL Fine-Tuning: What you need to know and when you should use it

OpenAI dropped Reinforcement Fine-Tuning (RFT) in late 2024, bringing academic RL techniques to everyday developers—one AI researcher called it "RL for the masses." Unlike traditional fine-tuning where models memorize examples, RFT trains models through trial-and-error with rewards, creating a breakthrough moment for developers trying to squeeze better

The first platform built for prompt engineering