Introducing Codex | OpenAI

OpenAI's new coding agent fixed a bug by changing two characters — `= 1` to `= right` — while the base model buried the same fix under 18 lines of commentary. That difference is the thesis: Codex does

openai.com

Gist

1.
OpenAI's new coding agent fixed a bug by changing two characters — `= 1` to `= right` — while the base model buried the same fix under 18 lines of commentary. That difference is the thesis: Codex doesn't just write code, it ships pull requests. The bet is that async, parallel AI agents will become the default way software gets built.

Logic

2.
Codex runs many engineering tasks simultaneously in isolated sandboxes

Each task gets its own cloud container preloaded with your repo, with access to test harnesses, linters, and type checkers — no shared state, no cross-contamination
Tasks complete in 1 to 30 minutes; you review results, request revisions, or open a GitHub PR directly from the interface
Two interaction modes: "Code" for task delegation, "Ask" for codebase questions — separating execution from exploration

3.
codex-1 produces patches a human would actually merge

The astropy separability_matrix bug: codex-1 changed = 1 to = right in _cstack and added a 14-line regression test to the existing test file — done
OpenAI o3 made the identical one-line fix but wrapped it in an 18-line explanatory comment block and created an entirely new 52-line test file with lengthy docstrings
codex-1 is an o3 variant trained via reinforcement learning on real-world coding tasks to mirror human PR style, follow instructions precisely, and loop on tests until they pass

4.
AGENTS.md files let teams customize the agent to match real workflows

AGENTS.md files — scoped to the directory tree that contains them, with deeper files taking precedence — tell Codex how to navigate codebases, run tests, and follow project conventions
If AGENTS.md includes programmatic checks, Codex must run all of them and verify they pass after every change, even for documentation edits
codex-1 still performs strongly without AGENTS.md or custom scaffolding — configuration helps, but isn't required

5.
The safety architecture: isolation by default, verifiability by design

During task execution, internet access is disabled — the agent interacts only with code from GitHub repos and pre-installed dependencies configured via a setup script
Every action is traceable: Codex cites terminal logs and test outputs so reviewers can reconstruct each step; when uncertain or when tests fail, it says so explicitly
Trained to refuse malware development requests while preserving legitimate low-level work like kernel engineering — an addendum to the o3 System Card was published alongside launch

6.
Four named companies validate the workflow on production codebases

Superhuman uses Codex for test coverage and integration failures, and now lets product managers contribute lightweight code changes without pulling in an engineer except for review
Kodiak accelerates development of its autonomous driving stack — writing debugging tools, improving test coverage, and helping engineers understand unfamiliar parts of the codebase
Temporal runs complex refactoring and test execution in the background, keeping engineers in flow; Cisco is evaluating Codex across its product portfolio as an early design partner
OpenAI's own engineers use it daily to offload refactoring, renaming, writing tests, triaging on-call issues, and planning tasks at the start of the day

7.
A smaller model and an open-source CLI extend the ecosystem to local workflows

codex-mini-latest — a version of o4-mini, not o3 — is optimized for low-latency code Q&A and editing, available as the default model in Codex CLI and in the API at $1.50 per 1M input tokens, $6 per 1M output tokens, with a 75% prompt caching discount
Codex CLI is an open-source terminal agent; you can now sign in with your ChatGPT account instead of manually generating API tokens
Plus and Pro users get $5 and $50 in free API credits respectively for 30 days — lowering the barrier to test async delegation in local workflows

Counter-Argument

8.
The safety story collapsed 18 days after launch — and nobody reconciled the contradiction

The core safety claim is architectural: internet access is disabled during execution, limiting the agent to code you explicitly provided. This isn't a feature — it's the foundation of the trust model. Every other safety property (traceability, verifiability, abuse prevention) rests on the premise that the agent can't reach the outside world.
On June 3, OpenAI enabled users to give Codex internet access during task execution. The same document that describes isolation as a fixed design property now contains an update that relaxes it — with no explanation of how the trust model survives. The safety section was not rewritten; the update was appended.
The "research preview" label is doing the work of a liability disclaimer while the product is marketed aggressively to enterprise customers. If isolation was essential enough to be the lead safety claim on May 16, removing it 18 days later without revising the safety argument suggests the framing was never about safety — it was about shipping.

Steelman

9.
The product is a Trojan horse — the real play is owning the workflow layer

Both the launch argument and the safety critique share an unexamined assumption: that what matters is whether codex-1 is good enough today. It isn't — one benchmark example, no published scores, partner testimonials with zero metrics. But that's beside the point.
OpenAI is establishing the interface layer — AGENTS.md as a configuration spec, GitHub as the integration surface, parallel sandboxed agents as the execution model — through which every future model improvement will be delivered. The architecture is the product; codex-1 is the first tenant.
The historical parallel isn't a coding tool — it's AWS. Amazon sold below-cost compute to establish the API layer that locked in developers for a decade. OpenAI is selling below-cost intelligence (generous free access, $5 credits, research preview pricing) to establish the delegation layer. If async multi-agent becomes the default workflow, whoever owns the orchestration surface owns the economics of software engineering — regardless of which model runs underneath.

Gist

Logic

2. Codex runs many engineering tasks simultaneously in isolated sandboxes

3. codex-1 produces patches a human would actually merge

4. AGENTS.md files let teams customize the agent to match real workflows

5. The safety architecture: isolation by default, verifiability by design

6. Four named companies validate the workflow on production codebases

7. A smaller model and an open-source CLI extend the ecosystem to local workflows