Choosing AI Programming Tools: Understanding Module Complexity

This article explores the complexities of selecting AI programming tools like Claude Code and Codex based on module stability in mid-sized SaaS projects.

Project Stages Are Module Attributes

A real scenario.

A mid-sized SaaS team has been iterating on their main repository for two years. The codebase has about 300,000 lines, with clear modules and a well-structured CI/CD pipeline, representing a typical “1-N” phase—stable growth.

Six months ago, they started an AI customer service module. The structure is still being adjusted, APIs are changing back and forth, and even naming conventions are not fully established. This is a typical “0-1” phase—exploring from scratch.

Image 1

This team is currently selecting AI programming tools. On the table are Claude Code and Codex.

The most popular technical decision-making response is:

“Look at the stage of the main repository. If the main part is 1-N, choose Codex; if it is 0-1, choose Claude Code.”

This response has gained popularity in the Chinese tech community—logically sound, well-balanced, and clear in decision-making. Claude Code is seen as the “governance faction,” while Codex is the “execution faction”; CLAUDE.md is for “modeling,” and AGENTS.md is for “expansion.” It sounds like everything is resolved.

My judgment is: the question was wrong from the start.

Not the answer, but the question itself was lazy.

Acknowledging the Precision of the Contrast

First, let’s acknowledge the accuracy of the contrast.

Summarizing the core differences between Claude Code and Codex as “governance vs execution”—this contrast is indeed precise. The design philosophies of the two tools at the instruction mechanism level are completely opposite.

Image 2

The core of Claude Code is a single file: CLAUDE.md. It is placed in the project root directory and is fully loaded at the beginning of each session. It comes with a complete set of structures: CLAUDE.local.md (personal preferences, automatically gitignored), settings.json in the .claude/ subdirectory, rules/, agents/, skills/, and hooks/. All of these are designed as a whole—a “project constitution” that is shared by the team, committed to git, and accumulated over time.

The core of Codex is another entity: AGENTS.md. It does not “exclusively” reside in the root directory but can be placed in any subdirectory, discovered layer by layer according to the cwd path. Codex also allows AGENTS.override.md (personal overrides), configures project_doc_max_bytes to limit the read size of each file, and uses project_doc_fallback_filenames to specify alternative filenames. The entire mechanism is designed for “layered injection”—wherever you are in the directory, you read the rules at that level.

When these two mechanisms are compared side by side, the differences become apparent:

  • CLAUDE.md assumes your project has a set of “global consensus.” It encourages you to centralize, unify, and make rules accessible to everyone.
  • AGENTS.md assumes your project does not have a complete global consensus, or that such consensus should not govern everything. It encourages you to disperse rules according to directories.

One is “constitutional,” the other is “federal.”

“Governance vs execution” is a superficial statement. At a deeper level, it is “consensus-driven vs container-driven.” Anthropic believes in the former, while OpenAI believes in the latter.

Up to this point, the original contrast holds.

However, a good contrast has a side effect—it can lead people to mistakenly think this is a binary choice.

In Real Projects: Stages Are Module Attributes

The vast majority of articles attempting to make choices for you ultimately fall into a simplified binary classification.

“Are you a 0-1 project? Choose Claude Code.”

“Are you a 1-N project? Choose Codex.”

The premise of this reasoning is that a project can only be in one stage at any given time.

Image 3

This premise is almost never true in reality.

Returning to the SaaS team mentioned earlier. The main repository is in the 1-N phase, while the new module is in the 0-1 phase. This structure is not an exception; it is the norm for most mid-sized teams. Established projects of three to five years typically exhibit “stable main body + continuous exploration of new modules.” Completely 0-1 projects exist only in the first six months of a startup; completely 1-N projects exist only at the end of their lifecycle. For the majority of a project’s lifespan—essentially the vast majority of time for most projects—both phases coexist.

Looking deeper, the term “phase” itself may be misused.

An e-commerce main repository iterating for two years is stable and is in the 1-N phase. However, the “marketing campaign module” is rebuilt every three months, each time starting from 0-1.

A three-year content platform is overall in the 1-N phase. But this year, it suddenly needs to undergo AI restructuring, keeping the old content review pipeline intact (extremely 1-N), while the newly added AI review flow starts from scratch (completely 0-1). Two pipelines run in parallel within the same repository.

Open-source projects illustrate this even more clearly. Is React 1-N or 0-1? The main branch is certainly 1-N, but the new features in the experimental subtree are always in 0-1.

Thus, the concept of “project phase” actually refers not to the project itself, but to a specific module or feature line within the project.

Choosing tools based on “project phase” forces you to pick a single standard for the entire project, while there are at least 5-10 modules operating on different standards within the project.

What happens?

Choosing one side based on the original logic will inevitably disadvantage the other side.

For teams that chose Claude Code: the new module works smoothly— the team needs to quickly form a consensus, and CLAUDE.md consolidates directory agreements, naming conventions, and API specifications all at once. Claude reads this every session, and coding aligns with the regulations. However, the old module suffers. After two years of project accumulation, when CLAUDE.md reaches 3,000 lines, it becomes overwhelming. Each session, Claude has to read all 3,000 lines, consuming tokens painfully, and worse, the context gets filled with noise, drowning out the truly important rules. Newcomers find it more exhausting to read this CLAUDE.md than to read the code.

For teams that chose Codex: the old module works smoothly— the stable directory structure corresponds to a stable AGENTS.md, and rules grow with the code, simple and direct. However, the new module suffers. The directory structure of the AI customer service module is still changing; today it is called agents/, tomorrow it might be renamed to pipelines/. AGENTS.md follows the restructuring, and every restructuring requires rewriting the rules. Even more awkwardly, the AI often autonomously creates new folders for code—because it sees that AGENTS.md only covers existing directories, and new places lack rule constraints.

These pains are not due to the tools being difficult to use; rather, the tools’ assumptions do not match the project’s real structure.

Project phases are not project attributes; they are module attributes.

The Real Selection Dimension is “Rule Stability”

Do not ask, “Is this project 0-1 or 1-N?”

Instead, ask a different question: “Is the engineering rule of this module stable?”

This is a more grounded, actionable, and closer-to-your-daily-experience dimension.

Image 4

What does “rule stability” mean?

  • The interface has been finalized (no major changes for six months);
  • The file structure does not undergo frequent restructuring (naming conventions and directory divisions have been established);
  • Newcomers can get started based on existing agreements (no need to explain to a veteran every time, “Why is this written this way?”);
  • Patterns that repeatedly appear in the code can be solidified in a sentence (“Interfaces must be Pydantic, error handling follows raise_for_status style”).

These signals combined indicate that your module has entered a stage where “rules can be decentralized.”

In such modules, Codex will work very smoothly. AGENTS.md follows the directory, rules grow with the code, and AI makes local modifications without needing to pull the global context every time.

What does “rule instability” mean?

  • This week it is still called service.py, but next week it might be split into service/handler.py and service/processor.py;
  • The team is still discussing “what goes into model, what goes into schema”;
  • You give the AI a task, and before writing code, you must explain three segments of background (“Our project is different from a typical Python project; our model is not an ORM model…”);
  • Every time a new feature is written, team reviews often raise the question, “Which category does this belong to?”

In such modules, Claude Code is more suitable. CLAUDE.md consolidates the rules that the team is still forming consensus around, showing the AI the entire document every session to avoid it writing randomly in areas without consensus.

The judgment can be further simplified—here’s a “3-minute self-check checklist”:

  1. Has there been a significant adjustment to the file structure of this module in the last three months?
  2. Have you written an introductory document for a newcomer on this module? Is it finished?
  3. Can you explain the core agreements of this module in three lines of rules?
  4. The last time AI wrote incorrect code, was it because it didn’t read the global rules, or because it didn’t read the local rules of this directory?

If 1 is “yes,” 2 is “not finished,” 3 is “no,” and 4 is “global rules”—this module is still in the 0-1 phase, use Claude Code.

If 1 is “no,” 2 is “finished and doesn’t need updating,” 3 is “yes,” and 4 is “local rules”—this module has entered the 1-N phase, use Codex.

It is completely normal for some modules in the same project to be at 1 and others at 4.

Normal interpretation: your project is not in a single phase. Both tools should coexist.

What Happens When You Choose the Wrong Tool—Practical Signal Checklist

Image 5

What is missing in the original text is irony.

“When did you choose wrong?” and “how to recognize from daily use that you chose wrong”—no one tells you these.

I will break down the two types of wrong choices for you, along with specific signals. Next time you encounter them, you can react immediately.

Signal 1: Forcing a 1-N Module to Use Claude Code

The most common early signal is the expansion of the CLAUDE.md file itself.

At first, it may only have 200 lines—technology stack, directory agreements, a few review checklists, very comfortable.

Six months later, you open it, and it has grown to 1,500 lines. New rules are added one by one, and old ones are not deleted.

A year later, it becomes 4,000 lines. You can no longer remember what is written inside, but every session requires a full load, and every prompt has to be reinserted.

Team members start complaining, “Why does Claude keep forgetting our rules?” But the truth is not that Claude forgot; it’s that CLAUDE.md contains 50 conflicting rules, and Claude is selecting the most recently read one.

The real problem behind this signal is: your project has matured to the point where “global consensus” cannot sustain single-file governance. Rules need to be dispersed by module, with each directory managing its own affairs. Continuing to use CLAUDE.md is like managing a layered governance structure with a founding constitution.

Signal 2: Forcing a 0-1 Module to Use Codex

The most common early signal is AGENTS.md following the directory restructuring.

At first, you followed Codex’s recommendation, placing an AGENTS.md in each subdirectory, which looked tidy.

Two weeks later, you rename agents/ to workers/, and then you have to change agents/AGENTS.md to workers/AGENTS.md. The git diff shows a sea of red and green.

A month later, you become impatient and start unifying the rules into the root directory’s AGENTS.md—but Codex loads rules by directory, and the root directory’s rules only take effect when the cwd is the root directory. AI running in subdirectories cannot access these rules.

You find that AI often autonomously creates new folders because it sees that the AGENTS.md does not cover the newly added locations.

The real problem behind this signal is: your module has not yet reached the stage where “rules can be decentralized to directories.” The structure itself is still changing, and following the structure with rules is self-defeating. What is needed in this stage is a directive system that is “visible to the entire project”—consolidating the changing agreements in one place and showing the AI the entire document every time.

Commonalities of These Two Failures

It is not that the tools are difficult to use.

It is that your engineering discipline has not kept pace with the assumptions of the tools you selected.

Claude Code assumes you can maintain a “project constitution”—this is an engineering discipline requirement, and whether you can maintain it depends on whether your team has someone to regularly review and trim this document. Codex assumes you can maintain “local rules by directory”—this is also an engineering discipline requirement, and whether you can maintain it depends on the stability of your directory structure.

Choosing the wrong tool essentially means you selected a preset that exceeds your team’s engineering discipline capacity.

This is not just about tool selection.

Image 6

Looking deeper, choosing Claude Code or Codex is essentially choosing between two engineering governance philosophies.

Claude Code is the constitutional faction. It believes that global consensus can form, can be solidified, and can be adhered to by everyone using the same document. This is Anthropic’s engineering philosophy—high leverage at a single point, consensus-driven, long-term memory. The underlying assumption is: a good team should be able to write a “project specification” that everyone recognizes.

Codex is the federal faction. It assumes that global consensus cannot form, or even if it does, it should not govern everything. Each directory manages its own, each module has its own rules, and personal overrides take precedence over team defaults. This is OpenAI’s engineering philosophy—layered injection, container-driven, and on-demand locality. The underlying assumption is: a good team should break governance down to a manageable granularity.

There is no right or wrong in these two philosophies, only matching degrees.

Signs of matching your team:

  • If your team has people who are willing to spend two hours writing a document from a debate, and this document can be maintained by others—then the constitutional route of CC suits you.
  • If your team is more accustomed to “writing rules in comments and README while coding” and finds it difficult to gather time for a centralized discussion of global rules—then the federal route of Codex suits you.

This is a cultural issue, not a technical capability issue.

Looking even deeper, both tools actually push the team to “grow up,” but in different directions.

CC pushes you to form consensus. If you use CC for a while and find that CLAUDE.md only contains hollow statements like “write good code, follow best practices,” it indicates that your team has not truly formed a consensus that can be written down. At this point, the style of code AI generates will always be erratic, which is CC’s way of telling you that “your engineering governance has not matured.”

Codex pushes you to stabilize structure. If you use Codex for a while and find that AGENTS.md is always following directory restructuring, and rules are never stable, it indicates that your engineering structure itself has not been finalized. At this point, AI autonomously creating folders is Codex’s way of telling you that “your engineering structure has not been finalized.”

Tools will not solve governance issues for you. They merely present this issue to you, allowing you to see it.

So the Real Question Is

When you open Claude Code or Codex, you think you are selecting an AI programming tool.

You are actually selecting your acknowledgment of your team’s engineering discipline.

  • CLAUDE.md is asking you: “Can your team form a consensus? If so, can it be solidified into a document for everyone to share?”
  • AGENTS.md is asking you: “Is your team’s directory structure stable? If so, can rules be decentralized to the directories?”

The answers you have to these two questions will determine which tool can help you.

Returning to the SaaS team mentioned earlier. The main repository is in the 1-N phase, while the new module is in the 0-1 phase. Their correct approach is not to choose one of the two tools but to use both simultaneously—

Using Codex for the old module, allowing rules to grow with the stable directory.

Using Claude Code for the new module, allowing the team to form consensus while coding.

If the team is unwilling to accept the engineering cost of using both tools in parallel—that is another issue: choosing one powerful tool to make do, but expecting it to create friction in the mismatched modules.

However, there is no need to pretend that this friction is something the tool can solve.

Software projects are never just 0-1 then 1-N. They consist of N modules, each at different stages.

The real question to ask yourself is not “What stage is my project in?” but a more challenging question:

Which modules can I constitutionally govern, and which modules must I govern federatively?

Once this question is answered well, the tools will choose themselves.

If this question is not answered well, whichever tool you choose will feel cumbersome.

Was this helpful?

Likes and saves are stored in your browser on this device only (local storage) and are not uploaded to our servers.

Comments

Discussion is powered by Giscus (GitHub Discussions). Add repo, repoID, category, and categoryID under [params.comments.giscus] in hugo.toml using the values from the Giscus setup tool.