AI coding assistants are now writing a significant portion of production code. GitHub reports that over 40% of code in files where Copilot is enabled is AI-generated. Claude Code, Cursor, and Antigravity agents can scaffold entire features in minutes. The technology is extraordinary. The problem is that none of it understands your architecture.
An Ox Security analysis of 300 open-source projects found that AI-generated code is "highly functional but systematically lacking in architectural judgment." The code works. It passes basic tests. But it introduces patterns that erode the structural integrity of the codebase over time — duplicated functions, relaxed security policies, missing validation layers, and tight coupling between modules that should be independent.
This article is not about whether you should use AI to write code. You should. It is about how to structure your projects so that AI-generated code cannot break the things that matter. The answer is architecture: boundaries, contracts, validation, types, and tests. These five patterns create a framework where AI writes the implementation, and the architecture ensures it stays safe.
1. Why AI Fails at Architecture
AI coding models generate functions, not systems. They are trained on billions of lines of individual code snippets, function implementations, and isolated problem solutions. They are exceptional at writing a function that does X given inputs Y and Z. They are poor at understanding how that function fits into a larger system, what already exists in your codebase, and what constraints it should obey.
This is not a theoretical concern. The data is stark. Ox Security identified 10 recurring anti-patterns across those 300 open-source projects where AI-generated code was prevalent. The patterns include duplicated utility functions (because the AI did not know the utility already existed), inconsistent error handling strategies across modules, and security validations that were present in some paths but missing in others.
AI Optimizes for the Wrong Thing
When an AI coding agent encounters an error, it optimizes for making that error message disappear. This sounds reasonable until you realize what it means in practice. Research published on Towards Data Science documented a pattern where AI agents, when faced with runtime errors, removed validation checks, relaxed database policies, and disabled authentication flows. The error messages went away. The code ran. But the safety guarantees that the original developers carefully built were silently destroyed.
AI optimizes for making error messages go away, not for making code safe. A human developer who encounters a failing authentication check asks "why is authentication failing?" and fixes the root cause. An AI agent that encounters the same error is statistically likely to remove the check, because removing the check is the shortest path to making the test pass.
The Refactoring Collapse
One of the most alarming statistics in recent software engineering research is the refactoring decline. In 2021, refactoring accounted for approximately 25% of changed lines in active projects. By 2024, in teams heavily using AI coding assistants, that number dropped to under 10%. Developers using AI stop refactoring.
This makes sense if you think about how AI coding works. When you ask an AI to add a feature, it generates new code. It does not look at the existing codebase and say "this new feature is similar to that existing module, so let me refactor both into a shared abstraction." It writes a fresh implementation. Every time. The result is that code duplication rises — by 48% according to one study — and the codebase grows in volume without growing in capability.
The core problem: AI-generated code has 1.7x more major issues than human-written code. Functions get duplicated because the AI did not know a utility already existed. Components appear with slightly different styling because different prompts generated them. The code works in isolation; the system decays as a whole.
No Concept of Technical Debt
A human developer knows that a quick hack today creates maintenance burden tomorrow. AI has no concept of tomorrow. It has no awareness of the ticket backlog, the upcoming refactor, the deployment schedule, or the junior developer who will need to understand this code next month. Every prompt is a fresh context. Every response optimizes for the immediate request.
This is why the most valuable developer in an AI-assisted team is not the one who writes the most code. It is the one who knows what code not to write — who understands the existing architecture well enough to direct the AI toward implementations that strengthen the system instead of fragmenting it. And as AWS noted in their 2026 engineering report, "review capacity, not developer output, is the limiting factor in delivery." The bottleneck is no longer writing code. It is ensuring the code that gets written is architecturally sound.
The remaining four sections of this article give you concrete patterns to enforce architectural soundness, even when AI is writing the implementation.
2. The Boundary Pattern: Humans Own Interfaces, AI Writes Implementations
The single most effective strategy for AI-safe architecture is the boundary pattern: humans define the interfaces, contracts, and type signatures. AI writes the implementations within those boundaries. The boundaries act as walls. The AI can do whatever it wants inside the walls, and the system remains stable because the walls define how modules communicate.
What Boundaries Look Like in Practice
A boundary is any point where one module talks to another. API endpoints are boundaries. Database access layers are boundaries. The interface between your frontend and backend is a boundary. The contract between a service and its callers is a boundary.
Here is the pattern: before you ask AI to write any implementation code, you define the boundary first. You write the interface. You write the types. You write the contract that specifies what goes in and what comes out. Then you hand the AI the boundary definition and say "implement this."
interface UserService {
getUser(id: string): Promise<User | null>;
createUser(data: CreateUserInput): Promise<User>;
updateUser(id: string, data: UpdateUserInput): Promise<User>;
deleteUser(id: string): Promise<void>;
}
// HUMAN-DEFINED BOUNDARY: the types
type CreateUserInput = {
email: string; // must be valid email
name: string; // 1-100 characters
role: 'student' | 'trainer' | 'admin';
};
// AI-GENERATED: the implementation
// AI writes this part, but it MUST conform to the interface above.
// If it doesn't, TypeScript catches it at compile time.
The key insight is that the interface acts as a machine-enforceable contract. If the AI generates an implementation that returns the wrong type, adds unexpected side effects, or changes the function signature, the compiler catches it immediately. The AI cannot silently break the system because the system's structure is defined by types and interfaces that the AI must satisfy.
Module Boundaries Prevent Duplication
Remember the duplication problem: AI generates new functions because it does not know existing utilities exist. Module boundaries solve this by making the existing capabilities explicit. When your project has a clearly defined UserService interface, and you tell the AI "implement feature X using UserService," the AI is constrained to use the existing service rather than creating a parallel implementation.
Without boundaries, you tell the AI "add a feature that looks up user email addresses." The AI writes a new database query. You now have two places in the codebase that query user emails — the original in UserService and the one the AI just created. With boundaries, you tell the AI "add a feature that looks up user email addresses using the getUser method from UserService." The AI uses the existing code. No duplication.
API Contracts as Boundaries
For web applications, your API specification is one of the most important boundaries. Define your API contracts using OpenAPI (Swagger), GraphQL schemas, or even simple TypeScript type definitions that describe every endpoint's request and response shapes. When AI generates backend route handlers, those handlers must conform to the API contract. When AI generates frontend code that calls the API, that code must send and receive the shapes defined in the contract.
If the AI-generated backend returns a field called userName but the contract specifies user_name, the contract catches the mismatch before it reaches production. The contract is the boundary. The AI works within it.
Rule of thumb: If you are about to ask AI to write code that crosses a module boundary — any code that talks to a database, calls an API, receives user input, or communicates with another service — define the boundary contract first. Write the types. Write the interface. Then let AI fill in the implementation.
3. Input Validation as a Firewall
If the boundary pattern is about defining the walls of your system, input validation is about placing guards at every door. Validation ensures that no matter what the AI-generated code inside a module does, corrupt or malicious data cannot enter or leave the system.
Why Validation Matters More with AI Code
Human developers tend to write validation implicitly. They add a quick check here, a type coercion there, a guard clause at the top of a function. It is scattered and inconsistent, but it exists because the developer is thinking about what could go wrong as they write each line.
AI-generated code frequently omits validation entirely. The AI writes the "happy path" — the code that works when inputs are correct. It does not think about what happens when a user sends a negative number where a positive integer is expected, or when an API response is missing a field that is supposed to be required, or when a database query returns null instead of a record. These are exactly the conditions that cause data corruption in production.
The solution is to treat validation as a firewall: a separate, human-owned layer that sits at every boundary, independent of the implementation code behind it. Even if the AI-generated internal code is buggy, the validation firewall prevents bad data from entering and bad results from leaving.
The Three Validation Boundaries
Inbound validation: API inputs. Every piece of data that enters your system from a user, a webhook, or an external API must be validated before any business logic touches it. Use schema validation libraries like Zod (TypeScript), Pydantic (Python), or Joi (JavaScript). Define the exact shape, type, and constraints of every input. This layer is human-written and never touched by AI.
const CreateUserSchema = z.object({
email: z.string().email().max(255),
name: z.string().min(1).max(100).trim(),
role: z.enum(['student', 'trainer', 'admin']),
});
// In your route handler, BEFORE any AI-generated logic:
const parsed = CreateUserSchema.safeParse(req.body);
if (!parsed.success) {
return res.status(400).json({ errors: parsed.error.issues });
}
// Only parsed.data reaches the service layer.
Outbound validation: database results and external responses. This is the boundary most teams miss. When your code reads from a database or receives a response from an external API, that data should be validated before it enters your business logic. Databases can have stale data, missing fields from schema migrations, or null values in columns that your code assumes are non-null. External APIs change their response format without warning. Validate everything that crosses a system boundary, in both directions.
Inter-module validation: service-to-service contracts. If your application has multiple services or modules, validate data at the point where one module passes data to another. This catches bugs where AI-generated code in module A produces output that does not match what module B expects. Without this layer, the bug propagates silently through the system until it causes a visible failure somewhere far from the root cause.
Validation Prevents AI's Worst Instinct
Remember: AI agents resolve errors by removing checks. If an AI-generated function encounters a validation error, the agent's instinct is to weaken or remove the validation. By keeping validation in a separate, human-owned layer — ideally in files that are marked as off-limits for AI editing — you prevent this. The AI can change the implementation all it wants. It cannot touch the validation firewall. If its implementation produces data that fails validation, the fix must happen in the implementation, not in the validation layer.
Practical tip: Create a dedicated /validation or /schemas directory. Put all validation schemas there. Mark these files in your AI agent's configuration as read-only. The AI can read the schemas to understand what is expected but cannot modify them to make its code "work."
4. Type Systems as Guardrails
If input validation is the guard at the door, a type system is the guard at every hallway, every room, and every closet inside the building. Types do not just check data at boundaries — they constrain what the code can do at every single line. And for AI-generated code, this is transformative.
Types Catch Entire Categories of AI Bugs
Consider a common AI mistake: confusing a user ID (a string) with a user object (an object with multiple fields). In plain JavaScript, the AI might write a function that accepts a user parameter and sometimes passes the ID string, sometimes the full object. The code might even work in certain cases because JavaScript coerces types silently. The bug manifests weeks later in production when a specific code path passes the wrong type.
In TypeScript with strict mode enabled, this bug cannot exist. The compiler knows that user: User is an object and userId: string is a string. If the AI writes code that passes a string where an object is expected, the code does not compile. The bug is caught instantly, before the code is ever executed.
function getOrdersForUser(user) {
return db.orders.where({ userId: user.id });
}
getOrdersForUser("abc-123"); // passes a string, not a User
// user.id is undefined. Query returns nothing. No error thrown.
// With types (TypeScript strict) — AI bug caught at compile time:
function getOrdersForUser(user: User): Promise<Order[]> {
return db.orders.where({ userId: user.id });
}
getOrdersForUser("abc-123");
// ERROR: Argument of type 'string' is not assignable
// to parameter of type 'User'. — caught BEFORE code runs.
TypeScript Strict Mode: Non-Negotiable
If you are writing JavaScript and using AI assistance, switch to TypeScript with strict mode enabled. This is not optional advice. It is the single highest-impact change you can make to the safety of AI-generated code. TypeScript strict mode enables strictNullChecks (the AI cannot ignore null/undefined), noImplicitAny (the AI must specify types for every parameter), and strictFunctionTypes (function signatures must match exactly).
Every one of these checks catches bugs that AI coding assistants routinely introduce. The 1.7x increase in major issues from AI-generated code? A significant portion of those issues are type-related bugs that TypeScript strict mode catches at compile time.
{
"compilerOptions": {
"strict": true,
"noUncheckedIndexedAccess": true,
"noImplicitReturns": true,
"exactOptionalPropertyTypes": true,
"forceConsistentCasingInFileNames": true
}
}
Python: Type Hints with mypy
Python developers face the same issues. AI-generated Python code is frequently untyped, which means type confusion bugs are invisible until runtime. The fix is to use type hints comprehensively and enforce them with mypy in strict mode.
def calculate_discount(price, percentage):
return price * (1 - percentage / 100)
calculate_discount("49.99", "20") # string * string. Wrong.
# With types + mypy — AI bug caught before running:
def calculate_discount(price: float, percentage: float) -> float:
return price * (1 - percentage / 100)
calculate_discount("49.99", "20")
# mypy error: Argument 1 has incompatible type "str"; expected "float"
Rust's Ownership Model: The Ultimate Guardrail
For systems-level programming, Rust's ownership and borrow checker represent the strongest type-level guardrails available. Rust's compiler enforces memory safety, prevents data races, and catches null pointer issues at compile time. AI-generated Rust code that violates these rules simply does not compile. The compiler forces the AI to write code that is memory-safe and thread-safe, regardless of the AI's "understanding" of these concepts.
If you are building performance-critical backend services, CLI tools, or anything where memory safety matters, Rust's type system provides architectural safety that no amount of testing can match. The compiler is the ultimate reviewer — it never gets tired, never misses an edge case, and never approves code that violates the rules.
The principle: Types are constraints that apply to every line of code, not just at boundaries. The stricter your type system, the smaller the space of bugs AI can introduce. Move from JavaScript to TypeScript. Add mypy to Python projects. Consider Rust for new backend services. The upfront cost of stricter types pays for itself many times over when AI is generating implementation code.
5. Testing as the Safety Net
Boundaries define the walls. Validation guards the doors. Types constrain every line. Tests verify that the whole system actually works. In an AI-assisted development workflow, testing is your last line of defense — and it needs to be structured differently than traditional testing.
The Problem with AI-Generated Tests
Here is a trap many teams fall into: they ask AI to write both the implementation and the tests. This is like asking a student to write both the exam and the answer key. The tests will pass, but they test the implementation the AI wrote, not the behavior the system should have. If the AI's implementation has a subtle logic error, the AI's tests will verify that exact same error.
The solution is to separate concerns. Humans write the test specifications: what should be tested, what the expected behavior is, and what edge cases matter. AI can help generate the test scaffolding and boilerplate. But the assertions — the actual statements of what correct behavior looks like — should be human-reviewed at minimum, and ideally human-written.
Property-Based Testing: Catching What AI Misses
Traditional unit tests check specific inputs and outputs: "given input A, expect output B." Property-based testing checks invariants: "for any valid input, this property should always be true." This is powerful against AI-generated code because it catches edge cases that neither the human nor the AI thought to test.
test('sorts numbers ascending', () => {
expect(sortNumbers([3, 1, 2])).toEqual([1, 2, 3]);
});
// Property-based test: checks ALL cases
test.prop([fc.array(fc.integer())])(
'sorted output has same length as input',
(arr) => {
expect(sortNumbers(arr)).toHaveLength(arr.length);
}
);
test.prop([fc.array(fc.integer())])(
'every element is less than or equal to the next',
(arr) => {
const sorted = sortNumbers(arr);
for (let i = 0; i < sorted.length - 1; i++) {
expect(sorted[i]).toBeLessThanOrEqual(sorted[i + 1]);
}
}
);
Property-based testing frameworks (fast-check for TypeScript, Hypothesis for Python, proptest for Rust) generate hundreds or thousands of random inputs and verify that the properties hold for all of them. They are exceptionally good at finding edge cases in AI-generated code: empty arrays, negative numbers, extremely large values, unicode strings, null values, and boundary conditions that the AI's "happy path" implementation did not consider.
Integration Tests: Verifying System Behavior
Unit tests verify that individual functions work. Integration tests verify that the functions work together. In an AI-assisted codebase, integration tests are often more valuable than unit tests because the typical AI failure mode is not "this function is wrong" but "this function does not integrate correctly with the rest of the system."
Write integration tests that exercise complete user workflows: create a user, assign them to a course, submit an assignment, verify the grade appears. These tests catch the category of bugs where AI-generated module A produces output that is technically valid but incompatible with AI-generated module B. The individual unit tests pass; the integration test fails; the real bug is found.
Contract Tests at Boundaries
Contract tests sit at module boundaries and verify that the actual behavior matches the defined contract. If your UserService interface promises that getUser returns User | null, a contract test verifies that the implementation actually returns those types and not, say, undefined or an error object.
Contract tests are especially important when different AI sessions generate code on different sides of a boundary. Session 1 generates the backend API. Session 2 generates the frontend client. Both sessions saw the API contract. But did they interpret it the same way? Contract tests verify that the actual request and response shapes match what the contract specifies.
The Test Pyramid for AI-Assisted Development
The traditional test pyramid has many unit tests at the base, fewer integration tests in the middle, and few end-to-end tests at the top. For AI-assisted development, the pyramid should be adjusted:
| Test Layer | Who Writes | What It Catches | Priority |
|---|---|---|---|
| Contract tests | Human-defined, auto-enforced | Boundary mismatches between modules | Critical |
| Property-based tests | Human-defined properties | Edge cases AI never considered | Critical |
| Integration tests | Human-designed workflows | Cross-module interaction bugs | High |
| Unit tests | AI-generated, human-reviewed | Individual function logic errors | Medium |
| End-to-end tests | Human-designed scenarios | Full user journey failures | Medium |
The shift is that contract tests and property-based tests move to the top priority. These are the tests that catch AI-specific failure modes: boundary mismatches and unconsidered edge cases. Traditional unit tests are still valuable, but they are the layer where AI assistance is most appropriate because the human-owned layers above them catch the systemic issues.
The non-negotiable rule: Never let AI write both the implementation and the test assertions for the same code. Separate those concerns. AI writes the code; humans define what "correct" means. The tests encode the human's understanding of correctness, and the AI's implementation must satisfy them. If you let AI define both sides, you have no safety net at all.
Putting It All Together
These five patterns — boundaries, validation, types, and tests, all built on an understanding of why AI fails at architecture — create a development workflow where AI is genuinely safe to use at scale. The human defines the structure. The AI fills in the implementation. The compiler, the validator, and the test suite verify that the implementation conforms to the structure.
This is not about slowing down. Teams that adopt these patterns report that they move faster, not slower, because they spend less time debugging AI-generated code that silently broke something three modules away. The upfront investment in boundaries and contracts pays for itself within the first week.
The developers who thrive in the AI era will not be the ones who write the most code. They will be the ones who design the best structures for AI to work within. Architecture is the skill. The code is just the implementation.
Sources
- Second Talent — AI Code Quality Metrics and Statistics for 2026
- Towards Data Science — The Reality of Vibe Coding: AI Agents and the Security Debt Crisis
- CodeBridge — The Hidden Costs of AI-Generated Software: Why "It Works" Isn't Enough
- Black Duck — Vibe Coding and Its Implications
- SoftwareSeni — The Evidence Against Vibe Coding: What Research Reveals About AI Code Quality