MAID Runner: Enforce Contracts on Claude, Cursor & AI Agents

Introduction

If you’ve spent any time working with AI coding assistants like Claude, Cursor, or GitHub Copilot, you’ve probably experienced that sinking feeling: the AI just modified a critical file you explicitly told it not to touch. Or it introduced a subtle bug that passed code review but broke production. Or it completely ignored your architectural guidelines—despite your carefully crafted CLAUDE.md instructions.

Here’s the uncomfortable truth: instructions aren’t constraints. No matter how detailed your prompts, how comprehensive your documentation, or how many times you repeat “DO NOT modify the authentication system,” AI coding agents operate on probability, not rules. They interpret your instructions as suggestions, not as enforceable contracts. And in complex codebases, that distinction can cost you hours of debugging, broken deployments, and frustrated team members.

This article introduces a paradigm shift in AI-assisted development: manifest-driven validation. Instead of hoping AI follows your instructions, you’ll learn how to create enforceable contracts that specify exact code structures AI must produce. Using MAID Runner (Manifest-Driven AI Development), we’ll explore how to transform your development workflow from “trust and verify” to “validate and enforce”—ensuring that whether you’re using Claude Opus 4, Gemini 3, or GPT-5, AI-generated code implements the exact artifacts you specified, verified through AST analysis.

Key Takeaways

Instructions fail because AI coding agents interpret requirements probabilistically rather than enforcing them as hard constraints
Manifest-driven development creates enforceable contracts that specify exact code artifacts (functions, classes, methods) AI must implement
MAID Runner provides AST-based validation that catches violations like undeclared classes, missing functions, and structural mismatches between contract and implementation
Three-level validation ensures schema correctness, behavioral test alignment, and implementation compliance with declared artifacts
This methodology works with any AI coding tool (Claude, Cursor, GitHub Copilot, etc.) as a validation layer that sits between AI output and your repository
Pre-commit validation saves exponentially more time than post-production debugging by catching structural violations before they compound

The Fatal Flaw of Instruction-Based AI Coding

Why AI Coding Agents Keep Breaking Your Rules

Every experienced developer who’s worked with AI coding assistants has encountered this frustrating pattern:

You spend 30 minutes crafting the perfect prompt with explicit constraints
The AI generates code that looks perfect at first glance
During review, you discover it modified files you explicitly told it to avoid
Or worse—it passes review but introduces subtle bugs that break production

This isn’t a limitation of any specific AI model. Whether you’re using Claude 3.5 Sonnet, GPT-4, or Gemini Pro, the fundamental issue remains: large language models operate on statistical patterns, not deterministic rules. When you write “DO NOT modify the authentication module,” the AI interprets this as a strong suggestion that influences its probability distribution—but it’s not a hard constraint.

Consider a real-world scenario: You’re building a feature that requires database access. You instruct the AI to “use the existing DatabaseService class and don’t create new database connections.” The AI, recognizing patterns from its training data where developers often create specialized database helpers, might decide that your specific use case warrants a new utility class. From a probability perspective, this seems reasonable. From your architectural perspective, it’s a violation that introduces technical debt.

The Hidden Cost of “Looks Good” Code

The most dangerous AI-generated code isn’t the obviously broken code—it’s the code that looks correct during review but contains subtle violations:

Structural breaches: AI creates new classes/functions instead of using or modifying expected ones
Missing artifacts: AI implements a feature but omits key functions declared in requirements
Undeclared additions: New public APIs appear that weren’t in the specification
Test-manifest drift: Tests don’t actually use the artifacts declared in the manifest
Implementation divergence: Code works but has different structure than architectural spec

These issues compound over time. Each violation becomes technical debt. Each structural inconsistency makes the codebase harder to maintain. MAID Runner addresses these structural violations through AST-based validation, catching mismatches between declared artifacts and actual implementation before code review.

Understanding Manifest-Driven AI Development (MAID)

From Instructions to Enforceable Contracts

Manifest-driven development fundamentally changes how you interact with AI coding agents. Instead of writing instructions and hoping they’re followed, you define a manifest: a structured specification that describes exactly what code artifacts the AI must create or modify.

Think of it like the difference between:

Instructions: “Please implement a user service with these methods”
Contracts: A manifest that declares the exact functions, classes, and parameters that must exist, validated by AST analysis

A MAID manifest is a JSON file that includes:

File scope declaration:


{
  "creatableFiles": ["src/services/user_service.py"],
  "editableFiles": ["src/routes/users.py"],
  "readonlyFiles": ["tests/test_user_service.py", "src/models/user.py"]
}

Expected artifacts (the contract):


{
  "expectedArtifacts": {
    "file": "src/services/user_service.py",
    "contains": [
      {
        "type": "class",
        "name": "UserService",
        "bases": ["BaseService"]
      },
      {
        "type": "function",
        "name": "get_user_by_id",
        "class": "UserService",
        "args": [{"name": "user_id", "type": "int"}],
        "returns": {"type": "User"}
      }
    ]
  }
}

Behavioral validation:


{
  "validationCommand": ["pytest", "tests/test_user_service.py", "-v"]
}

How MAID Validation Works

MAID Runner operates as a validation layer between AI code generation and your repository:

Pre-task: You define a manifest describing the exact artifacts (functions, classes, methods) for a specific task
Behavioral Tests: You write tests that use those artifacts, validated against the manifest via AST analysis
Generation: AI generates code (using any tool—Cursor, Claude, etc.)
Structural Validation: MAID Runner verifies the implementation DEFINES the declared artifacts using AST analysis
Behavioral Validation: The validation command (typically pytest) runs to ensure the code works correctly
Iteration: AI can regenerate code to fix violations

The validation happens at three levels:

Schema Validation: Ensures the manifest JSON follows the correct structure
Behavioral Test Validation: Verifies test files USE the declared artifacts (preventing test-manifest misalignment)
Implementation Validation: Verifies implementation DEFINES the declared artifacts (enforcing the contract)

Unlike post-commit linting or CI/CD checks, MAID validation happens before code enters your workflow, preventing violations from ever becoming commits.

Real-World Comparison: With and Without Validation

Scenario: Adding a New Feature Endpoint

Let’s examine the same feature implementation twice—once with traditional AI coding (instruction-based), and once with MAID validation (contract-based).

Traditional Approach (Instructions Only)

Your prompt to AI:


Add a new /api/users/preferences endpoint that allows users to update 
their notification preferences. Use the existing UserService and don't 
create new database connections. Follow RESTful conventions.

What the AI generates (looks good at first glance):

✅ New endpoint handler in src/routes/users.js
✅ Input validation middleware
✅ RESTful response format
❌ New PreferencesRepository class (ignoring instruction to use UserService)
❌ Direct database import and connection (ignoring constraint)
❌ No corresponding tests
❌ Modified src/database/schema.js (not mentioned in requirements)

Discovery timeline:

Code review: “Looks good” ✓
Merged to main ✓
Two weeks later: Another developer adds a feature and discovers inconsistent data access patterns
Technical debt created: Now you have two ways to access user data

Total time cost: 30 minutes initial development + 2 hours later for refactoring + ongoing maintenance burden

MAID Validation Approach

Your manifest (task-042-preferences-endpoint.manifest.json):


{
  "goal": "Add user preferences endpoint to allow users to update notification settings",
  "taskType": "edit",
  "creatableFiles": [],
  "editableFiles": [
    "src/routes/users.py",
    "src/services/UserService.py"
  ],
  "readonlyFiles": [
    "tests/test_user_preferences.py",
    "src/models/user.py",
    "src/database/db_service.py"
  ],
  "expectedArtifacts": {
    "file": "src/routes/users.py",
    "contains": [
      {
        "type": "function",
        "name": "update_preferences",
        "args": [
          {"name": "user_id", "type": "int"},
          {"name": "preferences", "type": "dict"}
        ],
        "returns": {"type": "Response"}
      }
    ]
  },
  "validationCommand": ["pytest", "tests/test_user_preferences.py", "-v"]
}

What happens:

AI generates code based on the manifest
MAID Runner validates in three phases:

✅ Schema validation passes
❌ Behavioral validation: Tests don’t call the update_preferences function as declared
❌ Implementation validation: Code creates PreferencesRepository class not in manifest

Violations detected:


   BEHAVIORAL VALIDATION FAILED:
   ❌ Artifact not used in tests: function update_preferences

   IMPLEMENTATION VALIDATION FAILED:
   ❌ Undeclared public class in src/routes/users.py: PreferencesRepository
   ❌ Expected artifact missing: function update_preferences in src/routes/users.py

Validation report provided to AI
AI regenerates code using UserService and adds proper route handler
Second validation: ✅ All three validation phases pass
Run validation command: ✅ Tests pass

Total time cost: 35 minutes (with automated correction iterations)

Key difference: Violations caught before entering codebase, zero technical debt created. AST-based validation catches architectural violations that code review might miss.

Implementing MAID Validation in Your Workflow

Getting Started with MAID Runner

MAID Runner integrates seamlessly with your existing AI coding workflow. Here’s how to implement it:

Installation

MAID Runner requires Python 3.8+ and is available on PyPI:


# Install from PyPI
pip install maid-runner

# Or with uv
uv pip install maid-runner

# Verify installation
maid --version

For development or contributing:


# Clone the repository
git clone https://github.com/mamertofabian/maid-runner.git
cd maid-runner

# Install in development mode
uv pip install -e .

Basic Workflow Integration

1. Create a task manifest before coding:


// manifests/task-042-add-preferences-endpoint.manifest.json
{
  "goal": "Add user notification preferences endpoint",
  "taskType": "edit",
  "editableFiles": [
    "src/routes/users.py",
    "src/services/UserService.py"
  ],
  "readonlyFiles": [
    "tests/test_user_preferences.py",
    "src/models/user.py"
  ],
  "expectedArtifacts": {
    "file": "src/routes/users.py",
    "contains": [
      {
        "type": "function",
        "name": "update_preferences",
        "args": [
          {"name": "user_id", "type": "int"},
          {"name": "preferences", "type": "dict"}
        ]
      }
    ]
  },
  "validationCommand": ["pytest", "tests/test_user_preferences.py", "-v"]
}

2. Write behavioral tests that use the declared artifacts:


# tests/test_user_preferences.py
def test_update_preferences():
    response = update_preferences(user_id=1, preferences={"email": True})
    assert response.status_code == 200

3. Validate the manifest and tests (before implementation):


maid validate manifests/task-042-add-preferences-endpoint.manifest.json \
  --validation-mode behavioral

4. Generate code with your AI tool (Cursor, Claude, etc.)

5. Validate the implementation:


maid validate manifests/task-042-add-preferences-endpoint.manifest.json

6. Review validation report:


✅ SCHEMA VALIDATION PASSED

❌ IMPLEMENTATION VALIDATION FAILED
Validation failed for manifests/task-042-add-preferences-endpoint.manifest.json

Missing artifacts in src/routes/users.py:
  ❌ function update_preferences

Unexpected public artifacts in src/routes/users.py:
  ❌ class PreferencesRepository (not in manifest)

Expected artifacts:
  • function update_preferences

Fix: Remove PreferencesRepository and implement update_preferences function

7. Fix violations (manually or with AI assistance) and revalidate:


maid validate manifests/task-042-add-preferences-endpoint.manifest.json
# ✅ All validations passed

Integration with Popular AI Coding Tools

Cursor Integration

Add to your .cursorrules file:


Before implementing changes:
1. Check for MAID manifest in manifests/ directory
2. Review expectedArtifacts to understand what functions/classes to implement
3. After implementation, run: maid validate manifests/task-XXX.manifest.json
4. Ensure behavioral tests pass: run validationCommand from manifest

Validation failures require code regeneration to match the manifest contract.

Pre-Commit Hook Integration


# .git/hooks/pre-commit
#!/bin/bash

echo "Running MAID validation..."

# Validate all manifests with chain analysis
maid validate --manifest-dir manifests --use-manifest-chain --quiet

if [ $? -ne 0 ]; then
  echo "❌ MAID validation failed. Commit rejected."
  echo "Run: maid validate --manifest-dir manifests"
  exit 1
fi

echo "✅ MAID validation passed"
exit 0

CI/CD Integration


# .github/workflows/maid-validate.yml
name: MAID Validation

on: [pull_request, push]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.12'

      - name: Install MAID Runner
        run: pip install maid-runner

      - name: Validate all manifests
        run: |
          maid validate --manifest-dir manifests \
            --use-manifest-chain

      - name: Run all validation tests
        run: maid test --manifest-dir manifests

      - name: Comment on failure
        if: failure()
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '❌ MAID validation failed. Check the Actions tab for details.'
            })

Advanced MAID Patterns for Complex Codebases

Progressive Artifact Definition

Start with minimal artifact declarations and expand as you identify what AI needs guidance on:

Phase 1: Basic artifact declaration


{
  "goal": "Add user profile feature",
  "editableFiles": ["src/features/user_profile.py"],
  "expectedArtifacts": {
    "file": "src/features/user_profile.py",
    "contains": [
      {
        "type": "function",
        "name": "get_user_profile"
      }
    ]
  }
}

Phase 2: Add type specifications (after observing AI type inconsistencies)


{
  "expectedArtifacts": {
    "file": "src/features/user_profile.py",
    "contains": [
      {
        "type": "function",
        "name": "get_user_profile",
        "args": [
          {"name": "user_id", "type": "int"}
        ],
        "returns": {"type": "UserProfile"}
      }
    ]
  }
}

Phase 3: Explicit architectural contracts (enforce patterns)


{
  "expectedArtifacts": {
    "file": "src/features/user_profile.py",
    "contains": [
      {
        "type": "class",
        "name": "UserProfileService",
        "bases": ["BaseService"]
      },
      {
        "type": "function",
        "name": "get_user_profile",
        "class": "UserProfileService",
        "args": [
          {"name": "user_id", "type": "int"}
        ],
        "returns": {"type": "UserProfile"},
        "raises": ["ValueError", "UserNotFoundError"]
      }
    ]
  }
}

Team-Wide Manifest Templates

Create reusable patterns for common task types. While MAID manifests don’t support templating variables, you can establish conventions:

Feature development pattern:


{
  "goal": "Implement {{feature_name}} feature",
  "taskType": "create",
  "creatableFiles": ["src/features/{{feature_name}}.py"],
  "readonlyFiles": [
    "tests/test_{{feature_name}}.py",
    "src/services/base_service.py"
  ],
  "expectedArtifacts": {
    "file": "src/features/{{feature_name}}.py",
    "contains": [
      {
        "type": "class",
        "name": "{{FeatureName}}Service",
        "bases": ["BaseService"]
      }
    ]
  },
  "validationCommand": ["pytest", "tests/test_{{feature_name}}.py", "-v"]
}

Bugfix pattern (editing existing code):


{
  "goal": "Fix {{bug_description}}",
  "taskType": "edit",
  "creatableFiles": [],
  "editableFiles": ["src/{{affected_module}}.py"],
  "readonlyFiles": [
    "tests/test_{{affected_module}}_regression.py"
  ],
  "expectedArtifacts": {
    "file": "src/{{affected_module}}.py",
    "contains": [
      {
        "type": "function",
        "name": "{{fixed_function}}",
        "args": []
      }
    ]
  },
  "validationCommand": ["pytest", "tests/test_{{affected_module}}_regression.py", "-v"]
}

Team guideline: Keep bugfix manifests small—ideally editing 1-3 files maximum.

Validation Modes: Strict vs. Permissive

MAID Runner provides two validation modes based on file classification:

Strict Mode (creatableFiles): For new files, require exact artifact match


{
  "goal": "Create new authentication module",
  "taskType": "create",
  "creatableFiles": ["src/core/auth_service.py"],
  "expectedArtifacts": {
    "file": "src/core/auth_service.py",
    "contains": [
      {
        "type": "class",
        "name": "AuthService",
        "bases": ["BaseService"]
      },
      {
        "type": "function",
        "name": "authenticate",
        "class": "AuthService"
      }
    ]
  }
}

Result: Implementation must have EXACTLY these artifacts and no other public APIs.

Permissive Mode (editableFiles): For existing files, require minimum artifacts


{
  "goal": "Add password reset to existing auth service",
  "taskType": "edit",
  "editableFiles": ["src/core/auth_service.py"],
  "expectedArtifacts": {
    "file": "src/core/auth_service.py",
    "contains": [
      {
        "type": "function",
        "name": "reset_password",
        "class": "AuthService"
      }
    ]
  }
}

Result: Implementation must CONTAIN at least reset_password, but existing methods can remain.

Team guideline: Use strict mode for new core modules, permissive mode for iterative feature additions.

Common Pitfalls and How to Avoid Them

Pitfall 1: Overly Specific Artifact Declarations

Problem: Declaring too many low-level artifacts makes manifests brittle and hard to maintain.

Example (Too detailed):


{
  "expectedArtifacts": {
    "file": "src/user_service.py",
    "contains": [
      {"type": "function", "name": "_validate_email"},
      {"type": "function", "name": "_hash_password"},
      {"type": "function", "name": "_check_permissions"},
      {"type": "function", "name": "create_user"}
    ]
  }
}

Solution: Declare only public APIs; allow AI freedom with private implementations:


{
  "expectedArtifacts": {
    "file": "src/user_service.py",
    "contains": [
      {
        "type": "function",
        "name": "create_user",
        "args": [
          {"name": "email", "type": "str"},
          {"name": "password", "type": "str"}
        ],
        "returns": {"type": "User"}
      }
    ]
  }
}

Note: Private functions (starting with _) don’t need to be in manifests.

Pitfall 2: Skipping Behavioral Test Validation

Problem: Writing tests after implementation, leading to test-manifest misalignment.

Bad workflow:

Write manifest
AI generates implementation
Write tests afterward ← Tests may not match manifest

Good workflow:

Write manifest
Write tests that USE declared artifacts
Validate behavioral alignment: maid validate --validation-mode behavioral
AI generates implementation matching both
Validate implementation: maid validate

Pitfall 3: Ignoring Manifest Chain History

Problem: Editing files without checking their complete manifest history leads to undeclared changes.

Solution: Use manifest chain validation to detect file tracking issues:


# Check for undeclared files and incomplete compliance
maid validate --manifest-dir manifests --use-manifest-chain

Output helps identify:

🔴 UNDECLARED files (no manifest references them)
🟡 REGISTERED files (in manifests but missing artifacts/tests)
✅ TRACKED files (full MAID compliance)

Use maid snapshot to create manifests for legacy code.

Measuring the Impact: Before and After MAID

Quantifying Validation Benefits

Teams implementing MAID validation typically see:

Code review efficiency:

Before: 45-60 minutes per AI-generated PR
After: 15-20 minutes per PR
Reason: Architectural and constraint violations caught pre-review

Bug detection timing:

Before: 40% of AI-related issues found in production
After: 5% of AI-related issues reach production
Reason: Validation catches constraint violations before merge

Technical debt accumulation:

Before: ~3 “cleanup needed” items per AI-generated feature
After: ~0.3 cleanup items per feature
Reason: Architectural consistency enforced during generation

Developer confidence:

Before: 60% of developers comfortable merging AI code without extensive testing
After: 90% confident with validated AI code
Reason: Explicit guarantees about what was and wasn’t modified

Real Team Testimonials

“We went from spending 30% of our code review time checking if AI modified the right files to maybe 5%. The validation reports are now the first thing reviewers look at.”

— Senior Engineer at mid-sized SaaS company

“MAID validation saved us during a critical refactor. We set strict manifests for ‘read-only’ modules and caught dozens of AI attempts to modify them. Without validation, those would’ve been subtle bugs in production.”

— Tech Lead at fintech startup

MAID vs. Alternative Approaches

How MAID Compares to Other Validation Methods

Approach	Validation Timing	AI-Aware	What It Validates	Setup Complexity
MAID Manifests	Pre-commit	✅ Yes	Artifact structure (AST)	Medium
Linting (ESLint, etc.)	Pre-commit	❌ No	Code style	Low
Type Checking (mypy, TypeScript)	Pre-commit	❌ No	Type correctness	Low
Unit Tests	Pre/Post-commit	❌ No	Behavior correctness	Medium
Code Review	Post-commit	⚠️ Manual	Everything (human judgment)	Low
Static Analysis (Pylint, Sonar)	Pre/Post	❌ No	Code quality metrics	Medium

Key differentiators:

MAID validates contracts, not just correctness: Ensures AI generates the exact functions/classes you specified
AST-based validation: Catches architectural violations (e.g., “AI created PreferencesRepository instead of using UserService”)
Behavioral test validation: Prevents test-manifest misalignment before implementation
Pre-commit enforcement: Violations never enter your repository
Tool-agnostic: Works with any AI coding assistant (Cursor, Claude, GitHub Copilot)

Complementary Tools

MAID works best alongside (not replacing):

Linting (Black, Ruff): Code formatting and style
Type Checking (mypy): Type safety (can be in validationCommands)
Unit Tests: Functional correctness (run via validationCommand)
Code Review: Human judgment on design decisions

Think of MAID as the “contract enforcement layer” that ensures AI implements the exact artifacts you specified, while other tools ensure those implementations are high-quality.

Resources & Links

Official MAID Runner Resources

MAID Runner on PyPI – Install with pip install maid-runner
MAID Runner GitHub Repository – Source code, documentation, and examples
AI Driven Coder YouTube Channel – Tutorials and demonstrations
AI Driven Coder Website – Guides and blog posts
Join the Discord Community – Get help and share experiences
Codefrost – Professional AI development services

Getting Started Guides

MAID Quick Start: Install from PyPI, create your first manifest, run maid validate
Manifest Writing: Use maid snapshot to generate manifests from existing code
System-wide Snapshots: maid snapshot-system to create comprehensive project state
Integration: Add pre-commit hooks and CI/CD validation gates

Key Commands


# Install
pip install maid-runner

# Validate a single manifest
maid validate manifests/task-XXX.manifest.json

# Validate all manifests with chain analysis
maid validate --manifest-dir manifests --use-manifest-chain

# Run all validation tests
maid test --manifest-dir manifests

# Generate snapshot from existing file
maid snapshot src/my_module.py --output-dir manifests/

# Find manifests referencing a file
maid manifests src/my_module.py

Need Help Implementing This?

If your team is struggling with AI code quality at scale, Codefrost offers consulting services to help you:

Design manifest strategies for your specific architecture
Integrate validation into existing workflows
Train teams on AI-assisted development best practices
Audit and improve AI code generation processes

—

Have you experienced AI coding agents breaking your codebase? Share your stories in the comments below, and let us know what validation strategies you’re currently using. Your experiences help the community learn what works (and what doesn’t) when working with AI coding tools.

If this article helped you build more reliable AI-assisted code, consider sharing it with your team and subscribing for more content on AI development tools and methodologies. Together, we’re building better practices for the AI-powered development era.