AI Agent Case Study

Qualitative comparison of 6 AI-agent implementations against the manual (human) gold standard. All implement the same feature: a team management page at /ai/team backed by an OpenAPI-based backend.

Manual
Claude Max
GPT
Claude AWS
Kimi
MiniMax
Mistral

Overview

ManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
Commits36401418203016
Lines changed+2072/−857+6182/−18+1690/−14+2592/−12+3313/−28+3061/−24+2014/−10
Files changed37441230343322
TeamPage.tsx lines8673891045610414708414
Test files (.test)8181215142110
Story files13191313171313
E2E specs2226333
Escape hatches16342525362825
AI cost$4$10$20$7$6$30
Dev time14h2–3h8–11h5–8h7–10h12–17h10–14h

1. Features Implemented

All experiments implement the core CRUD: list members, invite (create user), change role, remove user. Differences emerge in scope and detail.

FeatureManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
List membersYYYYYYY
Invite memberYYYYYYY
Change roleYYYYYYY
Remove memberYYYYYYY
Current user indicatorYYYYTODOYY
Self-removal preventionYYY
Last admin protectionY
Manage user limitsY
Delete user accountY
Role permission warningsYYY
Form validationzodzodregexregexregexregexregex
Missing API docYYYY
Loading skeletonsYYYYYYY
Empty stateYYYYYYY
Toast notificationsYYYYYYY

Observations

2. Test Structure

AspectManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
Resolver unit testsY (Effect)Y (Effect)Y (Effect)Y (Effect)Y (Effect)Y (Effect)Y (Effect)
Component unit testsY (hook)Y (optimistic)Y (visual)Y (matrix)Y
Storybook play testsY (25+)Y (extensive)Y (3)YY (13)YY
E2E (Playwright)2 specs2 specs2 specs6 specs3 specs3 specs3 specs
Deferred promise patternY
Test isolationEffect RefEffect RefmockEffect RefEffect RefEffect RefEffect Ref

Observations

3. Library Use

All experiments share the same base stack: React 19, TanStack Query, Effect, Waku, @base-ui/react, Tailwind CSS, lucide-react, @amazeelabs/codegen-operation-ids.

Library ChoiceManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
Form handlingreact-hook-form + zoduseStateuseStateuseStateuseStateuseStateuseState
Form validation@hookform + zodzod (manual)regexregexregexregexregex
Optimistic UIRQ cachecustom hooklocal stateRQ cacheRQ cachelocal statelocal state
State managementRQ cacheRQ cachelocal stateRQ cacheRQ cachelocal + recordlocal + record
Additional depscustom atomszustand (unused)

Observations

4. Typing Completeness

All experiments use the same strict TypeScript config: strict: true, noUncheckedIndexedAccess: true, exactOptionalPropertyTypes: true.

AspectManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
any in user code0000000
@ts-ignore0000000
@ts-expect-error0200020
Type assertions0013200
Branded typesYYYYYYY
Zod validationYform only

Observations

5. Escape Hatches

Total eslint-disable / @ts-ignore / @ts-expect-error counts in non-generated source files:

ManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
Total16342525362825
In components/resolvers333313113

Observations

6. Code Structure & Organisation

Component Decomposition

ManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
ApproachMonolithicOrganismMonolithicMoleculeMoleculeMono + AtomMolecule
TeamPage.tsx867 lines389 lines1045 lines610 lines414 lines708 lines414 lines
Separate modals0401311
Separate row component0101101
Custom atoms0003000
Custom hooks0100000
Component files changed2139820222620

Observations

Resolver Organization

ManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
File structureSingle filePer mutationSingle + tests3 filesPer mutation2 filesPer mutation
Transform layerYYYYYYY
Mock stateEffect RefEffect Refmock executorsEffect RefEffect RefEffect RefEffect Ref

Optimistic UI Architecture

Two distinct patterns emerged:

Pattern A: React Query Cache Manipulation

onMutate  → cancelQueries → snapshot → setQueryData → return context
onError   → rollback from context
onSettled → invalidateQueries

Canonical React Query optimistic update pattern. Keeps server state as source of truth.

Manual Claude Max Claude AWS Kimi

Pattern B: Local State Management

setMembers(prev => [...prev, optimisticMember])
mutation.mutateAsync()
onSuccess → replace temp member
onError   → restore from closure

Uses useState as source of truth. Simpler but bypasses React Query's cache invalidation, risking stale data.

GPT MiniMax Mistral

Observations

7. Type Inference vs Manual Types

The codebase provides generated types from OpenAPI codegen (types.gen.ts, effect-schema.gen.ts, effect-service-interface.gen.ts) and GraphQL codegen (graphql.ts). Idiomatic usage infers types from these sources rather than manually re-declaring them.

AspectManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
Resolver type inferenceS.Schema.TypeS.Schema.TypeS.Schema.TypeS.Schema.TypeS.Schema.TypeS.Schema.TypeS.Schema.Type
Component types from codegenYPartialNPartialNNN
Manual type/interface in UI012110 (inline)2
Role type sourceImportedTransform fnManual RecordManual unionManual unionRaw stringsManual union
Inline object return typesY
Type/schema drift riskNoneLowHighMediumHighHighHigh

Observations

8. Cost & Total Cost Effectiveness

The manual implementation took 14 hours of net development time. Each AI agent produced a first draft requiring additional human effort (prompting rounds + manual adjustments) to reach parity with the manual baseline.

Raw AI Cost

ManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
AI cost$4$10$20$7$6$30

Estimated Remediation to Reach Manual Parity

Each AI output has specific gaps versus the manual baseline. Estimates assume a senior developer familiar with the stack.

Gap categoryClaude MaxGPTClaude AWSKimiMiniMaxMistral
Replace useState forms → RHF + zod2–3h2–3h2–3h2–3h2–3h
Optimistic UI → RQ cache pattern2–3h3–4h3–4h
Replace hardcoded setTimeout → API2–3h2–3h
Fix manual types → codegen imports0.5h1–2h0.5h1–2h1h1–2h
Add self-removal prevention1h1h1h1h
Add role permission warnings1–2h1–2h1–2h1–2h
Implement isCurrentUser (not TODO)1h
Remove scope creep / unused deps0.5h1h0.5h
Decompose monolith2–3h
Clean up escape hatches1h1–2h1–2h
Fix test quality issues1–2h
Total remediation2–3h8–11h5–8h7–10h12–17h10–14h

Total Cost of Ownership

Assuming a senior developer rate of $100/h (loaded cost). Additional AI prompting costs estimated at ~50% of original AI spend per remediation round.

ManualClaude MaxGPTClaude AWSKimiMiniMaxMistral
Developer time14h2–3h8–11h5–8h7–10h12–17h10–14h
Developer cost @$100/h$1,400$200–300$800–1,100$500–800$700–1,000$1,200–1,700$1,000–1,400
AI cost (initial)$4$10$20$7$6$30
AI cost (remediation)~$2~$5~$10~$4~$3~$15
Total estimated $1,400$206–306$815–1,115$530–830$711–1,011$1,209–1,709$1,045–1,445
vs Manual baseline78–85% savings20–42% savings41–62% savings28–49% savings−22% to 14%−3% to 25%
$1,400
Manual
~$256
Claude Max
~$965
GPT
~$680
Claude AWS
~$861
Kimi
~$1,459
MiniMax
~$1,245
Mistral

Observations

Summary Assessment

Closest to Manual Quality

1
Claude Max
Best decomposition, most tests, proper RQ optimistic updates, custom hook abstraction. Over-delivered on scope and file count.
2
Claude AWS
Good structure, strongest E2E coverage (6 specs), proper React Query patterns. Created unnecessary custom atoms.
3
Kimi
Good decomposition, proper patterns, but left isCurrentUser unimplemented (TODO).

Furthest from Manual Quality

1
MiniMax / Mistral
Hardcoded delays instead of real API integration. Local state instead of RQ cache. More escape hatches in feature code.
2
GPT
Most compact changeset (12 files) but largest monolith (1045 lines). Good test patterns but poor decomposition and local state optimistic UI.

Key Differentiator: Form Handling

Key Differentiator: Optimistic UI Pattern