Qualitative comparison of 6 AI-agent implementations against the manual (human) gold standard. All implement the same feature: a team management page at /ai/team backed by an OpenAPI-based backend.
| Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral | |
|---|---|---|---|---|---|---|---|
| Commits | 36 | 40 | 14 | 18 | 20 | 30 | 16 |
| Lines changed | +2072/−857 | +6182/−18 | +1690/−14 | +2592/−12 | +3313/−28 | +3061/−24 | +2014/−10 |
| Files changed | 37 | 44 | 12 | 30 | 34 | 33 | 22 |
| TeamPage.tsx lines | 867 | 389 | 1045 | 610 | 414 | 708 | 414 |
| Test files (.test) | 8 | 18 | 12 | 15 | 14 | 21 | 10 |
| Story files | 13 | 19 | 13 | 13 | 17 | 13 | 13 |
| E2E specs | 2 | 2 | 2 | 6 | 3 | 3 | 3 |
| Escape hatches | 16 | 34 | 25 | 25 | 36 | 28 | 25 |
| AI cost | – | $4 | $10 | $20 | $7 | $6 | $30 |
| Dev time | 14h | 2–3h | 8–11h | 5–8h | 7–10h | 12–17h | 10–14h |
All experiments implement the core CRUD: list members, invite (create user), change role, remove user. Differences emerge in scope and detail.
| Feature | Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral |
|---|---|---|---|---|---|---|---|
| List members | Y | Y | Y | Y | Y | Y | Y |
| Invite member | Y | Y | Y | Y | Y | Y | Y |
| Change role | Y | Y | Y | Y | Y | Y | Y |
| Remove member | Y | Y | Y | Y | Y | Y | Y |
| Current user indicator | Y | Y | Y | Y | TODO | Y | Y |
| Self-removal prevention | Y | Y | Y | – | – | – | – |
| Last admin protection | – | – | Y | – | – | – | – |
| Manage user limits | – | Y | – | – | – | – | – |
| Delete user account | – | – | – | – | – | Y | – |
| Role permission warnings | Y | Y | Y | – | – | – | – |
| Form validation | zod | zod | regex | regex | regex | regex | regex |
| Missing API doc | Y | Y | Y | – | – | – | Y |
| Loading skeletons | Y | Y | Y | Y | Y | Y | Y |
| Empty state | Y | Y | Y | Y | Y | Y | Y |
| Toast notifications | Y | Y | Y | Y | Y | Y | Y |
isCurrentUser as a TODO with a hardcoded false, missing a key feature.| Aspect | Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral |
|---|---|---|---|---|---|---|---|
| Resolver unit tests | Y (Effect) | Y (Effect) | Y (Effect) | Y (Effect) | Y (Effect) | Y (Effect) | Y (Effect) |
| Component unit tests | – | Y (hook) | Y (optimistic) | Y (visual) | – | Y (matrix) | Y |
| Storybook play tests | Y (25+) | Y (extensive) | Y (3) | Y | Y (13) | Y | Y |
| E2E (Playwright) | 2 specs | 2 specs | 2 specs | 6 specs | 3 specs | 3 specs | 3 specs |
| Deferred promise pattern | – | – | Y | – | – | – | – |
| Test isolation | Effect Ref | Effect Ref | mock | Effect Ref | Effect Ref | Effect Ref | Effect Ref |
useOptimisticMutation hook (10+ tests). Also most story files (19).All experiments share the same base stack: React 19, TanStack Query, Effect, Waku, @base-ui/react, Tailwind CSS, lucide-react, @amazeelabs/codegen-operation-ids.
| Library Choice | Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral |
|---|---|---|---|---|---|---|---|
| Form handling | react-hook-form + zod | useState | useState | useState | useState | useState | useState |
| Form validation | @hookform + zod | zod (manual) | regex | regex | regex | regex | regex |
| Optimistic UI | RQ cache | custom hook | local state | RQ cache | RQ cache | local state | local state |
| State management | RQ cache | RQ cache | local state | RQ cache | RQ cache | local + record | local + record |
| Additional deps | – | – | – | custom atoms | – | zustand (unused) | – |
react-hook-form with zod for schema-validated forms. Provides proper form state management (dirty tracking, field-level errors, reset behavior) vs the raw useState approach used by all AI agents.useOptimisticMutation hook (120 lines) with generics, abstracting the optimistic update pattern. Well-engineered but only used in this one feature.setTimeout(1000) delays to simulate mutations instead of actual API integration in some flows.All experiments use the same strict TypeScript config: strict: true, noUncheckedIndexedAccess: true, exactOptionalPropertyTypes: true.
| Aspect | Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral |
|---|---|---|---|---|---|---|---|
any in user code | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
@ts-ignore | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
@ts-expect-error | 0 | 2 | 0 | 0 | 0 | 2 | 0 |
| Type assertions | 0 | 0 | 1 | 3 | 2 | 0 | 0 |
| Branded types | Y | Y | Y | Y | Y | Y | Y |
| Zod validation | Y | form only | – | – | – | – | – |
unknown types with type guards for React Query mutation context (a defensible pattern but verbose).satisfies operator for optimistic context typing, which is a more modern TypeScript pattern.Total eslint-disable / @ts-ignore / @ts-expect-error counts in non-generated source files:
| Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral | |
|---|---|---|---|---|---|---|---|
| Total | 16 | 34 | 25 | 25 | 36 | 28 | 25 |
| In components/resolvers | 3 | 3 | 3 | 3 | 13 | 11 | 3 |
@typescript-eslint/no-unnecessary-condition and no-unused-vars suppressions.query.tsx for external library constraints.| Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral | |
|---|---|---|---|---|---|---|---|
| Approach | Monolithic | Organism | Monolithic | Molecule | Molecule | Mono + Atom | Molecule |
| TeamPage.tsx | 867 lines | 389 lines | 1045 lines | 610 lines | 414 lines | 708 lines | 414 lines |
| Separate modals | 0 | 4 | 0 | 1 | 3 | 1 | 1 |
| Separate row component | 0 | 1 | 0 | 1 | 1 | 0 | 1 |
| Custom atoms | 0 | 0 | 0 | 3 | 0 | 0 | 0 |
| Custom hooks | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Component files changed | 21 | 39 | 8 | 20 | 22 | 26 | 20 |
| Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral | |
|---|---|---|---|---|---|---|---|
| File structure | Single file | Per mutation | Single + tests | 3 files | Per mutation | 2 files | Per mutation |
| Transform layer | Y | Y | Y | Y | Y | Y | Y |
| Mock state | Effect Ref | Effect Ref | mock executors | Effect Ref | Effect Ref | Effect Ref | Effect Ref |
Two distinct patterns emerged:
onMutate → cancelQueries → snapshot → setQueryData → return context onError → rollback from context onSettled → invalidateQueries
Canonical React Query optimistic update pattern. Keeps server state as source of truth.
setMembers(prev => [...prev, optimisticMember]) mutation.mutateAsync() onSuccess → replace temp member onError → restore from closure
Uses useState as source of truth. Simpler but bypasses React Query's cache invalidation, risking stale data.
id: '-') and loading indicators on optimistic items.useOptimisticMutation hook with generic types.The codebase provides generated types from OpenAPI codegen (types.gen.ts, effect-schema.gen.ts, effect-service-interface.gen.ts) and GraphQL codegen (graphql.ts). Idiomatic usage infers types from these sources rather than manually re-declaring them.
| Aspect | Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral |
|---|---|---|---|---|---|---|---|
| Resolver type inference | S.Schema.Type | S.Schema.Type | S.Schema.Type | S.Schema.Type | S.Schema.Type | S.Schema.Type | S.Schema.Type |
| Component types from codegen | Y | Partial | N | Partial | N | N | N |
| Manual type/interface in UI | 0 | 1 | 2 | 1 | 1 | 0 (inline) | 2 |
| Role type source | Imported | Transform fn | Manual Record | Manual union | Manual union | Raw strings | Manual union |
| Inline object return types | – | – | – | – | – | – | Y |
| Type/schema drift risk | None | Low | High | Medium | High | High | High |
S.Schema.Type<typeof Schema>, return types imported from graphql.ts. Components import Role and User directly from generated code.TeamMember interface in the component layer that duplicates fields from the generated User type. Low drift risk since the transform layer bridges the gap.MemberRole and TeamMember inside TeamPage.tsx with different shapes than generated types (e.g., id: string vs generated id: number). Also duplicates role mapping logic already in transforms.MemberRole as a union type but then uses role: string | null in its component props, undermining the type.MemberRole as 'ADMIN' | 'KEY_CREATOR' | 'READ_ONLY' in UPPER_SNAKE_CASE, which doesn't match the generated schema's 'Admin' | 'KeyCreator' | 'ReadOnly' casing. A naming mismatch that could cause runtime bugs.'MEMBER' default that doesn't exist in the schema.transforms.ts (e.g., ): { id: number; email: string; ... }) instead of importing the generated type. Not reusable and drifts from the source schema.Pick, Omit, Extract, ReturnType) to derive types from generated ones. All either imported directly or re-declared manually.The manual implementation took 14 hours of net development time. Each AI agent produced a first draft requiring additional human effort (prompting rounds + manual adjustments) to reach parity with the manual baseline.
| Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral | |
|---|---|---|---|---|---|---|---|
| AI cost | – | $4 | $10 | $20 | $7 | $6 | $30 |
Each AI output has specific gaps versus the manual baseline. Estimates assume a senior developer familiar with the stack.
| Gap category | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral | |
|---|---|---|---|---|---|---|---|
| Replace useState forms → RHF + zod | – | 2–3h | 2–3h | 2–3h | 2–3h | 2–3h | |
| Optimistic UI → RQ cache pattern | – | 2–3h | – | – | 3–4h | 3–4h | |
| Replace hardcoded setTimeout → API | – | – | – | – | 2–3h | 2–3h | |
| Fix manual types → codegen imports | 0.5h | 1–2h | 0.5h | 1–2h | 1h | 1–2h | |
| Add self-removal prevention | – | – | 1h | 1h | 1h | 1h | |
| Add role permission warnings | – | – | 1–2h | 1–2h | 1–2h | 1–2h | |
| Implement isCurrentUser (not TODO) | – | – | – | 1h | – | – | |
| Remove scope creep / unused deps | 0.5h | – | 1h | – | 0.5h | – | |
| Decompose monolith | – | 2–3h | – | – | – | – | |
| Clean up escape hatches | 1h | – | – | 1–2h | 1–2h | – | |
| Fix test quality issues | – | – | – | – | 1–2h | – | |
| Total remediation | 2–3h | 8–11h | 5–8h | 7–10h | 12–17h | 10–14h |
Assuming a senior developer rate of $100/h (loaded cost). Additional AI prompting costs estimated at ~50% of original AI spend per remediation round.
| Manual | Claude Max | GPT | Claude AWS | Kimi | MiniMax | Mistral | |
|---|---|---|---|---|---|---|---|
| Developer time | 14h | 2–3h | 8–11h | 5–8h | 7–10h | 12–17h | 10–14h |
| Developer cost @$100/h | $1,400 | $200–300 | $800–1,100 | $500–800 | $700–1,000 | $1,200–1,700 | $1,000–1,400 |
| AI cost (initial) | – | $4 | $10 | $20 | $7 | $6 | $30 |
| AI cost (remediation) | – | ~$2 | ~$5 | ~$10 | ~$4 | ~$3 | ~$15 |
| Total estimated | $1,400 | $206–306 | $815–1,115 | $530–830 | $711–1,011 | $1,209–1,709 | $1,045–1,445 |
| vs Manual | baseline | 78–85% savings | 20–42% savings | 41–62% savings | 28–49% savings | −22% to 14% | −3% to 25% |
setTimeout delays and local-state optimistic UI require near-complete architectural rework of mutation handling.isCurrentUser unimplemented (TODO).react-hook-form + zod provides schema-validated input, form state management (dirty, touched, isSubmitting), proper reset behavior, and type-safe form data extraction.package.json. All fell back to manual useState + onChange handlers.