Sculpting Code: Collaboration with the genie

2025-12-22

In the last 6 months, Claude Code (or recently Open Code) has become a reliable coworker of mine. After going through a couple of iterations, I'm pretty happy with the collaboration workflow that the genie and I have established. Augmented Development finally allows me to implement 30 year old best practices like test-driven-development and constant refactoring, and in the right hands it can be an incredible tool.

From Dopamine Hits to Actual Work#

The last years have been wild. In seemingly no time we went from "cute, it suggested the right thing" to "i'm not writing code by hand anymore", and my (and if you are reading this probably also your) line of work changed drastically.

First there was Copilot and the shadow-text autocompletion. I liked it, but I think it was more about the dopamine rush every time it got something right and less about actual productivity. Also, my brain is trained to use autocompletion as a validation step. If the LSP plugin does not suggest what I expect, there is probably a mistake somewhere. And suddenly it had always something to say! My defect rate went through the roof and I switched it off again.

Then came Cursor with it's "next edit suggestion". My senses couldn't deal with that much file and location jumping. That was the moment I decided that I didn't want AI to interfere with my work directly. I am the master, the machine the servant.

I have been a fan of Raycast for a long time, and their AI offering allowed me to experiment a lot. I created chat presets for different areas that would quickly scaffold repetitive code for me. Even one that scaffolds React components based on screen designs. Primitive by today's standards but a mind-blowing time saver back then[1].

Cursors first agent mode got me really interested. I caught the scent of an LLM working autonomously while I was in meetings. I remember telling a colleague that I would pay double if they just removed that crappy editor. I experimented with Aider, but the ergonomics have not been quite there. Anthropic apparently had nothing better to do than spying on me, because shortly after they delivered in form of Claude Code, which I have been using extensively since its release.
Only recently I started switching to OpenCode. Not because I'm unhappy with Claude, but because I put a lot of work in creating subagents and prompts, and I try to live by the "no vendor lock-in if it pays the bills" standard.

The Workflow That Actually Ships#

Setup#

So how do I get productive beyond slot-machine coding? Step one has nothing to do with AI. I set up really tight development workflow that deterministically covers as much ground as possible. Type checkers, linters, unit tests and end-to-end tests for all the different technologies I work with. Currently that means PHPUnit, PHPCS, PHPCBF, Vitest, Typescript, ESLint, Prettier, pytest, black, Playwright and GraphQL for the language boundaries. Then I create subagents for each space that have the right tools and documentation in their context. For example:

drupal-developer: PHP, Drupal, GraphQL schemas
react-developer: Typescript, React, ESLint, Storybook, Tanstack, Shadcn
effect-export: effect.ts's llm.txt
e2e-test-engineer: Typescript, Playwright

Based on that I set up two fundamental scripts:

precommit: Fixes any autofixable issues and then runs all "quick" tests that do not require a full system. If the script does not pass, the output is automatically passed into the agent which immediately starts fixing.
test: Runs the full (long running) test suite, including all integration tests. Mostly used in CI runners.

Planning#

So much for the preparation. Now the actual work starts with a /plan-issue command I have set up in my workspace. It takes a Github issue number as an argument and passes it to the planning agent which will read its content and start planning and distributing work to the subagents. The plan it comes up will be iterated on until I feel it has enough detail and decision for the agent to make a lot of it. Then the development process kicks off and I switch to another task, since this is going to take some time. A desktop notification pings me when the agent is done or needs my feedback.

Exploratory testing#

When the agent thinks it is done, I start some exploratory testing. In this phase, I do not look at code but walk through the application and just manually test it. I collect all my feedback in a file, send the genie off for another round, and we repeat that until I'm satisfied with the result from a user perspective. Depending on the project and team I might hand this off to other humans for review as well.

Automated testing#

When exploratory testing is done, I focus on the automated test cases. I essentially look at the output of the different test runners and do a sanity check based on the test descriptions. I focus on missing, redundant or unnecessary test cases, but not really on the test implementation at that point. Only when it fails to get those tests to green, I will assist. When this eventually happens, the automated testing is concluded.

Review#

At this point I have a fully implemented feature with test coverage. A great starting point for a ruthless review and refactoring like there's no tomorrow (or like there is one, since that is the point of refactoring). I walk through each changed line and leave inline comments for every change request. I commit all of them and start a dedicated /address-review agent command that will read the last review diff from git history and turn it into a todo-list for issues to be addressed and distribute work to the appropriate subagents. Once again, this process repeats until I'm happy with the code quality. As a final step, I run the precommit command which will reliably eliminate all formatting, linting and trivial typing issues.

Handoff#

At the end of this process I have a well tested and working implementation of my initial issue. It is really hard to tell how much time it saved me, because the process is an ebb and flow of interaction with the agent. But it plays well into my life as a staff engineer that has a lot of meetings and a home-office father with a lot of small distractions. And even under these circumstances, it allows me to deliver better software than ever, for two unexpected reasons:

No premature optimization: Since I'm not even looking at code before I have a working product, I'm much less prone to over-engineer and spend time on optimizing something that might get thrown away. Which is directly connected to the second reason ...
Fearless trashing: When I hit a dead end, I am much more likely to scratch the result and start again. I have a hard time telling a coworker to go back to the start, and when I think up a nice solution to a problem I also tend to get very attached to it. But I have no reservations to send the random number generator back to the start.

Sculpting, Not Building#

Software engineering changed dramatically. Instead of working from ground up, it becomes a top down process. The real work starts with a finished product. To me it even feels like sculpting instead of building. As Michaelangelo put it:

"All I did was chip away everything that didn't look like the figure"

Generating a lot of code is not the issue, the hard part now is finding the correct form within all the possible outcomes.

This also means that we will need new tools. Our editors and IDE's are optimized for writing code, not reviewing it. The review process has been an afterthought, tacked onto our version control provider. But reviewing became the main part of work, and it is not necessarily a shared, collaborative process any more, which changes the constraints.

Reviewing my agents code can remain local. It's faster and does not create noise for everyone.
I want the full context while reviewing. Quick access to type definitions, referenced code and the ability to execute unit tests with a keypress.
I do not want my reviews to be locked into my VCS provider. They are the most important step now.

To solve this for myself, I came up with a little Neovim plugin called review.nvim! The concept is simple:

After the agent work is finished, I commit all of it and hit <leader>rs to start a review. This will create a .review file at the root of the repository with a list of all files that have been changed. From within the file I can use gd (the standard "go to definition" command) to jump to the change.
I review the changed file and leave my review as code comments.
I type <leader>rc to mark the file as reviewed.
<leader>ru brings up a quick picker of all unreviewed files, and I can quickly repeat the process.
When the list is empty, I commit my review and kick off the agent with instructions to fix all comments in the last commit.

I use this mainly for working with my coding agent right now, but there is no reason this would not work for reviews between humans. And writing this down already gave a lot of ideas how this could be improved!

Future#

The GitHubs Cursors, Anthropics and CodeRabbits out there are trying to convince us that the future is in pure cloud development. There is a ticket with the requirements somewhere, and we just set off an agent that does the implementation, another one does the review, and we review the review. I did try that, but for several reasons this did not quite click.

The most important one is cost, because I can't use my Claude Code "Max" plan, which is a steal, compared to the raw token consumption. That will probably change in the future, most likely because they simply can't afford to swallow all that inference costs forever, but at that point I might have a small gamer rig[2] that does that with the power generated by the solar panels on my roof.

The second reason is iteration speed. We are still far away from agents reliably "one-shotting" problems, and there is a lot of back and forth during planning and exploratory testing that just works better when everything runs locally. Although Cursor bought Graphite for a reason, and they probably tackle that problem, in a way that makes them money.

But exactly that brings me to my last reason, and that one is not very likely to change. 25 years ago I was a "Macromedia Flash Expert", and we all know how that played out. I do not want the most important part of my work to be tied to and controlled by a single vendor. And that's the hill I'm going to die on. 🫡