The Real Story
Anthropic announced their computer use feature today — AI that can see your screen and control your mouse and keyboard. My first thought wasn't "let me download it." It was "let me try to build my own before I even look at theirs."
So I told my AI assistant (running on OpenClaw) to go look up what Anthropic's version does, and just make it better. That was literally the first prompt. No planning doc, no architecture review. Just "see what they did and one-up it."
Within minutes, we had a working v1 — screenshot, send to Claude, execute the action, repeat. It worked. It also took 17 steps to open Safari and type a URL. That's when I started pushing.
"Make it better. Use advanced mathematics. Think outside the box." — That was the next prompt. And we started stacking my own ideas on top. What if we read the macOS accessibility tree instead of guessing from pixels? What if we hardcoded common sequences so the AI doesn't fumble the same task every time? What if we only take screenshots when the outcome is actually uncertain?
Each idea led to the next. The macro system came from watching Claude take 8 attempts to focus a URL bar. The accessibility integration came from realizing the OS already knows where every button is. We even explored Voronoi decomposition, Markov chains, and Kalman filters — some of it practical, some of it R&D for later.
None of this was planned from the start. It was built iteratively — one conversation, one experiment, one "what if" at a time. I'd push an idea, my AI assistant would build it, we'd test it, and the results would spark the next idea.
This is one of several projects I'm tinkering with right now, and I figured I'd start creating pages like this to act as living blogs — documenting the build as it happens. The entire v1 + v2 stack, the architecture, this website — all built in a single afternoon. I'll keep updating this page as the project develops.
↓ Keep scrolling to see what we've built so far. This page will grow as the project develops.