AI coding assistants powered by large language models (LLMs) now feel like junior pair programmers rather than autocomplete on steroids. After months of “vibe coding” with six popular tools, I’m sold on the concept, but every one of them still demands the patience you reserve for a bright but distractible intern. Here’s what each assistant gets right, where it falls short, and why I’ve started building the tool I really want.
ChatGPT: The generalist that runs out of room
OpenAI’s ChatGPT is where most developers start because it understands almost any prompt. In the macOS app, you can even send an open file and get a unified diff back, an upgrade from early cut-and-paste gymnastics. But the moment your change spans several files or a language ChatGPT can’t execute, you’re back to copy-pasting or juggling a detached “canvas” window. Long code blocks sometimes stall, and multi-turn refactors often hit token limits. There are now local plugins and extensions that are supposed to ease this, but I’ve not had much luck yet.
Strengths: Broad model quality, single-file diffs.
Limits: No real project context, external execution, occasional size limits.
GitHub Copilot: Inline speed, narrow field of view
The killer feature of GitHub Copilot is friction-free inline completion inside both Visual Studio and Visual Studio Code. Type a comment, press Tab, get a plausible snippet. Copilot Chat can rewrite multiple files, but the workflow is still tuned for single-file suggestions. Cross-file refactors or deep architectural changes remain awkward. While there is a brand new release I’ve barely tested, I’ve come to the conclusion that GitHub Copilot is where the laggards of AI-assisted development will live, not those doing it day to day.
Strengths: Seamless autocomplete, native IDE support.
Limits: Struggles with cross-file edits, context is mostly whatever’s open.
Cursor: Inline diff done right, at a price
Cursor proves how powerful inline diff review can be. Type a prompt, and it writes code, often crossing dozens of files. Being a VS Code fork, though, it loses built-in C# debugging due to licensing issues. It also enforces hard-coded limits (25 tool calls) you can’t override. Once the conversation grows, latency spikes, and you risk colliding with Cursor’s rate limits. There are frequent outages and slowdowns, sometimes bad enough that I VPN into Germany to finish a task. By the way, I dumped $500 in it this month.
Strengths: Best-in-class diff workflow, improving stability, cheapest way to try Claude 4.
Limits: Closed fork, hard caps, opaque latency.
Windsurf (formerly Codeium): Fast, generous, sometimes chaotic
Windsurf feels like Cursor with a turbocharger. The same OpenAI models return responses two to three times faster, and the free tier is unusually generous. Speed aside, multi-file edits are erratic: Windsurf sometimes wanders into unrelated files even after agreeing to a well-scoped plan. It also thrashes sometimes, a lot of repetitive file scans and tool calls. I sandbox those runs on a throw-away branch and cherry-pick what works. I’m not sure I’ll use Windsurf once it’s not free. By the way, Anthropic just pulled the rug out from under Windsurf for Claude.
Strengths: Exceptional latency, large free quota, cheapest way to use o4-mini and other OpenAI models.
Limits: Unpredictable edits, roadmap may shift after the OpenAI acquisition
RooCode: Agentic power, all-or-crawl workflow
RooCode layers an orchestrator (“Boomerang Tasks”) over its Cline legacy, splitting big requests into subtasks you approve one by one. It ships a diff view, but it’s a modal panel, not the inline experience Cursor and Windsurf provide. Roo has only two speeds: Go Fast (hands-off, great for throw-away prototypes) and Crawl (approve every micro-step). There’s no middle-ground “walk” mode, so real-world development feels either too automated or too granular. Config changes don’t always reload without a restart. Roo is also not the tool you want for AI-assisted debugging.
Strengths: Powerful task orchestration, VS Code plugin rather than fork.
Limits: Modal diff view, no balanced workflow speed, sporadic config glitches.
Claude Code: Because I’m too cool for an IDE
If you’re in a Discord channel, chatting with the in crowd, or just lurking, they aren’t talking about any of these tools. They’re talking about CLI tools. Claude Code is fun and does a good job of realizing some Python scripts and things you might want to try. However, as your project gets bigger and debugging becomes a larger part of the test, you’re going to want and IDE. It isn’t that you can’t use one, it’s just then why are you using this thing for generation and changing things instead of a tool that integrates into your IDE?
Strengths: The most stable way to use Claude.
Limits: I’m not cool enough to debug everything at the command line like I did in the 1990s.
What today’s AI coding assistants still miss
- Plugin, not fork: Forks break debugger extensions and slow upstream updates.
- Controlled forgetting and task-aware recall: Most tools “forget” to stay under context limits, but the pruning is blind, often chopping off the high-level why that guided the whole session. We need selective, user-editable forgetting (pin critical goals, expire trivia) and smart recall that surfaces the proper slice of history when a sub-task resumes.
- Fine-grained control: Pick any model (local or cloud), set per-model rate limits, decide exactly what gets stored in memory and when.
- Inline diff as table stakes: Line-by-line review is mandatory.
- Stability first: Crashes and silent time-outs erase trust faster than bad suggestions.
- Open source matters again: With open LLMs, you can realistically contribute fixes. Accepted pull requests prove it.
- Deterministic guardrails: The model can be stochastic, but everything around it—config files, rate limits, memory rules—must behave predictably.
- Optimizing debugging: I don’t want to undersell how much more productive I am since I started vibe coding. However, the makers continue to optimize generation, which is already pretty good. Then I spend a lot of time debugging and fixing my tests. That part is sometimes slower than doing it myself.
No single assistant ticks all eight boxes. The result is a mountain of custom rules and workarounds just to keep projects moving, which is why I’m writing my own tool.
Bottom line
The first wave of AI coding assistants proves the concept but also shows the cost of letting a black-box model drive your IDE. GitHub Copilot nailed autocomplete; Cursor nailed inline diff; Windsurf nailed latency; RooCode nailed orchestration. Combine those strengths with deterministic guardrails, accurate memory control, and plugin-based freedom, and we’ll have an assistant that starts to feel like a mid-level engineer instead of a gifted intern.
Until then, I’ll keep one hand on git revert
and the other on the keyboard building the assistant I actually want.