From Clicking Yes to Letting Claude Run Wild (Safely)

14 Jul 2025, 00:00

technical / ai / go / claude / agentic coding

So I’ve been working on this project called client-w-mcp – it’s a leanring project to truly understand how an AI agent works with MCP servers. And I’m exploring Agentic development - with Claude Code.

Why Claude Code?

The first time I used it, Claude just… flowed. It seems to do a lot more by itself to figure things out. I especially like the Task() so that it can go do more than one thing at a time. It would be bad to try to modify code that way, but to write tests or update docs and not have to wait doing one thing at a time… is wonderful.

I ran a few tests on copies of the same repo using GitHub CoPilot and Claude Code using the same model (Claude Sonnet). Every time I did it Claude Code finished faster, got stuck less, and generally produced better docs. According to Anthropic, Claude Code is optimized for coding. My observation? It’s obvious. As an agent, it’s better (for now!) than CoPilot.

Here’s a 12 minute video on Claude. Well worth watching.

What is client-w-mcp?

I wanted to understand from first principles how Agents work with LLMs and MCP servers. It’s written in go, and is a working text chat agent and a skeleton MCP server. Together with an LLM they provide a complete AI development experience. I made it a template repo so you can use it to build your own agents and MCP servers. It’s a great way to learn how to build agents that can do real work.

From Ollama to OpenAI to Anthropic

When I started this journey, I was all-in on Ollama. Running local models felt like the right move – no API costs, complete privacy, and I could tinker with different models whenever I wanted. I added code to measure the “tokens per second (TPS)” on various systems. But for the use case of agenting programming (e.g. generating code), it was slow… even on my high-end MacBookPro with 48GB of unified RAM.

So I bit the bullet and ported the code first to OpenAI, and then to Anthropic. I went to Anthropic partly because I love the Claude models, but also because I sometimes use Amazon Bedrock and they don’t support OpenAI (yet?). Later I’ll have a post about my experiences with paying for Anthropic.

I used Claude Code to migrate from Ollama to OpenAI, then to Anthropic. So it’s using AI tools to build AI tools. It’s a bit recursive, which appeals to me!

The Permission Dance: One Click at a Time

At first, working with either GitHub CoPilot or Claude Code was a constant dance of permissions. The tool asks “I want to modify server.go” and I’d click “Allow.” Then “I want to create a new test file” – another click. “I want to run the tests” – click again. It was productive, but honestly kind of tedious. And I found myself reading the code it wrote and agreeing with it nearly all the time. I realized that if I could just automatically accept all these requests, I could get a lot more done… if I had a way to do it safely, and a way to inspect the code later and be confident it was doing the right thing.

Don’t get me wrong, those safety guardrails are there for good reasons. I really don’t want it to go to my home folder and doing an “rm -Rf” or similar. I don’t want it reading my ssh private keys. I don’t want it scanning my other source code by accident. But geesh, ’s asking “mother may I” about… everything. Listing files. Grepping for a string. Sneezing. Geesh. Literally anything. What I really want is a way to let it do what it wants to do, but only in the folder it’s started in and not anywhere else. I want to give it the permissions it needs, but not more than that. And I want to be able to inspect the code it writes AT SOME POINT. It does not have to be during the editing process.

CoPilot has no ability to specify auto-acceptance for any of these things. Claude Code has a permissions model to allow fine grained control of these things but there seems to be a bug on MacOS? It just does not work today, for me at least.

Living Dangerously

Many folks on the Internet are saying they use the Claude Code --dangerously-skip-permissions flag. This bypasses all the checks and just lets the agent do whatever it wants. Folks who like it say “well, it’s not broken my system after two months, so it must be fine!” Ugh, no. I’m nervous about letting an AI loose to do whatever it wants. One oopsie and I lose days recovering from backups. No. Just no. Especially since I have secrets like ssh keys, API keys, etc.

Just say NO!

The Answer: VS Code Devcontainers!

Dev Containers are Docker containers specifically configured to serve as full-featured development environments within Visual Studio Code. By defining a devcontainer.json file in your project, you tell VS Code how to build or access a container that includes all the tools, libraries, runtimes, and extensions needed for your codebase. This means you can have a consistent, isolated environment that works the same way for everyone on your team, regardless of their local setup.

I learned about this from a blog post by Tim Sh. It’s brilliant in its simplicity – give Claude all the permissions it wants, but only inside a sandbox where it can’t hurt anything important. And you don’t need to make a separate container. It’s built into VS Code!

And even better: Claude Code integrates into VS Code’s terminal so you get the best of both worlds: a modern editor and a sandboxed Claude Code inside of it.

For the client-w-mcp project, this was perfect. I could mount just the project directory into a Go development container, and Claude could go wild creating files, running tests, and building binaries without any risk to my actual system.

Making It Practical: The devcontainer.json File

Now, here’s where the rubber meets the road: a solid devcontainer.json file. This is basically the recipe that tells VS Code (and Claude) how to set up your development environment consistently every time.

{
  "name": "Go Claude Development",
  "image": "mcr.microsoft.com/devcontainers/go:1.21",
  "features": {
    "ghcr.io/devcontainers/features/common-utils:2": {},
    "ghcr.io/devcontainers/features/docker-in-docker:2": {}
  },
  "customizations": {
    "vscode": {
      "extensions": [
        "golang.go",
        "ms-vscode.vscode-json"
      ]
    }
  },
  "postCreateCommand": "go mod download",
  "remoteUser": "vscode",
  "mounts": [
    "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind"
  ]
}

Breaking Down the Config

The Base Image: Microsoft’s Go container is solid – it has everything you need, and it’s maintained by people who actually know what they’re doing. I have a latent mistrust of Microsoft, but hey, I’m using VS Code. I guess the Linux guy in me has to accept that Microsoft has become a good open source citizen. Or close enough to it, anyway.

Useful Features: The common-utils gives you all the basic dev tools, and docker-in-docker lets you build containers inside your container (containerception!).

VS Code Extensions: Pre-installing the Go extension means Claude gets access to all the language server features – autocomplete, error checking, the works.

Automatic Setup: The postCreateCommand downloads your Go modules right away, so Claude doesn’t have to wait around twiddling its digital thumbs.

Docker Socket: This mount lets you build and run Docker containers from inside your dev container.

But Wait! Who is Inspecting the Code? When is it Inspected?

It’s a totally different thought process: if you plan to let the AI write all the code, and human reads it over later, then you should start with writing tests.

My workflow became:

Write tests first – Before asking Claude to implement something, I’d write tests that defined what success looked like. This might be a markdown file that describes what I want tested. Or it might just be that I asked Claude to write thorough tests.
Run tests constantly – After every significant change, the test suite would run automatically
Fix failures immediately – Broken tests got addressed right away, while the context was fresh
Review Code Later – Once the tests were passing, I could review the code Claude wrote. If it was good, great! If not, I could ask it to fix specific issues or rewrite parts of it.

In fact, that’s exactly what I did with client-w-mcp. The first version used Ollama. Then I told it to plan carefully a move to OpenAI. Then I told it to plan carefully a move to Anthropic. Each time, I had it write tests first, and then let Claude write the code. It worked great!

And it raises a real question: if the AI can write the tests, why do I need to read the code? In fact, there’s an argument that the tests are the real specification. If the tests cover all the edge cases and pass, then the code is doing its job. Sure, I might not like the way it was written, but if it works and the tests are solid, does it really matter? One could also argue that if a human isn’t likely to be reading the code later, then why does it matter how it was written? The AI can read it and understand it just fine. So the AI can modify it later just as easily.

For critical systems, I still want to read the code. If it’s part of a larger project with humans involved, I want it to follow good design patterns. For this project, I’m just building agents and MCP servers as a learning project so if it does not use the best patterns it’s not a big deal.

But for client-w-mcp, I wanted to understand how it worked, so I read the code. And I learned a lot about how AI agents work with MCP servers.

What Can Go Wrong (And How to Avoid It)

Now that I’ve been using this setup for a while, let me share some of the pitfalls I’ve discovered and how to avoid them:

Claude Might Accidentally Break Stuff:

Deleted the wrong file (solution: always use version control). Git is your friend. Don’t be afraid! You can git squash them later to make the history clean!
Modified something it shouldn’t have touched (solution: careful mount points)
Got access to secrets (solution: proper container isolation)
Messed with configs that were working fine (solution: read-only mounts for configs)

Security Can Get Sketchy:

Claude might accidentally read sensitive files (solution: minimal mounts)
Environment variables or API keys could get exposed (solution: clean environment)
You lose some safety guardrails (solution: comprehensive tests)

Using devcontainers mitigates most of these risks. By isolating Claude in a controlled environment, you can give it the freedom to explore and modify without worrying about it wreaking havoc on your actual system.

Some Hard-Learned Rules to Live By

Okay, so you want to use this dangerous permissions thing? Here are the rules that’ll keep you out of trouble:

Containers Only, Always: Seriously, don’t even think about running this on your actual system. That way lies madness and broken laptops… no matter what other people say.
Be Picky About Mounts: Only give Claude access to what it absolutely needs. Your secret SSH keys can stay safely locked away.
Make Things Read-Only: Config files, documentation, anything Claude doesn’t need to modify should be mounted read-only.
Nuke and Rebuild Often: Containers are cheap. Debugging weird state issues is expensive. When in doubt, start fresh.
Git Is Your Friend: Make sure everything important is in version control. Claude can be a bit… enthusiastic… with changes.
Have a Backup Plan: Before letting Claude touch critical stuff, make sure you can roll back if things go sideways.

Wrapping Up

Look, the --dangerously-skip-permissions flag turns Claude from a helpful but sometimes frustrating assistant into a genuine coding partner that can actually get stuff done. But with great power comes great responsibility, and all that.

The Devcontainer approach is honestly brilliant – you get all the benefits of giving Claude full access without the heart palpitations that come from worrying about what it might accidentally delete. Add in solid tests and a good devcontainer setup, and you’ve got a development environment that’s both powerful and safe.

Trust me, once you get this setup working smoothly, you’ll wonder how you ever developed software any other way. Just remember to keep those tests green, and you’ll be golden.

Footnote

I went to publish this and realized that I had updated hugo on the machine I do blog builds on. And, of course, it broke the build. I got this:

hugo -d ../gherlein.github.io/
  Start building sites …
  hugo v0.148.1-98ba786f2f5dca0866f47ab79f394370bcb77d2f linux/amd64
  BuildDate=2025-07-11T12:56:21Z VendorInfo=gohugoio

  ERROR deprecated: .Site.Social was deprecated in Hugo v0.124.0 and subsequently
  removed. Implement taxonomy 'social' or use .Site.Params.Social instead.
  ERROR deprecated: .Page.NextPage was deprecated in Hugo v0.123.0 and subsequently
  removed. Use .Page.Next instead.
  ERROR deprecated: .Page.PrevPage was deprecated in Hugo v0.123.0 and subsequently
  removed. Use .Page.Prev instead.
  Total in 245 ms
  Error: error building site: logged 3 error(s)

Alas, I use the blackburn theme and it has not been updated to work with the latest Hugo. I tried a few things that didn’t work and decided to take a walk. I came back and just used Claude Code. It found the specific errors in blackburn and fixed them. 15 seconds later, I had a working blog build again. I was able to publish this post.

Have I said how much I love Claude Code?