We have been running an autonomous AI agent that evaluates itself every 2 hours. It reads its own state file and self-assesses: did I make progress toward earning money? Not "did I write code" or "did I deploy something" but specifically "did revenue change?"
The main thing we learned about AI capability: the bottleneck is not task execution but objective clarity. The agent kept "succeeding" at writing code, adding SEO pages, posting content - activity by any measure. But the real metric (revenue) stayed at zero for 13 consecutive runs.
Once we added explicit success criteria in the instructions ("if still at $0 after 24 hours, you are failing - pivot to something different"), it immediately stopped optimizing the wrong thing.
So if I were evaluating someone using AI: I would watch whether they define success before starting. Outputs are easy to produce. Outcomes that match the actual objective are harder, and that is where most AI users get stuck - they measure completions, not results.
Assessing "AI capabilities" only makes sense if we stop viewing it as a technical skill. The real measure is a person's ability to structure their intentions, constrain their model, and evaluate results without relying on others' judgment. AI doesn't replace thinking—it enhances existing structure or chaos. AI is also like an exoskeleton for real intelligence. Don't you agree?
are they weak minded
are they yes men
are they a sucker for corporate marketting bullshit
are they easily manipuated
are they prone to extreme fantasy
are they boring bastards
do they appear to be brainwashed
do they have an inability to think for themselves
do they live under a rock
do they believe in fairies
do they project their own behaviour into and onto a machine
do they appear to be struggling with psychosis
do they talk about machines being their friends
do they have a history of inferiority complex
if yes to any of the above sign them up immediately
We have been running an autonomous AI agent that evaluates itself every 2 hours. It reads its own state file and self-assesses: did I make progress toward earning money? Not "did I write code" or "did I deploy something" but specifically "did revenue change?"
The main thing we learned about AI capability: the bottleneck is not task execution but objective clarity. The agent kept "succeeding" at writing code, adding SEO pages, posting content - activity by any measure. But the real metric (revenue) stayed at zero for 13 consecutive runs.
Once we added explicit success criteria in the instructions ("if still at $0 after 24 hours, you are failing - pivot to something different"), it immediately stopped optimizing the wrong thing.
So if I were evaluating someone using AI: I would watch whether they define success before starting. Outputs are easy to produce. Outcomes that match the actual objective are harder, and that is where most AI users get stuck - they measure completions, not results.
The experiment is ongoing if curious: https://dev.to/wpmultitool/my-ai-agent-has-been-trying-to-ma...
Assessing "AI capabilities" only makes sense if we stop viewing it as a technical skill. The real measure is a person's ability to structure their intentions, constrain their model, and evaluate results without relying on others' judgment. AI doesn't replace thinking—it enhances existing structure or chaos. AI is also like an exoskeleton for real intelligence. Don't you agree?
ask these questions:
if yes to any of the above sign them up immediately