Testing on Jack Monte — AI Engineer

Testing on Jack Monte — AI Engineerhttps://jackmonte.com/tags/testing/Recent content in Testing on Jack Monte — AI EngineerHugoen-usWed, 10 Jun 2026 00:00:00 +0000A Green Eval Was Lying to Mehttps://jackmonte.com/posts/a-green-eval-was-lying/Wed, 10 Jun 2026 00:00:00 +0000https://jackmonte.com/posts/a-green-eval-was-lying/Adding strict typing to my eval harness exposed an intermittent judge failure. Fixing that exposed a second bug a passing, zero-error evaluation had been hiding the whole time. A green eval is not a correct eval.