Conversation
One thing I wish more AI teams understood: your evaluation dataset IS your product spec.

If your evals don't test for it, you're not building for it. Make your test cases as intentional as your feature list.
0
0
0