test_cases
Test cases for evaluating agent performance.eval_sets
Collections of test cases for batch evaluation.eval_runs
Evaluation execution records.Evaluation Workflow
- Create test_cases with expected inputs and outputs
- Group test cases into eval_sets for organized testing
- Run eval_runs to evaluate agent performance
- Review results and iterate on agent improvements