Mindrift is looking for an Evaluation Scenario Writer - AI Agent Testing Specialist to join our effort to ethically shape AI. In this role, you will design realistic and structured evaluation scenarios for LLM-based agents, simulating human-performed tasks and defining gold-standard behavior for comparison.
What You'll Do
- Design structured test scenarios based on real-world tasks.
- Define the golden path and acceptable agent behavior.
- Annotate task steps, expected outputs, and edge cases.
- Work with developers to test scenarios and improve clarity.
- Review agent outputs and adapt tests accordingly.
What We're Looking For
- A Bachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science, Artificial Intelligence, Computational Linguistics, Information Systems, or a related field.
- A background in QA, software testing, data analysis, or NLP annotation.
- Good understanding of test design principles like reproducibility, coverage, and edge cases.
- Strong written communication skills in English.
- Comfortable with structured formats like JSON and YAML for scenario description.
- Ability to define expected agent behaviors and scoring logic.
- Basic experience with Python and JS.
- Curiosity and openness to working with AI-generated content, agent logs, and prompt-based behavior.
- Readiness to learn new methods, switch between tasks quickly, and work with complex guidelines.
- A laptop, reliable internet connection, available time, and enthusiasm for the challenge.
Nice to Have
- Experience in writing manual or automated test cases.
- Familiarity with LLM capabilities and typical failure modes.
- Understanding of scoring metrics like precision, recall, coverage, and reward functions.
Technical Stack
- Python
- JS
- JSON
- YAML
Benefits & Compensation
- Contribute on your own schedule, from anywhere in the world.
- Get paid for your expertise, with rates that can go up to $52/hour depending on your skills, experience, and project needs.
- Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.
- Participate in an advanced AI project and gain valuable experience to enhance your portfolio.
- Influence how future AI models understand and communicate in your field of expertise.
Work Mode
This is a global, fully remote freelance position.
Mindrift believes in using the power of collective human intelligence to ethically shape the future of AI.





