Mindrift is looking for an Evaluation Scenario Writer - AI Agent Testing Specialist to design structured test scenarios that evaluate the performance of LLM-based agents. You will create realistic simulations of human-performed tasks and define gold-standard behavior to measure agent actions against.
What You'll Do
- Design structured test scenarios based on real-world tasks.
- Define the golden path and acceptable agent behavior.
- Annotate task steps, expected outputs, and edge cases.
- Work with developers to test your scenarios and improve clarity.
- Review agent outputs and adapt tests accordingly.
What We're Looking For
- Bachelor's or Master’s degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems, or other related fields.
- 3+ years of relevant experience.
- Advanced (C1) or above level of English proficiency.
- Ready to learn new methods, able to switch between tasks and topics quickly.
- Able to sometimes work with challenging, complex guidelines.
- Have a laptop, reliable internet connection, and available time.
Benefits & Compensation
- Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments.
- Work on advanced AI projects and gain valuable experience that enhances your portfolio.
- Influence how future AI models understand and communicate in your field of expertise.
Work Mode
This is a global, remote opportunity.
