About the Role
Role details below.
Responsibilities
- Design and implement comprehensive testing strategies for GenAI features, including conversational AI, agentic systems, and LLM-powered workflows
- Develop automated test suites for prompt testing, including regression tests that detect unintended changes in model behaviour
- Create evaluation frameworks to measure GenAI quality across multiple dimensions (accuracy, relevance, safety, consistency, latency)
- Build and maintain test datasets and golden examples that represent diverse user scenarios and edge cases
- Implement monitoring and alerting systems to detect quality degradation in production GenAI features
- Perform adversarial testing to identify potential failures, hallucinations, biases, or security vulnerabilities in AI systems
- Collaborate with engineers to define acceptance criteria and quality gates for AI feature releases
- Develop tools and frameworks that make it easy for engineers to test their GenAI implementations
- Conduct user acceptance testing and gather feedback on AI feature performance from internal users
- Document testing procedures, known issues, and quality metrics in clear, accessible formats
- Partner with Product and Design teams to ensure AI features meet user experience standards
- Stay current with GenAI testing methodologies, tools, and industry best practices
Requirements
- PRE or test engineering experience, preferably with AI/ML systems
- Strong understanding of GenAI technologies including LLMs, prompt engineering, and AI application patterns
- Experience with test automation frameworks and scripting (Python, JavaScript, Selenium, Pytest)
- Knowledge of software testing methodologies (functional, integration, regression, performance, security testing)
- Ability to design test cases and evaluation criteria for non-deterministic systems
- Strong analytical and problem-solving skills with attention to detail
- Experience with API testing tools (Postman, REST Assured) and backend testing
- Familiarity with CI/CD pipelines and automated testing integration
- Excellent communication skills for documenting issues and collaboration
Nice to Have
- Experience testing conversational AI, chatbots, or agentic systems
- Knowledge of ML model evaluation metrics and techniques
- Familiarity with LLM evaluation frameworks (LangSmith, PromptFoo, Ragas)
- Experience with performance testing and load testing AI APIs
- Understanding of responsible AI principles, including fairness, transparency, and safety testing
- Background in enterprise software or SaaS QA
- Experience with test management tools (TestRail, Zephyr, Jira)
- Knowledge of security testing methodologies for AI systems
- Scripting experience with Python, including working with LLM APIs
Benefits
- Define Quality practices for GenAI applications
- Work on cutting-edge AI technologies and help ensure they're reliable and trustworthy
- Shape quality standards that will impact