Ensure the reliability, safety, and quality of virtual agents and generative-AI workflows across customer service channels. Design and execute test strategies for LLM- and agent-based systems by combining traditional QA practices with expertise in conversational AI and agentic automation.
Responsibilities
- Create and carry out end-to-end test plans for virtual customer service agents across voice, chat, and email platforms.
- Develop and maintain test suites for agentic AI systems that validate autonomous processes like booking modifications, billing tasks, and damage handling.
- Establish and track AI quality metrics including resolution rates, customer deflection, customer satisfaction scores, and safety compliance.
- Generate test data and synthetic scenarios to evaluate edge cases, ambiguous user inputs, and multilingual interactions.
- Assess agent behavior by analyzing reasoning paths, tool usage decisions, and memory consumption patterns.
- Build and enhance automated testing frameworks for conversational interactions, APIs, and system integrations.
- Support LLMOps and CI/CD pipelines to automatically identify regressions in AI models and workflows.
- Conduct testing for bias, fairness, and safety in alignment with company policies and regulatory standards.
- Work with cross-functional teams to help define requirements and acceptance criteria.
- Record and report defects, test outcomes, and actionable insights clearly for technical and non-technical stakeholders.
Requirements
- 2 to 3 years of professional experience in software quality assurance or test engineering.
- Solid knowledge of testing methodologies including functional, regression, integration, and performance testing.
- Practical experience testing AI or machine learning systems such as LLM-powered chatbots or recommendation engines.
- Familiarity with multi-agentic systems and agentic frameworks like LangChain and AWS AgentCore.
- Proficiency in Python scripting.
- Experience testing RESTful APIs.
- Understanding of LLMOps and MLOps practices including model versioning, monitoring, and rollback procedures.
- Strong analytical thinking, documentation abilities, and communication skills.
- Fluent in English.
Nice to Have
- Knowledge of German or other languages is preferred.
Tech Stack
Python, REST API, LangChain, AWS AgentCore, LLMOps, MLOps, CI/CD, Generative AI, Conversational AI, Agentic AI
Benefits
- 28 days of paid vacation per year
- One additional day off for your birthday
- 1 paid volunteer day annually
- Hybrid working model
- Flexible working hours
- No formal dress code
- Access to discounts on car rental, car sharing, ride-hailing, and SIXT+ services
- Partner discount programs
- Opportunities to participate in training initiatives
- Attendance at external industry conferences
- Internal developer and technology presentations
- Private health insurance coverage
- Coverflex employee benefits platform
Work Arrangement
Hybrid working model with flexible working hours
- Commitment to top-tier customer experience
- Focus on exceptional customer service
- Encouragement of entrepreneurial mindset
- Long-term corporate stability
- Strategic foresight in business planning
Additional Information
- English language fluency is required.
- Proficiency in German or additional languages is an advantage.
- Work-life balance and flexibility includes a relaxed, no-dress-code policy.
- Training and development opportunities include attendance at external conferences and internal tech talks.
- Health and well-being are supported through private health insurance.
- Employee benefits include access to the Coverflex advantage system.


