As the AI Test Architect, you will shape the future of quality assurance by designing and deploying a next-generation 'Quality Intelligence' system powered by generative AI. This role is central to ensuring the integrity of AI-driven features across a large-scale, cloud-native SaaS platform used by hundreds of thousands of professionals worldwide.

Key Responsibilities

Design and implement a unified Quality Intelligence platform that leverages generative AI to forecast defect-prone areas, optimize test coverage, auto-generate test cases, and enable self-correcting test execution.
Define and lead the adoption of an enterprise-wide AI-first testing strategy, including methods for evaluating non-deterministic outputs, monitoring model drift, and detecting hallucinations throughout the development lifecycle.
Establish ethical and compliance standards for AI testing, aligned with evolving regulatory expectations.
Develop rigorous evaluation frameworks for internal AI agents and generative features, including red teaming, adversarial testing, and benchmarks focused on bias, prompt injection, jailbreaking, and goal alignment.
Build statistically sound evaluation pipelines using tools such as LangFuse, LangSmith, DeepEval, RAGAS, or Arize Phoenix, incorporating LLM-as-judge patterns and human-in-the-loop validation.
Create test harnesses for agentic behaviors, tool use, planning logic, multi-agent simulations, and runtime observability.
Integrate AI-powered testing into GitHub-based CI/CD workflows, enabling predictive flakiness detection, automated quality gates, and AI-generated test suites.
Design self-healing test frameworks by combining AI plugins with Playwright or Cypress, reducing maintenance overhead as UIs and models evolve.
Lead synthetic data generation, curate reference datasets, and implement AI-driven data masking to support high-fidelity, privacy-compliant testing at scale.
Collaborate with product, data science, ML engineering, and security teams to embed quality controls into AI feature development from inception.
Train and mentor QA teams to adopt AI-augmented testing practices through workshops, documentation, and community initiatives.
Champion AI quality standards across the organization, including dashboards that track DORA metrics alongside AI-specific indicators like hallucination rate and red team success.
Implement telemetry systems for AI quality, including drift detection, faithfulness scoring, and compliance monitoring, integrated with platforms like Langfuse.
Establish feedback mechanisms for model refinement, A/B testing safeguards, and proactive risk management in production environments.

Required Qualifications

Minimum of 8 years of experience in Quality Engineering or Test Architecture within cloud-native SaaS environments.
At least 2 years focused on testing AI, ML, or LLM-based systems.
Strong technical foundation in AWS, including serverless architectures, microservices, and infrastructure-as-code using Terraform or CloudFormation.
Hands-on experience with GitHub CI/CD ecosystems.
Proven ability to architect and test LLM-powered applications using LangChain, LangGraph, LangSmith, or similar frameworks.
Expertise in modern test automation tools such as Playwright or Cypress, with practical experience integrating AI-based self-healing capabilities.
Proficiency in JavaScript/TypeScript and/or Python.
Firm grasp of core AI concepts including transformers, embeddings, RAG architectures, and evaluation trade-offs.
Experience with LLM evaluation platforms such as Bedrock Evaluations, Prompt Management, Guardrails, DeepEval, RAGAS, Arize Phoenix, or Langfuse.
Track record of technical leadership and cross-functional influence.

Preferred Qualifications

Background in red teaming using tools like Cobalt Strike, Sliver, or Nmap.
Familiarity with adversarial testing methodologies and security-focused evaluation techniques.

Technology Environment

AWS, Terraform, CloudFormation, GitHub CI/CD, LangChain, LangGraph, LangSmith, Playwright, Cypress, JavaScript, TypeScript, Python, Bedrock Evaluations, Prompt Management, Guardrails, DeepEval, RAGAS, Arize Phoenix, Langfuse, Cobalt Strike, Sliver, Nmap.

Work Mode

This is a fully remote position open to candidates based in Colombia.

Caseware is hiring an AI Test Architect

About the Role

Key Responsibilities

Required Qualifications

Preferred Qualifications

Technology Environment

Work Mode

200+ professionals, 37 countries, one network

Similar Jobs

Lead Software Automation Engineer

AI Assurance Quality Engineer

Senior QA Automation Engineer

Quality Engineer

Software Test Engineer

Senior AI-First QA Engineer (SQA)