Responsibilities
- Establish clear definitions of success for AI features in the absence of established industry benchmarks, determining what constitutes accurate or acceptable performance.
- Create and implement comprehensive testing approaches for AI tools, using both exploratory and adversarial methods to expose potential failure points.
- Construct evaluation systems to assess key aspects of AI performance, including accuracy, consistency, frequency of hallucinations, handling of edge cases, and system stability.
- Produce and manage datasets, validation workflows, and experimental setups used in testing AI capabilities.
- Analyze recurring behaviors in AI systems to build a deeper understanding of their capabilities, limitations, and areas prone to risk.
- Serve as the primary advocate for quality, ensuring alignment between development efforts and reliable measures of performance excellence.
Work Arrangement
Hybrid
Team
Team exploring and building AI-powered products that don’t have industry playbooks yet.
Team
Team exploring and building AI-powered products that don’t have industry playbooks yet.