Responsibilities
- Create and implement comprehensive testing strategies for AI agents handling clinical documentation, appointment scheduling, billing, messaging, and workflow automation.
- Develop full-cycle testing frameworks to assess model performance under varying conditions such as prompt designs, input sources, and available tools.
- Collaborate with medical professionals to establish accurate benchmarks for evaluating AI outputs in high-impact clinical areas, and work with experts in non-clinical fields as needed.
- Conduct and reproduce experiments using different models, settings, and interaction patterns to identify the most effective setups.
- Implement and manage continuous monitoring systems to sample agent behavior after deployment for compliance and performance tracking.
- Interpret evaluation results and communicate key insights and trade-offs clearly to product, engineering, and external technical audiences.
- Lead the development and upkeep of internal evaluation tools and systems to ensure fast, reliable, and repeatable testing processes.
- Detect performance gaps in AI models and propose candidates for improvement through fine-tuning or enhanced retrieval methods.
Benefits
- Competitive salary and equity compensation
- Health insurance coverage
- Stipend for home office setup
- 401k retirement plan
- Paid parental leave for 12 weeks
- Flexible and unlimited paid time off
Compensation
Competitive Salary & Equity Package
Work Arrangement
Remote (Worldwide)
Team
We operate as a fully remote, distributed team with an emphasis on asynchronous collaboration and individual autonomy.
Other
- We are a mostly remote, distributed team.
- We encourage people to do their work when and where they perform at their best.
- Strong written communication skills, time management skills, and personal accountability are very important to us.
- We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information.
- These tools assist our recruitment team but do not replace human judgment.
- Final hiring decisions are ultimately made by humans.
- If you would like more information about how your data is processed, please contact us.
Not specified