H Company is hiring a Member of Technical Staff to join our Evaluations team. In this role, you will develop agent-specific benchmarks and tools to simplify benchmark creation, collaborating closely with product teams to ensure our agents are thoroughly measured before production.
What You'll Do
- Collaborate with other teams to gather requirements for building new evaluation environments.
- Create benchmarks that accurately reflect the real-world problems agents will face.
- Define metrics that align with specific business use cases.
- Gather insights from benchmarks to define critical improvements for our LLMs and agents.
What We're Looking For
- MS or PhD in Computer Science, Machine Learning, or a related field.
- Proficient in Python.
- Significant experience in web development.
- A collaborative mindset, thriving in dynamic, multidisciplinary teams.
- Strong communication and presentation skills.
- An eagerness to explore and tackle new challenges.
Nice to Have
- Participation in open source projects.
- Experience in agent development.
- Experience in software architecture for large projects.
Technical Stack
- Python
Team & Environment
The Evaluations team interacts with all teams involved in product development. Our culture is built on openness, learning, collaboration, and a holistic, humanist, humble mindset. You'll join a fun, dynamic, multicultural, and highly collaborative environment.
Benefits & Compensation
- Competitive salary.
- Opportunities for professional growth, continuous learning, and career development.
Work Mode
This is a hybrid position. The role is open to candidates based in France, the UK, or the US.
H Company is an equal opportunity employer.



