Responsibilities
- Interact with models to identify where function calling and tool use behaviour can be improved
- Gather internal and external feedback on tool-calling behaviour to scope areas for improvement
- Design and implement evals, data guidelines, data generation, and synthetic tool environments and APIs
- Identify and fix edge case behaviours, such as malformed arguments, hallucinated functions, and incorrect tool selection—through rigorous testing
- Develop robust evaluation pipelines for the function-calling capabilities of our model candidates
- Work collaboratively with AI Scientists
Requirements
- You have a deep understanding of either 1) API design, structured outputs, and schema specification (e.g. JSON Schema), 2) engineering and code behavior, 3) LLM agents at work, including reasoning, planning, and multi-step tool use
- You have prior knowledge in training and optimising model behaviour
- You are an expert at building robust evaluations
- You thrive in dynamic and technically complex environments
- You have a track record of delivering innovative, out-of-the-box solutions to address real-world constraints
Benefits
- Competitive salary and equity (stock-options)
- Health insurance
- Transportation allowance
- Sport allowance
- Meal vouchers
- Generous parental leave policy
- Visa sponsorship
Work Arrangement
Remote (Worldwide) — France, USA, UK, Germany, Singapore
Additional Information
- Visa sponsorship
Visa sponsorship available