Responsibilities
- Innovate at the edge of efficiency by designing and deploying high-performance agentic systems that leverage Fastino’s optimized model architectures to outperform traditional LLM benchmarks.
- Bridge the gap between research and production by collaborating with engineering teams to turn novel architectural breakthroughs into scalable, low-latency solutions for enterprise customers.
- Drive rapid, iterative prototyping of AI functionalities, refining model performance and task-accuracy based on real-world telemetry to ensure specialized models meet rigorous developer standards.
- Own the stability and throughput of inference pipelines, proactively solving scalability bottlenecks to ensure models deliver consistent, reliable performance under massive operational loads.
- Architect large-scale data and fine-tuning strategies to continuously improve the precision and domain-specific reliability of the Fastino models.
Requirements
- 2+ years of hands-on experience in AI/ML engineering roles
- Demonstrated proficiency with LLMs and a track record of applying AI/ML techniques to solve complex, unstructured problems
- You are comfortable working across the stack from prompt engineering and vector DB tuning to Kubernetes deployment and API design.
Nice to Have
- Experience building microservices that handle high-concurrency agentic workloads.
- Familiarity with GLiNER or other information extraction architectures.
Work Arrangement
Remote (Worldwide)


