What You'll Do
Own the development and optimization of the inference infrastructure for on-device AI, ensuring models perform efficiently and consistently across diverse hardware. You'll focus on runtime quality, fine-tuning system behavior for fast startup, low memory pressure, and balanced throughput and latency during extended use.
Work directly with machine learning models using frameworks like llama.cpp, ggml, and ONNX, deploying them to edge environments with a strong emphasis on performance and reliability. Partner with research teams to bridge the gap between experimental models and production-ready implementations, helping refine models for real-world deployment.
Integrate advanced AI capabilities into existing software products, ensuring seamless performance and alignment with user privacy by design.
Requirements
- Strong proficiency in C++ with a focus on systems-level programming and runtime efficiency
- Hands-on experience deploying machine learning models to edge or resource-constrained devices
- Familiarity with inference frameworks such as llama.cpp, ggml, and ONNX
- Excellent written and verbal communication skills in English
- Ability to collaborate across disciplines, especially with research and product teams
Benefits
- Work 100% remotely from anywhere in the world
- Collaborate with a lean, high-impact team at the forefront of fintech innovation
- Contribute to a transparent, globally distributed organization committed to technological empowerment
- Be part of a mission-driven effort advancing blockchain-based financial systems