BJAK is looking for an MLOps Engineer to build and scale impactful AI solutions. You will focus on efficiently running and managing open-source models, ensuring high performance and stability, and implementing scalable model serving solutions. This is a global position with a hybrid work arrangement.
What You'll Do
- Run and manage open-source models efficiently, optimizing for cost and reliability
- Ensure high performance and stability across GPU, CPU, and memory resources
- Monitor and troubleshoot model inference to maintain low latency and high throughput
- Collaborate with engineers to implement scalable and reliable model serving solutions
What We're Looking For
- Experience with model serving platforms such as vLLM or HuggingFace TGI
- Proficiency in GPU orchestration using tools like Kubernetes, Ray, Modal, RunPod, LambdaLabs
- Ability to monitor latency, costs, and scale systems efficiently with traffic demands
- Experience setting up inference endpoints for backend engineers
Technical Stack
- vLLM, HuggingFace TGI, Kubernetes, Ray, Modal, RunPod, LambdaLabs
Team & Environment
You'll join a flat structure and work closely with regional teams across product, engineering, operations, infrastructure and data.
Benefits & Compensation
- Health, dental & vision insurance
- Global travel insurance (for you & your dependents)
- Unlimited, flexible time off
- Housing rental subsidies
- Quality company cafeteria
- Overtime meals
- Top-of-market compensation and performance-based bonuses
Work Mode
This is a hybrid position open to candidates in Malaysia, Thailand, Taiwan, and Japan.
BJAK is a dense, high-performance team focused on high quality work and global impact. We behave like owners and value speed, clarity, and relentless ownership.




