BJAK is seeking an MLOps Engineer to build and scale impactful AI solutions. In this role, you will fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production, ensuring they are intelligent, safe, trustworthy, and impactful at scale.
What You'll Do
- Run and manage open-source models efficiently, optimizing for cost and reliability.
- Ensure high performance and stability across GPU, CPU, and memory resources.
- Monitor and troubleshoot model inference to maintain low latency and high throughput.
- Collaborate with engineers to implement scalable and reliable model serving solutions.
What We're Looking For
- Experience with model serving platforms such as vLLM or HuggingFace TGI.
- Proficiency in GPU orchestration using tools like Kubernetes, Ray, Modal, RunPod, or LambdaLabs.
- Ability to monitor latency, costs, and scale systems efficiently with traffic demands.
- Experience setting up inference endpoints for backend engineers.
Technical Stack
- vLLM, HuggingFace TGI
- Kubernetes, Ray, Modal, RunPod, LambdaLabs
Team & Environment
Work closely with regional teams across product, engineering, operations, infrastructure and data.
Benefits & Compensation
- Housing rental subsidies.
- Quality company cafeteria.
- Overtime meals.
- Health, dental & vision insurance.
- Global travel insurance (for you & your dependents).
- Unlimited, flexible time off.
Work Mode
This is a hybrid role based in Malaysia (HQ).
BJAK is a high-performance team focused on high quality work and global impact. We behave like owners, value speed, clarity, and relentless ownership, and seek individuals hungry to grow who care deeply about excellence.



