BJAK is looking for an MLOps Engineer focused on building and scaling impactful AI solutions. In this role, you will run and optimize state-of-the-art open-source models, ensuring they are safe, trustworthy, and performant at scale. You will collaborate closely with cross-functional teams across product, engineering, operations, infrastructure, and data.
What You'll Do
- Run and manage open-source models efficiently, optimizing for cost and reliability.
- Ensure high performance and stability across GPU, CPU, and memory resources.
- Monitor and troubleshoot model inference to maintain low latency and high throughput.
- Collaborate with engineers to implement scalable and reliable model serving solutions.
What We're Looking For
- Experience with model serving platforms such as vLLM or HuggingFace TGI.
- Proficiency in GPU orchestration using tools like Kubernetes, Ray, Modal, RunPod, or LambdaLabs.
- Ability to monitor latency, costs, and scale systems efficiently with traffic demands.
- Experience setting up inference endpoints for backend engineers.
Technical Stack
- vLLM, HuggingFace TGI
- Kubernetes, Ray, Modal, RunPod, LambdaLabs
Team & Environment
You will work in a flat structure, collaborating closely with regional teams across product, engineering, operations, infrastructure, and data.
Benefits & Compensation
- Health, dental & vision insurance.
- Global travel insurance (for you & your dependents).
- Unlimited, flexible time off.
- Housing rental subsidies.
- Quality company cafeteria.
- Overtime meals.
Work Mode
This is a hybrid role. It is a global position, with the company headquarters located in Malaysia.
BJAK values speed, clarity, and relentless ownership. Our high-density, high-performance team is focused on high-quality work and global impact.





