Quora is looking for a Senior Software Engineer - Machine Learning Platform to build and maintain our company-wide machine learning infrastructure. You'll solve technical problems at the intersection of Machine Learning, Distributed Systems, and High Performance Computing, with a primary focus on ML infrastructure (80%) and some model deployment (20%).
What You'll Do
- Design, develop, and maintain the core infrastructure that powers Quora's machine learning platform, ensuring high availability, scalability, and performance.
- Build scalable and reliable distributed systems for serving machine learning models.
- Optimize infrastructure performance across the ML platform, identifying and resolving bottlenecks for large-scale machine learning workloads.
- Collaborate with machine learning engineers to understand their infrastructure needs and provide solutions that enable efficient model building and deployment.
- Contribute to the design and implementation of our next-generation machine learning infrastructure, focusing on scalability, reliability, and cost-effectiveness.
- Develop services on top of open source technologies like Kubernetes, Tensorflow, and PyTorch.
- Own business-critical infrastructure, help resolve production issues, and participate in the team-wide on-call rotation.
- Collaborate with ML engineers who use the platform to help them be more impactful.
What We're Looking For
- Availability for meetings and impromptu communication during Quora's 'coordination hours' (Mon-Fri: 9am-3pm Pacific Time).
- Experience with building and owning end-to-end machine learning or data science-related systems.
- Experience instrumenting ML workloads for performance monitoring and efficiency.
- Experience with high performance, large scaled distributed systems.
- 4+ years of industry experience in Machine Learning, Infrastructure or related fields.
- 4+ years of experience writing production code in Python, C++, or similar language.
- BS or MS in Computer Science, Engineering or a related technical field.
Nice to Have
- Strong communication and inter-personal skills; experience working with ML teams is a plus.
- Experience working with Kubernetes, Docker, Terraform, or other forms of containerized infrastructure.
- Hands-on experience with AWS technologies like EC2, EBS, S3, EKS.
Technical Stack
- Kubernetes
- Tensorflow
- PyTorch
- Docker
- Terraform
- AWS EC2, EBS, S3, EKS
Team & Environment
You will be part of a small team focused on the ML development platform.
Benefits & Compensation
- Medical, dental, and vision coverage
- Equity refreshers
- Remote work reimbursement
- Paid time off
- Employee assistance programs
- Compensation:
- US: $155,656 - $225,160 USD
- Canada (Toronto/Vancouver): $199,399 - $230,748 CAD
- Canada (other): $186,105 - $215,365 CAD
- Equity included
Work Mode
This is a remote position open to candidates in multiple countries around the world.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.




