About the Role
The role involves maintaining system stability, improving scalability, and driving operational excellence by collaborating with engineering teams to implement resilient and efficient solutions.
Responsibilities
- Design, implement, and manage scalable and fault-tolerant systems
- Monitor production environments to detect and resolve performance issues
- Develop automation tools to streamline deployment and operational workflows
- Respond to and resolve critical incidents with minimal downtime
- Collaborate with software engineers to improve application reliability
- Establish and enforce best practices for system observability
- Optimize system performance and resource utilization
- Support CI/CD pipelines and infrastructure as code practices
- Lead root cause analysis for major incidents
- Implement disaster recovery and business continuity strategies
- Ensure compliance with security and regulatory standards
- Evaluate and integrate new technologies to enhance reliability
- Document system architecture and operational procedures
- Mentor junior engineers and share operational knowledge
- Participate in on-call rotations for incident response
Nice to Have
- Master’s degree in a technical field
- Experience in regulated industries such as healthcare or pharmaceuticals
- Knowledge of compliance frameworks like HIPAA or SOC 2
- Familiarity with service mesh technologies
- Experience with large-scale data processing systems
- Contributions to open-source projects
- Certifications in cloud or systems engineering
Compensation
Competitive salary and comprehensive benefits package including equity and bonuses
Work Arrangement
Hybrid work model with flexibility for remote and office-based work
Team
Part of the infrastructure and operations team focused on reliability, scalability, and system performance
Why This Role Matters
This position plays a critical role in ensuring the stability and performance of core services used by millions. The engineer will directly influence system uptime, security, and scalability while working with cutting-edge technologies.
Our Technology Stack
We use AWS for cloud infrastructure, Kubernetes for orchestration, Prometheus and Grafana for monitoring, and GitLab for CI/CD. Our systems are built with microservices architecture and emphasize automation and observability.
This position may offer visa sponsorship for qualified candidates