About the Role
The role involves owning production systems, troubleshooting complex issues, and working across development and operations to maintain high availability and performance standards.
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid work model with flexible scheduling
Team
Collaborative engineering team focused on scalable systems
Responsibilities
- Monitor and maintain system uptime and performance
- Respond to production incidents with urgency and precision
- Collaborate with development teams to improve code deployability
- Implement automation tools for operational efficiency
- Conduct root cause analysis for critical outages
- Optimize infrastructure for scalability and cost
- Support deployment pipelines and continuous integration
- Enforce best practices in system monitoring and alerting
- Document system configurations and runbooks
- Participate in on-call rotations
Requirements
- Bachelor's degree in computer science or related field
- Minimum of 5 years in production or systems engineering
- Experience with cloud platforms such as AWS or GCP
- Strong scripting skills in Python or similar languages
- Familiarity with containerization and orchestration tools
- Proven ability to debug distributed systems
- Knowledge of networking and Linux internals
- Experience with configuration management tools
- Understanding of CI/CD workflows
- Solid grasp of security and compliance standards
Preferred Qualifications
- Master's degree in a technical discipline
- Experience in high-traffic environments
- Background in site reliability engineering
- Contributions to open-source projects
- Certifications in cloud or systems architecture
What We Offer
- Comprehensive health and wellness benefits
- Retirement savings plan with company contribution
- Paid time off and flexible holidays
- Professional development stipend
- Remote work support allowance
- Inclusive and diverse workplace culture
Available for qualified candidates