About the Role
The role involves leading the architecture and deployment of scalable systems, ensuring high availability and performance across distributed environments.
Responsibilities
- Design and manage large-scale infrastructure systems
- Collaborate with development teams to optimize system performance
- Implement automation tools for deployment and monitoring
- Ensure system reliability, security, and scalability
- Troubleshoot complex technical issues across environments
- Lead incident response and post-mortem analysis
- Develop and maintain technical documentation
- Evaluate and integrate new technologies
- Support continuous integration and delivery pipelines
- Work closely with security teams to enforce best practices
- Monitor system health and performance metrics
- Drive improvements in system architecture
- Participate in capacity planning and forecasting
- Mentor junior engineers and share expertise
- Ensure compliance with operational standards
- Optimize cloud resource utilization
- Manage configuration management systems
- Support disaster recovery planning
- Coordinate with product teams on technical requirements
- Maintain system uptime and minimize service disruptions
Nice to Have
- Advanced degree in engineering or computer science
- Experience with Kubernetes in production environments
- Knowledge of service mesh technologies
- Background in site reliability engineering
- Familiarity with database administration
- Experience with multi-region cloud deployments
- Contributions to open-source projects
- Certifications in cloud or systems engineering
- Experience in agile development teams
- Leadership in technical architecture decisions
Compensation
Competitive salary with equity and comprehensive benefits package
Work Arrangement
Hybrid work model with flexibility for remote and on-site collaboration
Team
Collaborative engineering team focused on innovation, system reliability, and rapid development cycles
Technology Stack
- Primary use of AWS for cloud infrastructure
- Container orchestration via Kubernetes
- Automation through Terraform and Ansible
- Monitoring with Prometheus and Grafana
- Logging pipeline built on Fluentd and Elasticsearch
Onboarding Process
- Structured onboarding program lasting four weeks
- Pairing with a senior engineer for first month
- Access to internal knowledge base and training modules
- Weekly check-ins with team lead during ramp-up
- Introduction to key stakeholders and team members
Available for qualified candidates