Responsibilities
- Design and implement scalable core infrastructure systems
- Automate infrastructure provisioning and management to reduce manual effort
- Establish secure network connectivity across services, VPCs, accounts, and regions
- Diagnose and resolve complex technical issues across distributed systems
- Develop robust, production-ready code, tools, and frameworks
- Work with engineering teams to promote consistent infrastructure practices
Core Areas of Ownership
- Build and operate logging, metrics, and tracing with ELK stack
- Build and operate observability systems using Datadog
- Enhance system reliability, alerting, and incident response
- Define and enforce resource tagging standards
- Implement data classification policies
- Ensure cost transparency and auditability
- Develop self-service infrastructure workflows
- Improve developer experience through automation and tooling
- Contribute to the evolution of internal platform systems