Honeywell is looking for a Lead Site Reliability Engineer to join our Aerospace Technologies team. You will serve as a subject matter expert, ensuring the reliability, availability, and performance of our systems and services. You'll collaborate closely with development and operations teams to implement best practices in reliability engineering, automation, and monitoring, driving key infrastructure improvements.
What You'll Do
- Define and manage service SLOs/SLIs, track error budgets, and drive reliability roadmaps.
- Proactively identify reliability bottlenecks, lead remediation, and preventative actions.
- Establish CI/CD best practices and standards across the organization.
- Implement and scale metrics, logs, and traces across services.
- Build actionable dashboards and alerts with noise reduction and runbooks for on-call.
- Own on-call rotations, triage, and coordination; drive post-incident reviews and blameless RCA.
- Automate rollback/roll-forward, health checks, and verification steps.
- Conduct load and resilience testing; manage capacity planning and cost optimization.
- Tune databases, queues, and network settings for throughput and latency.
- Reduce toil with automation and self-service tooling; standardize deployment and recovery procedures.
- Build reliability guardrails such as chaos experiments, circuit breakers, and rate limiting.
- Operate and harden Kubernetes clusters, container runtimes, and service meshes.
- Manage infrastructure using Infrastructure as Code (IaC), secrets management, and policy-as-code.
- Implement DevSecOps practices: vulnerability management, dependency scanning, and IAM hardening.
- Partner with developers, QA, and product on design reviews, release strategies, and production readiness.
- Document standards and provide enablement sessions to elevate reliability practices.
- Create comprehensive documentation and self-service guides.
- Administer, maintain, and optimize relational and/or NoSQL databases.
- Collaborate with security teams to enforce database access controls.
What We're Looking For
- Bachelor’s degree from an accredited institution in a technical discipline such as science, technology, engineering, or mathematics.
- 4–8+ years in SRE, Platform, DevOps, or Operations roles with ownership of production systems at scale.
- Hands-on experience with AWS, Azure, and/or GCP, with a strong grasp of managed services trade-offs.
- Experience with Docker and Kubernetes, Helm/Kustomize, and service mesh familiarity.
- Proficiency in Python, Go, and Bash for automation, tooling, and APIs.
- Strong SQL skills and deep experience with at least one major database.
- Deep knowledge of database internals, replication, and indexing.
- Understanding of networking fundamentals such as DNS, load balancing, and TCP/IP.
- Expertise in designing and developing reusable CI/CD pipeline templates.
- Proficiency with at least two CI/CD platforms.
- Strong experience with Docker and Kubernetes.
- Infrastructure as Code skills with Terraform, ARM templates, or CloudFormation.
- Cloud platform expertise in Azure, AWS, or GCP.
- Experience troubleshooting build and deployment issues across multiple technology stacks.
- Strong Git and version control workflow knowledge.
- Experience with automated testing frameworks.
- Scripting skills in PowerShell, Bash, or Python.
- Must be a U.S. Person as defined by export control regulations.
Nice to Have
- Advanced degree in Computer Science, Engineering, or a related field.
- Experience with additional programming languages like Java, Node.js, or Go.
- Knowledge of frontend frameworks such as React, Angular, or Vue.js.
- GitOps implementation experience with ArgoCD or Flux.
- Service mesh technologies like Istio or Linkerd.
- Advanced deployment strategies including blue-green, canary, and feature flags.
- Database CI/CD and migration automation experience.
- Security scanning tools integration.
- Monitoring and observability tools expertise.
- Configuration management with Ansible, Chef, or Puppet.
- Multi-cloud or hybrid cloud deployment experience.
- Experience building internal developer platforms.
- Cloud or Kubernetes certifications.
Technical Stack
- Azure, Docker, AKS/EKS, Helm, Istio
- OpenTelemetry, Prometheus/Grafana, Azure Monitor/Log Analytics, Dynatrace, Elastic
- GitHub Actions, Azure DevOps Pipelines
- Terraform/Terragrunt, Bicep, Vault/Azure Key Vault, SSM, Dependabot, Cosign, OPA
Team & Environment
You will report to an SRE Engineering Manager. Our culture empowers leaders to develop and support their teams, driving strong performance and fostering an inclusive environment.
Benefits & Compensation
- Employer subsidized Medical, Dental, Vision, and Life Insurance
- Short-Term and Long-Term Disability
- 401(k) match
- Flexible Spending Accounts and Health Savings Accounts
- EAP
- Educational Assistance
- Parental Leave
- Paid Time Off for vacation, personal business, sick time, and parental leave
- 12 Paid Holidays
Work Mode
This role follows a hybrid work model and is based in Phoenix, AZ.
Honeywell is an equal opportunity employer.





