Responsibilities
- Oversee deployment, monitoring, and lifecycle management of thousands of remote edge devices integrated with cloud services.
- Ensure reliable delivery of Over-The-Air updates across a large-scale distributed edge network.
- Configure and operate NATS JetStream with Leaf Nodes to enable robust communication between edge locations and central cloud.
- Implement and maintain distributed tracing and metrics collection using OpenTelemetry for system-wide health monitoring.
- Design fault-tolerant architectures that remain stable during high-volume reconnection events across the device fleet.
- Handle secure management of secrets, certificates, and enforce mTLS for trusted communication between edge nodes and control systems.
- Lead incident response and conduct root cause analysis for system-wide operational disruptions.
- Develop operational workflows that scale efficiently with growing fleet size without increasing maintenance overhead.