About the Role
This position is responsible for maintaining high system availability and performance while working closely with customers and engineering teams to resolve complex technical challenges and prevent recurring issues.
Responsibilities
- Diagnose and resolve technical issues impacting customer systems
- Collaborate with support and engineering teams to escalate and resolve incidents
- Monitor system performance and proactively identify reliability risks
- Document root causes and implement corrective actions
- Support post-incident reviews and follow-up improvements
- Work directly with customers to understand operational challenges
- Improve system resilience through automation and tooling
- Contribute to on-call incident response rotations
- Analyze logs and metrics to detect patterns in system failures
- Assist in refining incident management processes
- Drive initiatives to reduce repeat incidents
- Provide technical guidance during service outages
- Ensure customer environments meet best practices for reliability
- Develop runbooks and operational procedures
- Support integration of monitoring solutions
- Evaluate system design for fault tolerance
- Assist in capacity planning and performance tuning
- Communicate technical status updates to stakeholders
- Participate in system architecture reviews
- Promote observability across platforms
- Assist in testing disaster recovery procedures
- Track reliability metrics and report on improvements
- Advocate for customer needs within engineering teams
- Maintain up-to-date knowledge of IoT networking
- Ensure compliance with operational standards
Nice to Have
- Experience with large-scale distributed systems
- Background in telecommunications or IoT infrastructure
- Knowledge of Kubernetes or Docker
- Certifications in cloud or systems administration
- Experience with CI/CD pipelines
- Familiarity with Terraform or infrastructure as code
- Previous work in a customer-facing technical role
- Understanding of data privacy regulations
- Experience with time-series databases
- Contributions to open-source projects
Compensation
Competitive salary with benefits
Work Arrangement
Fully remote
Team
Cross-functional team supporting global IoT connectivity solutions
Why This Role Matters
- This position plays a key role in ensuring customers achieve consistent, reliable service from complex IoT systems.
- By bridging technical teams and customer needs, the role directly impacts product trust and long-term success.
What to Expect
- You will work remotely with flexible hours, collaborating across global teams.
- Expect a mix of reactive incident support and proactive system improvements.
- Regular communication with customers and internal engineers is essential.
Not available