About the Role
The role involves proactive monitoring, incident response, and system maintenance to support continuous operations and rapid resolution of technical issues.
Responsibilities
- Monitor network and system performance using centralized tools
- Respond to alerts and initiate incident resolution procedures
- Perform root cause analysis for recurring system issues
- Maintain documentation for configurations and procedures
- Deploy and configure server infrastructure as needed
- Support escalation workflows during major incidents
- Collaborate with engineering teams to improve system resilience
- Implement and verify backup and recovery processes
- Apply security patches and system updates on schedule
- Troubleshoot connectivity and service delivery problems
- Ensure compliance with internal operational standards
- Participate in on-call rotations for after-hours support
- Optimize monitoring dashboards and alert thresholds
- Track and report on system uptime and incident metrics
- Assist in capacity planning for infrastructure growth
Nice to Have
- Certification in CompTIA, CCNA, or equivalent
- Experience with containerization or orchestration tools
- Familiarity with configuration management systems
- Background in high-availability environments
- Exposure to DevOps practices and CI/CD pipelines
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid work model with on-site and remote options
Team
Part of a 24/7 Network Operations Center team ensuring system reliability
On-Call Expectations
- Team members rotate through on-call duties to handle incidents outside business hours
- Response time targets are defined by severity level
- Tools and runbooks are provided to support rapid diagnosis
Growth Opportunities
- Engineers are encouraged to pursue advanced training
- Internal mobility is supported across technical teams
Available for qualified candidates