Responsibilities
- Develop and sustain high-performance services that ingest and process metrics, logs, and hardware telemetry from large-scale compute clusters
- Design and implement internal web applications, tools, and APIs for configuring data pipelines, monitoring system health, and enabling AI workflows
- Help evolve the Data Platform by aligning architecture with current industry standards while meeting specific organizational requirements
- Oversee service deployment and lifecycle management using Kubernetes and cloud infrastructure
- Collaborate with stakeholders to optimize data models, detect performance delays, and implement efficient storage solutions to lower operational expenses
- Maintain platform reliability through rigorous on-call protocols and proactive monitoring of interdependent data systems


