About the Role
You will lead the development and deployment of a new, self-hosted data lake platform, building robust data infrastructure to support analytics, storage, and processing at scale.
Responsibilities
- Architect and deploy a scalable data lake solution on self-hosted infrastructure
- Design data ingestion pipelines for structured and unstructured sources
- Ensure data reliability, consistency, and accessibility across systems
- Optimize storage and query performance for large-scale datasets
- Implement data partitioning, indexing, and lifecycle management
- Integrate security controls and access policies for data governance
- Collaborate with data scientists and analysts to understand requirements
- Monitor system performance and troubleshoot infrastructure issues
- Automate deployment and configuration using infrastructure-as-code tools
- Maintain documentation for architecture, processes, and configurations
- Evaluate and integrate open-source data technologies
- Support disaster recovery and backup strategies
- Ensure compliance with data privacy and regulatory standards
- Scale infrastructure to meet growing data volume demands
- Coordinate with cross-functional teams on integration needs
- Implement monitoring and alerting for data pipeline health
- Design for fault tolerance and high availability
- Contribute to capacity planning and resource forecasting
- Stay current with advancements in data storage and processing
- Promote best practices in data engineering and infrastructure design
Nice to Have
- Experience with Apache Hadoop or similar frameworks
- Knowledge of object storage systems like MinIO or Ceph
- Familiarity with data cataloging and metadata management
- Experience in gaming or high-throughput data environments
- Contributions to open-source data projects
- Understanding of GDPR or similar data regulations
- Background in site reliability engineering
- Certifications in cloud or data infrastructure
- Exposure to real-time data processing systems
- Prior work in self-hosted, high-availability environments
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid remote work available
Team
Collaborative engineering team focused on scalable data systems
Why This Role Matters
This position is central to establishing a modern data foundation for the organization. You will shape how data is stored, accessed, and used across teams, directly influencing analytics capabilities and long-term scalability.
Tech Stack Highlights
We use open-source data technologies, run on self-hosted infrastructure. Tools include Apache Spark, Airflow, Prometheus, and Terraform, with storage built on scalable object storage solutions.
Available for qualified candidates