Responsibilities
- Manage daily operations of data platforms, including capacity planning, system stability, upgrades, deployments, and disaster recovery to ensure low latency and high uptime.
- Design and oversee ingestion from multiple sources such as exchanges and internal or external systems, including protocol decoding and resilient retry logic.
- Build rule-based and statistical data validation mechanisms to monitor completeness, uniqueness, time consistency, anomalies, and error handling.
- Develop automated processes for data correction, reconciliation, and historical data reprocessing.
- Implement monitoring and alerting systems to maintain reliable, production-ready data assets.
- Design and sustain end-to-end ETL and ELT pipelines with support for scheduling, caching, data partitioning, modeling, schema evolution, and lineage for both batch and streaming workloads.
- Enforce data security through access controls, encryption, audit logging, and data classification to meet regulatory and internal compliance standards, including handling of personally identifiable information.
- Utilize Infrastructure-as-Code, data versioning, automated testing, and CI/CD practices to enhance deployment reliability and reduce manual intervention risks.
- Support the development of GenAI and large language model-driven data applications for enterprise analytics, data reconciliation, and internal efficiency improvements.
- Collaborate with analytics and product teams to integrate and operationalize AI-powered data solutions.
Requirements
- Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related discipline.
- Minimum of five years of professional experience in data engineering, data platform development, or AI/ML systems architecture.
- Extensive experience with cloud-based data platforms such as Snowflake, Databricks, BigQuery, or Redshift.
- Proficient in SQL, workflow orchestration tools like Airflow, streaming technologies such as Kafka, and modern data pipeline design principles.
- Solid grasp of data warehouse lifecycle management and dimensional modeling techniques.
- Proven ability in debugging, performance optimization, and systematic problem resolution.
Nice to Have
- Practical experience developing foundational data structures for BI and supporting GenAI or large language model systems.
- Experience with GitLab and CI/CD pipeline configuration.
- Familiarity with data governance, data lineage tracking, privacy controls, and security frameworks.
Compensation
Competitive salary and benefits package
Work Arrangement
Virtual/remote position
Team
Part of the data platform and AI engineering team supporting insurance-focused data products
Responsibilities
- Manage daily operations of data platforms, including capacity planning, system stability, upgrades, deployments, and disaster recovery to ensure low latency and high uptime.
- Design and oversee ingestion from multiple sources such as exchanges and internal or external systems, including protocol decoding and resilient retry logic.
- Build rule-based and statistical data validation mechanisms to monitor completeness, uniqueness, time consistency, anomalies, and error handling.
- Develop automated processes for data correction, reconciliation, and historical data reprocessing.
- Implement monitoring and alerting systems to maintain reliable, production-ready data assets.
- Design and sustain end-to-end ETL and ELT pipelines with support for scheduling, caching, data partitioning, modeling, schema evolution, and lineage for both batch and streaming workloads.
- Enforce data security through access controls, encryption, audit logging, and data classification to meet regulatory and internal compliance standards, including handling of personally identifiable information.
- Utilize Infrastructure-as-Code, data versioning, automated testing, and CI/CD practices to enhance deployment reliability and reduce manual intervention risks.
- Support the development of GenAI and large language model-driven data applications for enterprise analytics, data reconciliation, and internal efficiency improvements.
- Collaborate with analytics and product teams to integrate and operationalize AI-powered data solutions.
Required
- Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related discipline.
- Minimum of five years of professional experience in data engineering, data platform development, or AI/ML systems architecture.
- Extensive experience with cloud-based data platforms such as Snowflake, Databricks, BigQuery, or Redshift.
- Proficient in SQL, workflow orchestration tools like Airflow, streaming technologies such as Kafka, and modern data pipeline design principles.
- Solid grasp of data warehouse lifecycle management and dimensional modeling techniques.
- Proven ability in debugging, performance optimization, and systematic problem resolution.
Preferred
- Practical experience developing foundational data structures for BI and supporting GenAI or large language model systems.
- Experience with GitLab and CI/CD pipeline configuration.
- Familiarity with data governance, data lineage tracking, privacy controls, and security frameworks.
Not specified