Responsibilities
- Create and maintain data pipelines for batch and streaming data ingestion, transformation, and delivery
- Construct ETL and ELT workflows to consolidate data from varied sources into centralized platforms
- Write transformation logic using Apache Spark, PySpark, SparkSQL, and SQL
- Apply change data capture techniques for real-time and near-real-time data replication
- Develop streaming pipelines to support operational analytics and real-time use cases
- Improve efficiency in pipeline performance, resource usage, and operational costs
- Support a federated pipeline model allowing business units to manage their own data domains
- Help build self-service data infrastructure that simplifies pipeline development for domain teams
- Establish standardized deployment methods for pipelines that support team autonomy
- Assist domain teams in developing data products that are accessible, interoperable, and aligned with enterprise policies
- Enable cross-domain data processing while maintaining consistency through federated governance
- Help define data contracts and standards for seamless data exchange across domains
- Support the alignment of independent domain ownership with enterprise governance needs
- Build reusable pipeline templates and Infrastructure as Code patterns for common data solutions
- Design blueprints for ingestion, transformation, validation, and serving that teams can adapt
- Develop standard approaches for batch, streaming, CDC, and API-based integrations
- Contribute to a library of architectural patterns including medallion architecture and dimensional modeling
- Document best practices and reference designs to guide compliant pipeline development
- Produce starter kits and accelerators to speed up data product delivery
- Write implementation guides and cookbooks that operationalize enterprise standards
- Support business units in adopting templates while allowing domain-specific customization
- Combine data from multiple internal and external sources into unified datasets
- Build reusable integration patterns and connectors for enterprise data systems
- Use Auto Loader, COPY INTO, and similar tools for automated data ingestion
- Develop API-based integrations and file-driven data processing workflows
Work Arrangement
Remote (City/Region)
