Responsibilities
- Design and manage ETL and ELT workflows using Databricks with PySpark and Delta Lake.
- Enhance data pipeline efficiency, cost-effectiveness, and scalability on Google Cloud Platform.
- Develop both batch and real-time data processing systems using Spark Streaming and related tools.
- Build data solutions leveraging BigQuery, Cloud Storage, Dataflow, Cloud Composer, and Vertex AI.
- Follow cloud security standards, including IAM policies, monitoring setups, and cost controls.
- Create and maintain data models such as dimensional schemas and data vault architectures.
- Establish data quality processes, validation checks, and automated testing frameworks.
- Handle data versioning, governance, and lineage tracking using Unity Catalog or GCP Data Catalog.
- Work with diverse teams to convert business needs into technical data designs.
- Offer technical direction and promote engineering best practices across projects.
- Support the creation of documentation, system diagrams, and internal knowledge resources.
