Responsibilities
- Design and manage ETL workflows using Python (PySpark) in Azure Synapse Analytics Notebooks or Pipelines to support reliable data ingestion, transformation, and loading.
- Leverage data warehousing knowledge, including star schema design, fact and dimension tables, to build optimized storage models in a Massively Parallel Processing (MPP) SQL Pool.
- Pull data from diverse sources such as REST APIs, SQL database tables, and CSV files.
- Apply in-depth experience with Azure Synapse Analytics to develop and tune data notebooks and pipelines for high performance and scalability.
- Support the adoption of Data Fabric components like data lakes, lakehouses, delta lakes, and data cataloging to improve data organization and accessibility.
- Partner with data architects to develop data models and schemas that reflect business needs.
- Establish data validation rules and quality controls to ensure accuracy and consistency across datasets.
- Detect and fix performance issues in ETL processes to meet service level agreements.
- Monitor ETL job execution, troubleshoot failures, and apply fixes to maintain pipeline stability.
- Keep detailed records of ETL logic, data movement, and transformation rules.
- Collaborate with teams across functions to define data needs and assist with data-driven projects.
- Enforce data security practices and adhere to governance and privacy regulations.
Work Arrangement
Remote, Latin America