Responsibilities
- Support the creation and upkeep of data pipelines, data ingestion workflows, and transformation processes.
- Develop and manage SQL queries, Python scripts, and Spark-based jobs for data processing and analysis.
- Help diagnose and resolve issues related to pipeline failures, data quality, and operational incidents.
- Collaborate with senior engineers on schema mappings, transformation logic, and data validation rules.
- Ensure data sets comply with defined schemas, data contracts, and quality benchmarks.
- Assist in managing metadata, documenting datasets, and tracking data lineage.
- Maintain data classification labels in alignment with organizational policies.
- Automate routine data management and operational tasks to enhance efficiency and consistency.
- Support monitoring, alerting, and incident response for data workflows and pipelines.
- Participate in testing, including unit testing, transformation validation, and data quality verification.
- Adhere to established coding standards, engineering practices, and team development methodologies.
- Apply security, privacy, and compliance protocols when working with sensitive or regulated data.
- Engage with Data Governance, Security, and Compliance teams as needed.
- Contribute to initiatives that improve data reliability, trustworthiness, and operational performance.
Requirements
- Bachelor's degree in Computer Science, Computer Engineering, Information Systems, Data Science, Software Engineering, or a related field.
- Basic to intermediate proficiency in English.
- Up to two years of experience in Data Engineering, Software Engineering, Data Analytics, or similar domains.
- Working knowledge of SQL and Python programming.
- Understanding of ETL/ELT methodologies and data transformation techniques.
- Familiarity with relational databases and data warehousing principles.
- Basic experience with Spark, Databricks, or distributed data processing systems.
- Experience using Git and standard version control workflows.
- Introductory knowledge of cloud platforms such as AWS, Azure, or Google Cloud.
- Knowledge of automation strategies and scripting to improve operational efficiency.
- Basic understanding of data quality principles and validation methods.
- Awareness of data governance concepts, including metadata management, ownership, stewardship, and documentation.
- Familiarity with data classification levels such as Public, Internal, Confidential, and Restricted.
- Understanding of data lineage and traceability practices.
- Knowledge of security best practices, including access controls, secrets management, and least-privilege access.
- Strong analytical thinking, problem-solving ability, and communication skills.
- Demonstrated willingness to learn new technologies and work collaboratively across teams.
Nice to Have
- Experience with Databricks, dbt, or comparable data transformation tools.
- Familiarity with CI/CD platforms such as GitHub Actions or Azure DevOps.
- Exposure to APIs, JSON, event-driven architectures, or messaging systems.
- Experience with vulnerability or secret scanning and secure software development practices.
- Knowledge of privacy regulations including LGPD, GDPR, or equivalent frameworks.