Responsibilities

Pipeline Development & Data Integration
Build, maintain, and optimize ETL/ELT pipelines using Python, SQL, or Scala
Orchestrate workflows using Airflow, Prefect, Dagster, or similar orchestration tools
Ingest structured and unstructured data from APIs, SaaS platforms, databases, files, and streaming systems
Develop scalable connectors and automated ingestion workflows
Data Warehousing & Modeling
Manage and optimize cloud data warehouses such as Snowflake, BigQuery, or Redshift
Design scalable schemas using star and snowflake modeling techniques
Implement partitioning, clustering, indexing, and performance optimization strategies
Build clean, analytics-ready datasets for business intelligence and reporting use cases
Data Quality, Governance & Reliability
Implement validation checks, anomaly detection, logging, and monitoring to ensure data integrity
Enforce naming conventions, lineage tracking, and documentation standards using tools such as dbt or Great Expectations
Maintain audit-ready data processes and ensure compliance with GDPR, HIPAA, or industry-specific requirements
Monitor pipeline health and proactively resolve failures or inconsistencies
Streaming & Real-Time Data Processing
Build and manage real-time data pipelines using Kafka, Kinesis, Pub/Sub, or similar platforms
Support low-latency ingestion and event-driven architectures for time-sensitive applications
Monitor streaming infrastructure and optimize throughput and reliability
Collaboration & Analytics Enablement
Partner closely with analysts, data scientists, and business stakeholders to deliver reliable datasets
Support dashboard and reporting initiatives across Tableau, Looker, or Power BI
Translate business requirements into scalable data solutions and models
Maintain clear technical documentation for pipelines, schemas, and workflows
Infrastructure, DevOps & Automation
Containerize data services using Docker and manage deployments through Kubernetes when applicable
Automate deployments using CI/CD pipelines such as GitHub Actions, Jenkins, or GitLab CI
Manage cloud infrastructure using Terraform, CloudFormation, or similar Infrastructure-as-Code tools
Continuously optimize performance, scalability, reliability, and cloud costs

Requirements

3+ years of experience in Data Engineering, Back-End Engineering, or Data Infrastructure roles
Strong proficiency in Python and SQL
Experience with at least one modern data warehouse (Snowflake, Redshift, BigQuery)
Hands-on experience with orchestration tools such as Airflow or Prefect
Strong understanding of ETL/ELT pipelines, data modeling, and data transformation workflows
Familiarity with cloud platforms such as AWS, GCP, or Azure

Nice to Have

Experience with dbt for data modeling and transformation management
Streaming and event-driven data pipeline experience (Kafka, Kinesis, Pub/Sub)
Experience with cloud-native data services such as AWS Glue, GCP Dataflow, or Azure Data Factory
Familiarity with Docker, Kubernetes, Terraform, or CI/CD workflows
Background in regulated industries such as healthcare, fintech, or enterprise SaaS
Experience optimizing warehouse costs and query performance at scale

Additional Information

U.S. client business hours (with flexibility for pipeline monitoring, deployments, and data refresh cycles)

Pavago is hiring a Data Engineer

Responsibilities

Requirements

Nice to Have

Additional Information