ATPCO is hiring a Principal Data Engineer to build and optimize data pipelines, manage storage and processing systems, and ensure the availability, scalability, and reliability of our data platform. In this role, you will collaborate closely with cross-functional teams to understand data requirements and deliver efficient, high-quality solutions.
What You'll Do
- Partner with data scientists, analysts, and stakeholders to translate business and ML/AI use cases into scalable data architectures, including designing modern schemas, data models, and pipelines.
- Design, develop, and maintain scalable and efficient data pipelines and ETL processes to ingest, process, and transform large volumes of data from various sources.
- Build and optimize data storage and processing systems, including data warehouses, data lakes, and big data platforms, using AWS services.
- Implement and manage real-time data streaming architectures using AWS services like Amazon Kinesis or Apache Kafka.
- Ensure solutions facilitate secure, efficient, and real-time data analysis and reporting, leveraging infrastructure-as-code and best practices for automation, monitoring, cost optimization, and compliance.
- Perform data profiling, data cleansing, and data transformation tasks to prepare data for analysis and reporting.
- Implement data security and privacy measures to protect sensitive data using AWS security services.
- Design and implement data architectures following Data Mesh principles within the AWS environment.
- Provide technical guidance and mentorship to junior data engineers, reviewing their work and ensuring adherence to best practices.
What We're Looking For
- Strong programming skills in languages like Python, Java, or Scala, with experience in data manipulation and transformation frameworks.
- Proven experience as a data engineer, with experience in designing and building large-scale data processing systems.
- Strong understanding of data modeling concepts and data management principles.
- In-depth knowledge of SQL and experience working with relational and non-relational databases.
- Knowledge of Data Mesh principles and experience designing and implementing data architectures following these concepts within the AWS ecosystem.
- Experience with real-time data streaming architectures using AWS services like Amazon Kinesis or Apache Kafka.
- Familiarity with AWS cloud services, such as AWS Sagemaker Unified Studio, AWS Glue, AWS Lambda, AWS EMR, AWS S3, Amazon Redshift, and their data-related features and functionalities.
- Familiarity with AWS security services and features for data security and privacy.
- Bachelor's or Master's degree in Computer Science, Information Systems, or a related field.
Nice to Have
- Experience designing, implementing, and managing scalable machine learning pipelines and MLOps frameworks for production AI/ML solutions in cloud environments (e.g., AWS, Azure, GCP), including model deployment, monitoring, and operational automation.
Technical Stack
- Languages: Python, Java, Scala, SQL
- AWS Services: AWS Sagemaker Unified Studio, AWS Glue, AWS Lambda, AWS EMR, AWS S3, Amazon Redshift, Amazon Kinesis
- Other Technologies: Apache Kafka
Team & Environment
Collaborate with cross-functional teams, including data scientists, software engineers, and business stakeholders.
Benefits & Compensation
- Compensation Range: $145,290 – $185,000
- Remote-First Culture
- “Leave Your Way” PTO
- 401(k) with Generous Employer Match
- Comprehensive Benefits (medical, dental, vision, & mental health)
- Global Tuition and Gym Reimbursement
- Standby Flight Program
Work Mode
This is a global position.
We consider qualified applicants for employment without regard to race, gender, age, color, religion, national origin, citizenship status, marital status, disability, sexual orientation, protected military/veteran status, gender identity or expression, genetic information, marital status, medical condition, or any other legally protected factor.




