Remote (Global) Full-time

People Data Labs is hiring a Data Engineer

About the Role

People Data Labs is looking for a Data Engineer to build the core infrastructure for ingesting, transforming, and loading exponentially increasing data volumes. You will architect systems to solve complex data problems at massive scale within our data engineering team.

What You'll Do

  • Build infrastructure for ingestion, transformation, and loading of an exponentially increasing volume of data from a variety of sources using Spark, SQL, AWS, and Databricks
  • Build an organic entity resolution framework capable of correctly merging hundreds of billions of individual entities into clean, consumable datasets
  • Develop CI/CD pipelines and anomaly detection systems capable of continuously improving the quality of data we're pushing into production
  • Dream up solutions to largely novel data engineering and data science problems

What We're Looking For

  • 4-6+ years of industry experience with clear examples of strategic technical problem-solving and implementation
  • Strong software development fundamentals
  • Experience with Python
  • Expertise with Apache Spark (Java, Scala, and/or Python-based)
  • Experience with SQL
  • Experience building scalable data processing systems (e.g., cleaning, transformation) from the ground up
  • Experience using developer-oriented data pipeline and workflow orchestration (e.g., Airflow (preferred), dbt, dagster or similar)
  • Knowledge of modern data design and storage patterns (e.g., incremental updating, partitioning and segmentation, rebuilds and backfills)
  • Experience working in Databricks (including delta live tables, data lakehouse patterns, etc.)
  • Experience with cloud computing services (AWS (preferred), GCP, Azure or similar)
  • Experience with data warehousing (e.g., Databricks, Snowflake, Redshift, BigQuery, or similar)
  • Understanding of modern data storage formats and tools (e.g., parquet, ORC, Avro, Delta Lake)
  • Balance high ownership and autonomy with a strong ability to collaborate
  • Work effectively remotely (able to be proactive about managing blockers, proactive on reaching out and asking questions, and participating in team activities)
  • Demonstrate strong written communication skills on Slack/Chat and in documents
  • Exhibit experience in writing data design docs (pipeline design, dataflow, schema design)
  • Scope and breakdown projects, communicate and collaborate progress and blockers effectively with your manager, team, and stakeholders

Nice to Have

  • Degree in a quantitative discipline such as computer science, mathematics, statistics, or engineering
  • Experience working with entity data (entity resolution / record linkage)
  • Experience working with data acquisition / data integration
  • Expertise with Python and the Python data stack (e.g., numpy, pandas)
  • Experience with streaming platforms (e.g., Kafka)
  • Experience evaluating data quality and maintaining consistently high data standards across new feature releases (e.g., consistency, accuracy, validity, completeness)

Technical Stack

  • Spark, SQL, AWS, Databricks, Python
  • Apache Spark, Airflow, dbt, dagster
  • Delta Lake, Kafka, numpy, pandas

Team & Environment

You will join our Data Engineering Team, collaborating with colleagues who balance extreme ownership with a 'one-team, one-dream' mindset.

Benefits & Compensation

  • Compensation range: $160-180K
  • Stock
  • Competitive Salaries
  • Unlimited paid time off
  • Medical, dental, & vision insurance
  • Health, fitness, and office stipends
  • The permanent ability to work wherever and however you want

Work Mode

This is a fully remote position.

People Data Labs does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.

Required Skills
SparkSQLAWSDatabricksPythonApache SparkAirflowdbtdagsterDelta LakeData EngineeringData InfrastructureETLData ModelingScalable Systems
Visa expiring soon?

Extend or switch without leaving Thailand

Running out of time on your current visa? SVBL identifies your best option — extension, category switch, or long-term visa — and handles the entire process.

Visa extensions & category switches
LTR & DTV visa applications
90-day reporting managed
Overstay prevention
Check your options
Prevent overstay issues
About company
People Data Labs

People Data Labs (PDL) is the provider of people and company data. We do the heavy lifting of data collection and standardization so our customers can focus on building and scaling innovative, compliant data solutions. Our sole focus is on building the best data available by integrating thousands of compliantly sourced datasets into a single, developer-friendly source of truth.

Visit website
Job Details
Category data
Posted 4 months ago