About the Role

The role involves building and optimizing data extraction workflows, ensuring data accuracy and reliability, and supporting analytics initiatives through robust pipeline architecture.

Responsibilities

Develop and manage automated web scraping frameworks for diverse online sources
Design scalable ETL pipelines to process unstructured and semi-structured data
Ensure data integrity and consistency across ingestion and transformation stages
Monitor and troubleshoot data workflows for performance and reliability
Collaborate with data analysts and scientists to understand data requirements
Optimize data storage solutions for efficient querying and access
Implement error handling and retry mechanisms in data collection systems
Maintain documentation for data pipelines and source configurations
Evaluate new data sources for integration potential
Apply data validation techniques to ensure quality standards
Support compliance with website terms of service and data usage policies
Work with security teams to ensure ethical data collection practices
Improve data processing efficiency through automation and tooling
Respond to data quality incidents with root cause analysis
Participate in code reviews and system design discussions
Integrate third-party APIs into existing data workflows
Scale infrastructure to handle increasing data volume and velocity
Use version control for pipeline development and deployment
Stay current with changes in website structures and anti-bot measures
Contribute to data governance and metadata management practices

Nice to Have

Master’s degree in a technical discipline
Experience with large-scale distributed data processing tools like Spark
Background in natural language processing or text extraction
Knowledge of browser automation tools such as Puppeteer or Selenium
Experience with proxy rotation and IP management for scraping
Familiarity with CAPTCHA-solving techniques and tools
Contributions to open-source data engineering projects
Published work or projects involving public web data analysis

Compensation

Competitive salary with performance-based bonuses

Work Arrangement

Hybrid remote with office availability in major cities

Team

Collaborative data engineering team within a growing technology division

Technology Stack

Primary languages: Python, SQL
Frameworks: Scrapy, BeautifulSoup, Selenium
Cloud: AWS (S3, EC2, Lambda, CloudWatch)
Orchestration: Apache Airflow
Databases: PostgreSQL, MongoDB
Containerization: Docker, Kubernetes
Monitoring: Prometheus, Grafana

Data Ethics Policy

All data collection must comply with website terms of service
Respect for robots.txt and crawl-delay directives is mandatory
No personal data collection without explicit consent
Regular audits of data sources for compliance
Transparency in data usage and retention practices

Available for qualified candidates

TechBiz Global is hiring a Data Engineer – Web Scraping & ETL

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

Technology Stack

Data Ethics Policy

Get steady projects, keep your freedom

Similar Jobs

Senior Synapse Engineer (Remote)

AI Scientist - Paris/London - Onsite or Hybrid or Remote

Project Data Analyst

Senior Data Engineer

Senior Computer Vision /Machine Learning Engineer II (Indianapolis)

Senior Data Scientist (USA / Israel)