Informazioni sul ruolo

The role involves building and managing automated systems to extract public procurement data from various European government sources, maintaining data quality, and improving system reliability and performance over time.

Responsabilità

Develop and maintain web scraping pipelines for European public procurement portals
Ensure compliance with legal and ethical data collection standards
Optimize data extraction processes for speed and reliability
Monitor system performance and troubleshoot data inconsistencies
Collaborate with data engineers to integrate scraped data into databases
Adapt scrapers to handle website structure changes
Implement anti-detection techniques to avoid IP blocking
Maintain documentation for all scraping workflows
Scale infrastructure to support increasing data volume
Work with minimal supervision in an autonomous environment

Preferenziali

Experience with procurement or government data systems
Background in data normalization and cleaning
Knowledge of EU language patterns and regional variations
Prior work with large-scale data monitoring
Familiarity with containerization and orchestration tools

Retribuzione

Competitive salary with performance-based bonuses

Modalità di lavoro

Fully remote with flexible hours

Team

Small, agile team focused on data extraction and analysis

Technology Stack

We primarily use Python with libraries such as Scrapy, Selenium, and Playwright
Data is stored in PostgreSQL and processed using Pandas
Infrastructure runs on AWS with Docker and Kubernetes for orchestration
Monitoring is handled through Prometheus and Grafana

Data Challenges

We deal with highly variable website formats across 30+ European jurisdictions
Many sites use JavaScript-heavy frontends requiring headless browser solutions
Frequent layout changes require resilient selector strategies
Some portals implement CAPTCHAs or login walls

Not applicable — fully remote role open globally

Nesso Labs is hiring a Web Scraping Engineer — European Public Procurement