Informazioni sul ruolo
The role involves building and managing automated systems to extract public procurement data from various European government sources, maintaining data quality, and improving system reliability and performance over time.
Responsabilità
- Develop and maintain web scraping pipelines for European public procurement portals
- Ensure compliance with legal and ethical data collection standards
- Optimize data extraction processes for speed and reliability
- Monitor system performance and troubleshoot data inconsistencies
- Collaborate with data engineers to integrate scraped data into databases
- Adapt scrapers to handle website structure changes
- Implement anti-detection techniques to avoid IP blocking
- Maintain documentation for all scraping workflows
- Scale infrastructure to support increasing data volume
- Work with minimal supervision in an autonomous environment
Preferenziali
- Experience with procurement or government data systems
- Background in data normalization and cleaning
- Knowledge of EU language patterns and regional variations
- Prior work with large-scale data monitoring
- Familiarity with containerization and orchestration tools
Retribuzione
Competitive salary with performance-based bonuses
Modalità di lavoro
Fully remote with flexible hours
Team
Small, agile team focused on data extraction and analysis
Technology Stack
- We primarily use Python with libraries such as Scrapy, Selenium, and Playwright
- Data is stored in PostgreSQL and processed using Pandas
- Infrastructure runs on AWS with Docker and Kubernetes for orchestration
- Monitoring is handled through Prometheus and Grafana
Data Challenges
- We deal with highly variable website formats across 30+ European jurisdictions
- Many sites use JavaScript-heavy frontends requiring headless browser solutions
- Frequent layout changes require resilient selector strategies
- Some portals implement CAPTCHAs or login walls
Not applicable — fully remote role open globally