Responsibilities
- Infrastructure Strategy & Architecture: Architect, build, and maintain the core infrastructure for massive, large-scale asynchronous data extraction system.
- Advanced Resilience Engineering: Design, implement, and continuously optimize sophisticated anti-blocking strategies, IP rotation, fingerprint management, and anti-bot bypass techniques to ensure high reliability and consistent uptime against modern web blocking.
- Operational Excellence & Monitoring: Implement robust monitoring, alerting, and logging systems to proactively debug, troubleshoot, and continuously improve scraper performance, reliability, and data quality across the platform.
- Core Development: Develop, test, and deploy highly robust and fault-tolerant web scraping components using advanced Python tools (Scrapy, Playwright, Selenium, Requests, etc.).
- Integration & Pipelines: Manage and automate high-volume data ingestion pipelines and seamless integrations with internal and external REST APIs.
- DevOps & Automation: Drive DevOps best practices, including managing infrastructure with Docker, Nomad knowledge (a plus), CI/CD pipelines
- Collaboration & Mentorship: Partner with other engineers to set standards, enhance core infrastructure tooling, and mentor junior team members.
Requirements
- Proven, hands-on professional experience in high-volume web scraping and data extraction using Python.
- Deep, practical knowledge of anti-bot solutions, including CAPTCHA solving, browser fingerprinting, and effective proxy/IP management strategies.
- Solid understanding of HTML parsing, browser automation techniques, and asynchronous programming.
- Proficiency with leading web scraping frameworks (e.g., Playwright, Scrapy, or Selenium).
- Strong knowledge of REST APIs, HTTP protocols, and effective proxy management.
- Familiarity with both SQL and NoSQL databases for efficient data storage and processing.
- Experience with Docker, Linux environments, and version control (Git).
- Fluent in English (written and spoken).
- Self-driven, pragmatic, and capable of taking full ownership of critical, high-impact infrastructure projects.
Nice to Have
- Experience with advanced async libraries (e.g., asyncio)
- Understanding of data quality validation and pipeline monitoring tools.
Benefits
- Impact & Ownership: A high degree of freedom and the opportunity to have a meaningful, measurable impact on a growing scale-up business.
- Flexibility: A high degree of flexibility – our client is a remote-first company and actively support remote work.
- Growth: A competitive compensation package and dedicated support for your personal & professional development (ongoing training & coaching).
- Team & Atmosphere: A great work atmosphere within a small, talented, and international team.
- Office (Optional): A modern office located on the campus of Wildau Tech University, easily accessible by public transport (just outside Berlin).
Additional Information
- Fluent in English (written and spoken).