Build and manage scalable data infrastructure that powers real-time insights for a US client. In this role, you'll focus on engineering robust streaming pipelines that handle high-volume machine log data, ensuring reliability, performance, and accuracy across distributed systems.
What You'll Do
- Develop and maintain real-time data processing workflows using Spark Streaming or Azure Functions
- Consume and transform machine-generated logs from Kafka topics
- Optimize data transformation logic to support efficient downstream usage
- Push processed results into Redis for low-latency access
- Archive structured data into a data lake for analytics and reporting
- Write clean, maintainable Python code and contribute to data processing frameworks
- Participate in code reviews, CI/CD implementation, and version control workflows
- Work closely with analysts and engineers to refine data requirements and deliver solutions
Requirements
- Proven experience building and managing real-time data pipelines
- Hands-on expertise with Spark Streaming or Azure Functions
- Solid background in processing log data using Kafka
- Strong Python programming skills applied to data transformation
- Experience writing data to Redis and data lake storage systems
- Understanding of software development best practices, including source control and automated pipelines
- Ability to collaborate effectively with distributed teams across functions
Technical Environment
Work with a modern cloud-native stack: Spark Streaming, Azure Functions, Kafka, Redis, Data Lake, Python, and CI/CD tooling.
Work Mode
This is a remote position based in Vietnam. Candidates must be located in Vietnam and available for shift work to align with US client hours.


