What You ll Do
Data Pipeline Development & Management:
Design, implement, and maintain scalable and reliable data pipelines to ingest, transform and load structured, unstructured, and real‑time data feeds from diverse sources.
Manage data pipelines for analytics and operational use ensuring data integrity, timeliness, and accuracy across systems.
Implement data quality tools and validation frameworks within transformation pipelines.
Data Processing & OptimizationBuild efficient high‑performance systems by leveraging techniques such as data denormalization, partitioning, caching and parallel processing.
Develop stream‑processing applications using Apache Kafka and optimize performance for large‑scale datasets.
Enable data enrichment and correlation across primary, secondary and tertiary sources.
Cloud Infrastructure and Platform EngineeringDevelop and deploy data workflows on AWS or GCP using services such as S3, Redshift, Pub/Sub or Big Query.
Containerize data processing tasks using Docker, orchestrate with Kubernetes and ensure production‑grade deployment.
Collaborate with platform teams to ensure scalability, resilience and observability of data pipelines.
Database EngineeringWrite and optimize complex SQL queries on relational (Redshift, Postgre
SQL) and No
SQL (Mongo
DB) databases.
Work with the ELK stack (Elasticsearch, Logstash, Kibana) for search, logging and real‑time analytics.
Support Lakehouse architectures and hybrid data storage models for unified access and processing.
Data Governance & StewardshipImplement robust data governance, access control and stewardship policies aligned with compliance and security best practices.
Establish metadata management, data lineage and auditability across pipelines and environments.
Machine Learning & Advanced Analytics EnablementCollaborate with data scientists to prepare and serve features for ML models.
Maintain awareness of ML pipeline integration and ensure data readiness for experimentation and deployment.
Documentation & Continuous ImprovementMaintain thorough documentation including technical specifications, data flow diagrams and operational procedures.
Continuously evaluate and improve the data engineering stack by adopting new technologies and automation strategies.
What You’ll Bring- 8+ years of experience in data engineering within a production environment.
- Advanced knowledge of Python and Linux shell scripting for data manipulation and automation.
- Strong expertise in SQL/No
SQL databases such as Postgre
SQL and Mongo
DB. - Experience building stream processing systems using Apache Kafka.
- Proficiency with Docker and Kubernetes in deploying containerized data workflows.
- Good understanding of cloud services (AWS or Azure).
- Hands‑on experience with the ELK stack (Elasticsearch, Logstash, Kibana) for scalable search and logging.
- Familiarity with AI models supporting data management.
- Experience working with Lakehouse systems, data denormalization, and data labeling practices.
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.
- Demonstrated success in designing, scaling, and operating data systems in cloud‑native and distributed environments.
- Proven ability to work collaboratively with cross‑functional teams including product managers, data scientists, and Dev Ops.
Preferred Experience
- Working knowledge of data quality tools, lineage tracking, and data observability solutions.
- Experience in data correlation, enrichment from external sources, and managing data integrity at scale.
- Understanding of data governance frameworks and enterprise compliance protocols.
- Exposure to CI/CD pipelines for data deployments and infrastructure‑as‑code.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).