Jr. Data Engineer
Listed on 2026-02-16
-
IT/Tech
Data Engineer, Data Analyst
CODOXO IS NOT ABLE TO OFFER SPONSORSHIP OR ACCOMMODATE ANY CANDIDATES THAT ARE CURRENTLY BEING SPONSORED NOW OR IN THE FUTURE
This is for a Full Time Role with Codoxo, NOT C2C
Of the $3.8T we spend on healthcare in the United States annually, about a third of it is estimated to be lost due to waste, fraud and abuse. Codoxo is the premier provider of artificial intelligence-driven solutions and services that help healthcare companies and agencies proactively detect and reduce risks from fraud, waste, and abuse and ensure payment integrity. Codoxo helps clients manage costs across network management, clinical care, provider coding and billing, payment integrity, and special investigation units.
Our software-as-a service applications are built on our proven Forensic AI Engine, which uses patented AI-based technology to identify problems and suspicious behavior far faster and earlier than traditional techniques.
We are venture backed by some of the top investors in the country, with strong financials, and remain one of the fastest growing healthcare AI companies in the industry.
Position SummaryThe Junior Data Engineer supports the design, development, and maintenance of scalable data pipelines that power analytics, reporting, and machine learning initiatives. Working under the guidance of senior engineers, this role contributes to building reliable ETL workflows, optimizing database performance, and integrating structured and unstructured data sources.
This position partners closely with data scientists, analysts, and cross-functional stakeholders to ensure timely, accurate, and secure data delivery. By strengthening foundational data infrastructure, the Junior Data Engineer helps advance analytics maturity, enable AI initiatives, and promote data-driven decision-making across the organization. The role consistently leverages AI tools to enhance productivity, code quality, and solution effectiveness.
Key Responsibilities- Assist in designing, building, and maintaining scalable ETL/ELT data pipelines.
- Develop and optimize batch and streaming workflows using tools such as AWS Glue, Spark, and Airflow.
- Support data integration across multiple structured and unstructured data sources.
- Write clean, efficient, and maintainable code in Python and SQL.
- Monitor, troubleshoot, and improve pipeline reliability and performance.
- Optimize database performance, particularly in Postgre
SQL and cloud-based environments. - Maintain and support AWS-based infrastructure (EC2, S3, Glue, etc.).
- Implement data validation, quality checks, and monitoring processes.
- Ensure compliance with data governance, security, and regulatory standards.
- Collaborate with data scientists and analysts to translate data requirements into scalable engineering solutions.
- Document data flows, architecture decisions, and technical processes.
- Use AI-assisted development tools to improve speed, testing coverage, and code quality.
- Bachelor’s degree in Computer Science, Data Engineering, Information Systems, or a related technical field (or equivalent practical experience).
- 0–2 years of experience in data engineering, software engineering, or related technical roles (internships included).
- Proficiency in Python, PySpark and SQL.
- Familiarity with ETL/ELT concepts and data pipeline architecture.
- Experience working with relational databases such as Postgre
SQL. - Basic understanding of cloud computing concepts, preferably AWS.
- Exposure to distributed data processing frameworks such as Spark.
- Experience working in Linux environments and basic shell scripting.
- Strong analytical and problem-solving skills.
- Ability to work collaboratively in a team environment under mentorship.
- Strong written and verbal communication skills.
- Experience working with medical claims data strongly preferred.
- Hands‑on experience with AWS services such as EC2, S3, Glue, and IAM.
- Experience with workflow orchestration tools such as Apache Airflow.
- Exposure to data warehousing concepts and dimensional modeling.
- Familiarity with CI/CD pipelines and version control (e.g., Git).
- Understanding of data security, governance, and compliance best practices.
- Experience supporting…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).