×
Register Here to Apply for Jobs or Post Jobs. X

Senior Software Engineer, Data Processing

Job in New City, Rockland County, New York, 10956, USA
Listing for: Protege
Full Time position
Listed on 2026-06-17
Job specializations:
  • Software Development
    Database Engineering
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

About the Role

Protege is hiring a Senior Software Engineer to own the data processing layer at ingestion — the part of the platform that takes large-scale source data and turns it into clean, structured, enriched, validated, AI‑ready datasets. This is a hands‑on, backend‑ and data‑heavy role with end‑to‑end ownership of the pipelines that move and process data at volume.

Protege connects organizations that hold high‑value data with the AI builders who need it. The value of that exchange depends on what happens at ingestion: raw, varied, high‑volume source data has to be processed reliably, securely, and at scale before it's useful to anyone.

You'll work across imaging, audio, video, and other data modalities, crossing healthcare, media, and other disparate industries and data partners. You’ll partner closely with product, Data Lab, and partner engineering teams to build robust ingestion and processing systems for structured and unstructured data at massive scale, from millions to billions of records, files, and other source objects. This role is ideal for engineers who are energized by messy data at scale, want deep ownership of critical infrastructure, and like turning ambiguity into reliable systems.

What You’ll Do
Ingestion & Processing Systems
  • Design, build, and operate the ingestion systems that process large volumes of multimodal data into usable, well‑structured datasets.

  • Own the ingestion path end to end, from how data lands to how it is validated, processed, tracked, and made available downstream.

  • Build modality‑specific processing steps for real‑world source data, such as medical imaging processing, audio and video metadata extraction, quality validation, and notes processing.

  • Build parsers, validators, and normalization logic that can systematically handle messy, non‑standard, and high‑variance source formats.

  • Turn repeated one‑off data handling work into reusable processing patterns, internal tooling, and platform capabilities.

Scale, Performance & Reliability
  • Build for high volume and high throughput, optimizing systems for reliability, cost, and speed.

  • Work across distributed and parallel compute systems to process workloads that do not fit well on a single machine.

  • Choose the right execution model for the workload, including batch processing, distributed execution, and modern compute patterns for unstructured data and inference‑heavy processing.

  • Diagnose and resolve bottlenecks across ingestion and processing systems, and keep performance from degrading as volume and modality complexity grow.

Data Quality, Security & Compliance
  • Build validation and quality checks that catch bad, incomplete, or malformed data before it propagates downstream.

  • Handle sensitive and regulated data, including PHI, with the security and care the domain demands, including de‑identification where required.

  • Track provenance, metadata, and usage constraints through the ingestion path so downstream use remains compliant and auditable.

  • Raise the quality bar for observability, debuggability, and operational reliability across the ingestion layer.

Cross‑Functional Partnership
  • Partner with product and Data Lab to support new modalities, new partner requirements, and non‑standard source data.

  • Work directly with partner engineering teams when needed to translate source‑system realities into robust ingestion and processing design.

  • Surface recurring patterns that are worth standardizing into reusable transforms, validators, and internal tooling.

  • Help shape how Protege handles new data types as the platform expands into more complex data environments.

What Success Looks Like
30 days:
Ramp
  • Get productive in the codebase and ship your first improvements to existing pipelines.

  • Build a working map of the ingestion and processing stack, the major data flows, and how we handle each modality.

  • Meet the engineering, product, and Data Lab teams to understand how the function operates across the company.

60 days:
Take Ownership
  • Own a processing pipeline or modality end to end, from ingestion through delivery of AI‑ready output.

  • Develop depth in how we handle one or two data types at scale.

  • Start raising the bar on data quality, observability, and…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary