Software Engineer ML and Data Skills Poland + Remote – PLN NET B2B Softwa
Rochester, Monroe County, New York, 14602, USA
Listed on 2026-06-27
-
IT/Tech
AI Engineer (Applied/Software), Machine Learning/ ML Engineer, Data Scientist
Software Engineer with ML and Data Skills
Virtus Lab is a leading European software consulting and engineering company. Our mission is to craft clean code and practical solutions with precision and purpose. We foster a dynamic culture rooted in strong engineering, a sense of ownership, and transparency, empowering professionals to make a substantial impact in the software industry.
About the roleProduct ionizing and scaling an ML-driven data quality system across the organization. The scope of services involves building and tuning anomaly-detection and clustering pipelines, pairing classic ML with LLM reasoning to flag and explain issues, collaborating with data producers to fix root causes, and creating as well as maintaining validator models that turn detected anomalies into better future data.
Technology proficiency:
- Python – Expert
- Airflow – Advanced
- Spark (Dataproc) – Advanced
- Scikit-learn – Advanced
- Big Query – Regular
- Snowflake – Regular
- Trino/Starburst with Iceberg – Regular
- AWS / GCP – Regular
- Git Hub Actions – Regular
- Jenkins – Basic
- Terraform – Basic
- Docker – Basic
Anomalsky
Our client is a NASDAQ-listed B2B data company powering Go-To-Market strategies with a 360-degree view of every customer, a view whose value depends on the quality of billions of person and company records. Anomalsky is the ML system built to catch what traditional observability misses: row-level semantic anomalies (e.g., , title, ). Three layers – an ML layer (embeddings + unsupervised clustering) flags suspicious records at scale, an LLM layer removes false positives and explains each cluster, and an optional human-in-the-loop lets domain experts resolve whole clusters MVP already drove ~40k crucial record corrections in production.
What’s next: the MVP is landing on GCP now. Once it’s operational, the mission is to scale Anomalsky across the entire organization, embedding it into Acquisition pipelines and building a real-time variant that scans data before it reaches customers.
Scope of cooperation covers:
- Product ionizing Anomalsky on GCP and scaling it to operational, organization-wide use.
- Evolving the ML / LLM / human-in-the-loop design and the feedback loop that turns expert reviews into reusable knowledge.
- Prototyping the low-latency real-time variant.
- Integrating Anomalsky into existing workflows, starting with Acquisition.
- Strong Python and production ML skills, with a proven track record of shipping models into real production pipelines.
- Hands-on experience using classic ML to surface data quality issues at scale: unsupervised anomaly detection (kNN, Isolation Forest, autoencoders) and clustering on messy real-world tabular data.
- Practical experience pairing classic ML with LLMs: using models to flag suspicious records and LLMs for reasoning, false-positive filtering, and the final verification of anomalies.
- Solid data engineering background across the modern stack (Airflow, Spark/Dataproc, Big Query, Snowflake, Iceberg/Trino) and the production toolchain (GCP, Docker, Terraform, CI, MLflow).
- Pragmatic, product-oriented approach focused on incremental value delivery and seamless integration into existing workflows.
- Professional fluency in English, enabling smooth technical and business discussions in an international environment.
- Building tech community
- Home office reimbursement
- Training Package
- Virtusity / in-house training
- Access to the above perks is optional and completely voluntary for B2B contractors
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).