Machine Learning Systems Engineer Job Zurich area,Zürich Kanton Zürich Switzerland,IT/Tech

Location: Zürich

Location:

TN Switzerland
· Canton of Lucerne, Switzerland

Employer:

Tether Operations Limited

About the job

We are developing a highly scalable media intelligence platform that processes, analyzes, and structures large volumes of multimedia content across text, image, video, and audio. As a Senior Applied ML Engineer, you will architect and build the core backend systems that power media ingestion, processing workflows, metadata generation, AI-based analysis, semantic search, and retrieval across large media libraries.

Responsibilities

Backend Architecture & System Ownership: architect, build, and operate scalable backend services for a media intelligence platform, with a focus on clean, maintainable, and production-ready systems. Own critical backend components end-to-end, from system design and API contracts through implementation, deployment, monitoring, and iteration. Drive architectural decisions across APIs, processing pipelines, distributed compute, storage, search, observability, cloud infrastructure, and model-serving workflows. Design data models and storage patterns for media assets, generated metadata, embeddings, processing jobs, model outputs, search indexes, and audit trails.

Design high-throughput media ingestion and processing pipelines for large volumes of video, audio, image, and text content. Build distributed, event-driven workflows for media processing using queues and pub/sub systems such as SQS, Kafka, Pub/Sub, or equivalent technologies. Implement reliable asynchronous processing patterns, including retries, idempotency, dead-letter queues, back-pressure handling, and fault-tolerant job execution.
AI/ML Integration & Model Workflows: lead the development and optimization of metadata extraction, content analysis, scene detection, transcription, embedding generation, and multimodal AI inference workflows. Integrate and optimize AI/ML services within backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene analysis, multimodal inference, batching, caching, and fallback strategies. Collaborate with ML engineers, data scientists, or external model providers to benchmark models, compare quality/latency trade-offs, and safely roll out model upgrades.

Model serving & performance optimization involve optimizing inference workflows for latency, throughput, reliability, and cost across both real-time and batch-processing paths. Work with model-serving systems such as vLLM, Triton, TGI, Sage Maker, Vertex AI, or custom inference services to improve batching, concurrency, warm-up behavior, timeout handling, autoscaling, and GPU utilization. Evaluate and apply practical model optimization techniques such as quantization, model distillation, batching, caching, prompt optimization, and routing to smaller or cheaper models where appropriate.

Design and maintain vector search and indexing systems using technologies such as Pinecone, Weaviate, Qdrant, Elastic Vectors, FAISS, pgvector, or similar tools. Build retrieval workflows that support semantic search, similarity matching, duplicate detection, media discovery, and structured metadata search. Monitor model and system performance in production, including API latency, queue depth, processing time, model error rates, GPU utilization, confidence distributions, drift signals, and cost per processed item.
Infrastructure, Reliability & Observability: deploy and operate systems on AWS, GCP, Azure, or equivalent cloud platforms, including compute, storage, networking, queues, model-serving infrastructure, and monitoring systems. Ensure system reliability through logging, metrics, tracing, alerting, dashboards, operational runbooks, and incident-response best practices.
Collaboration & Engineering Leadership: collaborate with product, design, data, and ML teams to deliver media-rich, AI-powered product features. Mentor junior and mid-level engineers, support technical planning, review designs, and raise engineering quality across the team. Participate in code reviews, documentation, technical planning, and continuous improvement of engineering practices. Ensure code quality through testing, peer review, clear documentation, and maintainable implementation patterns.

Education

& Experience

5-7+ years of backend engineering experience, ideally building scalable distributed systems, media platforms, data pipelines, or high-throughput backend services.
Prior experience owning major backend modules end-to-end, including architecture, implementation, deployment, monitoring, and production operations.
3+ years of experience integrating AI/ML inference systems into backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene detection, or multimodal model outputs.
Hands-on experience creating AI-powered processing pipelines for image, video, audio, or text analysis.
Practical experience with production model optimization, especially for image, video, embedding, or multimodal models, including batching, caching,…