Principal Engineer Machine Learning; MLOps DLP Detection Job Santa Clara area,California USA,IT/Tech

Position: Principal Engineer Machine Learning (MLOps DLP Detection)

Company Description Our Mission

At Palo Alto Networks everything starts and ends with our mission:

Being the cybersecurity partner of choice, protecting our digital way of life.
Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.

Who We Are

We believe collaboration thrives in person. That’s why most of our teams work from the office full time, with flexibility when it’s needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes.

Job Description Your Career

We are looking for a Principal MLOps Engineer to lead the design, development, and operation of production-grade machine learning infrastructure this role, you will architect robust pipelines, deploy and monitor ML models, and ensure reliability, reproducibility, and governance across our AI/ML ecosystem. You will work at the intersection of ML, Dev Ops, and cloud systems, enabling our teams to accelerate experimentation while ensuring secure, efficient, and compliant deployments.

This

role is located at our dynamic Santa Clara California headquarters campus, and in office 3 days a week. Not a remote role. Your Impact

End-to-End ML Architecture and Delivery Ownership:
Architect, design, and lead the implementation of the entire ML lifecycle. This includes ML model development and deployment workflows that seamlessly transition models from initial experimentation/development to complex cloud and hybrid production environments.
Operationalize Models at Scale:
Develop and maintain highly automated, resilient systems that enable the continuous training, rigorous testing, deployment, real‑time monitoring, and robust rollback of machine learning models in production, ensuring performance meets massive scale demands.
Ensure Reliability and Governance:
Establish and enforce state‑of‑the‑art practices for model versioning, reproducibility, auditing, lineage tracking, and compliance across the entire model inventory.
Drive Advanced Observability & Monitoring: Develop comprehensive, real‑time monitoring, alerting, and logging solutions focused on deep operational health,
model performance analysis (e.g., drift detection), and business metric impact.
Champion Automation & Efficiency: Act as the primary driver for efficiency, pioneering best practices in Infrastructure‑as‑Code (IaC), sophisticated container orchestration, and continuous delivery (CD) to reduce operational toil.
Collaborate and Lead Cross‑Functionally: Partner closely Security Teams, and Product Engineering to define requirements and deliver robust, secure, and production‑ready AI systems.
Lead MLOps Innovation: Continuously evaluate, prototype, and introduce cutting‑edge tools, frameworks, and practices that fundamentally elevate the scalability, reliability, and security posture of our production ML operations.
Optimize Infrastructure & Cost: Strategically manage and optimize ML infrastructure resources to drive down operational costs, improve efficiency, and reduce model bootstrapping times.

Qualifications Your Experience

8 years of software/Dev Ops/ML engineering experience, with at least 3 years focused specifically on advanced MLOps, ML Platform, or production ML infrastructure and 5 yeas of experience building ML Models
Deep expertise in building scalable, production‑grade systems using strong programming skills (Python, Go, or Java).
Expertise in leveraging cloud platforms (AWS, Google Cloud Platform, Azure) and container orchestration (Kubernetes, Docker) for ML workloads.
Proven hands‑on experience in the ML Infrastructure lifecycle, including:
Model Serving: (Tensor Flow Serving, Torch Serve, Triton Inference Server/TIS).
Workflow Orchestration: (Airflow, Kubeflow, MLflow, Ray, Vertex AI, Sage Maker).
Mandatory Experience with Advanced Inferencing Techniques: Demonstrable ability to utilize advanced hardware/software acceleration and optimization techniques, such as Tensor

RT (TRT), Triton…


Increase/decrease your Search Radius (miles)



Job Posting Language