Platform Operations Manager
Listed on 2026-06-24
-
IT/Tech
Cloud Computing: Infrastructure & Operations, Systems Engineer, SRE/Site Reliability, AWS
Leidos is excited to present an opportunity for a TS/SCI‑cleared Platform Operations Manager to join a high‑impact team driving the design, development, and deployment of a modern technology stack supporting the DOMEX Data Discovery Platform (D3P) Modernization Program. This role directly supports our customer’s mission to centralize and standardize the Tasking, Collection, Processing, Exploitation, and Dissemination (TCPED) of Open Source Intelligence (OSINT) across the Defense Intelligence Enterprise.
The majority of work is performed on‑site at our customer location in Bethesda, MD, with some flexibility for remote tasks.
- Ensure availability, reliability, and performance of a full‑stack, containerized microservices platform.
- Help cultivate a strong Dev Sec Ops culture and collaborate with systems engineering, architecture, development, security, operations, and integration teams.
- Partner with multidisciplinary teams to lead efforts in areas including:
- System Reliability & Performance – Ensure uptime, performance, and capacity planning for a large‑scale big data production platform with microservice architecture on Kubernetes, Elasticsearch, PostgreSQL, Kafka, and technologies such as Java, Python, React, and low‑code tools like Appian.
- Monitoring & Observability – Leverage monitoring tools to proactively detect and resolve issues.
- Incident Response – Lead triage, troubleshooting, root‑cause analysis, and post‑incident reviews.
- SLIs & SLOs – Define and track reliability metrics.
- Management Oversight – Lead a team of system administrators supporting a help desk, set technical standards, and mentor staff.
- Technical Leadership – Partner with systems engineers to design solutions, contribute to documentation, and support architectural alignment.
- SAFe Agile – Participate in release planning, scrums, design sessions, bug triage, and cross‑team coordination.
- BS in Engineering, Computer Science, Systems Engineering, or related field (or equivalent experience).
- 15+ years of relevant experience (including 13+ years on a Master’s; alternative experience may substitute).
- Active TS/SCI clearance with the ability to obtain and maintain that clearance and polygraph.
- At least one DoD 8570.01‑M IAT Level II+ certification (e.g., Security+ CE, CySA+, CCNA Security, SSCP, CISSP or Associate).
- Ability to obtain Privileged User Account (PUA) certification.
- Experience with Kubernetes, Git Lab pipelines, Linux, and containerized environments.
- Experience supporting enterprise‑scale production systems.
- Experience with cloud services (preferably AWS) and cloud infrastructure.
- Familiarity with Elasticsearch, PostgreSQL, Logstash, Kibana, and Keycloak.
- Demonstrated success in cross‑functional coordination and execution.
- Team leadership and line management experience.
- Strong communication skills and the ability to perform under pressure during incidents.
- Experience with Agile methodologies.
- Development experience (Bash, Power Shell, SALT, Python, Groovy, Java, etc.).
- Experience with Appian or other low‑code platforms.
- Experience with technologies such as Kafka, AMQP/JMS, Prometheus/Grafana, GPU‑based Kubernetes, SALT automation, Nexus, or GraphQL.
- Knowledge of security best practices (authN/Z, secrets management, data protection).
- Infrastructure‑as‑code experience (Cloud Formation, Terraform, Pulumi).
- AWS cloud certifications.
All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).