×
Register Here to Apply for Jobs or Post Jobs. X

Cloud Reliability & Support Engineer

Job in Chantilly, Fairfax County, Virginia, 22021, USA
Listing for: Peraton
Full Time position
Listed on 2025-12-26
Job specializations:
  • IT/Tech
    Systems Engineer, IT Support, Cybersecurity, Cloud Computing
Salary/Wage Range or Industry Benchmark: 95000 - 130000 USD Yearly USD 95000.00 130000.00 YEAR
Job Description & How to Apply Below

Required qualifications:

  • This position requires the candidate to possess a minimum of Top-Secret clearance with the ability to obtain TS/SCI. The candidate must maintain the clearance.
  • Associates degree and 10+ years of experience in a systems engineering related field; OR bachelor’s degree in computer science, computer engineering, or related field and 8+ years of experience in a systems engineering related field; or a master’s degree in computer science, cloud computing, or related field and 6+ years of experience in a systems engineering related field. Additional four (4) years of relevant experience will be considered in lieu of a degree
  • Meet DoD 8140 foundational requirements for a System Developer with a proficiency of advanced.
  • 4+ years of hands‑on experience in a cloud operations, system reliability engineer (SRE), or highly technical Level‑3 support role within a Linux/Private Cloud environment.
  • Deep‑level expertise with RHEL/CentOS administration, networking, and system diagnostics.
  • Strong understanding of Red Hat Open Stack service interaction (Nova, Neutron).
  • Proficiency with key observability tools and log analysis on Linux systems (e.g., systemd‑journald, specialized Open Stack logs).
  • Expert skill in diagnosing resource contention and failure patterns in distributed systems on a Linux operating system.
  • Proficiency in Linux systems administration, cloud computing, and virtualization, with a strong understanding of both public and private cloud environments.
  • Strong communication and organizational skills in coordination with customers / tenants
Desired

Qualifications:
  • Certifications:

    Red Hat Certified Engineer (RHCE) or equivalent is highly preferred.
  • You have strong skills in scripting languages such as Python (specifically for Open Stack SDK interaction).
  • Hands‑on experience with container technologies (Docker, Kubernetes) and demonstrable experience with Open Shift Container Platform
  • A solid grasp of enterprise networking, firewalls, and security best practices.
  • Strong analytical and conceptual thinking skillsto troubleshoot complex issues and optimize performance.
  • Ability to learn independently, adapt to an evolving environment, and stay current with industry trends.
Benefits:

Peraton offers enhanced benefits to employees working on this critical National Security program, which include heavily subsidized employee benefits coverage for you and your dependents, 25 days of PTO accrued annually up to a generous PTO cap and eligible to participate in an attractive bonus plan

#Metroplex

Peraton is seeking a Cloud Reliability & Support Engineer in our Chantilly, VA office in support of our Department of Defense (DoD) customer as part of a highly talented, highly motivated and high-performing team. As the program’s expert in Level 3 Anomaly Resolution and operational excellence, your deep expertise in RHEL and RHOSP internals is used to conduct deep, in‑project troubleshooting, ensuring tenant applications fully utilize the cloud’s resiliency features.

Your focus is on stability by identifying root causes of system anomalies within the tenant's provisioned environment. Join us and be part of the next generation of innovators as we blaze a trail forward for our profession and company.

What you'll do:
  • Anomaly Resolution & Deep Troubleshooting
    • Serve as the primary technical resource for complex, escalated incidents that are contained within the tenant's RHOSP project/resources.
    • RHEL/OS Deep Dive:
      Expertly troubleshoot issues on tenant RHEL instances, including kernel panics, package conflicts, file system errors, and performance degradation (CPU, memory, I/O).
    • RHOSP Resource Triage:
      Diagnose issues related to the tenant's consumption of Open Stack services (e.g., Nova instance failures, Neutron port issues, Cinder volume attachment problems).
    • Utilize monitoring tools to perform deep‑dive analysis and isolate the root cause of service disruptions within the Open Stack data plane.
    • Root Cause Analysis (RCA):
      Own the technical execution and documentation of RCAs, focusing on issues rooted in RHEL/RHOSP misconfiguration or resource limitations.
  • Maintain partnership with Red Hat vendor to stay up to date with the latest advancements in Red Hat products and industry best practices to maintain effective and innovative infrastructures
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary