Software Engineer- Site Reliability Engineer Security Clearance Job Chantilly area,Virginia USA,IT/Tech

Position: Software Engineer- Site Reliability Engineer with Security Clearance
Position:
Software Engineer
Capability:
Site Reliability Engineering Company Overview Noctua Technology, Inc. is a software engineering and consulting corporation focused on data engineering, machine learning, and cloud technologies. We specialize in delivering premier quality software engineering solutions to Public Sector and Commercial customers across the US. Department Overview The Site Reliability Engineering discipline at Noctua Technology, Inc is a strategic force driving digital transformation.

We treat operations as a software engineering challenge, focusing on the seamless integration, scalability, and long-term reliability of cloud native systems. Our SREs don’t just manage infrastructure; they build it using Infrastructure as Code (IaC), monitor it through advanced observability stacks, and protect it by engineering for failure. We work closely with clients to bridge the gap between development and operations.

Job Summary We are seeking a motivated Site Reliability Engineer (SRE) to join our dynamic team. As a key contributor, you will apply software engineering principles to operations, focusing on the reliability, scalability, and performance of production systems. You will play a crucial role in reducing toil through automation, defining and monitoring Service Level Objectives (SLOs), and implementing best practices for system stability and incident response.

This role requires working with modern cloud technologies to ensure the high availability and efficiency of applications and infrastructure. Security Clearance Requirement:
Applicants must be US citizens and eligible to obtain and maintain an active Secret security clearance or above.

Key Responsibilities
● Site Reliability Engineering
○ Define, measure, and report on Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure system reliability and uptime.
○ Develop and deploy Infrastructure as Code (IaC) using Terraform, Cloud Formation, or similar tools, with an emphasis on repeatability and change management.
○ Implement and manage containerized and serverless architectures using Docker, Kubernetes, and cloud-native services, focusing on performance and error budgets.
○ Build and maintain reliable and self-healing CI/CD pipelines to automate deployments and improve development workflows.

● Toil Reduction and Incident Management ○ Implement and refine comprehensive monitoring, alerting, and logging to detect and address performance and availability issues proactively.
○ Eliminate toil by extensively automating operational tasks, including provisioning, patching, and deployments, using scripting and configuration management tools such as Python, Bash, or Ansible.
○ Conduct post-incident reviews (blameless postmortems) to drive continuous improvement in system reliability and operational processes.

● Testing and Service Resiliency
○ Implement cloud security best practices, including identity and access management (IAM), encryption, and compliance controls.
○ Proactively identify and address system weaknesses and ensure performance under stress.
○ Support disaster recovery and high availability strategies through backup and failover planning.

● Collaboration and Knowledge Sharing
○ Collaborate with development teams to improve the operability and production readiness of applications from design through deployment.
○ Create and maintain documentation for cloud architectures, deployment processes, and best practices.
○ Contribute to internal knowledge-sharing initiatives, ensuring continuous learning within the team.

● Stakeholder Communication
○ Provide technical guidance and support to clients and internal teams on cloud infrastructure and reliability best practices, with a focus on defining Service Level Agreements (SLAs).
○ Act on client feedback to refine and enhance cloud solutions.
○ Conduct training and knowledge-sharing sessions to help clients manage their cloud environments effectively.

● Continuous Learning and Innovation
○ Stay updated on the latest developments in cloud infrastructure and technology trends.
○ Drive innovation by proposing and implementing new techniques and technologies. Qualifications

● 1-5…


Increase/decrease your Search Radius (miles)



Job Posting Language