Sr Site Reliability Engineer
Listed on 2026-06-18
-
IT/Tech
Systems Engineer, Cloud Computing: Infrastructure & Operations, SRE/Site Reliability
Overview
Archer is an aerospace company based in San Jose, California building an all-electric vertical takeoff and landing aircraft with a mission to advance sustainable air mobility. We design, manufacture, and operate an all-electric aircraft capable of carrying four passengers with minimal noise. We are seeking a highly experienced and passionate Sr. Staff Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for the reliability, scalability, performance, and security of our core systems and services, designing, implementing, and maintaining robust infrastructure and automation solutions.
Responsibilities- Implement and maintain the infrastructure and pipeline required for an internal LLM-powered chat service, potentially leveraging platforms like Open Router or similar alternatives.
- Implement and maintain highly available, scalable, and secure cloud-native infrastructure on Amazon Elastic Kubernetes Service (EKS).
- Develop and implement comprehensive observability strategies, including monitoring, logging, and alerting, to ensure the health and performance of our systems.
- Architect and optimize data pipelines to ensure efficient and reliable data flow across various platforms.
- Drive the continuous improvement of CI/CD pipelines, promoting best practices for automated testing, deployment, and release management.
- Champion cloud-first strategies, leveraging the full capabilities of cloud platforms for infrastructure, services, and operations.
- Implement and enforce robust security practices across our infrastructure, applications, and data.
- Design and maintain Docker-based containerization solutions for our applications.
- Develop and maintain automation scripts and tools using Python, Bash, and Power Shell.
- Collaborate with development teams to ensure reliability is built into the software development lifecycle from inception.
- Troubleshoot complex production issues across various layers of the stack, identifying root causes and implementing preventative measures.
- Participate in on-call rotations to support production systems.
- 12+ years of experience in Site Reliability Engineering, Dev Ops, or a similar role with a strong focus on operational excellence.
- Deep expertise in Amazon EKS, including cluster provisioning, management, and troubleshooting.
- Extensive experience with observability tools and practices, including Prometheus, Grafana, ELK stack, or similar.
- Proven track record in designing and implementing robust data pipelines (e.g., Kafka, Airflow, Spark).
- Strong background in CI/CD methodologies and tools (e.g., Jenkins, Git Lab CI, ArgoCD).
- Expert-level knowledge of cloud platforms (AWS preferred), including infrastructure-as-code principles.
- Comprehensive understanding of security best practices for cloud environments, applications, and data.
- Proficiency in Docker for containerization and orchestration.
- Advanced scripting and programming skills in Python, Bash, and Power Shell.
- Solid understanding of networking concepts, distributed systems, and operating systems.
- Excellent problem-solving, analytical, and communication skills.
- Ability to work independently and as part of a highly collaborative team.
- Bachelor s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- Experience with other Kubernetes distributions or cloud providers.
- Familiarity with compliance frameworks (e.g., SOC 2, HIPAA, GDPR).
- Certifications in AWS, Kubernetes, or other relevant technologies.
Archer is committed to equal opportunity employment and diversity in the workplace. All aspects of employment are decided on the basis of merit, qualifications, and business needs. We do not discriminate based upon race, color, religion, sex, sexual orientation, age, national origin, disability status, protected veteran status, gender identity or any other characteristic protected by federal, state, or local laws.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).