Site Reliability Engineer
Listed on 2025-12-07
-
IT/Tech
Systems Engineer, Cloud Computing
Join to apply for the Site Reliability Engineer role at Invicti
We are looking for a Senior Site Reliability Engineer (SRE) to join our Infrastructure Team and play a key role in ensuring the reliability, scalability, and performance of Invicti’s infrastructure and services. This position is based in Austin and will operate with a high degree of independence. Familiarity with Invicti’s products, architecture, and operational practices is highly valued, as the engineer will be expected to work in isolation while maintaining strong alignment with global teams.
In this role, you will design, implement, and maintain resilient systems that support Invicti’s enterprise-scale applications. Collaborate closely with development, Dev Ops, and security teams to optimize performance, streamline infrastructure provisioning through Git Ops, and drive automation that enhances overall system reliability and observability.
What you will be doing:- System Reliability & Uptime: Ensure Invicti’s services are highly available and resilient by designing, implementing, and maintaining scalable infrastructure solutions.
- Incident Response: Lead and coordinate incident management, ensuring minimal downtime and clear communication across teams during production events.
- Automation & Infrastructure as Code: Design, implement, and maintain infrastructure provisioning pipelines using Terraform and Git Ops practices, ensuring consistent and auditable deployments.
- Infrastructure Ownership: Manage cloud-based infrastructure (primarily AWS) efficiently and securely, optimizing cost and performance.
- Performance Optimization: Identify bottlenecks and improve service performance through tuning, scaling, and observability enhancements.
- Observability & Monitoring: Continuously improve product observability and monitoring; proactively contribute by creating or updating dashboards, alerts, and metrics. Be willing to push merge requests directly into product code to enhance observability rather than waiting for development teams to implement changes.
- Security and Compliance: Collaborate with security teams to maintain infrastructure compliance and implement best practices.
- Operational Autonomy: Act as the key SRE contact for the U.S. time zone, independently managing incidents, optimizations, and infrastructure changes.
- Documentation &
Collaboration:
Maintain accurate infrastructure documentation and contribute to cross-team knowledge sharing.
- Strong experience with cloud platforms (AWS preferred) and core cloud services (EC2, ECS/EKS, RDS, IAM, etc.).
- Proficiency with Infrastructure as Code (IaC) tools such as Terraform or Cloud Formation, with experience integrating Git Ops workflows for infrastructure provisioning.
- Hands‑on experience with monitoring and observability tools such as Prometheus, Grafana, ELK Stack, Datadog, or Dynatrace.
- Strong understanding of automation pipelines for IaC and Terraform runs (not application CI/CD), emphasizing Git Ops‑based workflows and controlled approval processes.
- Solid understanding of networking concepts, security best practices, and identity management in distributed environments.
- Experience with both Linux and Windows systems administration and performance troubleshooting.
- Proficiency in scripting languages such as Python, Bash, or Go for automation and tooling.
- Strong familiarity with Git for version control, collaboration, and change management across infrastructure repositories.
- Prior experience with Invicti’s solutions (App Sec, DAST scanning platforms, or related SaaS infrastructure).
- Experience working in globally distributed and remote‑first teams.
- Proven ability to independently troubleshoot, optimize, and manage production systems.
- Demonstrated history of driving observability, monitoring, and automation improvements across products and infrastructure.
- Strong problem‑solving and analytical mindset.
- Excellent communication and collaboration skills across distributed teams.
- High degree of ownership, accountability, and initiative.
- Adaptable and comfortable working asynchronously across time zones.
- Detail‑oriented, with a focus on delivering…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).