Compliance Engineering, Dev Ops, Vice President, Dallas
Listed on 2026-05-27
-
IT/Tech
SRE/Site Reliability, Systems Engineer
What We Do
We are Compliance Engineering, a global team of more than 300 engineers and scientists who work on the most complex, mission‑critical problems. We build and operate a suite of platforms and applications that prevent, detect, and mitigate regulatory and reputational risk across the firm, have access to the latest technology and to massive amounts of structured and unstructured data, leverage modern frameworks to build responsive and intuitive front‑end and Big Data applications.
The firm is making a significant investment to uplift and rebuild the Compliance application portfolio.
SRE at Goldman Sachs combines software and systems engineering to build, run, and maintain high‑performant, distributed, fault‑tolerant systems. As an SRE Engineer you will fill a mission‑critical role ensuring that our systems are healthy, monitored, automated, and designed to scale. You will collaborate with engineering teams to continually improve our production services, facilitate fast delivery of new services, and reduce downtime.
SRE utilizes automation, tools, and solid engineering principles to optimize existing systems, build infrastructure, and eliminate operational work. We are looking for passionate, curious, driven engineers who thrive on solving operational problems, improving efficiency, and would like to apply their skills to Compliance Engineering SRE.
- Proactive management of our production services by measuring and monitoring availability, capacity and overall system health.
- Shape software before go‑live through activities such as system design consulting, capacity planning and launch reviews.
- Scale and evolve systems by pushing for changes that improve capacity and reliability.
- Practice sustainable incident management in a blameless postmortem culture.
- Identify and build improvements to system behavior, controls and monitoring tools.
- Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to quantify and manage service reliability.
- Engineer solutions to reduce toil through advanced automation and self‑healing system capabilities.
- Conduct capacity modeling and performance tuning to ensure systems meet future demand.
- Experience in one or more of the following:
Java, Python, and Perl. - Strong communication skills and the ability to clearly express ideas and arguments.
- Solid analytical and problem‑solving skills with appreciation of technical risk.
- Experience with automated testing and SDLC concepts, developing applications in a Linux environment, and sound knowledge of algorithms, data structures and software design.
- Systematic problem‑solving approach and a sense of ownership and drive.
- Ability to debug and optimize code and to automate routine tasks.
- Proficiency with observability stacks, including distributed tracing, logging, and metrics (e.g., Prometheus, Grafana, ELK, or Open Telemetry).
- Deep understanding of containerization and orchestration technologies, specifically Docker and Kubernetes (K8s).
- Knowledge of networking protocols and load‑balancing strategies in a distributed systems environment.
- Bachelor’s degree in Computer Science, or a related technical field that involves programming.
- Experience in some of the following is desired: SRE experience, relational databases, Hadoop and big data technologies, knowledge of the financial industry or compliance/risk functions.
- Experience with Chaos Engineering principles and fault‑injection testing to verify system resilience.
- Familiarity with cloud‑native architecture and managed services (AWS, Azure, or GCP).
- Understanding of Error Budgeting and its application in balancing feature velocity with system stability.
- Hands‑on experience with Infrastructure as Code (IaC) frameworks such as Terraform, Ansible, or Cloud Formation.
- Passionate about solving operational problems and constant improvement via automation.
- Highly motivated, pro‑active and capable of working under pressure without compromising development processes.
- Strong, committed and reliable team player and strong communicator, able to take direction but also willing…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).