Systems Engineer; Performance Engineer
Listed on 2026-01-02
-
IT/Tech
Systems Engineer, Cloud Computing
Overview
Join to apply for the Systems Engineer (Performance Engineer) role at Auto Zone
.
As a Systems Engineer specializing in Performance Engineering and Site Reliability Engineering (SRE), you will drive the reliability, scalability, and performance of critical enterprise systems. The Performance Engineer creates and implements performance test plans to evaluate system operations and detect performance bottlenecks. The role focuses on SRE practices, ensuring robust system operations, automation, and continuous improvement. You will analyze CPU usage, memory usage, and other performance metrics with testing tools, and develop monitoring profiles for the underlying infrastructure.
You will work with technical stakeholders to interpret test results and identify possible system backlogs.
- Develop and implement performance test plans to evaluate system operations and detect bottlenecks.
- Analyze CPU, memory, and other performance metrics using industry-standard tools.
- Identify, track, and communicate performance issues, memory leaks, and bottlenecks to stakeholders.
- Collaborate with engineers, architects, and business teams to define performance SLAs and monitoring strategies.
- Lead and mentor performance testing teams, conduct chaos testing, and replicate production issues in test environments.
- Define and enforce resilience and reliability best practices.
- Plan and manage deliverables for performance diagnostics, capacity planning, architecture design, tuning, and monitoring.
- Conduct system security, performance, and stress testing; analyze results and recommend improvements.
- Identify areas for performance and process improvement, and define roadmaps for enhancement.
- Perform web/mobile application and network penetration testing, including vulnerability exploitation and documentation.
- Provide technical assistance to improve system performance, capacity, reliability, and scalability.
- Collaborate on service-level objectives (SLOs), error budgets, and reliability metrics.
- Implement and refine observability solutions (metrics, logs, traces) to proactively detect and resolve issues.
- Champion best practices for system reliability, scalability, and disaster recovery.
- Work closely with development and operations teams to integrate reliability engineering into the software lifecycle.
- Bachelor's degree in Computer Science, Information Science, or related field.
- 5+ years of experience in architecting performance test automation solutions and SRE practices.
- Experience in performance testing web applications, and middleware/applications.
- Implemented at least one Chaos testing tool and derived/executed chaos experiments at different layers of application on cloud infrastructure.
- Proven ability to create automated test scripts, test scenarios, and analyze results using Load Runner, JMeter, and Blaze Meter.
- Experience in performance testing and tuning of complex large-scale enterprise applications in the Retail industry.
- Strong troubleshooting, problem solving & reasoning skills with the ability to identify system bottlenecks.
- Strong programming skills (Python required; Java a plus).
- Experience with Python, JMeter, code profiling, and monitoring/observability tools.
- Data mining experience using custom shell scripts and leveraging complex Splunk queries for troubleshooting and testbed setup.
- Experience in reporting to all levels of an organization regarding testing results and building monitoring dashboards.
- Database knowledge, indexes, and SQL optimization techniques in Oracle.
- Proficiency in monitoring/observability tools (Dynatrace required).
- Understanding of factors influencing software performance across multiple layers including database, network, CPU utilization, JVM tuning, memory analysis, thread management, and query performance.
- Solid understanding of APIs and experience in creating and measuring performance for Web Services.
- Knowledge of UNIX, Linux, Windows, Java, MS SQL, C/C++, Python, Go, GoScript, Oracle, and related technologies; familiarity with APIGEE, Ping Identity, Kafka, TCP/IP, networking and LAN monitoring.
- Experience with cloud platforms (AWS, Azure, GCP) and infrastructure automation…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).