SRE Architect Job Dallas area,Texas USA,IT/Tech

Responsibilities

Provide SRE and production support with an emphasis on observability to proactively identify issues and drive incident response.
Act as incident commander to diagnose complex issues and actively drive incident calls with technical teams, product SMEs, and Tier 2 SREs.

Qualifications

Bachelor’s degree or foreign equivalent required from an accredited institution. Will also consider three years of progressive experience in the specialty in lieu of every year of education.
At least 10 years of Information Technology experience.
SRE mindset in production support with proactive issue identification using observability tools.
Skilled in using monitoring and observability tools to track system performance.
Experience with Splunk (including Splunk APM and Splunk O11y), App Dynamics; experience with DB, Network, Linux/Unix, Kubernetes; and experience in APM, NMON, Wireshark usage and analysis.
Experience in production support activities including proactive issue identification leveraging observability tools and correlating inputs from dashboards and tools to drive resolution.
Able to identify probable failure points through analysis of logs, observability dashboards, recent application changes, infra and network changes.
Basic troubleshooting across the stack (Application, Database, Infra including container platforms, and Network).
Experience in setting up observability dashboards based on Splunk logs.

Preferred Qualifications

Production support expertise with SRE observability experience, including proactive issue identification using observability tools and tracking system performance.
Experience in production support activities involving correlating inputs from dashboards and tools to drive resolution.
Ability to swiftly identify probable failure points through analysis of multiple inputs (logs, observability dashboards, recent changes, infra, network changes).
Strong troubleshooting across all layers of the tech stack (Application, Database, Infra including container platforms, and Network).
Experience in setting up observability dashboards based on Splunk logs.

Communication

Excellent communicator and capable of leading and triaging proactively identified issues/incidents where leadership may be present.
Leadership in triage calls to direct actions for the team.
Automation – experience in Toil identification and automation.

Technical expertise

Analysis of issues via Splunk (including Splunk APM and Splunk O11y), App Dynamics, Grafana, Red Metrics, 1000

Eyes.
Debugging issues in VMs, load balancers, firewalls, API gateways, DB, network, Linux/Unix.
Debugging in containerization (Docker, Kubernetes), AWS, PCF, Azure.
Analysis of issues via APM, NMON, Wireshark usage and analysis.
Database performance monitoring and analysis.
Experience in UEM and synthetic monitoring setup.
Experience in heap dump analysis, memory leak analysis, and resource optimization.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language