More jobs:
SRE Architect
Job in
Dallas, Dallas County, Texas, 75215, USA
Listed on 2025-12-02
Listing for:
NeerInfo Solutions
Full Time
position Listed on 2025-12-02
Job specializations:
-
IT/Tech
IT Support, Cybersecurity, Systems Engineer, Network Security
Job Description & How to Apply Below
Responsibilities
- Provide SRE and production support with an emphasis on observability to proactively identify issues and drive incident response.
- Act as incident commander to diagnose complex issues and actively drive incident calls with technical teams, product SMEs, and Tier 2 SREs.
- Bachelor’s degree or foreign equivalent required from an accredited institution. Will also consider three years of progressive experience in the specialty in lieu of every year of education.
- At least 10 years of Information Technology experience.
- SRE mindset in production support with proactive issue identification using observability tools.
- Skilled in using monitoring and observability tools to track system performance.
- Experience with Splunk (including Splunk APM and Splunk O11y), App Dynamics; experience with DB, Network, Linux/Unix, Kubernetes; and experience in APM, NMON, Wireshark usage and analysis.
- Experience in production support activities including proactive issue identification leveraging observability tools and correlating inputs from dashboards and tools to drive resolution.
- Able to identify probable failure points through analysis of logs, observability dashboards, recent application changes, infra and network changes.
- Basic troubleshooting across the stack (Application, Database, Infra including container platforms, and Network).
- Experience in setting up observability dashboards based on Splunk logs.
- Production support expertise with SRE observability experience, including proactive issue identification using observability tools and tracking system performance.
- Experience in production support activities involving correlating inputs from dashboards and tools to drive resolution.
- Ability to swiftly identify probable failure points through analysis of multiple inputs (logs, observability dashboards, recent changes, infra, network changes).
- Strong troubleshooting across all layers of the tech stack (Application, Database, Infra including container platforms, and Network).
- Experience in setting up observability dashboards based on Splunk logs.
- Excellent communicator and capable of leading and triaging proactively identified issues/incidents where leadership may be present.
- Leadership in triage calls to direct actions for the team.
- Automation – experience in Toil identification and automation.
- Analysis of issues via Splunk (including Splunk APM and Splunk O11y), App Dynamics, Grafana, Red Metrics, 1000
Eyes. - Debugging issues in VMs, load balancers, firewalls, API gateways, DB, network, Linux/Unix.
- Debugging in containerization (Docker, Kubernetes), AWS, PCF, Azure.
- Analysis of issues via APM, NMON, Wireshark usage and analysis.
- Database performance monitoring and analysis.
- Experience in UEM and synthetic monitoring setup.
- Experience in heap dump analysis, memory leak analysis, and resource optimization.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×