Senior Operations Reliability Engineer – Messaging Platforms
Listed on 2026-06-03
-
IT/Tech
Systems Engineer, IT Support, Cybersecurity, SRE/Site Reliability
Senior Operations Reliability Engineer – Messaging Platforms
Location: UK
Level: P3 – Professional Track
OverviewMaintain and improve the reliability, security, and operational maturity of enterprise messaging services. Focus on Microsoft Exchange Online, Exchange On‑Premises, and Mimecast with responsibilities for incident resolution, platform stability, observability improvements, and automation validation.
Responsibilities General Reliability Operations- Resolve complex messaging‑related incidents through advanced troubleshooting and cross‑team coordination.
- Serve as a senior escalation point for the team.
- Monitor observability and AIOps platforms to detect anomalies, service degradation, and emerging risks.
- Perform advanced incident triage and event correlation to identify root cause and reduce duplicate or misrouted alerts.
- Validate automated remediation workflows and ensure reliability before broader production rollout.
- Identify recurring operational tasks and translate them into automation or scripting opportunities.
- Improve alert signal quality by tuning thresholds, suppression logic, and correlation rules.
- Lead post‑incident reviews, identifying systemic fixes and reliability improvements.
- Ensure messaging telemetry, event data, and service mappings align with monitoring and CMDB standards.
- Partner with Cloud, IAM, Security, Network, and Service Now teams to improve messaging reliability and governance posture.
- Support the availability, performance, and reliability of Microsoft Exchange Online and Exchange On‑Premises environments.
- Troubleshoot complex mail flow, transport rules, DNS, SMTP, hybrid configuration, and authentication‑related issues.
- Administer and optimize Mimecast policies for spam filtering, malware protection, phishing mitigation, and data protection.
- Analyze messaging logs and telemetry to detect trends, abnormal patterns, and systemic weaknesses.
- Support and validate platform changes, upgrades, and configuration updates following change‑management practices.
- Participate in resilience validation exercises, including failover, hybrid validation, and mail continuity testing.
- Collaborate with Security and Compliance teams on investigations, audit support, and messaging‑related security incidents.
- Mentor junior engineers and provide knowledge‑sharing on messaging operations and troubleshooting patterns.
- Develop and enhance Power Shell scripts to automate repetitive messaging administration tasks.
- Contribute to Service Now workflow improvements related to messaging incidents and requests.
- Integrate messaging platform health signals and vendor advisories into AIOps and monitoring systems.
- Tune alerting and anomaly detection models to reduce noise and improve actionable signal quality.
- Support predictive detection efforts by refining telemetry and logging standards.
- Track improvements in MTTR, alert reduction, automation coverage, and service availability.
- On‑Call Support:
Participation in a shared, rotational on‑call schedule is required.
Contact reasonab for assistance with application process accommodation.
Equal Opportunity EmployerGenesys is an equal opportunity employer committed to fairness in the workplace. We evaluate qualified applicants without regard to race, color, age, religion, sex, sexual orientation, gender identity or expression, marital status, domestic partner status, national origin, genetics, disability, military and veteran status, and other protected characteristics.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: