More jobs:
Job Description & How to Apply Below
Description:
Design and implement cloud-native database infrastructure using Terraform /Ansible to provision managed DB instances in multi-clouds (RDS/Azure DB /Cloud SQL) and self-managed clusters
Automate Configuration Management, security hardening, and patching of database instances across all environments. Automate workflows to reduce manual effort and improve reliability
Develop internal tools and scripts (Python/Bash) to enable production support teams to manage their own database instances and environments safely. Develop scripts for routine operational tasks like backups, health checks, etc
Integrate advanced observability platforms (Dynatrace, Cloud Watch) with AIOps tools to establish SLOs and train models for anomaly detection and proactive forecasting of database degradation like predicting slow queries or imminent connection pool exhaustion)
Design, deploy, and govern AI-powered agents (using Azure Copilot /AWS Bedrock) to achieve autonomous self-healing capabilities and automated resource management
Implement advanced monitoring (Cloud Watch, Dynatrace) for key database metrics (SLIs/SLOs) like latency, throughput, error rates, and connection pools. Develop and train predictive ML models to analyze historical telemetry and forecast potential system outages or performance bottlenecks and configure proactive monitoring and alerting for critical services
Respond to alerts and create self-healing actions based on alerts
Design and implement cross-region/multi-AZ replication, automated failover strategies, and point-in-time recovery (PITR) procedures for mission-critical databases. Disaster recovery planning and DR drills
Execute backup strategies and validate recovery procedures using Rubrik and Perform restores as needed
Work closely with application operations / Production support teams to troubleshoot issues on database layer (performance, locks, schema) and the platform layer (multi-cloud /middleware /network, resource limits) to find the root causes
Lead incident response and root cause analysis (RCA) for database outages, performance degradations, and data integrity issues. Collaborate with DBAs and application teams for root cause analysis
Implement AI tools to perform real-time Root Cause Analysis (RCA), correlate complex event data (logs, metrics) and auto-generate runbooks
Define and automate scaling strategies (read replicas, sharding, auto-scaling) based on predicted load and business growth. Provide input for capacity planning and resource optimization
Implement cost management policies, including rightsizing instances, managing storage tiers, and defining lifecycle rules for backups and snapshots
Proactively analyze query performance, index usage, and database configuration, making and automating changes to optimize throughput and reduce latency. Support DBA teams in performance tuning initiatives
Implement robust secrets management solutions (AWS Secrets Manager, Hashi Corp Vault) for database credentials, ensuring applications retrieve secrets securely at runtime
Ensure database environments meet regulatory requirements (PCI, HIPAA, GDPR) through encryption-at-rest and in-transit, audit logging, and automated compliance checks
Define and enforce least-privilege access policies (IAM roles, service accounts) for databases
Implement encryption and data masking policies as directed
Manage security and compliance by utilizing AI agents to detect configuration drift and auto-generate compliant updates for IAM, network, and security policies
Apply patches and perform upgrades in coordination with DBA teams
Validate post-upgrade functionality and compliance
Requirements
8+ years of experience in Oracle / DB2 /MSSQL/Snowflake/Postgre
SQL and MySQL administration, with a strong focus on AIOps integration
5+ years of experience in public cloud operations (AWS, Azure, GCP)
Deep, demonstrable expertise designing and operationalizing solutions leveraging AWS Bedrock/Agent Frameworks and Azure Copilot for DB Operations
Expertise in Infrastructure as Code (Terraform, Cloud Formation), Ansible, and CI/CD pipelines, including supervising AI-generated infrastructure artifacts
Expertise integrating observability platforms into AI/ML platforms for predictive analysis and anomaly detection.
- Advanced (7+ Years)
Hands-On experience on Informatica Power Center / Power
BI /Cognos /Sapiens /Alteryx/IDMC/ILM/SAS / Business Objects / Glue / SPSS /ODI is a plus - Advanced (7+ Years)
Proficiency in scripting languages (Python, Bash) - Advanced (7+ Years)
Benefits
Competitive compensation and benefits package:
Competitive salary and performance-based bonuses
Comprehensive benefits package
Career development and training opportunities
Flexible work arrangements (remote and/or office-based)
Dynamic and inclusive work culture within a globally renowned group
Private Health Insurance
Pension Plan
Paid Time Off
Training & Development
Note:
Benefits differ based on employee level.
About Capgemini
Capgemini is a global leader in partnering with companies to…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×