Genesys empowers organizations of all sizes to improve loyalty and business outcomes by creating the best experiences for their customers and employees. Through Genesys Cloud, the AI-powered Experience Orchestration platform, organizations can accelerate growth by delivering empathetic, personalized experiences at scale to drive customer loyalty, workforce engagement, efficiency and operational improvements.

We employ more than 6,000 people across the globe who embrace empathy and cultivate collaboration to succeed. And, while we offer great benefits and perks like larger tech companies, our employees have the independence to make a larger impact on the company and take ownership of their work. Join the team and create the future of customer experience together.

Overview

As a Senior Operations Reliability Engineer with a specialization in Cloud Infrastructure, you will play a key role in maintaining and improving the reliability, stability, and operational maturity of enterprise cloud and compute environments. This role focuses primarily on AWS infrastructure, with supporting responsibility for Azure and Windows-based systems.

You will lead incident detection, advanced troubleshooting, patching and vulnerability remediation validation, and proactive reliability improvements across AWS services and Windows/Linux compute platforms. In addition to hands-on operational support, you will actively contribute to automation initiatives, AIOps tuning, and the continuous improvement of monitoring, correlation, and signal quality.

This role blends advanced cloud operations with reliability engineering practices, including event correlation, automation validation, telemetry refinement, and support for emerging self-healing capabilities. You will collaborate across Cloud Engineering, Security, IAM, Network, and ServiceNow teams to strengthen operational standards and accelerate automation maturity across the platform.

Responsibilities

General Reliability Operations

Resolve complex cloud and OS-related incidents through advanced troubleshooting, coordinating cross-functional teams when necessary.

Monitor observability, AIOps, and event management platforms to detect anomalies, performance degradation, and emerging risks across AWS and compute systems.

Perform advanced incident triage and event correlation to determine root cause and reduce misrouted or duplicate incidents.

Lead validation of automated remediation workflows and ensure reliability of automation before production adoption.

Identify recurring manual operational tasks and translate them into automation requirements or lightweight scripted solutions.

Contribute structured operational insights, telemetry improvements, and signal refinement recommendations to reduce alert noise.

Lead post-incident reviews, including root cause documentation and reliability improvement actions.

Ensure cloud and OS telemetry aligns with monitoring, governance, and CMDB standards to support accurate correlation and impact analysis.

Partner with Cloud Engineering, Security, IAM, and Network teams to mature reliability practices and reduce operational risk.

Cloud & Windows Infrastructure Responsibilities

Troubleshoot advanced AWS operational issues, including EC2 performance anomalies, networking misconfigurations, IAM policy conflicts, storage degradation, and service dependency failures.

Support Azure VM and cloud service troubleshooting where applicable, ensuring cross-cloud awareness.

Perform deep OS-level diagnostics and remediation primarily within Windows Server environments, with supporting responsibilities across Linux systems.

Analyze telemetry from AWS CloudWatch, system logs, and vulnerability management platforms to detect trends and systemic weaknesses.

Own validation and oversight of patching and vulnerability remediation workflows primarily for Windows systems with a supporting role of Linux systems, ensuring compliance and reducing drift.

Improve tagging compliance, IAM access hygiene, backup validation, and governance posture through operational enforcement and automation.

Validate and support resilience testing (backup restores, failover simulations, DR exercises).

Contribute to infrastructure-as-code (Terraform) enhancements.

Develop scripts (PowerShell, Python, CLI-based automation) to improve repeatability and reduce manual effort.

Participate in readiness planning for new AWS services, infrastructure changes, or architectural updates, ensuring monitoring and operational support models are in place.

Provide mentorship and technical guidance to junior reliability engineers.

Automation & AIOps Contributions

Actively tune alert thresholds, suppression logic, and event correlation rules within AIOps and monitoring platforms.

Partner with teams to refine automated remediation logic and validate reliability before rollout.

Improve cloud signal quality by ensuring accurate metrics, logs, and dependency mapping across AWS services.

Contribute operational feedback to enhance predictive alerting and early detection models.

Track and support improvements in MTTR, alert noise reduction, patch compliance, and automation coverage.

Requirements

Bachelor’s degree in IT or related field, or equivalent experience.

5+ years of experience in cloud infrastructure, systems engineering, or infrastructure operations roles.

Strong hands-on experience with AWS services (EC2, VPC, IAM, EBS, S3, CloudWatch, networking).

Familiarity with Azure cloud environments.

Solid experience administering Windows Server; working knowledge of Linux systems.

Experience with patch management, vulnerability remediation, and system hardening.

Strong understanding of cloud governance principles (tagging, IAM access control, backups, cost awareness, compliance).

Experience working with monitoring, observability, and event management platforms.

Ability to write and modify automation scripts (PowerShell, Python, CLI tools, YAML/JSON).

Strong troubleshooting and analytical skills with the ability to interpret complex telemetry and log data.

Experience contributing to automation initiatives or reliability improvements.

Effective communication skills for cross-functional collaboration.

Motivation to continue developing deeper skills in automation, AIOps, infrastructure-as-code, and cloud reliability engineering.

Additional Information

Working Hours: 9:00 AM – 6:00 PM IST (first shift), supporting global platform operations.

On-Call Support: Participation in a shared, rotational on-call schedule is required.

#LI-GR1
#LI-Remote

If a Genesys employee referred you, please use the link they sent you to apply.

About Genesys:

Genesys® empowers more than 8,000 organizations worldwide to create the best customer and employee experiences. With agentic AI at its core, Genesys Cloud™ is the AI-Powered Experience Orchestration platform that connects people, systems, data and AI across the enterprise. As a result, organizations can drive customer loyalty, growth and retention while increasing operational efficiency and teamwork across human and AI workforces. To learn more, visit www.genesys.com.

Reasonable Accommodations:

If you require a reasonable accommodation to complete any part of the application process, or are limited in your ability to access or use this online application and need an alternative method for applying, you or someone you know may contact us at reasonable.accommodations@genesys.com.

You can expect a response within 24–48 hours. To help us provide the best support, click the email link above to open a pre-filled message and complete the requested information before sending. If you have any questions, please include them in your email.

This email is intended to support job seekers requesting accommodations. Messages unrelated to accommodation—such as application follow-ups or resume submissions—may not receive a response.

Genesys is an equal opportunity employer committed to fairness in the workplace. We evaluate qualified applicants without regard to race, color, age, religion, sex, sexual orientation, gender identity or expression, marital status, domestic partner status, national origin, genetics, disability, military and veteran status, and other protected characteristics.

Please note that recruiters will never ask for sensitive personal or financial information during the application phase.

Senior Operations Reliability Engineer – Cloud Infrastructure (AWS & Windows)

Related Jobs

Travel Nurse Recruiter (On-Site)

Weekend Baylor (RN) - Home Care Registered Nurse

LPN - Skilled Nursing

LPN - Infant Home Care Nurse

LPN - Home Care Nurse

LPN - 1:1 School Nurse