We are

At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more. Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.

Our challenge

We are seeking a highly skilled Lead Site Reliability Engineer (SRE) / DevOps Engineer to drive the reliability, observability, and operational excellence of our platforms. This role will lead major initiatives around monitoring, automation, incident response, and performance optimization leveraging enterprise tools such as Dynatrace, BigPanda, and LogScale/MonPro. Candidate will partner closely with engineering, operations, and product teams to build robust systems, improve service availability, and ensure a seamless user experience through proactive observability and best-in-class SRE practices.

Additional Information*

The base salary for this position will vary based on geography and other factors. In accordance with law, the base salary for this role if filled within Pittsburgh, PA/Dallas, TX is $125k - $135k/year & benefits (see below).

The Role

Responsibilities:

Observability & Monitoring

Implement and enhance proactive observability frameworks to anticipate and mitigate issues before they occur.
Optimize experience monitoring and user interaction metrics across applications and services.
Manage and improve the event catalog, ensuring all system events are structured and actionable.
Build and maintain dashboards, alerts, and health reporting using tools like Dynatrace, BigPanda, MonPro, and LogScale.
Perform service tuning to improve system performance based on real-time metrics and data analysis.
Establish and maintain observability standards and best practices across teams.
Conduct chaos testing and resilience validation to ensure high system availability.
Lead anomaly detection practices to quickly identify and respond to unusual system behavior.

SRE Practices

Ensure platform stability, performance, and reliability through proven reliability engineering principles.
Drive SRE initiatives, including continuous improvement projects within the Site Reliability Center.
Develop, maintain, and scale automated orchestration pipelines to streamline operations and improve efficiency.
Create, maintain, and enforce SRE standards, including SLIs, SLOs, and operational playbooks.
Lead and conduct root cause analysis for critical incidents and drive long-term remediation improvements.

Problem Management

Own the problem management lifecycle—identifying, tracking, and resolving underlying issues to prevent recurring incidents.
Collaborate with cross-functional teams to address systemic issues and drive operational resilience.

Requirements:

7+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
Hands-on expertise with observability/monitoring tools such as:
- Dynatrace (APM, RUM, dashboards, alerting)
- BigPanda (event correlation, incident response)
- LogScale / MonPro / LogicMonitor or similar log and metrics platforms
Solid experience with cloud platforms (AWS, Azure, or GCP).
Strong proficiency in automation & orchestration (Terraform, Ansible, Jenkins, GitHub Actions, etc.).
Proven track record in incident management, RCA, and implementing reliable SRE practices.
Experience with CI/CD pipelines, infrastructure as code, and configuration management.
Deep understanding of Linux systems, networking fundamentals, and distributed system design.
Strong scripting abilities (Python, Bash, PowerShell, or equivalent).
Excellent communication, leadership, and cross-team collaboration skills.

Preferred, but not required:

Experience leading SRE or DevOps teams.
Knowledge of chaos engineering, advanced anomaly detection, and proactive alerting strategies.
Experience implementing SLI/SLO frameworks and performance optimization programs.
Familiarity with containerization (Docker, Kubernetes) and service meshes.

We offer:

A highly competitive compensation and benefits package.
A multinational organization with 58 offices in 21 countries and the possibility to work abroad.
10 days of paid annual leave (plus sick leave and national holidays).
Maternity & paternity leave plans.
A comprehensive insurance plan including medical, dental, vision, life insurance, and long-/short-term disability (plans vary by region).
Retirement savings plans.
A higher education certification policy.
Commuter benefits (varies by region).
Extensive training opportunities, focused on skills, substantive knowledge, and personal development.
On-demand Udemy for Business for all Synechron employees with free access to more than 5000 curated courses.
Coaching opportunities with experienced colleagues from our Financial Innovation Labs (FinLabs) and Center of Excellences (CoE) groups.
Cutting edge projects at the world’s leading tier-one banks, financial institutions and insurance firms.
A flat and approachable organization.
A truly diverse, fun-loving, and global work culture.

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Lead SRE/DevOps Engineer

Related Jobs

Software Engineer II

Software Engineer II

Lead Software Engineer

Delivery Manager Data Science US

Mechanical Engineer

Receptionist (Notturno Turnante)