Grade Level (for internal use):
12Department overview
S&P Global provides innovative products and services that enhance transparency, reduce risk, and improve operational efficiency. Our customers include banks, hedge funds, asset managers, central banks, regulators, auditors, fund administrators and insurance companies. We develop large scale technology platforms and enterprise software to produce global financial data with focus on analysis and regulatory requirements.
Position summary
We are seeking a seasoned Senior Site Reliability Engineer (SRE) to join our team. You will be responsible for the big picture architecture, day-to-day operations, and continuous improvement of our production systems, ensuring their availability, performance, and resilience. This role is pivotal in blending cutting-edge observability and automation with proactive engineering practices.
Responsibilities
Design, implement, and maintain comprehensive observability solutions to track the health and performance of our systems.
Analyze observability data and explore AIOps methodologies to identify potential issues, predict failures, and proactively troubleshoot problems before they impact users.
Develop and implement alerts and notifications for critical events to ensure timely intervention.
Collaborate with development teams to design and implement solutions that enhance system resilience, partially through designing and executing chaos engineering experiments (e.g., using AWS FIS), to reduce downtime.
Analyze performance metrics to identify and resolve latency bottlenecks in our infrastructure.
Implement performance optimization techniques and tools to improve the overall responsiveness of our systems.
Work with development teams to ensure that new features and code changes do not introduce performance regressions.
Develop and maintain metrics dashboards to track key performance indicators (KPIs) for our critical systems.
Identify performance trends and anomalies that may indicate potential issues or areas for improvement.
Recommend and implement performance optimization strategies to enhance the overall efficiency of our systems.
Optimize resource utilization and minimize unnecessary expenditure on IT infrastructure.
Identify and implement cost-effective solutions to improve the efficiency of our IT operations, reducing TOIL.
Design and implement automated deployment and rollback procedures to mitigate risks associated with software updates.
Monitor the performance of new releases and address any issues that arise promptly.
Analyze root causes of incidents to identify and implement preventive measures to minimize their recurrence.
Document incident responses and communicate lessons learned to enhance our incident handling processes.
Requirements
Proficient in application and infrastructure observability; Splunk OpenTelemetry preferred.
A deep understanding and practical application of Site Reliability Engineering principles.
Ability to build and maintain a system and culture that supports and implements SLOs.
Experienced in production environments running in AWS.
Comfortable with Infrastructure as Code; Terraform is preferred.
Familiar with Docker & Kubernetes, specifically EKS & ECS.
Familiar with programming languages, with a strong preference for Python (for scripting, automation, and data analysis/AI).
Comfortable with CI/CD pipelines such as GitHub Actions or Azure DevOps.
Understanding of the application lifecycle.
Familiarity working in an agile environment.
Ability to review architecture designs, ensuring observability coverage, high availability, resilience, and disaster recovery principles.
Familiarity with Chaos Engineering principles and experience designing or running controlled experiments to test system resilience.
Demonstrable interest or experience in AIOps, including the application of AI/ML to operational data and familiarity with platforms like AWS Bedrock.
Excellent troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues.
Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders.
A passion for maintaining high availability, performance, and reliability of critical systems in a fast-paced environment.
Maintain relationships with other disciplines and stakeholders.
Strong sense of ownership, urgency, and drive.
Potential participation in an on-call rotation.
Qualifications
Bachelor's degree in Computer Science, Information Technology, or a related field.
10+ years of experience as a Site Reliability Engineer or equivalent in a similar role.
Proven experience in monitoring, analyzing, and optimizing the performance of large-scale distributed systems in a cloud environment.
Proven experience with Windows or Linux production environments, including managing servers, operating systems, and network configurations within the cloud.
Proven scripting and automation skills, preferably Powershell, Bash or Python.
AWS certification preferred.
About S&P Global Market Intelligence
At S&P Global Market Intelligence, a division of S&P Global we understand the importance of accurate, deep and insightful information. Our team of experts delivers unrivaled insights and leading data and technology solutions, partnering with customers to expand their perspective, operate with confidence, and make decisions with conviction.
For more information, visit www.spglobal.com/marketintelligence.
What’s In It For You?
Our Purpose:
Progress is not a self-starter. It requires a catalyst to be set in motion. Information, imagination, people, technology–the right combination can unlock possibility and change the world.
Our world is in transition and getting more complex by the day. We push past expected observations and seek out new levels of understanding so that we can help companies, governments and individuals make an impact on tomorrow. At S&P Global we transform data into Essential Intelligence®, pinpointing risks and opening possibilities. We Accelerate Progress.
Our People:
We're more than 35,000 strong worldwide—so we're able to understand nuances while having a broad perspective. Our team is driven by curiosity and a shared belief that Essential Intelligence can help build a more prosperous future for us all.
From finding new ways to measure sustainability to analyzing energy transition across the supply chain to building workflow solutions that make it easy to tap into insight and apply it. We are changing the way people see things and empowering them to make an impact on the world we live in. We’re committed to a more equitable future and to helping our customers find new, sustainable ways of doing business. We’re constantly seeking new solutions that have progress in mind. Join us and help create the critical insights that truly make a difference.
Our Values:
Integrity, Discovery, Partnership
At S&P Global, we focus on Powering Global Markets. Throughout our history, the world's leading organizations have relied on us for the Essential Intelligence they need to make confident decisions about the road ahead. We start with a foundation of integrity in all we do, bring a spirit of discovery to our work, and collaborate in close partnership with each other and our customers to achieve shared goals.
Benefits:
We take care of you, so you can take care of business. We care about our people. That’s why we provide everything you—and your career—need to thrive at S&P Global.
Our benefits include:
Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.
For more information on benefits by country visit: https://spgbenefits.com/benefit-summaries
Global Hiring and Opportunity at S&P Global:
At S&P Global, we are committed to fostering a connected and engaged workplace where all individuals have access to opportunities based on their skills, experience, and contributions. Our hiring practices emphasize fairness, transparency, and merit, ensuring that we attract and retain top talent. By valuing different perspectives and promoting a culture of respect and collaboration, we drive innovation and power global markets.
Recruitment Fraud Alert:
If you receive an email from a spglobalind.com domain or any other regionally based domains, it is a scam and should be reported to reportfraud@spglobal.com. S&P Global never requires any candidate to pay money for job applications, interviews, offer letters, “pre-employment training” or for equipment/delivery of equipment. Stay informed and protect yourself from recruitment fraud by reviewing our guidelines, fraudulent domains, and how to report suspicious activity here.
-----------------------------------------------------------
Equal Opportunity Employer
S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.
If you need an accommodation during the application process due to a disability, please send an email to: EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person.
US Candidates Only: The EEO is the Law Poster http://www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_%20English_formattedESQA508c.pdf
-----------------------------------------------------------