Lilly

Observability Integration Specialist

India, Hyderabad Full time

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.

Observability Integration Specialist

Location:

Lilly Hyderabad

Position Type:

Full-Time

Level:

P2

At Lilly, we believe in the talent of our workforce. One of the best ways to utilize and develop that talent is to use our existing workforce to fill new and/or open positions. If you are looking for a new position within Lilly, you can view and apply to open roles posted in the internal job posting system. You must meet the minimum qualifications outlined in the job description and have/obtain work authorization in the country the position is located in order to be considered. When applying internally for a position, your current supervisor will receive notification that you have applied to the position. We encourage employees to discuss the opportunity with their supervisor prior to applying.

Note: Roles are posted at the lowest level of a band; however, employees should search across all levels of the band to identify all opportunities. Employees hired on banded positions (e.g., P1–P3, R1–R2, B1–B3, etc.) transfer at their current level, despite the level indicated on the job posting.

The Observability Integration Specialist is a critical technical role within the AIOps COE. This specialist is responsible for developing, maintaining, and optimizing the end-to-end integration workflows that bridge actionable insights from the Observability and AIOps platforms with downstream service management and automation tools. The primary goal is to accelerate incident resolution and drive operational resilience by automating response actions.

Key Responsibilities

  • Integration Development: Design, code, and maintain robust integration scripts and APIs to connect the Observability/AIOps platform (e.g., Splunk ITSI) with source systems (monitoring tools) and target systems (e.g., ITSM like ServiceNow, Splunk Observability Cloud, and Automation platforms like Ansible).
  • Automated Remediation Workflows: Develop specific scripts, workflows, and "playbooks" that contain the steps necessary to fix common, high-priority issues (e.g., restarting services, scaling resources).
  • AIOps Trigger Logic: Configure and manage the logic within the AIOps platform that automatically triggers these remediation playbooks when a "notable event" or root cause is identified by correlation engines.
  • System Connectivity: Act as the technical expert for connecting new monitoring elements, service management, and automation platforms to the core observability stack, ensuring secure, high-performance data exchange.
  • Performance Optimization: Directly contribute to the reduction of Mean Time to Recovery (MTTR) by maximizing the effectiveness and speed of automated incident response.
  • Documentation: Create and maintain comprehensive technical documentation for all developed integrations, APIs, and automated playbooks.

Required Qualifications

  • Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
  • Experience: Overall 3+ years of experience in IT Operations, DevOps, or Platform Engineering with a focus on system integration and automation.
  • Technical Skills:
    • Proficiency in scripting languages such as Python, JavaScript/Node.js, or similar languages used for API integration and automation logic.
    • Deep understanding of REST APIs, webhooks, and queuing mechanisms for inter-system communication.
    • Experience integrating with enterprise-level ITSM platforms (e.g., ServiceNow, Jira Service Management) for incident creation and enrichment.
    • Direct experience with Automation/Orchestration tools (e.g., Ansible, Rundeck, Terraform) for triggering infrastructure changes.
  • Observability Knowledge: Familiarity with the data types and concepts within observability platforms (metrics, logs, traces) and how they translate into actionable events.

Preferred Qualifications

  • Experience working directly with AIOps platforms or solutions (e.g., Splunk ITSI, Dynatrace) to set up event correlation and alert routing.
  • Knowledge of standard IT service management (ITSM) and IT Infrastructure Library (ITIL) processes, particularly Incident, Problem, and Change Management.
  • Experience with Infrastructure as Code (IaC) tools and practices.

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form (https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly