The Incident and Problem Management Lead is responsible for ensuring the effective management of IT incidents and problems to minimize business impact and prevent recurrence. This role oversees the end-to-end process, drives timely resolution of incidents, root cause analysis, and continuous improvement initiatives. Additionally, the role manages a 24x7 Command Center operation with a team of 12 staff, ensuring continuous monitoring, rapid response, and operational excellence. As a leader in both operational IT service operations supporting digital transformation, this role champions AI adoption, process optimization, and shift-left strategies to enhance service delivery while reducing manual overhead whilst maintaining operational stability.
Responsibilities
Incident Management:
- Lead the incident management process to ensure rapid restoration of services.
- Coordinate major incident response, including communication with stakeholders and escalation management.
- Ensure adherence to SLAs and KPIs for incident resolution.
- Maintain accurate incident records and reporting.
Problem Management:
- Drive root cause analysis for recurring incidents and major problems.
- Provide oversight to permanent fixes and preventive measures.
- Maintain the knowledgebase of problems and ensure effective knowledge sharing.
- Collaborate with engineering and operations teams to reduce recurring incident volume.
- Review Incident Trends for preventive measures to incident occurrence.
Command Center Operations:
- Manage a 24x7 Command Center with 12 staff across rotating shifts.
- Ensure continuous monitoring of critical systems and proactive detection of issues.
- Establish clear escalation protocols and ensure timely response to alerts.
- Optimize staffing schedules and maintain high team performance.
- Implement automation and tools to improve operational efficiency.
Process Governance & Continuous Improvement:
- Define and enforce incident and problem management policies and procedures, ensuring annual review is performed.
- Monitor process performance and identify improvement opportunities.
- Provide training and guidance to teams and partners on best practices.
- Prepare and present regular reports to senior management.
- Implement shift-left strategies to streamline Infra Operations responses to common alerts and incidents.
- Act as the point of contact for audits related to Incident and Problem Management
Stakeholder Management:
- Act as the escalation point of contact for incident and problem management.
- Communicate effectively with business units, vendors, and leadership during critical events.
- Ensure transparency and timely updates throughout the incident lifecycle, including post-incident reporting to Group Risk Management.
Champion culture and conduct behavioral expectations within the Department/Division
Ensure compliance with IT policies and contribute to risk culture and audit participation
About Encora
Encora is a global company that offers Software and Digital Engineering solutions. Our practices include Cloud Services, Product Engineering & Application Modernization, Data & Analytics, Digital Experience & Design Services, DevSecOps, Cybersecurity, Quality Engineering, AI & LLM Engineering, among others.
At Encora, we hire professionals based solely on their skills and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality