Encora

Incident and Problem Management Lead -IT Infrastructure

Kuala Lumpur Full Time

 

The Incident and Problem Management Lead is responsible for ensuring the effective management of IT incidents and problems to minimize business impact and prevent recurrence. This role oversees the end-to-end process, drives timely resolution of incidents, root cause analysis, and continuous improvement initiatives. Additionally, the role manages a 24x7 Command Center operation with a team of 12 staff, ensuring continuous monitoring, rapid response, and operational excellence. As a leader in both operational IT service operations supporting digital transformation, this role champions AI adoption, process optimization, and shift-left strategies to enhance service delivery while reducing manual overhead whilst maintaining operational stability.

 

Responsibilities

Incident Management:

  • Lead the incident management process to ensure rapid restoration of services.
  • Coordinate major incident response, including communication with stakeholders and escalation management.
  • Ensure adherence to SLAs and KPIs for incident resolution.
  • Maintain accurate incident records and reporting.

 

Problem Management:

  • Drive root cause analysis for recurring incidents and major problems.
  • Provide oversight to permanent fixes and preventive measures.
  • Maintain the knowledgebase of problems and ensure effective knowledge sharing.
  • Collaborate with engineering and operations teams to reduce recurring incident volume.
  • Review Incident Trends for preventive measures to incident occurrence.

 

Command Center Operations:

  • Manage a 24x7 Command Center with 12 staff across rotating shifts.
  • Ensure continuous monitoring of critical systems and proactive detection of issues.
  • Establish clear escalation protocols and ensure timely response to alerts.
  • Optimize staffing schedules and maintain high team performance.
  • Implement automation and tools to improve operational efficiency.

 

Process Governance & Continuous Improvement:

  • Define and enforce incident and problem management policies and procedures, ensuring annual review is performed.
  • Monitor process performance and identify improvement opportunities.
  • Provide training and guidance to teams and partners on best practices.
  • Prepare and present regular reports to senior management.
  • Implement shift-left strategies to streamline Infra Operations responses to common alerts and incidents.
  • Act as the point of contact for audits related to Incident and Problem Management

 

Stakeholder Management:

  • Act as the escalation point of contact for incident and problem management.
  • Communicate effectively with business units, vendors, and leadership during critical events.
  • Ensure transparency and timely updates throughout the incident lifecycle, including post-incident reporting to Group Risk Management.

 

Champion culture and conduct behavioral expectations within the Department/Division

Ensure compliance with IT policies and contribute to risk culture and audit participation

 

About Encora

Encora is a global company that offers Software and Digital Engineering solutions. Our practices include Cloud Services, Product Engineering & Application Modernization, Data & Analytics, Digital Experience & Design Services, DevSecOps, Cybersecurity, Quality Engineering, AI & LLM Engineering, among others.

At Encora, we hire professionals based solely on their skills and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality