About the Job
We are seeking a highly skilled L3 Senior IT Infrastructure Operations Engineer to serve as a subject matter expert for our enterprise server and network infrastructure. This senior-level position demands deep technical expertise, strategic thinking, and leadership capabilities to drive operational excellence across the infrastructure operations function. The ideal candidate will possess extensive
experience with Dell PowerEdge server ecosystems, Cisco enterprise networking, and modern infrastructure management practices. You will lead complex incident resolution, architect standardized processes, mentor junior engineers, and partner closely with leadership to execute hardware refresh projects, develop automation initiatives, and maintain the highest levels of system availability in a demanding 24x7 global environment.
Key Responsibilities
Serve as the subject matter expert for Dell PowerEdge server infrastructure, providing expert-level troubleshooting of physical components, system configurations, and complex hardware failures.
Lead advanced network troubleshooting and fault isolation for complex issues across Cisco routers and switches, resolving critical incidents that have been escalated from L1/L2 teams.
Partner with the FTE IT Team Lead to develop and execute standardized processes for periodic firmware, BIOS, and driver updates across the entire server fleet, ensuring stability, security, and OS compatibility.
Develop, test, and maintain standardized procedures for IOS/NX-OS firmware and software updates across the Cisco infrastructure, ensuring compliance with security and operational requirements.
Lead hardware lifecycle management initiatives including planning and executing hardware refresh projects, EOL migrations, and ensuring continuous system availability during transitions.
Partner with the SRE team to develop comprehensive monitoring dashboards and integrate logging tools, creating proactive alerting mechanisms that flag instability or security events before service impact.
Collaborate with the SRE team to create and maintain code repositories for network configurations, enabling version control, configuration auditing, and rapid rollback capabilities.
Lead blameless post-mortem reviews following major incidents, driving root cause analysis, identifying systemic issues, and implementing preventative measures to enhance future reliability.
Partner with the IT Infrastructure Installation team to create and maintain detailed network diagrams, operational runbooks, and comprehensive documentation for all service architectures.
Conduct regular capacity planning and performance tuning assessments, providing strategic recommendations to leadership for infrastructure optimization and growth planning.
Mentor and develop L1 and L2 engineers through technical coaching, knowledge sharing sessions, and establishing best practices for infrastructure operations excellence.
Manage vendor relationships for hardware support, coordinating warranty claims, evaluating service quality, and ensuring contractual SLA compliance for break/fix and replacement activities.
Required Skills
Bachelor´s degree in Computer Science, Information Technology, or related field with 7+ years of progressive experience in enterprise IT infrastructure operations and support.
Expert-level proficiency with Dell PowerEdge server architecture, including deep knowledge of iDRAC/Redfish, hardware diagnostics, firmware management, and enterprise server lifecycle practices.
Advanced expertise in Cisco networking technologies including routing protocols, switching architectures, IOS/NX-OS administration, and complex network troubleshooting methodologies.
Strong experience with infrastructure automation, configuration management, and version control systems; familiarity with Infrastructure as Code principles and scripting languages.
Demonstrated leadership capabilities with experience mentoring technical teams, driving process improvements, and collaborating with cross-functional stakeholders on strategic initiatives.
Industry certifications such as CCNP/CCIE, Dell Proven Professional, or equivalent; ITIL certification preferred; ability to provide 24x7 on-call support