Rakuten

Kubernetes/Container Development/Operation Engineer - Cloud Services Department (CLSD)

Tokyo, Japan Full time

Job Description:

Business Overview 
The Technology Platforms Division (TPD) drives the growth of Rakuten's ecosystem by delivering innovative, high-quality technology platforms characterized by integrated control and strategic partnerships.

  
Within TPD, the Cloud Platform Supervisory Department (CPSD) develops and manages Rakuten's state-of-the-art cloud platform, empowering global scalability and accelerating innovation across its diverse business units. 

Department Overview

The Cloud Services Department (CLSD) at Rakuten Group provides high-quality cloud infrastructure and platform services to application developers across Rakuten. Our mission is to enable secure, scalable, and efficient digital innovation. We deliver key domain services, including compute, storage, core infrastructure components, databases, container platform, observability, and gateway solutions, empowering Rakuten application teams to focus on their core business objectives.

Position:

Why We Hire

The business of Rakuten Group, Inc. is rapidly growing and our private cloud is rapidly growing as well. To support such growth, many interesting and ambitious projects are on going. We’re searching for new members who can enjoy such interesting and ambitious projects. We’re also welcoming those who can propose new ideas which can further support Rakuten Group, Inc.'s growth, with internal/external technologies and flexible mind.

 

Position Details

We are seeking a highly skilled and motivated Infrastructure Engineer with a strong background in Kubernetes, container technologies, and Linux systems, coupled with proven software development capabilities. In this role, you will be instrumental in designing, developing, and operating our core infrastructure, ensuring high availability, performance, and security. The ideal candidate will thrive in a fast-paced environment, embrace a "Get Things Done" mindset, and contribute to a culture of operational excellence within a large enterprise setting.

Key Responsibilities

1) Operation

- Cluster/Node Provisioning

- Alert/Incident Handling (24/7 OnCall. Daily Rotation roughly 1d / week)

- OS/Middleware Update

- Security Requirement Achievement

- Midnight Release, Midnight Monitoring

- Operation Manual Creation

- Risk Analysis of Production Environment Operation

2) Development

- Design/Proposal Doc (Diagram, pros/cons comparison)

- Cluster/Node auto provisioning

- OS/Middelware auto upgrade

- Engineer Self-healing

3) User Support and Migration Support

- Support special cases which user support group cannot handle

- Support migration from the legacy platform to new private cloud

 

Work Environment

- 16 members

- Language: Go, Python, Groovy, Shell Script

- Infrastructure: Private Cloud (Kubernetes, Baremetal, VM, Container)

- Provisioning/Operation: Ansible, multiple inhouse tools written in Go and operator pattern (redhat operator framework, etc.), jenkins

- Monitoring: prometheus, cortex, grafana, kibana, Datadog, PagerDuty

- CI/CD: Jenkins

- Knowledge Tool: Confluence

- Project Management: JIRA

- Communication Tool: Slack, MS Teams, Viber 

 

Mandatory Qualifications:

- Certified Kubernetes Administrator (CKA) Holder (Note: If not currently held, successful candidates are required to obtain CKA certification within 3 months of joining) 

- Experience of designing and developing web services with a "statically typed" programming language (golang/java/C++ etc)

- 3+ years of operation experience in a "big company"

- Strong sense of responsibility to keep the stability of the system, and to output artifacts by deadline

- Get things done mind for projects to meet the deadline

- Experience of leading projects

- Deep understanding and experience of Kubernetes/container/Linux provisioning and trouble shooting

- Basic knowledge of networking, TCP/IP

- Basic knowledge of distributed system and HA structure

- Experience of large scale system operation (100+ servers)

- Those who can follow to the strict rules such as document creation and approval process which is mandatory for infrastructure 

 

Desired Qualifications:

- Participate in open source activities, OSS contributor

- Bachelor/Master's degree around computer science, engineering, or related fields

- Experience of automation of large scale system operation

- Experience of development of middle - large scale application

- Experience of multiple monitoring tool (prometheus, cortex, grafana, datadog, newrelic, elasticsearch, kibana, etc.)

- Private/public cloud experience

#engineer #infrastructureengineer #technologyplatformdiv 

Languages:

English (Overall - 4 - Fluent)