Roche company logo

Roche is hiring a Infrastructure Management and Provisioning Engineer

Get the latest jobs to your inbox!

Job Description

At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections,  where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.

The Position

Job description

As an Infrastructure Provisioning and Management Engineer within the Accelerated Compute Engineering (ACE) team, you will be responsible for overseeing and advancing our core infrastructure management and provisioning tech stack. This role has a strong focus on driving configuration-as-code, infrastructure-as-code (IaC), and modern automated provisioning best practices across our high-performance compute (HPC) and industry-leading AI Factory.

You will own the lifecycle, deployment, and optimization of bare-metal and virtualized compute environments that power Roche's advanced computing initiatives. By treating infrastructure strictly as code and eliminating manual configurations, you will ensure our advanced clusters are highly reproducible, securely patched, and rapidly scalable to meet the evolving demands of computational science and large-scale AI workloads.

Description of the area

Hosting and Infrastructure (HI) provides mission-critical on-premise infrastructure, cloud hosting, connectivity, and technology products that enable all functions at every Roche site to develop, innovate, connect, and deliver compliant digital products across the Roche Enterprise.

The Value Streams - Accelerated Compute Engineering (ACE) Team is focused on driving both customer success and platform success by acting as a center of excellence and delivery for the High Performance Compute and AI Infrastructure supporting AI and HPC use cases across Roche. This team facilitates seamless onboarding and adoption for business vertical customers needing accelerated compute—helping those infrastructure consumers with needs optimized for high availability, seamless data transfer, flexibility, speed, and the rapidly changing needs of AI—helping achieve rapid time-to-value.

Job Responsibilities

Automated Provisioning & Cluster Orchestration

  • Design, deploy, and manage large-scale automated provisioning systems for multi-node HPC and AI Factory environments.

  • Own and maintain the infrastructure management and provisioning tech stack underpinning the orchestration, monitoring, and provisioning of complex GPU and CPU workloads.

  • Streamline bare-metal provisioning and node imaging pipelines to ensure minimal downtime and rapid expansion capabilities.

Infrastructure-as-Code (IaC) & Configuration Governance

  • Enforce a strict configuration-as-code and infrastructure-as-code mindset, replacing manual interventions with repeatable automation scripts.

  • Author, review, and maintain complex Ansible playbooks and roles for configuration management, patch deployment, and compliance drift remediation.

  • Establish robust CI/CD pipelines using GitLab to test, validate, and deploy infrastructure changes safely across development, staging, and production clusters.

Operating System Engineering & Lifecycle Management

  • In partnership with Enterprise OS teams, standardize and manage operating system builds, with dual proficiency across HPC and AI Factory platforms.

  • Utilize solutions such as Red Hat Image Builder and NVIDIA Base Command Manager to create optimized, compliant, and secure custom golden images tailored for AI and high-performance computing workloads.

  • Manage OS lifecycles, including kernel tuning, automated package updates, and vulnerability management, ensuring alignment with global security standards.

Platform Reliability & Collaboration

  • Implement proactive monitoring and alerting for infrastructure provisioning health, node availability, and configuration drifts.

  • Address and help resolve complex, systemic infrastructure failures, contributing to post-mortem analyses to continuously improve platform resilience.

Qualifications

Education / Experience

  • Bachelor’s or an advanced degree in Computer Science, Computer Engineering, or a similar technical discipline.

  • 5+ years of experience in systems engineering, DevOps, or platform infrastructure roles, with a proven track record of managing enterprise Linux environments at scale.

  • Deep, practical knowledge of operating system internals for both RHEL and Ubuntu OS.

Technical & Business Skills:

  • Automation & Orchestration: Advanced capability with Ansible on the command line and experience building scalable infrastructure pipelines using GitLab CI/CD.

  • Provisioning Tooling: Experience using NVIDIA Base Command Manager (Bright Cluster Manager) and Red Hat Image Builder (or related tools like Kickstart/Satellite).

  • Modern Engineering Mindset: Strong adherence to git-based workflows, code-review methodologies, and infrastructure-as-code principles.

  • Troubleshooting Depth: Ability to isolate complex, multi-layered faults bridging hardware, kernel configurations, and automation scripts.

Leadership & Mindset:

  • Lean & Agile Mindset: Passionate about continuous improvement, eliminating technical debt, and automating repetitive tasks to achieve scale.

  • Collaboration & Communication: Strong collaborative skills with an enterprise mindset, capable of working fluidly across team boundaries to drive platform success.

  • Intellectual Curiosity: Highly self-motivated to explore and adopt emerging technologies in the fast-evolving landscape of HPC and AI infrastructure engineering

 

 

Who we are

A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.


Let’s build a healthier future, together.

Roche is an Equal Opportunity Employer.

Sponsored
⭐ Featured Partner

Explore Sports Tech Careers

Discover exciting opportunities in sports technology. Join innovative companies transforming the sports industry through data, media, and cutting-edge tech.

Remote FriendlyCompetitive SalarySports Tech

Create a Job Alert

Interested in building your career at Roche? Get future opportunities sent straight to your email.

Create Alert

Related Opportunities

Discover similar positions that might interest you