SecureHPC: Processing Sensitive Data on Shared HPC Systems

Europe/Berlin
https://meet.gwdg.de/b/mar-qyh-er6-r5p (virtual)

https://meet.gwdg.de/b/mar-qyh-er6-r5p

virtual

Description

Researchers dealing with highly sensitive data are increasingly demanding access to the computational resources required to be able to adapt those approaches in their disciplines, which is being driven by the advancement of data and compute-intensive methods in numerous scientific domains. Integrating trustworthy security measures on current High Performance Computing (HPC) clusters is an ongoing endeavor to efficiently meet the computing needs of those researchers. The main issue with handling sensitive data securely is that HPC systems are often shared systems that are optimized for efficiency rather than strong security. Moreover, using only UNIX permissions is not secure enough since new vulnerabilities are constantly being found. In this workshop, we will use a step-by-step approach to show you how to use a secure workflow that has been been developed and deployed on the Scientific Compute Cluster (SCC), to guarantee more security when analysing sensitive data on HPC systems.

Registration
Participants
    • 13:00 13:15
      Welcome

      Prof. Dr. Julian Kunkel and Prof. Dr. Dagmar Krefting will welcome the attendees.

      Conveners: Prof. Dagmar Krefting (UMG), Prof. Julian Kunkel (GWDG)
    • 13:15 13:40
      Data security requirements by UMG researchers

      The data security office of UMG will talk about the legal requirements to process sensitive data with with the secure HPC partition.

      Convener: Florian Gottschalk (UMG)
    • 13:40 14:10
      Introduction to Secure HPC: Infrastructure overview

      The secure workflow developed by the GWDG relies on an ensemble of methods and tools to ensure data privacy. Fortunately it is automated end-to-end on a secure client, but also allows a manual work using the batch system. Here, instead of working on a run script on the HPC frontend, a user works on a similar file locally and executes our client script to submit the job on the secure partition.

      In this session, a high-level overview of the workflow will be presented with a focus on how it works on the user side.

      Convener: Trevor Khwam Tabougua (GWDG)
    • 14:10 14:25
      Use case 1: pseudonymised T1 magnetic resonance images segmentation

      The department of diagnostic and interventional neuroradiology at the university medical center Göttingen is working on an approach to utilize the GWDG Secure Workflow to externally segment pseudonymised T1 magnetic resonance images. The pipeline aims to generate reliable, reproducible and precise volumetric information on atrophy patterns within a patients brain, while running parallel to the clinical workflow of a regular MRI-scan. The goal is to provide the physician in a timely manner with quantified hints to simplify the search for the correct diagnosis right after the MRI-scan finished.

      The used FastSurfer neural net was trained on unedited MRI-images, and thus almost any alteration of the actual imaging data (f.e. removal of facial features) to ensure maximum protection of patient data leads to inaccurate and unusable segmentation results, making the utilization of externally provided hardware a difficult venture.

      The need for best possible data protection and the additional requirement of powerful hardware to meet the temporal criteria are a well fitting use case for the Secure Workflow on GWDGs HPC-Cluster.

      Convener: Philip Langer (UMG)
    • 14:25 14:40
      Use case 2: polysomnography

      Polysomnography (PSG) is a multimodal measurement of multiple biosignals such as electroencephalography (EEG), electrocardiography (ECG), electromyography (EMG) and is used in clinical practice for sleep stage scoring. Traditionally, a somnologist visually assesses the signals in parallel in 30s windows and classifies them in one out of five different sleep stages. This is a tedious and time-consuming task which also suffers from intra- and inter-rater variability. Recently, this led to the development of end-to-end machine learning approaches for processing PSG and computing sleep stages which automatically leads to high performance computing (HPC) for effective processing. However, due to their multimodality, PSGs are highly sensible: They allow to identify multiple types of disease and it has already been demonstrated that individuals can be identified from ECG or EEG data to a certain degree.

      Therefore, the Biosignal Processing Group of the Department of Medical Informatics at the University Medical Center Göttingen uses the GWDG HPC Secure Workflow with the aim of providing end-to-end sleep staging based on protected PSG data.

      Convener: Dr Nicolai Spicher (UMG)
    • 14:40 15:10
      Discussion about possible improvements / requirements
    • 15:10 16:00
      Hands-on and QA

      Trevor and Hendrik will answer the technical questions, and go though remaining issues.

      Conveners: Trevor Khwam Tabougua (GWDG), Hendrik Nolte (GWDG)