Data Lake Admin Workshop

BigBlueButton (Online)


Julian Kunkel (GWDG), Hendrik Nolte (GWDG), Andreas Knuepfer (TU Dresden)

Data Lake Admin Workshop

In recent years, classic HPC users have seen an ever-increasing interest in the public cloud that is used as part of traditional HPC workflows. There are many reasons for this, e.g. special hardware components such as TPUs or special GPUs are available in the cloud earlier than in a local data center. In addition, there is a need for users to store any data for analysis using AI methods in different data silos and to be able to access them flexibly from HPC and cloud systems. A central role for data analytics workflows is the flexible data migration and provision in the data lake. For this purpose, highly-scalable object storage has long been established in the cloud area, which is mostly used via an S3 interface. Another advantage from the user's point of view for a consistent data management strategy as offered by a data lake is the uniform and consistent view that it allows for the individual data silos.

This ongoing shift in the usage model of HPC systems requires admins to extend their consulting, software, and hardware offerings. This Admin-Workshop will be split in two distinct parts. In the first part we will have three Talks presenting three different Data Management Tools from an HPC perspective. This session will go with an honest and critical discussion about the short comings of each tool. The goal is not to promote one, or all of these tools, but to identify common challenges and unique solutions in order to take a first step to develop an NHR wide strategy for Data Management.

In the second part, related topics are being presented and discussed, ranging from object storage from an HPC perspective, using an HPC system with a ReST API, or securely processing sensible (medical) data on a shared HPC system. This session will be concluded with a joint discussion about high-performance data analytics (HPDA), big data analytics (BDA), and scientific data management in general. This will foster further collaboration in this matter across the different NHR centers.

Important Infos

Date and Time Thursday, September 29th 2022, 13:00 - 17:00
Venue Virtual / Room: Data Lake Admin Workshop
Organizers Julian Kunkel (Uni Göttingen/GWDG),
  Hendrik Nolte (GWDG),
  Andreas Knuepfer (TU Dresden),
  Alexander Goldmann (GWDG),


This workshop is funded by the GWDG and supported by the NHR.
