NHR Data Lakes Workshop

Europe/Berlin
online

online

Freja Nordsiek (GWDG)
Description

Data-driven science requires not only fast storage systems but also strategies to manage this data efficiently within and across data centers. Big data tools can satisfy the need for searching data based on user specific metadata, however, there is a zoo of tools available and no single tool can realize all the requirements a HPC system in a data center requires. Data lakes, for example, are a reasonable approach but there are alternative concepts and tools that also need to be considered. A uniform and consistent view to the millions of scientific data files on HPC systems and their efficient processing is required to maximize exploitability and prevent segmented data silos between users or projects.

Within this Workshop we will discuss different topics related to large scale data management on HPC systems, ranging from the use of dedicated data management systems, to secure and efficient data transfer strategies, and storage challenges caused by data-intensive computations.

Registration
Participants
Freja Nordsiek