Workshop for Efficient Data Handling, Data Exchange and Storage Performance

Europe/Berlin
Hendrik Nolte (NHR@Göttingen)
Description

Within this admin workshop, all NHR centers will quickly (10-15) present their services concerning data management. This does not only include specific data management systems, but also extends to different available storage tiers with individual performance profiles, data exchange services, tools for workflow orchestration and provenance auditing, and offered trainings. These talks should serve as a foundation to determine the current state within the NHR and should offer admins the chance for an open and critical exchange: Was there some service that was accepted very well by users, or was there some didn't worth the effort? What problems with data management do admins encounter and a day-to-day basis? What services and trainings are available to recommend to users?
In addition to defining this baseline, this workshop should serve as a chance to network and brainstorm new ideas for further directions, particularly about the question of what HPC centers can do to support the increasing volume of data-intensive projects.

Registration
Participants
    • 09:30 09:45
      Welcome
      Convener: Prof. Julian Kunkel (NHR@Göttingen)
    • 09:45 10:00
      Research data services and current challenges at NHR@ZIB

      The presentation will outline the current research data-oriented services at NHR@ZIB. A current practical example from a data-intensive project will present challenges and practical considerations that arise in the planned handling of several hundred terabytes of data, both in transfer to the NHR center and in handling during processing within the center.

    • 10:00 10:15
      Coscine – playing FAIR

      For many researchers an involvement with the FAIR principles does not begin until the publication of an article and the sometimes-obligatory transfer of the research data to a repository. At this point, a significant amount of valuable information about the research project is often already lost. Therefore, only a fraction of the data (and metadata) collected during a research project is ever published. This is a particularly difficult challenge when HPC systems are used, since in this case large amounts of data are generated, analyzed, and transferred. One solution to make research data FAIR from the very beginning of its life cycle is to use a storage environment daily that implicitly implements FAIR principles. To create such a storage environment, the research data management platform Coscine was developed at RWTH Aachen University. Coscine provides an integrated concept for research (meta)data management in addition to storage, management and archiving of research data. To help researchers interact with Coscine through the interfaces and improve integration with existing data management processes, tools, programs, and consultation for the technical adaptation of the platform are provided. This includes the collection or extraction of metadata based on the data or the environment in which it was generated. In this talk, we present the research data management platform Coscine, the underlying storage system and methods for transferring data from and to Coscine.

      Convener: Mr Marcel Nellesen (RWTH)
    • 10:15 10:30
      Current State and Future Plans to Support Data-Intensive Projects

      This talk will provide a quick overview about the challenges we currently see our users are faced with when working in a data-intensive project. An overview of our current state and future plans is given, including specific storage tiers to serve different IO-related needs of our users. In addition, different related services and trainings are mentioned, and the talk will end with some far fetched ideas to prompt a lively discussion.

      Convener: Mr Hendrik Nolte (NHR@Göttingen)
    • 10:30 10:45
      Datenmanagement at NHR@FAU

      Johannes Veh gives a short introduction to the different storage systems at NHR@FAU as well as insight into the work of the "FAU Competence Center for Research Data and Information (FAU CDI)"

      Convener: Dr Johannes Veh (FAU)
    • 10:45 11:00
      Research Data Management Challenges at PC2

      We give a short overview on the existing systems and services for research data management at PC2, the current efforts, and challenges.
      The focus is on services that are suitable for practical scientific computational work and their integration into workflows and programs.

      Convener: Dr Robert Schade (PC2)
    • 11:00 11:30
      Break / Discussion 30m
    • 11:30 11:45
      Building a Data Transfer Federation Between Research Centers

      The Steinbuch Centre for Computing (SCC) at Karlsruhe Institute of Technology (KIT) designed a data transfer federation in the context of NFDI4Ing project and also in close collaboration with University Heidelberg in the bwHPC-S5 project. The proposed federation will allow researchers to access and transfer data between different storage systems using their home organization's user account. The storage systems in a federation are either dedicated large-scale systems or systems associated with High-Performance Computing. At SCC, we have integrated our Large Scale Data Facility: Online Storage (LSDF OS) with WebDAV protocol and OAuth2 authentication to enable access of third-party applications to the storage service. Additionally, we are working on integrating Horeka's storage as part of the data transfer federation.
      We have also deployed an instance of the File Transfer Service (FTS) on-premises and integrated it with our identity provider (IdP). FTS is a low-level data management service that schedules reliable bulk transfer of files from one site to another. However, FTS is designed to be used with one identity provider, whereas in a federation, more than one IdP is involved. To address this limitation, we have designed a central IdP to issue tokens that are recognizable by the downstream identity providers which manage users’ access to their corresponding storage service in the federation. At each storage service, tokens issued by the central IdP are resolved to the corresponding user identity at the local IdP via a mapping policy. To achieve this, a unified approach in the federation regarding the information included in the token is necessary. This will enable users to have seamless access to a wider range of resources, thereby enhancing collaboration between research centers.

      Convener: Dr Mozhdeh Farhadi (KIT)
    • 11:45 12:00
      Data and Storage, Present and Plans at NHR@TUD

      A brief overview of the current data infrastructure at TUD and best practices, complemented by considerations of the challenges of a future TUD data management platform.

      Convener: Mr Christian Löschen (ZIH)
    • 12:00 12:30
      Concluding Discussion