Joint NHR Data Management Training

Europe/Berlin
BigBlueButton (Online)

BigBlueButton

Online

Hendrik Nolte (GWDG)
Description

This is a joint NHR Training held by 5 different NHR centers. It consists of  different sessions related to a diverse set of challenges arising when doing proper data management within HPC workloads. Although the different sessions will build up on each other, they can still be taken individually. However, to efficiently participate in selected sessions, participants are recommended to have a reasonable familiarity with previously taught concepts. The entire course will take place online and will span over a period of two days.

This course will start with a basic introduction to data management on HPC system and their specific challenges. This includes the concept of storage tiering, and how HPC workflows can be designed to optimally utilize them. Important permission concepts to efficiently organize larger consortia and isolate different users within their own, well-defined space along with further techniques for data sharing and data cataloging are also explained. All of these concepts are supplemented by hands-on sessions. 

Then, further details on metadata and their extraction are given, followed by the introduction of dedicated data management systems, with a specific focus on Coscine.

The second day starts with a deep dive into the Research Data Management Organizer (RDMO), a well-established tool for creating Data Management Plans (DMP). 

The course concludes with a detailed and holistic overview of storage systems. It starts with explanations on the meaning of I/O, inodes, and files. Differences between local file systems (like ext4) and parallel filesystems (like BeeGFS or Lustre) and their implications are stated. Then different access patterns for parallel I/O are introduced, and tools like Darshan and Score-P to for analysis are demonstrated. This session concludes with a summary of I/O best practices. 

Everyone can join this course for free thanks to the funding received from the “Nationales Hochleistungsrechnen” by the Project ”Large Scale Data Management”.

 

  • Tuesday, 5 November
    • 09:00 09:20
      Introduction to Research Data Management
      Convener: Marcel Nellesen (RWTH)
    • 09:20 10:50
      Introduction to Data Management on Tiered Storage Systems
      Convener: Hendrik Nolte (GWDG)
    • 10:50 11:15
      Break
    • 11:15 12:15
      Introduction to Snakemake and iRODS
      Convener: Christian Meesters (NHR Süd West)
    • 13:30 14:30
      Working Together: Data Sharing and User Isolation
      Convener: Hendrik Nolte (GWDG)
    • 14:30 15:00
      Introduction to Coscine
      Convener: Katja Jansen (RWTH)
    • 15:00 15:30
      Writing Storage Applications in JARDS: Writing STorage Applications in JARDS
      Convener: Katja Jansen (RWTH)
    • 15:30 16:00
      Data Transfers Between HPC Systems and S3 Buckets
      Convener: Marcel Nellesen (RWTH)
    • 16:00 16:40
      Automatic Metadata Extraction (Including Metadata Schemas)
      Convener: Marcel Nellesen (RWTH)
  • Wednesday, 6 November
    • 10:30 12:00
      Introduction to Data Management Plans using RDMO
      Convener: Tim Hasler (ZIB)
    • 13:30 17:30
      Efficient Parallel I/O
      Convener: Sebastian Oeste (TUD)