9–11 Oct 2024
Mannheim, Schloss
Europe/Berlin timezone

Text+ LLM Service

10 Oct 2024, 16:45
1h 15m
O 138 (Fuchs-Petrolub-Saal) (Mannheim, Schloss)

O 138 (Fuchs-Petrolub-Saal)

Mannheim, Schloss

Schloss 68161 Mannheim

Speakers

Alexander Steckel (Georg-August-Universität Göttingen) Umut Basaran (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen) Stefan Buddenbohm (Georg-August-Universität Göttingen) Maik Wegener (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen) Philipp Wieder (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)

Description

AI as a Service

Text- and language-based humanities offer extensive use-cases for Large Language Models(LLMs). Text+ currently facilitates access to research data via the Text+ Registry, FederatedContent Search (FCS), and contributing partner’s data repositories. Through GWDG, a nationalhigh-performance computing and AI center, an additional web service will be made available onthe Text+ website providing free access to open-source and custom fine-tuneable LLMs like(Meta) LLaMA, Mixtral, Qwen, and Codestral as well as Open-AI’s Chat-GPT [1].

Text+ aims to be the first NFDI consortium to host an LLM service to its user base ensuringto meet the need of researchers that their data remain private and are not stored without theirconsent [2]. Especially when dealing with sensitive and/orcopyrighted materials, this added emphasis on necessary data protection is met here.

Implementation and Advantages

The LLM service enables users to create, edit and delete custom LLMs. A collaborations sectionallows users to invite collaborators to chat with the custom LLMs. The service offers free use ofvarious open-source models, a sources section on generated answers for users to check andenable citations, retrieval-augmented generation on personal documents, and compliance withlegislative requirements and user privacy interests. Currently, the service is available for projectparticipants who log in via Academic Cloud [3]. The serviceexcels at ensuring that no user related data is transferred externally with the ppen source LLMsas the host servers are GWDG’s [4]. With Open AI’s Chat-GPT,no single user related data is externally transmitted, as the current implementation makes allusers appear as one.

(Future) Use Cases

Within the context of Text+, the service is planned to assist in various domains. Datapreprocessing using Named Entity Recognition (NER), providing APIs with external portsopening in a GPU-supported runtime environment for Docker containers and context knowledgevia Entity Linking are covered. Additional scenarios, to name just a few, include FederatedContent Search backing query formulation based on natural language descriptions, GermaNetentries improvement by generating example sentences, historical normalization throughseq2seq transformer models, and APIs for components offering neural models such as speechreproduction and event detection.

Feedback

As an agile development, the LLM service aims to undergo constant enhancement regardingfunctionality and accessibility over time, with feedback from users playing a major role. Userswill therefore be asked to use the contact form [5] to providetheir experiences and suggestions. Also, this service is just the first step towards a growingnumber of offerings related to LLMs.

References

  1. https://kisski.gwdg.de/leistungen/2-02-llm-service/

  2. Cf. https://www.researchgate.net/publication/381883055_Chat_AI_A_Seamless_Slurm-Native_Solution_for_HPC-Based_Services

  3. https://academiccloud.de/

  4. Cf. https://datenschutz.gwdg.de/services/chatai

  5. https://text-plus.org/helpdesk/#kontaktformular

Presentation materials

There are no materials yet.