pub H-BRS | 004 Datenverarbeitung; Informatik

Textual Entailment Recognition with Semantic Features from Empirical Text Representation (2023)

Shajalal, Md ; Atabuzzaman, Md. ; Baby, Maksuda Bilkis ; Karim, Md. Rezaul ; Boden, Alexander

Harnessing Prior Knowledge for Explainable Machine Learning: An Overview (2023)

Beckh, Katharina ; Müller, Sebastian ; Jakobs, Matthias ; Toborek, Vanessa ; Tan, Hanxiao ; Fischer, Raphael ; Welke, Pascal ; Houben, Sebastian ; Rueden, Laura von

Generating Musical Compositions through a Data-Driven Approach along with Static Implementations of Theoretical Principles (2022)

Jiang, Daniel

In the field of automatic music generation, one of the greatest challenges is the consistent generation of pieces continuously perceived positively by the majority of the audience since there is no objective method to determine the quality of a musical composition. However, composing principles, which have been refined for millennia, have shaped the core characteristics of today's music. A hybrid music generation system, mlmusic, that incorporates various static, music-theory-based methods, as well as data-driven, subsystems, is implemented to automatically generate pieces considered acceptable by the average listener. Initially, a MIDI dataset, consisting of over 100 hand-picked pieces of various styles and complexities, is analysed using basic music theory principles, and the abstracted information is fed into explicitly constrained LSTM networks. For chord progressions, each individual network is specifically trained on a given sequence length, while phrases are created by consecutively predicting the notes' offset, pitch and duration. Using these outputs as a composition's foundation, additional musical elements, along with constrained recurrent rhythmic and tonal patterns, are statically generated. Although no survey regarding the pieces' reception could be carried out, the successful generation of numerous compositions of varying complexities suggests that the integration of these fundamentally distinctive approaches might lead to success in other branches.

From Zero to Hero: Generating Training Data for Question-To-Cypher Models (2022)

Opitz, Dominik ; Hochgeschwender, Nico

Graph databases employ graph structures such as nodes, attributes and edges to model and store relationships among data. To access this data, graph query languages (GQL) such as Cypher are typically used, which might be difficult to master for end-users. In the context of relational databases, sequence to SQL models, which translate natural language questions to SQL queries, have been proposed. While these Neural Machine Translation (NMT) models increase the accessibility of relational databases, NMT models for graph databases are not yet available mainly due to the lack of suitable parallel training data. In this short paper we sketch an architecture which enables the generation of synthetic training data for the graph query language Cypher.

Multimodal Transformers for Biomedical Text and Knowledge Graph Data (2022)

Balabin, Helena

Recent advances in Natural Language Processing have substantially improved contextualized representations of language. However, the inclusion of factual knowledge, particularly in the biomedical domain, remains challenging. Hence, many Language Models (LMs) are extended by Knowledge Graphs (KGs), but most approaches require entity linking (i.e., explicit alignment between text and KG entities). Inspired by single-stream multimodal Transformers operating on text, image and video data, this thesis proposes the Sophisticated Transformer trained on biomedical text and Knowledge Graphs (STonKGs). STonKGs incorporates a novel multimodal architecture based on a cross encoder that uses the attention mechanism on a concatenation of input sequences derived from text and KG triples, respectively. Over 13 million so-called text-triple pairs, coming from PubMed and assembled using the Integrated Network and Dynamical Reasoning Assembler (INDRA), were used in an unsupervised pre-training procedure to learn representations of biomedical knowledge in STonKGs. By comparing STonKGs to an NLP- and a KG-baseline (operating on either text or KG data) on a benchmark consisting of eight fine-tuning tasks, the proposed knowledge integration method applied in STonKGs was empirically validated. Specifically, on tasks with a comparatively small dataset size and a larger number of classes, STonKGs resulted in considerable performance gains, beating the F1-score of the best baseline by up to 0.083. Both the source code as well as the code used to implement STonKGs are made publicly available so that the proposed method of this thesis can be extended to many other biomedical applications.

A qualitative study of Machine Learning practices and engineering challenges in Earth Observation (2021)

Jentzsch, Sophie ; Hochgeschwender, Nico

Machine Learning (ML) is ubiquitously on the advance. Like many domains, Earth Observation (EO) also increasingly relies on ML applications, where ML methods are applied to process vast amounts of heterogeneous and continuous data streams to answer socially and environmentally relevant questions. However, developing such ML- based EO systems remains challenging: Development processes and employed workflows are often barely structured and poorly reported. The application of ML methods and techniques is considered to be opaque and the lack of transparency is contradictory to the responsible development of ML-based EO applications. To improve this situation a better understanding of the current practices and engineering-related challenges in developing ML-based EO applications is required. In this paper, we report observations from an exploratory study where five experts shared their view on ML engineering in semi-structured interviews. We analysed these interviews with coding techniques as often applied in the domain of empirical software engineering. The interviews provide informative insights into the practical development of ML applications and reveal several engineering challenges. In addition, interviewees participated in a novel workflow sketching task, which provided a tangible reflection of implicit processes. Overall, the results confirm a gap between theoretical conceptions and real practices in ML development even though workflows were sketched abstractly as textbook-like. The results pave the way for a large-scale investigation on requirements for ML engineering in EO.

Quincy: Detecting Host-Based Code Injection Attacks in Memory Dumps (2017)

Barabosch, Thomas ; Bergmann, Niklas ; Dombeck, Adrian ; Padilla, Elmar

Software Feature Request Detection in Issue Tracking Systems (2016)

Merten, Thorsten ; Falis, Matúš ; Hubner, Paul ; Quirchmayr, Thomas ; Bürsner, Simone ; Paech, Barbara

Efficient Template Attacks Based on Probabilistic Multi-class Support Vector Machines (2013)

Bartkewitz, Timo ; Lemke-Rust, Kerstin