pub H-BRS | 005 Computerprogrammierung, Programme, Daten

Making Order in Household Accounting - Digital Invoices as Domestic Work Artifacts (2024)

Dethier, Erik ; Kern, Dean-Robin ; Stevens, Gunnar ; Boden, Alexander

The digitization of financial activities in consumers' lives is increasing, and the digitalization of invoicing processes is expected to play a significant role, although this area is not well understood regarding the private sector. Human-Computer Interaction (HCI) and Computer Supported Cooperative Work (CSCW) research have a long history of analyzing the socio-material and temporal aspects of work practices that are relevant for the domestic domain. The socio-material structuring of invoicing work and the working styles of consumers must be considered when designing effective consumer support systems. In this ethnomethodologically-informed, design-oriented interview study, we followed 17 consumers in their daily practices of dealing with invoices to make the invisible administrative work involved in this process visible. We identified and described the meaningful artifacts that were used in a spatial-temporal process within various storage locations such as input, reminding, intermediate (for postponing cases) buffers, and archive systems. Furthermore, we identified three different working styles that consumers exhibited: direct completion, at the next opportunity, and postpone as far as possible. This study contributes to our understanding of household economics and domestic workplace studies in the tradition of CSCW and has implications for the design of electronic invoicing systems.

Data Cart: A Privacy Pattern for Personal Data Management in Organizations (2023)

Tolsdorf, Jan ; Lo Iacono, Luigi

The European General Data Protection Regulation requires the implementation of Technical and Organizational Measures (TOMs) to reduce the risk of illegitimate processing of personal data. For these measures to be effective, they must be applied correctly by employees who process personal data under the authority of their organization. However, even data processing employees often have limited knowledge of data protection policies and regulations, which increases the likelihood of misconduct and privacy breaches. To lower the likelihood of unintentional privacy breaches, TOMs must be developed with employees’ needs, capabilities, and usability requirements in mind. To reduce implementation costs and help organizations and IT engineers with the implementation, privacy patterns have proven to be effective for this purpose. In this chapter, we introduce the privacy pattern Data Cart, which specifically helps to develop TOMs for data processing employees. Based on a user-centered design approach with employees from two public organizations in Germany, we present a concept that illustrates how Privacy by Design can be effectively implemented. Organizations, IT engineers, and researchers will gain insight on how to improve the usability of privacy-compliant tools for managing personal data.

Vergleich von Open Source MLOps Tools zur Unterstützung von Machine Learning basierten Zeitreihenanalysen (2024)

Autenrieth, Erik

Projekte des maschinellen Lernens (ML), insbesondere im Bereich der Zeitreihenanalyse, gewinnen heute zunehmend an Bedeutung. Die Bereitstellung solcher Projekte in einer Produktionsumgebung mit dem gleichen Automatisierungsgrad wie bei klassischen Softwareprojekten ist ein komplexes Unterfangen. Die Umsetzung in Produktionsumgebungen erfordert neben klassischen DevOps auch Machine Learning Operation (MLOps) Technologien und Werkzeuge. Ziel dieser Studie ist es, einen umfassenden Überblick über verfügbare MLOps Tools zu bieten und einen spezifischen Techstack für Zeitreihen ML Projekte zu entwickeln. Es werden aktuelle Trends und Werkzeuge im Bereich MLOps durch eine multivokale Literaturrecherche (MLR) untersucht und analysiert. Die Studie identifiziert passende MLOps Werkzeuge und Methoden für die Zeitreihenanalyse und präsentiert eine spezifische Implementierung einer MLOps Pipeline für die Aktienkursprognose des S&P 500. MLOps und DevOps Tools nehmen eine essenzielle Rolle bei der effektiven Konstruktion und Verwaltung von ML Pipelines ein. Bei der Auswahl geeigneter Werkzeuge ist stets eine spezifische Anpassung an die jeweiligen Projektanforderungen erforderlich. Die Bereitstellung einer detaillierten Darstellung der aktuellen MLOps Tool Landschaft erweist sich hierbei als wertvolle Ressource, die es Entwicklern ermöglicht, die Effizienz und Effektivität ihrer ML Projekte zu optimieren.

Datengetriebenes Requirements Engineering für die Integration von KI in Softwaresysteme (2024)

Tewes, Alexander

Angesichts der raschen Entwicklungen und der Besonderheiten von Softwaresystemen, welche Künstliche Intelligenz (KI) nutzen, ist ein angepasstes Requirements Engineering (RE) erforderlich. Die spezifischen Anforderungen von KI-Projekten müssen dabei erkannt und angegangen werden. Hierfür wird eine systematische Überprufung bestehender Herausforderungen des RE in KI-Projekten durchgeführt. Darauf aufbauend werden neue RE-Ansätze und Empfehlungen präsentiert, die auf die Datensicht von KI-Projekten abzielen. Mithilfe der Analyse bestehender Lösungsansatze, Methoden, Frameworks und Tools soll aufgezeigt werden, inwiefern die Herausforderungen im RE bewältigt werden können. Noch bestehende Lücken im Forschungsstand werden identifiziert und aufgezeigt.

A Methodology for Integrating Hierarchical VMAP-Data Structures into an Ontology Using Semantically Represented Analyses (2024)

Spelten, Philipp ; Meyer, Morten-Christian ; Wagner, Anna ; Wolf, Klaus ; Reith, Dirk

Integrating physical simulation data into data ecosystems challenges the compatibility and interoperability of data management tools. Semantic web technologies and relational databases mostly use other data types, such as measurement or manufacturing design data. Standardizing simulation data storage and harmonizing the data structures with other domains is still a challenge, as current standards such as the ISO standard STEP (ISO 10303 ”Standard for the Exchange of Product model data”) fail to bridge the gap between design and simulation data. This challenge requires new methods, such as ontologies, to rethink simulation results integration. This research describes a new software architecture and application methodology based on the industrial standard ”Virtual Material Modelling in Manufacturing” (VMAP). The architecture integrates large quantities of structured simulation data and their analyses into a semantic data structure. It is capable of providing data permeability from the global digital twin level to the detailed numerical values of data entries and even new key indicators in a three-step approach: It represents a file as an instance in a knowledge graph, queries the file’s metadata, and finds a semantically represented process that enables new metadata to be created and instantiated.

Knowledge Base Question Answering by Transformer-Based Graph Pattern Scoring (2023)

Lamott, Marcel ; Hees, Jörn ; Ulges, Adrian

Question Answering (QA) has gained significant attention in recent years, with transformer-based models improving natural language processing. However, issues of explainability remain, as it is difficult to determine whether an answer is based on a true fact or a hallucination. Knowledge-based question answering (KBQA) methods can address this problem by retrieving answers from a knowledge graph. This paper proposes a hybrid approach to KBQA called FRED, which combines pattern-based entity retrieval with a transformer-based question encoder. The method uses an evolutionary approach to learn SPARQL patterns, which retrieve candidate entities from a knowledge base. The transformer-based regressor is then trained to estimate each pattern’s expected F1 score for answering the question, resulting in a ranking ofcandidate entities. Unlike other approaches, FRED can attribute results to learned SPARQL patterns, making them more interpretable. The method is evaluated on two datasets and yields MAP scores of up to 73 percent, with the transformer-based interpretation falling only 4 pp short of an oracle run. Additionally, the learned patterns successfully complement manually generated ones and generalize well to novel questions.

Developing OERs for Teaching Database Systems (2023)

Rakow, Thomas C. ; Kless, André ; Hasler, Charlotte ; Knolle, Harm ; Faeskorn-Woyke, Heide ; Saatz, Inga Marina ; Lambert, Jens ; Focken, Mareike

In the project EILD.nrw, Open Educational Resources (OER) have been developed for teaching databases. Lecturers can use the tools and courses in a variety of learning scenarios. Students of computer science and application subjects can learn the complete life cycle of databases. For this purpose, quizzes, interactive tools, instructional videos, and courses for learning management systems are developed and published under a Creative Commons license. We give an overview of the developed OERs according to subject, description, teaching form, and format. Following, we describe how licencing, sustainability, accessibility, contextualization, content description, and technical adaptability are implemented. The feedback of students in ongoing classes are evaluated.

A systematic literature review about the consumers’ side of fake review detection – Which cues do consumers use to determine the veracity of online user reviews? (2023)

Walther, Michelle ; Jakobi, Timo ; Watson, Steven James ; Stevens, Gunnar

Background Consumers rely heavily on online user reviews when shopping online and cybercriminals produce fake reviews to manipulate consumer opinion. Much prior research focuses on the automated detection of these fake reviews, which are far from perfect. Therefore, consumers must be able to detect fake reviews on their own. In this study we survey the research examining how consumers detect fake reviews online. Methods We conducted a systematic literature review over the research on fake review detection from the consumer-perspective. We included academic literature giving new empirical data. We provide a narrative synthesis comparing the theories, methods and outcomes used across studies to identify how consumers detect fake reviews online. Results We found only 15 articles that met our inclusion criteria. We classify the most often used cues identified into five categories which were (1) review characteristics (2) textual characteristics (3) reviewer characteristics (4) seller characteristics and (5) characteristics of the platform where the review is displayed. Discussion We find that theory is applied inconsistently across studies and that cues to deception are often identified in isolation without any unifying theoretical framework. Consequently, we discuss how such a theoretical framework could be developed.

Risk-Based Authentication for OpenStack: A Fully Functional Implementation and Guiding Example (2023)

Unsel, Vincent ; Wiefling, Stephan ; Gruschka, Nils ; Lo Iacono, Luigi

Online services have difficulties to replace passwords with more secure user authentication mechanisms, such as Two-Factor Authentication (2FA). This is partly due to the fact that users tend to reject such mechanisms in use cases outside of online banking. Relying on password authentication alone, however, is not an option in light of recent attack patterns such as credential stuffing. Risk-Based Authentication (RBA) can serve as an interim solution to increase password-based account security until better methods are in place. Unfortunately, RBA is currently used by only a few major online services, even though it is recommended by various standards and has been shown to be effective in scientific studies. This paper contributes to the hypothesis that the low adoption of RBA in practice can be due to the complexity of implementing it. We provide an RBA implementation for the open source cloud management software OpenStack, which is the first fully functional open source RBA implementation based on the Freeman et al. algorithm, along with initial reference tests that can serve as a guiding example and blueprint for developers.

TreeSatAI Benchmark Archive : a multi-sensor, multi-label dataset for tree species classification in remote sensing (2023)

Ahlswede, Steve ; Schulz, Christian ; Gava, Christiano ; Helber, Patrick ; Bischke, Benjamin ; Förster, Michael ; Arias, Florencia ; Hees, Jörn ; Demir, Begüm ; Kleinschmit, Birgit

Airborne and spaceborne platforms are the primary data sources for large-scale forest mapping, but visual interpretation for individual species determination is labor-intensive. Hence, various studies focusing on forests have investigated the benefits of multiple sensors for automated tree species classification. However, transferable deep learning approaches for large-scale applications are still lacking. This gap motivated us to create a novel dataset for tree species classification in central Europe based on multi-sensor data from aerial, Sentinel-1 and Sentinel-2 imagery. In this paper, we introduce the TreeSatAI Benchmark Archive, which contains labels of 20 European tree species (i.e., 15 tree genera) derived from forest administration data of the federal state of Lower Saxony, Germany. We propose models and guidelines for the application of the latest machine learning techniques for the task of tree species classification with multi-label data. Finally, we provide various benchmark experiments showcasing the information which can be derived from the different sensors including artificial neural networks and tree-based machine learning methods. We found that residual neural networks (ResNet) perform sufficiently well with weighted precision scores up to 79 % only by using the RGB bands of aerial imagery. This result indicates that the spatial content present within the 0.2 m resolution data is very informative for tree species classification. With the incorporation of Sentinel-1 and Sentinel-2 imagery, performance improved marginally. However, the sole use of Sentinel-2 still allows for weighted precision scores of up to 74 % using either multi-layer perceptron (MLP) or Light Gradient Boosting Machine (LightGBM) models. Since the dataset is derived from real-world reference data, it contains high class imbalances. We found that this dataset attribute negatively affects the models' performances for many of the underrepresented classes (i.e., scarce tree species). However, the class-wise precision of the best-performing late fusion model still reached values ranging from 54 % (Acer) to 88 % (Pinus). Based on our results, we conclude that deep learning techniques using aerial imagery could considerably support forestry administration in the provision of large-scale tree species maps at a very high resolution to plan for challenges driven by global environmental change. The original dataset used in this paper is shared via Zenodo (https://doi.org/10.5281/zenodo.6598390, Schulz et al., 2022). For citation of the dataset, we refer to this article.

Open Access

005 Computerprogrammierung, Programme, Daten

Refine

H-BRS Bibliography

Departments, institutes and facilities

Document Type

Year of publication

Language

Has Fulltext

Keywords

52 search hits