005 Computerprogrammierung, Programme, Daten
Refine
H-BRS Bibliography
- yes (41)
Departments, institutes and facilities
- Fachbereich Informatik (41) (remove)
Document Type
- Conference Object (19)
- Article (12)
- Part of a Book (2)
- Research Data (2)
- Working Paper (2)
- Book (monograph, edited volume) (1)
- Contribution to a Periodical (1)
- Doctoral Thesis (1)
- Preprint (1)
Language
- English (41) (remove)
Keywords
- Usable Security (5)
- Big Data Analysis (4)
- GDPR (4)
- Risk-based Authentication (4)
- Authentication (2)
- Authentication features (2)
- Password (2)
- Risk-based Authentication (RBA) (2)
- structural equation modeling (2)
- usable privacy (2)
The European General Data Protection Regulation requires the implementation of Technical and Organizational Measures (TOMs) to reduce the risk of illegitimate processing of personal data. For these measures to be effective, they must be applied correctly by employees who process personal data under the authority of their organization. However, even data processing employees often have limited knowledge of data protection policies and regulations, which increases the likelihood of misconduct and privacy breaches. To lower the likelihood of unintentional privacy breaches, TOMs must be developed with employees’ needs, capabilities, and usability requirements in mind. To reduce implementation costs and help organizations and IT engineers with the implementation, privacy patterns have proven to be effective for this purpose. In this chapter, we introduce the privacy pattern Data Cart, which specifically helps to develop TOMs for data processing employees. Based on a user-centered design approach with employees from two public organizations in Germany, we present a concept that illustrates how Privacy by Design can be effectively implemented. Organizations, IT engineers, and researchers will gain insight on how to improve the usability of privacy-compliant tools for managing personal data.
Is It Really You Who Forgot the Password? When Account Recovery Meets Risk-Based Authentication
(2024)
Question Answering (QA) has gained significant attention in recent years, with transformer-based models improving natural language processing. However, issues of explainability remain, as it is difficult to determine whether an answer is based on a true fact or a hallucination. Knowledge-based question answering (KBQA) methods can address this problem by retrieving answers from a knowledge graph. This paper proposes a hybrid approach to KBQA called FRED, which combines pattern-based entity retrieval with a transformer-based question encoder. The method uses an evolutionary approach to learn SPARQL patterns, which retrieve candidate entities from a knowledge base. The transformer-based regressor is then trained to estimate each pattern’s expected F1 score for answering the question, resulting in a ranking ofcandidate entities. Unlike other approaches, FRED can attribute results to learned SPARQL patterns, making them more interpretable. The method is evaluated on two datasets and yields MAP scores of up to 73 percent, with the transformer-based interpretation falling only 4 pp short of an oracle run. Additionally, the learned patterns successfully complement manually generated ones and generalize well to novel questions.
Although climate-induced liquidity risks can cause significant disruptions and instabilities in the financial sector, they are frequently overlooked in current debates and policy discussions. This paper proposes a macro-financial agent-based integrated assessment model to investigate the transmission channels of climate risks to financial instability and study the emergence of liquidity crises through interbank market dynamics. Our simulations show that the financial system could experience serious funding and market liquidity shortages due to climate-induced liquidity crises. Our investigation contributes to our understanding of the impact - and possible solutions - to climate-induced liquidity crises, besides the issue of asset stranding related to transition risks usually considered in the existing studies.
Risikobasierte Authentifizierung (RBA) ist ein adaptiver Ansatz zur Stärkung der Passwortauthentifizierung. Er überwacht eine Reihe von Merkmalen, die sich auf das Loginverhalten während der Passworteingabe beziehen. Wenn sich die beobachteten Merkmalswerte signifikant von denen früherer Logins unterscheiden, fordert RBA zusätzliche Identitätsnachweise an. Regierungsbehörden und ein Erlass des US-Präsidenten empfehlen RBA, um Onlineaccounts vor Angriffen mit gestohlenen Passwörtern zu schützen. Trotz dieser Tatsachen litt RBA unter einem Mangel an offenem Wissen. Es gab nur wenige bis keine Untersuchungen über die Usability, Sicherheit und Privatsphäre von RBA. Das Verständnis dieser Aspekte ist jedoch wichtig für eine breite Akzeptanz.
Diese Arbeit soll ein umfassendes Verständnis von RBA mit einer Reihe von Studien vermitteln. Die Ergebnisse ermöglichen es, datenschutzfreundliche RBA-Lösungen zu schaffen, die die Authentifizierung stärken bei gleichzeitig hoher Menschenakzeptanz.
In the project EILD.nrw, Open Educational Resources (OER) have been developed for teaching databases. Lecturers can use the tools and courses in a variety of learning scenarios. Students of computer science and application subjects can learn the complete life cycle of databases. For this purpose, quizzes, interactive tools, instructional videos, and courses for learning management systems are developed and published under a Creative Commons license. We give an overview of the developed OERs according to subject, description, teaching form, and format. Following, we describe how licencing, sustainability, accessibility, contextualization, content description, and technical adaptability are implemented. The feedback of students in ongoing classes are evaluated.
Risk-Based Authentication for OpenStack: A Fully Functional Implementation and Guiding Example
(2023)
Online services have difficulties to replace passwords with more secure user authentication mechanisms, such as Two-Factor Authentication (2FA). This is partly due to the fact that users tend to reject such mechanisms in use cases outside of online banking. Relying on password authentication alone, however, is not an option in light of recent attack patterns such as credential stuffing.
Risk-Based Authentication (RBA) can serve as an interim solution to increase password-based account security until better methods are in place. Unfortunately, RBA is currently used by only a few major online services, even though it is recommended by various standards and has been shown to be effective in scientific studies. This paper contributes to the hypothesis that the low adoption of RBA in practice can be due to the complexity of implementing it. We provide an RBA implementation for the open source cloud management software OpenStack, which is the first fully functional open source RBA implementation based on the Freeman et al. algorithm, along with initial reference tests that can serve as a guiding example and blueprint for developers.
Airborne and spaceborne platforms are the primary data sources for large-scale forest mapping, but visual interpretation for individual species determination is labor-intensive. Hence, various studies focusing on forests have investigated the benefits of multiple sensors for automated tree species classification. However, transferable deep learning approaches for large-scale applications are still lacking. This gap motivated us to create a novel dataset for tree species classification in central Europe based on multi-sensor data from aerial, Sentinel-1 and Sentinel-2 imagery. In this paper, we introduce the TreeSatAI Benchmark Archive, which contains labels of 20 European tree species (i.e., 15 tree genera) derived from forest administration data of the federal state of Lower Saxony, Germany. We propose models and guidelines for the application of the latest machine learning techniques for the task of tree species classification with multi-label data. Finally, we provide various benchmark experiments showcasing the information which can be derived from the different sensors including artificial neural networks and tree-based machine learning methods. We found that residual neural networks (ResNet) perform sufficiently well with weighted precision scores up to 79 % only by using the RGB bands of aerial imagery. This result indicates that the spatial content present within the 0.2 m resolution data is very informative for tree species classification. With the incorporation of Sentinel-1 and Sentinel-2 imagery, performance improved marginally. However, the sole use of Sentinel-2 still allows for weighted precision scores of up to 74 % using either multi-layer perceptron (MLP) or Light Gradient Boosting Machine (LightGBM) models. Since the dataset is derived from real-world reference data, it contains high class imbalances. We found that this dataset attribute negatively affects the models' performances for many of the underrepresented classes (i.e., scarce tree species). However, the class-wise precision of the best-performing late fusion model still reached values ranging from 54 % (Acer) to 88 % (Pinus). Based on our results, we conclude that deep learning techniques using aerial imagery could considerably support forestry administration in the provision of large-scale tree species maps at a very high resolution to plan for challenges driven by global environmental change. The original dataset used in this paper is shared via Zenodo (https://doi.org/10.5281/zenodo.6598390, Schulz et al., 2022). For citation of the dataset, we refer to this article.
Digital ecosystems are driving the digital transformation of business models. Meanwhile, the associated processing of personal data within these complex systems poses challenges to the protection of individual privacy. In this paper, we explore these challenges from the perspective of digital ecosystems' platform providers. To this end, we present the results of an interview study with seven data protection officers representing a total of 12 digital ecosystems in Germany. We identified current and future challenges for the implementation of data protection requirements, covering issues on legal obligations and data subject rights. Our results support stakeholders involved in the implementation of privacy protection measures in digital ecosystems, and form the foundation for future privacy-related studies tailored to the specifics of digital ecosystems.
A company's financial documents use tables along with text to organize the data containing key performance indicators (KPIs) (such as profit and loss) and a financial quantity linked to them. The KPI’s linked quantity in a table might not be equal to the similarly described KPI's quantity in a text. Auditors take substantial time to manually audit these financial mistakes and this process is called consistency checking. As compared to existing work, this paper attempts to automate this task with the help of transformer-based models. Furthermore, for consistency checking it is essential for the table's KPIs embeddings to encode the semantic knowledge of the KPIs and the structural knowledge of the table. Therefore, this paper proposes a pipeline that uses a tabular model to get the table's KPIs embeddings. The pipeline takes input table and text KPIs, generates their embeddings, and then checks whether these KPIs are identical. The pipeline is evaluated on the financial documents in the German language and a comparative analysis of the cell embeddings' quality from the three tabular models is also presented. From the evaluation results, the experiment that used the English-translated text and table KPIs and Tabbie model to generate table KPIs’ embeddings achieved an accuracy of 72.81% on the consistency checking task, outperforming the benchmark, and other tabular models.
Deployment of modern data-driven machine learning methods, most often realized by deep neural networks (DNNs), in safety-critical applications such as health care, industrial plant control, or autonomous driving is highly challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability and implausible predictions to directed attacks by means of malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from so-called safety concerns, properties that preclude their deployment as no argument or experimental setup can help to assess the remaining risk. In recent years, an abundance of state-of-the-art techniques aiming to address these safety concerns has emerged. This chapter provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigation. Our work addresses machine learning experts and safety engineers alike: The former ones might profit from the broad range of machine learning topics covered and discussions on limitations of recent methods. The latter ones might gain insights into the specifics of modern machine learning methods. We hope that this contribution fuels discussions on desiderata for machine learning systems and strategies on how to help to advance existing approaches accordingly.
This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence.
Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety?
This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above.
Login Data Set for Risk-Based Authentication
Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.
This data sets aims to foster research and development for <a href="https://riskbasedauthentication.org">Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.
Risk-based authentication (RBA) aims to protect users against attacks involving stolen passwords. RBA monitors features during login, and requests re-authentication when feature values widely differ from those previously observed. It is recommended by various national security organizations, and users perceive it more usable than and equally secure to equivalent two-factor authentication. Despite that, RBA is still used by very few online services. Reasons for this include a lack of validated open resources on RBA properties, implementation, and configuration. This effectively hinders the RBA research, development, and adoption progress.
To close this gap, we provide the first long-term RBA analysis on a real-world large-scale online service. We collected feature data of 3.3 million users and 31.3 million login attempts over more than 1 year. Based on the data, we provide (i) studies on RBA’s real-world characteristics plus its configurations and enhancements to balance usability, security, and privacy; (ii) a machine learning–based RBA parameter optimization method to support administrators finding an optimal configuration for their own use case scenario; (iii) an evaluation of the round-trip time feature’s potential to replace the IP address for enhanced user privacy; and (iv) a synthesized RBA dataset to reproduce this research and to foster future RBA research. Our results provide insights on selecting an optimized RBA configuration so that users profit from RBA after just a few logins. The open dataset enables researchers to study, test, and improve RBA for widespread deployment in the wild.
The processing of employees’ personal data is dramatically increasing, yet there is a lack of tools that allow employees to manage their privacy. In order to develop these tools, one needs to understand what sensitive personal data are and what factors influence employees’ willingness to disclose. Current privacy research, however, lacks such insights, as it has focused on other contexts in recent decades. To fill this research gap, we conducted a cross-sectional survey with 553 employees from Germany. Our survey provides multiple insights into the relationships between perceived data sensitivity and willingness to disclose in the employment context. Among other things, we show that the perceived sensitivity of certain types of data differs substantially from existing studies in other contexts. Moreover, currently used legal and contextual distinctions between different types of data do not accurately reflect the subtleties of employees’ perceptions. Instead, using 62 different data elements, we identified four groups of personal data that better reflect the multi-dimensionality of perceptions. However, previously found common disclosure antecedents in the context of online privacy do not seem to affect them. We further identified three groups of employees that differ in their perceived data sensitivity and willingness to disclose, but neither in their privacy beliefs nor in their demographics. Our findings thus provide employers, policy makers, and researchers with a better understanding of employees’ privacy perceptions and serve as a basis for future targeted research
on specific types of personal data and employees.
Risk-based authentication (RBA) aims to strengthen password-based authentication rather than replacing it. RBA does this by monitoring and recording additional features during the login process. If feature values at login time differ significantly from those observed before, RBA requests an additional proof of identification. Although RBA is recommended in the NIST digital identity guidelines, it has so far been used almost exclusively by major online services. This is partly due to a lack of open knowledge and implementations that would allow any service provider to roll out RBA protection to its users. To close this gap, we provide a first in-depth analysis of RBA characteristics in a practical deployment. We observed N=780 users with 247 unique features on a real-world online service for over 1.8 years. Based on our collected data set, we provide (i) a behavior analysis of two RBA implementations that were apparently used by major online services in the wild, (ii) a benchmark of the features to extract a subset that is most suitable for RBA use, (iii) a new feature that has not been used in RBA before, and (iv) factors which have a significant effect on RBA performance. Our results show that RBA needs to be carefully tailored to each online service, as even small configuration adjustments can greatly impact RBA's security and usability properties. We provide insights on the selection of features, their weightings, and the risk classification in order to benefit from RBA after a minimum number of login attempts.
Risk-based authentication (RBA) extends authentication mechanisms to make them more robust against account takeover attacks, such as those using stolen passwords. RBA is recommended by NIST and NCSC to strengthen password-based authentication, and is already used by major online services. Also, users consider RBA to be more usable than two-factor authentication and just as secure. However, users currently obtain RBA's high security and usability benefits at the cost of exposing potentially sensitive personal data (e.g., IP address or browser information). This conflicts with user privacy and requires to consider user rights regarding the processing of personal data. We outline potential privacy challenges regarding different attacker models and propose improvements to balance privacy in RBA systems. To estimate the properties of the privacy-preserving RBA enhancements in practical environments, we evaluated a subset of them with long-term data from 780 users of a real-world online service. Our results show the potential to increase privacy in RBA solutions. However, it is limited to certain parameters that should guide RBA design to protect privacy. We outline research directions that need to be considered to achieve a widespread adoption of privacy preserving RBA with high user acceptance.
Components and Architecture for the Implementation of Technology-Driven Employee Data Protection
(2021)
Software developers build complex systems using plenty of third-party libraries. Documentation is key to understand and use the functionality provided via the libraries’ APIs. Therefore, functionality is the main focus of contemporary API documentation, while cross-cutting concerns such as security are almost never considered at all, especially when the API itself does not provide security features. Documentations of JavaScript libraries for use in web applications, e.g., do not specify how to add or adapt a Content Security Policy (CSP) to mitigate content injection attacks like Cross-Site Scripting (XSS). This is unfortunate, as security-relevant API documentation might have an influence on secure coding practices and prevailing major vulnerabilities such as XSS. For the first time, we study the effects of integrating security-relevant information in non-security API documentation. For this purpose, we took CSP as an exemplary study object and extended the official Google Maps JavaScript API documentation with security-relevant CSP information in three distinct manners. Then, we evaluated the usage of these variations in a between-group eye-tracking lab study involving N=49 participants. Our observations suggest: (1) Developers are focused on elements with code examples. They mostly skim the documentation while searching for a quick solution to their programming task. This finding gives further evidence to results of related studies. (2) The location where CSP-related code examples are placed in non-security API documentation significantly impacts the time it takes to find this security-relevant information. In particular, the study results showed that the proximity to functional-related code examples in documentation is a decisive factor. (3) Examples significantly help to produce secure CSP solutions. (4) Developers have additional information needs that our approach cannot meet.
Overall, our study contributes to a first understanding of the impact of security-relevant information in non-security API documentation on CSP implementation. Although further research is required, our findings emphasize that API producers should take responsibility for adequately documenting security aspects and thus supporting the sensibility and training of developers to implement secure systems. This responsibility also holds in seemingly non-security relevant contexts.
The ongoing digitisation in everyday working life means that ever larger amounts of personal data of employees are processed by their employers. This development is particularly problematic with regard to employee data protection and the right to informational self-determination. We strive for the use of company Privacy Dashboards as a means to compensate for missing transparency and control. For conceptual design we use among other things the method of mental models. We present the methodology and first results of our research. We highlight the opportunities that such an approach offers for the user-centred development of Privacy Dashboards.
Less is Often More: Header Whitelisting as Semantic Gap Mitigation in HTTP-Based Software Systems
(2021)
The web is the most wide-spread digital system in the world and is used for many crucial applications. This makes web application security extremely important and, although there are already many security measures, new vulnerabilities are constantly being discovered. One reason for some of the recent discoveries lies in the presence of intermediate systems—e.g. caches, message routers, and load balancers—on the way between a client and a web application server. The implementations of such intermediaries may interpret HTTP messages differently, which leads to a semantically different understanding of the same message. This so-called semantic gap can cause weaknesses in the entire HTTP message processing chain.
In this paper we introduce the header whitelisting (HWL) approach to address the semantic gap in HTTP message processing pipelines. The basic idea is to normalize and reduce an HTTP request header to the minimum required fields using a whitelist before processing it in an intermediary or on the server, and then restore the original request for the next hop. Our results show that HWL can avoid misinterpretations of HTTP messages in the different components and thus prevent many attacks rooted in a semantic gap including request smuggling, cache poisoning, and authentication bypass.