TY - THES U1 - Dissertation oder Habilitation A1 - Merten, Thorsten T1 - Identification of Software Features in Issue Tracking System Data N2 - The knowledge of Software Features (SFs) is vital for software developers and requirements specialists during all software engineering phases: to understand and derive software requirements, to plan and prioritize implementation tasks, to update documentation, or to test whether the final product correctly implements the requested SF. In most software projects, SFs are managed in conjunction with other information such as bug reports, programming tasks, or refactoring tasks with the aid of Issue Tracking Systems (ITSs). Hence ITSs contains a variety of information that is only partly related to SFs. In practice, however, the usage of ITSs to store SFs comes with two major problems: (1) ITSs are neither designed nor used as documentation systems. Therefore, the data inside an ITS is often uncategorized and SF descriptions are concealed in rather lengthy. (2) Although an SF is often requested in a single sentence, related information can be scattered among many issues. E.g. implementation tasks related to an SF are often reported in additional issues. Hence, the detection of SFs in ITSs is complicated: a manual search for the SFs implies reading, understanding and exploiting the Natural Language (NL) in many issues in detail. This is cumbersome and labor intensive, especially if related information is spread over more than one issue. This thesis investigates whether SF detection can be supported automatically. First the problem is analyzed: (i) An empirical study shows that requests for important SFs reside in ITSs, making ITSs a good tar- get for SF detection. (ii) A second study identifies characteristics of the information and related NL in issues. These characteristics repre- sent opportunities as well as challenges for the automatic detection of SFs. Based on these problem studies, the Issue Tracking Software Feature Detection Method (ITSoFD), is proposed. The method has two main components and includes an approach to preprocess issues. Both components address one of the problems associated with storing SFs in ITSs. ITSoFD is validated in three solution studies: (I) An empirical study researches how NL that describes SFs can be detected with techniques from Natural Language Processing (NLP) and Machine Learning. Issues are parsed and different characteristics of the issue and its NL are extracted. These characteristics are used to clas- sify the issue’s content and identify SF description candidates, thereby approaching problem (1). (II) An empirical study researches how issues that carry information potentially related to an SF can be detected with techniques from NLP and Information Retrieval. Characteristics of the issue’s NL are utilized to create a traceability network vii of related issues, thereby approaching problem (2). (III) An empirical study researches how NL data in issues can be preprocessed using heuristics and hierarchical clustering. Code, stack traces, and other technical information is separated from NL. Heuristics are used to identify candidates for technical information and clustering improves the heuristic’s results. The technique can be applied to support components, I. and II. Y2 - 2017 U6 - https://doi.org/10.11588/heidok.00022655 DO - https://doi.org/10.11588/heidok.00022655 ER -