Volltext-Downloads (blau) und Frontdoor-Views (grau)

Identification of Software Features in Issue Tracking System Data

  • The knowledge of Software Features (SFs) is vital for software developers and requirements specialists during all software engineering phases: to understand and derive software requirements, to plan and prioritize implementation tasks, to update documentation, or to test whether the final product correctly implements the requested SF. In most software projects, SFs are managed in conjunction with other information such as bug reports, programming tasks, or refactoring tasks with the aid of Issue Tracking Systems (ITSs). Hence ITSs contains a variety of information that is only partly related to SFs. In practice, however, the usage of ITSs to store SFs comes with two major problems: (1) ITSs are neither designed nor used as documentation systems. Therefore, the data inside an ITS is often uncategorized and SF descriptions are concealed in rather lengthy. (2) Although an SF is often requested in a single sentence, related information can be scattered among many issues. E.g. implementation tasks related to an SF are often reported in additional issues. Hence, the detection of SFs in ITSs is complicated: a manual search for the SFs implies reading, understanding and exploiting the Natural Language (NL) in many issues in detail. This is cumbersome and labor intensive, especially if related information is spread over more than one issue. This thesis investigates whether SF detection can be supported automatically. First the problem is analyzed: (i) An empirical study shows that requests for important SFs reside in ITSs, making ITSs a good tar- get for SF detection. (ii) A second study identifies characteristics of the information and related NL in issues. These characteristics repre- sent opportunities as well as challenges for the automatic detection of SFs. Based on these problem studies, the Issue Tracking Software Feature Detection Method (ITSoFD), is proposed. The method has two main components and includes an approach to preprocess issues. Both components address one of the problems associated with storing SFs in ITSs. ITSoFD is validated in three solution studies: (I) An empirical study researches how NL that describes SFs can be detected with techniques from Natural Language Processing (NLP) and Machine Learning. Issues are parsed and different characteristics of the issue and its NL are extracted. These characteristics are used to clas- sify the issue’s content and identify SF description candidates, thereby approaching problem (1). (II) An empirical study researches how issues that carry information potentially related to an SF can be detected with techniques from NLP and Information Retrieval. Characteristics of the issue’s NL are utilized to create a traceability network vii of related issues, thereby approaching problem (2). (III) An empirical study researches how NL data in issues can be preprocessed using heuristics and hierarchical clustering. Code, stack traces, and other technical information is separated from NL. Heuristics are used to identify candidates for technical information and clustering improves the heuristic’s results. The technique can be applied to support components, I. and II.

Export metadata

Additional Services

Share in Twitter Search Google Scholar Check availability


Show usage statistics
Document Type:Doctoral Thesis
Author:Thorsten Merten
Referee:Barbara Paech, Kurt Schneider
Date of exam:2017/02/10
Contributing Corporation:Universität Heidelberg
Date of first publication:2017/02/14
Departments, institutes and facilities:Fachbereich Informatik
Dewey Decimal Classification (DDC):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Entry in this database:2017/03/17