Refine
H-BRS Bibliography
- yes (7)
Departments, institutes and facilities
Document Type
- Conference Object (6)
- Research Data (1)
Has Fulltext
- no (7)
Keywords
- Content Analysis (1)
- Empirical Study (1)
- Empirical study (1)
- Grounded Method (1)
- Information Types (1)
- Issue Tracking Systems (1)
- Issue Types (1)
- Issue tracking systems (1)
- Machine Learning (1)
- Mining Software Repositories (1)
Software repository data, for example in issue tracking systems, include natural language text and technical information, which includes anything from log files via code snippets to stack traces. However, data mining is often only interested in one of the two types e.g. in natural language text when looking at text mining. Regardless of which type is being investigated, any techniques used have to deal with noise caused by fragments of the other type i.e. methods interested in natural language have to deal with technical fragments and vice versa. This paper proposes an approach to classify unstructured data, e.g. development documents, into natural language text and technical information using a mixture of text heuristics and agglomerative hierarchical clustering. The approach was evaluated using 225 manually annotated text passages from developer emails and issue tracker data. Using white space tokenization as a basis, the overall precision of the approach is 0.84 and the recall is 0.85.
Application systems are often advertised with features, and features are used heavily for requirements man- agement. However, often software manufacturers only have incomplete information about the features of their software. The information is distributed over different sources, such as requirements documents, issue trackers, user manuals, and code. In this paper, we research the occurrence of feature information in open source software engineering data. We report on a case study with three open source systems. We analyze what information about features can be found in issue trackers and user documentation. Furthermore, we study the abstraction levels on which the features are described, how feature information is related, and we discuss the possibility to discover such information semi-automatically. To mirror the diversity of software development contexts, we choose open source systems, which are quite different, e.g., in the rigor of issue tracker usage. The results differ accordingly. One main result is that the user documentation did not provide more accurate information than the issue tracker compared to a provided feature list. The results also give hints on how the management of feature relevant information can be supported.
[Context and motivation] Communication in distributed software development is usually supported by issue tracking systems. Within these systems, most of the communication is stored as unstructured natural language text. The natural language text, however, contains much information with respect to requirements management, e.g. discussion, clarification and prioritization of features, bugs, and refactorings. [Question] This paper investigates the information stored in the issue tracking systems of four different open-source projects. It categorizes the text and reports on the distribution of issue types and information types. [Principal ideas/results] A manual analysis of 80 issues, using a grounded approach, is conducted to derive a taxonomy of issue types and information types. Subsequently, the taxonomy is used as a codebook, to manually categorize and structure the text in another 120 issues. [Contribution] The first contribution of this paper is the taxonomy of issue and information types and the second contribution is an in-depth analysis of the natural language data and the communication. This analysis showed, for example, that information with respect to prioritization and scheduling can be found in natural language data, whether the ITS supports such tasks in a structured way or not.
Issues in an issue tracking system contain different kinds of information like requirements, features, development tasks, bug reports, bug fixing tasks, refactoring tasks and so on. This information is generally accompanied by discussions or comments, which again are different kinds of information (e.g. social interaction, implementation ideas, stack traces or error messages). We propose to improve automatic categorization of this information and use the categorized data to support software engineering tasks. We want to obtain improvements in two different ways. Firstly, we want to obtain algorithmic improvements (e.g. natural language processing techniques) to retrieve and use categorized auxiliary data. Secondly we want to utilize multiple task-based categorizations to support different software engineering tasks.
Additional figures, tables, experimant data, code, and results.