Evaluation of Drift Detection Techniques for Automated Machine Learning Pipelines
- Machine learning-based solutions are frequently adapted in several applications that require big data in operations. The performance of a model that is deployed into operations is subject to degradation due to unanticipated changes in the flow of input data. Hence, monitoring data drift becomes essential to maintain the model’s desired performance. Based on the conducted review of the literature on drift detection, statistical hypothesis testing enables to investigate whether incoming data is drifting from training data. Because Maximum Mean Discrepancy (MMD) and Kolmogorov-Smirnov (KS) have shown to be reliable distance measures between multivariate distributions in the literature review, both were selected from several existing techniques for experimentation. For the scope of this work, the image classification use case was experimented with using the Stream-51 dataset. Based on the results from different drift experiments, both MMD and KS showed high Area Under Curve values. However, KS exhibited faster performance than MMD with fewer false positives. Furthermore, the results showed that using the pre-trained ResNet-18 for feature extraction maintained the high performance of the experimented drift detectors. Furthermore, the results showed that the performance of the drift detectors highly depends on the sample sizes of the reference (training) data and the test data that flow into the pipeline’s monitor. Finally, the results also showed that if the test data is a mixture of drifting and non-drifting data, the performance of the drift detectors does not depend on how the drifting data are scattered with the non-drifting ones, but rather their amount in the test set
Document Type: | Master's Thesis |
---|---|
Language: | English |
Author: | Hammam Abdelwahab |
Number of pages: | xviii, 86 |
DOI: | https://doi.org/10.24406/publica-718 |
Supervisor: | Alexander Asteroth, Nico Hochgeschwender, Claudio Martens |
Publisher: | Fraunhofer Publica |
Granting Institution: | Hochschule Bonn-Rhein-Sieg, Fachbereich Informatik |
Contributing Corporation: | Bonn-Aachen International Center for Information Technology (b-it); Fraunhofer IAIS |
Date of first publication: | 2023/01/15 |
Dewey Decimal Classification (DDC): | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Theses, student research papers: | Hochschule Bonn-Rhein-Sieg / Fachbereich Informatik |
Entry in this database: | 2023/04/28 |