Volltext-Downloads (blau) und Frontdoor-Views (grau)
  • search hit 4 of 62
Back to Result List

Evaluation of Drift Detection Techniques for Automated Machine Learning Pipelines

  • Machine learning-based solutions are frequently adapted in several applications that require big data in operations. The performance of a model that is deployed into operations is subject to degradation due to unanticipated changes in the flow of input data. Hence, monitoring data drift becomes essential to maintain the model’s desired performance. Based on the conducted review of the literature on drift detection, statistical hypothesis testing enables to investigate whether incoming data is drifting from training data. Because Maximum Mean Discrepancy (MMD) and Kolmogorov-Smirnov (KS) have shown to be reliable distance measures between multivariate distributions in the literature review, both were selected from several existing techniques for experimentation. For the scope of this work, the image classification use case was experimented with using the Stream-51 dataset. Based on the results from different drift experiments, both MMD and KS showed high Area Under Curve values. However, KS exhibited faster performance than MMD with fewer false positives. Furthermore, the results showed that using the pre-trained ResNet-18 for feature extraction maintained the high performance of the experimented drift detectors. Furthermore, the results showed that the performance of the drift detectors highly depends on the sample sizes of the reference (training) data and the test data that flow into the pipeline’s monitor. Finally, the results also showed that if the test data is a mixture of drifting and non-drifting data, the performance of the drift detectors does not depend on how the drifting data are scattered with the non-drifting ones, but rather their amount in the test set

Export metadata

Additional Services

Search Google Scholar Check availability

Statistics

Show usage statistics
Metadaten
Document Type:Master's Thesis
Language:English
Author:Hammam Abdelwahab
Number of pages:xviii, 86
DOI:https://doi.org/10.24406/publica-718
Supervisor:Alexander Asteroth, Nico Hochgeschwender, Claudio Martens
Publisher:Fraunhofer Publica
Granting Institution:Hochschule Bonn-Rhein-Sieg, Fachbereich Informatik
Contributing Corporation:Bonn-Aachen International Center for Information Technology (b-it); Fraunhofer IAIS
Date of first publication:2023/01/15
Dewey Decimal Classification (DDC):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Theses, student research papers:Hochschule Bonn-Rhein-Sieg / Fachbereich Informatik
Entry in this database:2023/04/28