Refine
Departments, institutes and facilities
Document Type
- Conference Object (2)
- Article (1)
Language
- English (3)
Keywords
- Abstract Syntax Tree (1)
- Malware (1)
- Markov Cluster Algorithm (1)
- Software Supply Chain (1)
Trojanized software packages used in software supply chain attacks constitute an emerging threat. Unfortunately, there is still a lack of scalable approaches that allow automated and timely detection of malicious software packages and thus most detections are based on manual labor and expertise. However, it has been observed that most attack campaigns comprise multiple packages that share the same or similar malicious code. We leverage that fact to automatically reproduce manually identified clusters of known malicious packages that have been used in real world attacks, thus, reducing the need for expert knowledge and manual inspection. Our approach, AST Clustering using MCL to mimic Expertise (ACME), yields promising results with a 𝐹1 score of 0.99. Signatures are automatically generated based on characteristic code fragments from clusters and are subsequently used to scan the whole npm registry for unreported malicious packages. We are able to identify and report six malicious packages that have been removed from npm consequentially. Therefore, our approach can support the detection by reducing manual labor and hence may be employed by maintainers of package repositories to detect possible software supply chain attacks through trojanized software packages.
Guzzo et al. (Reference Guzzo, Schneider and Nalbantian2022) argue that open science practices may marginalize inductive and abductive research and preclude leveraging big data for scientific research. We share their assessment that the hypothetico-deductive paradigm has limitations (see also Staw, Reference Staw2016) and that big data provide grand opportunities (see also Oswald et al., Reference Oswald, Behrend, Putka and Sinar2020). However, we arrive at very different conclusions. Rather than opposing open science practices that build on a hypothetico-deductive paradigm, we should take initiative to do open science in a way compatible with the very nature of our discipline, namely by incorporating ambiguity and inductive decision-making. In this commentary, we (a) argue that inductive elements are necessary for research in naturalistic field settings across different stages of the research process, (b) discuss some misconceptions of open science practices that hide or discourage inductive elements, and (c) propose that field researchers can take ownership of open science in a way that embraces ambiguity and induction. We use an example research study to illustrate our points.