Skip to main content

Maliga D, Nagy R, Buttyán L. 2024. A pipeline for processing large datasets of potentially malicious binaries with rate-limited access to a cloud-based malware analysis platform. EuroCyberSec 2024.

By October 23, 2024October 25th, 2024Publications
Download

Conference:
EuroCyberSec 2024, 23. October 2024, Krakow, Poland

Authors:
Maliga D, Nagy R, Buttyán L.

Abstract:
In this paper, we present a pipeline that we designed for cleaning and processing large datasets of potentially malicious binaries using access to a rate-limited cloud-based malware analysis platform. Our goal is to efficiently filter out and discard benign files, to extract metadata from the remaining, likelyto-be-malware samples, and to create graph-based databases containing only metadata of verified malware. The main issue that we have to solve is the limited quota for accessing online malware analysis platforms that can be used for deciding about the maliciousness of a binary and obtaining metadata from static and dynamic analysis of samples. Our pipeline solves the problem by reaching a state where every sample in the database is either confirmed malware (based on its VirusTotal report) or similar to a confirmed malware with a minimal amount of requests made to the online platform. A database in such a state is already usable in practice, while confirming the malicious nature of and extracting metadata for all the samples in it can be continued in the background.

Leave a Reply