Martin Mocko

Research areas: machine learning, deep learning, data science, malware detection, malware clustering, phishing

Position: PhD Student

Martin is a research assistant focusing on information security, in particular malware analysis and detection, phishing and malicious behavior. He is also interested in data analysis, machine learning and deep learning. His research currently focuses on clustering of executable files and creating useful representations for machine learning models.

He holds a Master’s degree in Intelligent Systems from the Slovak University of Technology. During his studies, he received the Institute of Informatics at the Slovak Academy of Sciences award for excellent study performance. He is currently a PhD student at KInIT, doing his PhD at the Faculty Of Information Technology, Brno University of Technology.

He has co-operated in research projects with ČSOB bank and CEAi. His Master’s thesis focused on anomaly detection in the bank domain. He is a former member of PeWe (Personalized Web) research group.

PhD topic: Malware clustering Using Machine Learning

Supervising team: Daniela Chudá (KInIT), Eset (industry partner)

Every day, hundreds of thousands of malicious software samples are created with the goal of either harming the victim’s computer, stealing their data, or causing financial damages. To be able to combat the ever-evolving landscape of malware, we need to utilize machine learning solutions to achieve the desired speed, accuracy and scalability of protection. One particular problem in this domain where millions of binary files reside is clustering of existing and newly arriving samples while maintaining an acceptable level of speed and accuracy. 

In our proposal, we formulate the reasons for the large heterogeneity of malware samples, analyze the state of the art in malware clustering, and identify shortcomings of the state of the art research works. Very small and custom collected datasets, labeling problems, lack of consensus on how clustering solutions should be set up, different evaluation metrics and evaluation methodologies that lack addressing cluster size bias (random chance of achieving a good clustering) are all factors that contribute to wildly different conclusions. Furthermore, in the past few years, deep clustering and contrastive learning has been gaining a lot of attention in the research community, while the research works in the malware field have yet to catch up. We address some of the identified issues in our research questions and present the plan of our experiments which will provide answers to these questions.

Selected Student Supervising

Gáfrik Andrej – Detection of malicious activities using machine learning methods. Ongoing

Kabáč Maroš – Time series forecasting using neural networks. Defended 2020