About Me
Student Name: Martin Demovič
Advisor Name: doc. RNDr. Martin Homola, PhD.
Contact: demovic17@uniba.sk
Previous Work
Annotation
EMBER and SoReL-20M are popular datasets used in development of malware detection and analysis tools.
The EMBER dataset was recently translated into a semantic format using the PE Malware Ontology.
This enables to process the dataset by different Explainable AI tools.
Goal
1) Translate SoReL-20M into a semantic format analogously to EMEBR
2) Apply selected explainable AI tools to obtain characterizations of malware sample in the dataset
3) Compare the results with previous works relying on the EMBER dataset
List of References and Links
More up to date list included at the end of the Current Version Bachelor Thesis PDF
-
Anderson, H.S., & Roth, P. (2018). Ember: An open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637.
-
Cate, B. T., Funk, M., Jung, J. C., & Lutz, C. (2023). SAT-Based PAC Learning of Description Logic Concepts. arXiv preprint arXiv:2305.08511.
-
Švec, P., Balogh, Š., Homola, M., & Kľuka, J. (2022). Knowledge-Based Dataset for Training PE Malware Detection Models. arXiv preprint arXiv:2301.00153.
Weekly updates
▶ Weeks
▶ Week 1 - (17.2.2025 - 23.2.2025)
- Set up SPELL DL learner on hardware, successfully ran preexisting tests from their GitHub repository, and planned to adjust these datasets for use.
- Additionally, contacted Peter Švec for consultation.
▶ Week 2 - (24.2.2025 - 2.3.2025)
- Coordinated with Peter Svec on how he would like the data processed so that he can review it.
- Ran an initial test using one of Peter’s datasets.
- Contacted the author of S.P.E.L.L. to report a bug; he agreed to fix it, and I found a workaround for smaller datasets in the meantime.
- Added all introductory sections to my bachelor’s thesis and revised its contents.
▶ Week 3 - (3.3.2025 - 9.3.2025)
- Discussed the contects with Martin Homola and agreed on the first sections to start writing on.
- Sent an Email to the author of SPELL Maurice Funk regarding the best configs for running of the tests.
- Started preparing the benchmarks for Peter Svec to run.
▶ Week 4 - (10.3.2025 - 16.3.2025)
- Contacted Maurice Funk about the best configs for the best results
- Wrote first section about SPELL (will continue editing in but will proceed with the PE Ontology)
▶ Week 5 - (17.3.2025 - 23.3.2025)
- Adjusted the first dataset to be used with SPELLs benchmarks
- Emailed Maurice Funk about bugs in SPELL
- Resolved these bugs
- Added 2manchester function to SPELL, we need to compare manchester syntaxes with previous works.
- Send a .zip with all of the files to Peter Svec by Thursday, these datasets will be run on FEI Servers for best comparison
▶ Week 6 - (24.3.2025 - 30.3.2025)
- Rollbacked cahnges with the 2manchester function, decided to reuse already existing 2 and combine them
- Added a function that will dynamically decide the amount of time for each experiment based on the size of the samples
- Sent the SPELL system, waiting for an answer
- Also added additional metrics after some back and forth with Maurice Funk
▶ Week 7 - (31.3.2025 - 6.4.2025)
- Started writing parts in the state of the art sections
- Got an answer from Peter Švec, he will begine running the experiments, will keep me updated in case there are issues
▶ Week 8 - (7.4.2025 - 13.4.2025)
- Continuing writing theoretical part of my bachelor
- Debuged with Peter Švec some issues with files not being able to be executed - After zipping some files lost x function so we had to add it back to run and validate file
▶ Week 9 - (14.4.2025 - 20.4.2025)
- Continuing writing theoretical part of my bachelor
- Before easter we will have last meeting on 15.4 with supervisor to decide priorities over the Easter
- Will try to gather all of the results from Peter Švec by 22.4 so I can focus on my Experiments after that
- Finished theoretical part of my bachelor thesis, but will do some minor changes in the upcoming days
- Finished experiments and will try to start working on Practical Part of my bachelor with these results, probably will do additional runs with better configurations
▶ Week 10 - (21.4.2025 - 27.4.2025)
- Continuing writing theoretical part of my bachelor
- After looking over the results of the experiments, we have to debug and make adjustments to the learner and run the experiments again.
▶ Week 11 - (28.4.2025 - 4.5.2025)
- Finished theoretical part of my thesis, will review with supervisor and adjust accordingly
- Fixed a bug regarding the SPELL learner, the issue was that the learner still didnt use absolute IRI in some functions, thus the concepts werent as expressive
▶ Week 12 - (5.5.2025 - 11.5.2025)
- After making adjustments the proffesor commented on some of the parts, after adjusting the theoretical part will be ready.
- We started the experiments again.
▶ Last Week - 13 - (12.5.2025 - 18.5.2025)
- The experiments yielded very interesting results, it looks like even tho the learner should be using polynomial time, it slower than the EXP2 DL learner. These results will be noted and will be the main focus of my practical part of the bachelor thesis. Currently I ran 1k experiments, 10k and now 20k. We are getting similar results as the DL learner algorithm but at a very slow pace. Will look into it more, but I will finish it by the end of the week, where next week we will finally look over all of the bachelor thesis and finish.