Bachelor thesis

Name of the thesis

Detection of tail fiber proteins via machine learning methods.


Andrej Baláž


As the occurance of super-bacteria with animicrobial resistance has been on the rise, pharmaceutical companies have found a new way to treat them - phages. For phages to be valiable for medical treatments, one has to know the functionality of their genomes. Biologists are not yet able to identify all genomes with their methods, therefore we are looking new ways. The bachelor thesis focuses on the use of machine learning in regards to detection and identification of tail fiber genomes of phages and prophages. It aims to create a tool for biologosts to aid with identification of yet unidentified genomes.

Dear Diary

By the start of this diary, I had already prepared the 2-mer datasets.

05.03.2021: went through sources (ESLII and ISLR) and read about the logistic regression, find sources on SVM
12.03.2021: went through ESLII and ISLR and read about SVM + online lectures
19.03.2021: found online sources on gene, genome, phage
26.03.2021: found online sources for bacteriophage, tail fibers, NCBI database

02.04.2021: wrote first parts of 1. Methods (biological background), also almost finished part about tools
09.04.2021: finished tools, wrote the theory of logistic regression without graphs
16.04.2021: added graphs to logistic regression, created graphs for SVM and kernel string
23.04.2021: wrote the theory of SVM
30.04.2021: wrote about kernels, checked and finished 1. Methods,

07.05.2021: create string kernel on server (start script), created most of the bar charts on validation dataset
14.05.2021: writing 2. Data and 3. Implementation, currently validating test dataset on SVM C=1,gamma=0.001 and PhANNs
21.05.2021: string kernel is finally computed, but was not properly done, will be left for future work, wrote abstract EN, abstract SK and Acknowledgement
28.05.2021: Wrote 4. Discussion, also went through conclusion and introduction, I'll check it tomorrow in grammarly if successfully finished

Final version of thesis:

Download here


name: Jozef Bača


phone: +421 918 246 080