Efficient recurrent neural network models

Diploma Thesis

Author: Bc. Daniel Pištek

Supervisor (Školitel): prof. Ing. Igor Farkaš, Dr.

University: Comenius University

Year: 2025

Annotation

The introduction of Transformers in 2017 reshaped the landscape of deep learning. Originally proposed for sequence modelling, Transformers have since achieved widespread success across various domains. However, the scalability limitations of Transformers—particularly with respect to sequence length—have sparked renewed interest in novel recurrent models that are parallelizable during training, offer comparable performance, and scale more effectively.

Aim

1. Study and implement recent models of recurrent neural networks such as minimized versions of LSTM and GRU. 2. Compare the performance of these models of selected benchmark sequential tasks from the perspective of training time and complexity (number of trainable parameters, scalability). 3. Analyze the performance of these models using the methods of explainable AI.

presentation

Zima du

References

Feng, L. et al. (2024). Were RNNs All We Needed?
https://arxiv.org/abs/2410.01201
Zhang, A. et al. (2020). Dive into Deep Learning
https://d2l.ai/
Feng, L. et al. (2024). Attention as an RNN
https://arxiv.org/abs/2405.13956
Beck, M. et al. (2024). xLSTM: Extended Long Short-Term Memory
https://arxiv.org/abs/2405.04517
Peng, B. et al. (2023). RWKV: Reinventing RNNs for the Transformer Era
https://arxiv.org/abs/2305.13048
NeurIPS: Conference on Neural Information Processing Systems
https://nips.cc/
ICML: International Conference on Machine Learning
https://icml.cc/
ICLR: International Conference on Learning Representations
https://iclr.cc/

Please log in

Annotation

Aim

References