Speakers:
Keynote: xLSTM: New Architectures for Large Language Models
Date:
Monday, November 18, 2024
Time:
9:05 am
Room:
Forum 1-3
Summary:
Today’s LLMs such as ChatGPT show an impressive performance and have the potential to revolutionize our daily life. All these LLMs are based on the Transformer architecture with the Attention mechanism at its core. Due to the quadratic scaling with context length, Attention makes processing of long sequences very expensive. In this talk Maximilian presents xLSTM, a novel architecture for LLMs that scales only linear in context length while still outperforming Transformers on language modeling.