**Multimodal Large Language Models** by **Anselmo Talotta** This presentation explores the field of multimodal learning, which aims to improve machine perception by integrating various types of data such as text, images, audio, and video. The presentation covers the main developments in multimodal AI, from early fusion techniques to more advanced approaches such as CLIP (Contrastive Language-Image Pre-training) and recent multimodal large language models. Key topics include the challenges of combining different data modalities, the application of transformer architectures in multimodal contexts, and emerging capabilities in zero-shot learning. The presentation discusses practical applications such as visual querying and text-based image retrieval, while addressing current limitations of multimodal systems.
Where does it take place?
SnT - Luxembourg University
29 Av. John F. Kennedy
1855 Kirchberg Luxembourg
Otherwise… check out the agenda

see all the things
to do around you
Hey, don’t go away...
Get the best
outings around you
All the best deals
events
spots