News
Apr 2026
Meta announced Muse Spark, its latest frontier LLM. My team contributed to its agentic coding capabilities.
Jun 2025
Joined Meta Superintelligence Labs (FAIR).
Apr 2025
Scaling Laws for Native Multimodal Models was accepted at ICCV 2025 as an oral.
Feb 2025
Released FlexTok , accepted at ICML 2025.
Jan 2025
Parameters vs FLOPs was accepted at ICML 2025.
Nov 2024
Released AIMv2 , accepted at CVPR 2025 as a highlight.
Jun 2024
Released DataComp-LM , accepted at NeurIPS 2024.
Jan 2024
Released Autoregressive Image Models (AIM) , accepted at ICML 2024.
Aug 2023
Joined Apple MLR as a Research Scientist.
May 2023
Mark Zuckerberg announced our recent foundational multimodal model ImageBind.
May 2023
Released ImageBind , accepted at CVPR 2023.
Research
I'm interested in agentic coding post-training, LLM pre-training, multimodal vision language models, large-scale visual representation learning.
Scaling Laws for Optimal Data Mixtures
Mustafa Shukor, Louis Bethune, Dan Busbridge, David Grangier, Enrico Fini, Alaaeldin El-Nouby , Pierre Ablin
NeurIPS 2025
paper
Scaling Laws for Native Multimodal Models
Mustafa Shukor, Enrico Fini, Victor Guilherme Turrisi da Costa, Matthieu Cord, Joshua Susskind, Alaaeldin El-Nouby
paper
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
Roman Bachmann, Jesse Allardice, David Mizrahi, Enrico Fini, Oguzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby , Amir Zamir, Afshin Dehghan
ICML 2025
paper
code
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar, Harshay Shah, Dan Busbridge, Alaaeldin El-Nouby , Josh Susskind, Vimal Thilak
ICML 2025
paper
Multimodal Autoregressive Pre-training of Large Vision Encoders
Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor Guilherme Turrisi da Costa, Louis Bethune, Zhe Gan, Alexander T Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua M. Susskind, Alaaeldin El-Nouby
paper
code
DataComp-LM: In Search of the Next Generation of Training Sets for Language Models
Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Vaishaal Shankar et al.
NeurIPS 2024
paper
code
Scalable Pre-training of Large Autoregressive Image Models
Alaaeldin El-Nouby , Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M. Susskind, Armand Joulin
ICML 2024
paper
code
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar, Alaaeldin El-Nouby , Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra
paper
code
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothee Darcet, Theo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby , Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski
Preprint
paper
code
Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models
Matthew J. Muckley, Alaaeldin El-Nouby , Karen Ullrich, Herve Jegou, Jakob Verbeek
ICML 2023
paper
Image Compression with Product Quantized Masked Image Modeling
Alaaeldin El-Nouby , Matthew J. Muckley, Karen Ullrich, Ivan Laptev, Jakob Verbeek, Hervé Jégou
Transactions of Machine Learning Research (TMLR)
paper
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar*, Alaaeldin El-Nouby* , Mannat Singh*, Kalyan Vasudev Alwala*, Armand Joulin, Ishan Misra*
CVPR 2023
paper
code
Three things everyone should know about Vision Transformers
Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby , Jakob Verbeek, Hervé Jégou
ECCV 2022
paper
code
Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Alaaeldin El-Nouby* , Gautier Izacard*, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave
Under Review
paper
XCiT: Cross-Covariance Image Transformer
Alaaeldin El-Nouby , Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou
NeurIPS 2021
paper
video
code
ResMLP: Feedforward networks for image classification with data-efficient training
Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby , Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou
TPAMI
paper
code
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Ben Graham, Alaaeldin El-Nouby , Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze
ICCV 2021
paper
code
Training Vision Transformers for Image Retrieval
Alaaeldin El-Nouby , Natalia Neverova, Ivan Laptev, Hervé Jégou
Preprint
paper
Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking
Alaaeldin El-Nouby , Shuangfei Zhai, Graham W. Taylor, Joshua M. Susskind
Holistic Video Understanding Workshop ICCV2019 (Best poster Award)
paper
poster
bibtex
Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction
Alaaeldin El-Nouby , Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W.Taylor
Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV)
paper
code
poster
blog
bibtex
Real-Time End-to-End Action Detection with Two-Stream Networks
Alaaeldin El-Nouby , Graham W. Taylor
15th Conference on Computer and Robot Vision, CRV 2018
Oral
paper
bibtex
Spatiotemporal Representation Learning For Human Action Recognition And Localization
Alaaeldin El-Nouby
paper
bibtex
Invited Talks
L3D-IVU Workshop, CVPR’24 - Scalable Pre-training of Large Autoregressive Image Models
IMAGINE lab, École des Ponts ParisTech - Bringing the power of Transformers to Computer Vision
Vector Institute / University of Guelph - Masked Image Modeling for Visual Representation Learning
Transformers for Vision workshop, CVPR’22 - Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Max Planck Institute / Tübingen AI Center - Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Computer Vision Group (CVG), University of Bern - Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Large Scale Holistic Video Understanding workshop, CVPR’21 - Training Vision Transformers for Image Retrieval
KTH Royal Institute of Technology - Training Vision Transformers for Image Retrieval
Microsoft Research Montreal - Sequential Scene Understanding and Generation
DeepVision workshop, Simon Fraser University - Real-Time End-to-End Action Detection with Two-Stream Networks
Twenty Billion Neurons - Real-Time End-to-End Action Detection with Two-Stream Networks
Reviewing
CVPR’22-'25, ECCV’22-'24, NeurIPS’24, ICLR’24, ICCV’21, NeurIPS’22 SSL workshop, TPAMI (2021-Present).