News
- [2024-04-15] reka core released
- [2023-09-18] I join reka
- [2023-09-05] whisperx hits 5k stars
- [2023-08-18] defended phd thesis, no corrections ☺
Research Artefacts
2024
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Reka Team
Technical report & product, 2024.
[Paper] [Chat] [Showcase] [Blog]
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Reka Team (Piotr Padlewski*, Max Bain* et al.)
Technical report, 2024.
[Paper] [Code] [Dataset] [Blog]
AutoAD III: The Prequel - Back to the Pixels
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
CVPR, 2024.
[Paper] [Code]
2023
Understanding Video Through the Lens of Language
M. Bain
Doctoral Thesis, 2023.
[Thesis]
Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets
B. Smith*, M. Farinha*, S. M. Hall, H. R. Kirk†, A. Shtedritski†, M. Bain†
Technical report, 2023.
[Paper] [Code]
AutoAD II: The Sequel – Who, When, and What in Movie Audio Description
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
ICCV, 2023.
[Paper] [Code]
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman
Interspeech, 2023.
[Paper] [Code]
AutoAD: Movie Description in Context
Tengda Han*, Max Bain*, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
CVPR, 2023. [Highlight]
[Paper] [Code]
2022
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning
H. Berg, S. Hall, Y. Bhalgat, W. Yang, H. R. Kirk, A. Shtedritski, M. Bain
AACL, 2022.
[Paper] [Code]
The CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
Technical report , 2022.
[Paper] [Code]
2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
ICCV, 2021.
[Paper] [Code] [Project] [Dataset] [Demo]
Automated Audiovisual Behaviour Recognition in Wild Primates
M. Bain, A. Nagrani, D. Schofield, S. Berdugo, J. Bessa, J. Owen, K. J. Hockings, T. Matsuzawa, M. Hayashi, D. Biro, S. Carvalho, A. Zisserman
Science advances, 2021.
[Paper] [Press]
2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
ACCV, 2020. [Oral]
[Paper] [Code] [Challenge]
2019
Count, Crop and Recognise: Fine-Grained Recognition in the Wild
Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman
ICCVW, 2019. [Oral]
[Paper]
Useful links
1. WebVid. Dataset of 10 million captioned shorted videos.
https://github.com/m-bain/webvid.
2. Efficient and accurate speech transcription (& diarization)
https://github.com/m-bain/whisperX.
Mood
Berserk (1997)