The WebVid-10M Dataset

Lonely beautiful woman sitting on the tent looking outside. wind on the hair and camping on the beach near the colors of water and shore. freedom and alternative tiny house for traveler lady drinking.

Female cop talking on walkietalkie, responding emergency call, crime prevention

Billiards, concentrated young woman playing in club.

Cabeza de toro, punta cana/ dominican republic - feb 20, 2020: 4k drone flight over coral reef with manta

Kherson, ukraine - 20 may 2016: open, free, rock music festival crowd partying at a rock concert. hands up, people, fans cheering clapping applauding in kherson, ukraine - 20 may 2016. band performing

Runners feet in a sneakers close up. realistic three dimensional animation.

What is WebVid-10M?

WebVid-10M is a large-scale dataset of short videos with textual descriptions sourced from stock footage sites. The videos are diverse and rich in their content.
  • 10.7M video-caption pairs.
  • 52K total video hours.

Terms of Access

You must not use the content in this dataset if you do not agree to the terms outlined here .
We do not own the copyright to any of the collected data and its use is authorised via the Intellectual Property Office’s Exceptions to Copyright for Non-Commercial Research and Private Study.


Full 10M
2.5M Subset



M. Bain, A. Nagrani, G. Varol, A. Zisserman.
Frozen in Time: A Joint Video and Image Encoder for End to End Paper.
ICCV, 2021.
(hosted on ArXiv)



Max Bain

Arsha Nagrani

Gül Varol

Andrew Zisserman

Template by Phillip Isola and Richard Zhang.