Computer Vision Datasets

Datasets who is the best at X ?
Computer Vision Datasets
Introducing the Open Images Dataset
A parallel download util for Google’s open image dataset
Image & Vision Group - Datasets
Huizhong Chen - Datasets

Classification / Recognition

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification

  • intro: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
  • homepage:


The MegaFace Benchmark: 1 Million Faces for Recognition at Scale
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
MSR Image Recognition Challenge (IRC)
UMDFaces: An Annotated Face Dataset for Training Deep Networks


The Comprehensive Cars (CompCars) dataset
BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance [IEEE T-ITS]
Cars Dataset

Scene Recognition

Places: An Image Database for Deep Scene Understanding
The Places365-CNNs for Scene Classification


EMNIST: an extension of MNIST to handwritten letters


3 Million Instacart Orders, Open Sourced


YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects
Exclusively Dark (ExDark) Image Dataset
  • intro: Exclusively Dark (ExDARK) dataset which to the best of our knowledge, is the largest collection of low-light images taken in very low-light environments to twilight (i.e 10 different conditions) to-date with image class and object level annotations.
  • github:

Face Detection

FDDB: Face Detection Data Set and Benchmark
WIDER FACE: A Face Detection Benchmark

Pedestrian Detection

Caltech Pedestrian Detection Benchmark
Caltech Pedestrian Dataset Converter
CityPersons: A Diverse Dataset for Pedestrian Detection
CrowdHuman: A Benchmark for Detecting Human in a Crowd
  • intro: CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset
  • homepage:
EuroCity Persons Dataset
  • intro: collected on-board a moving vehicle in 31 cities of 12 European countries, over 238200 person instances manually labeled in over 47300 images, contains a large number of person orientation annotations (over 211200)
  • arxiv:

Vehicle Detection

Toyota Motor Europe (TME) Motorway Dataset
Welcome to BIT-Vehicle Dataset

Salieny Detection

MSRA10K Salient Object Database

Logo Detection

QMUL-OpenLogo: Open Logo Detection Challenge
  • intro: QMUL-OpenLogo contains 27,083 images from 352 logo classes, built by aggregating and refining 7 existing datasets and establishing an open logo detection evaluation protocol
  • homepage:

Head Detection

HollywoodHeads dataset
Brainwash dataset.

Detection From Video

YouTube-Objects dataset v2.2
ILSVRC2015: Object detection from video (VID)


Mapillary Vistas Dataset

Mapillary Vistas Dataset
Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See
Multi-Human Parsing


Augmented Pascal VOC

Supervisely Person

Microsoft COCO

The Oxford-IIIT Pet Dataset

  • intro: a 37 category pet dataset with roughly 200 images for each class. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
  • homepage:


COCO-Stuff: Thing and Stuff Classes in Context
COCO-Stuff 10K dataset v1.1

Scene Parsing

MIT Scene Parsing Benchmark
  • intro: train: 20,120 images, val: 2000 images. contains 150 stuff/object category labels (e.g., wall, sky, and tree) and 1,038 imagelevel scene descriptors (e.g., airport terminal, bedroom, and street).
  • homepage:
Semantic Understanding of Scenes through the ADE20K Dataset



Captioning / Description

TGIF: A New Dataset and Benchmark on Animated GIF Description
Collecting Multilingual Parallel Video Descriptions Using Mechanical Turk


Dataset # Videos # Classes Year Manually Labeled ?
Kodak 1,358 25 2007
HMDB51 7000 51
Charades 9848 157
MCG-WEBV 234,414 15 2009
CCV 9,317 20 2011
UCF-101 13,320 101 2012
THUMOS-2 18,394 101 2014
MED-2014 ≈28,000 20 2014
Sports-1M 1M 487 2014
ActivityNet 27,801 203 2015
FCVID 91,223 239 2015
UCF101 - Action Recognition Data Set

HMDB51: A Large Video Database for Human Motion Recognition
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
Charades Dataset
  • intro: This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.
  • intro: The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos.
  • homepage:
FCVID: Fudan-Columbia Video Dataset
YouTube-8M: A Large-Scale Video Classification Benchmark
stabilized video frames
The Kinetics Human Action Video Dataset
e-Lab Video Data Set(s)
  • intro: “Currently, e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each (see histogram below). We are aiming to collect overall 1750 (50 × 35) videos with your help.”
  • homepage:
Video Dataset Overview


SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

Autonomous Driving

BDD: Berkely Deep Drive


COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Chinese Text in the Wild


DeepFashion: In-shop Clothes Retrieval

Person Re-ID

Dataset Description
CUHK01 971 identities, 3884 images, manually cropped
CUHK02 1816 identities, 7264 images, manually cropped
CUHK03 1360 identities, 13164 images, manually cropped + automatically detected
Person Re-identification Datasets
CUHK Person Re-identification Datasets
PRW (Person Re-identification in the Wild) Dataset

Person Re-identification in the Wild
  • intro: DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset
  • intro: 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images
  • github:
Person Re-ID (PRID) Dataset 2011
MARS (Motion Analysis and Re-identification Set) Dataset
X-MARS Reordering of the MARS Dataset for Image to Video Evaluation
Labeled Pedestrian in the Wild


Large-scale Fashion (DeepFashion) Database
Apparel classification with Style

Attribute Datasets

Attribute Datasets

Pedestrian Attribute Recognition

A Richly Annotated Dataset for Pedestrian Attribute Recognition
Pedestrian Attribute Recognition At Far Distance


UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
DukeMTMC: Duke Multi-Target, Multi-Camera Tracking Project
  • intro: DukeMTMC aims to accelerate advances in multi-target multi-camera tracking. It provides a tracking system that works within and across cameras, a new large scale HD video data set recorded by 8 synchronized cameras with more than 7,000 single camera trajectories and over 2,000 unique identities
  • homepage:
The WILDTRACK Seven-Camera HD Dataset

Color Classification

Vehicle Color Recognition on an Urban Road by Feature Context

License Plate Detection and Recognition

Application-Oriented License Plate (AVOP) Database
CCPD: Chinese City Parking Dataset


VoTT: Visual Object Tagging Tool 1.5
  • intro: Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos
  • github:
LabelImg: a graphical image annotation tool and label object bounding boxes in images

Pychet Labeller
ml-pyxis: Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).
  • intro: Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
  • github:
Open Image Dataset downloader
Data Labeler for Video
Computer Vision Annotation Tool (CVAT)

  • intro: Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms
  • github:


BAM! The Behance Artistic Media Dataset


CV Datasets on the web
Awesome Public Datasets
Machine Learning Repository

No comments:

Post a Comment