Computer Vision Datasets

Datasets who is the best at X ?
Computer Vision Datasets
Introducing the Open Images Dataset
A parallel download util for Google’s open image dataset
Image & Vision Group - Datasets
Huizhong Chen - Datasets

Classification / Recognition

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification

CIFAR-10 / CIFAR100
  • intro: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
  • homepage: http://www.cs.toronto.edu/~kriz/cifar.html

Face

The MegaFace Benchmark: 1 Million Faces for Recognition at Scale
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
MSR Image Recognition Challenge (IRC)
UMDFaces: An Annotated Face Dataset for Training Deep Networks

Vehicle

The Comprehensive Cars (CompCars) dataset

http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/
BoxCars: Improving Fine-Grained Recognition of Vehicles Using 3-D Bounding Boxes in Traffic Surveillance [IEEE T-ITS]

https://medusa.fit.vutbr.cz/traffic/research-topics/fine-grained-vehicle-recognition/boxcars-improving-vehicle-fine-grained-recognition-using-3d-bounding-boxes-in-traffic-surveillance/
Cars Dataset

Scene Recognition

Places: An Image Database for Deep Scene Understanding
Places2
The Places365-CNNs for Scene Classification

MNIST

EMNIST: an extension of MNIST to handwritten letters
Fashion-MNIST

Food

3 Million Instacart Orders, Open Sourced
https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2

Detection

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects
https://arxiv.org/abs/1804.00525
Exclusively Dark (ExDark) Image Dataset
  • intro: Exclusively Dark (ExDARK) dataset which to the best of our knowledge, is the largest collection of low-light images taken in very low-light environments to twilight (i.e 10 different conditions) to-date with image class and object level annotations.
  • github: https://github.com/cs-chan/Exclusively-Dark-Image-Dataset

Face Detection

FDDB: Face Detection Data Set and Benchmark
WIDER FACE: A Face Detection Benchmark

Pedestrian Detection


Caltech Pedestrian Detection Benchmark
Caltech Pedestrian Dataset Converter
https://github.com/mitmul/caltech-pedestrian-dataset-converter
CityPersons: A Diverse Dataset for Pedestrian Detection
CrowdHuman: A Benchmark for Detecting Human in a Crowd
  • intro: CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset
  • homepage: https://sshao0516.github.io/CrowdHuman/
EuroCity Persons Dataset
  • intro: collected on-board a moving vehicle in 31 cities of 12 European countries, over 238200 person instances manually labeled in over 47300 images, contains a large number of person orientation annotations (over 211200)
  • arxiv: https://arxiv.org/abs/1805.07193

Vehicle Detection

Toyota Motor Europe (TME) Motorway Dataset
Welcome to BIT-Vehicle Dataset

Salieny Detection

MSRA10K Salient Object Database
http://mmcheng.net/msra10k/

Logo Detection

QMUL-OpenLogo: Open Logo Detection Challenge
  • intro: QMUL-OpenLogo contains 27,083 images from 352 logo classes, built by aggregating and refining 7 existing datasets and establishing an open logo detection evaluation protocol
  • homepage: https://qmul-openlogo.github.io/

Head Detection

SCUT-HEAD
HollywoodHeads dataset
http://www.di.ens.fr/willow/research/headdetection/
Brainwash dataset.
https://exhibits.stanford.edu/data/catalog/sx925dc9385

Detection From Video

YouTube-Objects dataset v2.2
ILSVRC2015: Object detection from video (VID)

Segmentation


Mapillary Vistas Dataset

Mapillary Vistas Dataset
Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See
http://blog.mapillary.com/product/2017/05/03/mapillary-vistas-dataset.html
Multi-Human Parsing

https://lv-mhp.github.io/

PASCAL VOC


Augmented Pascal VOC

http://home.bharathh.info/pubs/codes/SBD/download.html

Supervisely Person


Microsoft COCO


The Oxford-IIIT Pet Dataset


  • intro: a 37 category pet dataset with roughly 200 images for each class. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
  • homepage: http://www.robots.ox.ac.uk/~vgg/data/pets/

COCO-Stuff

COCO-Stuff: Thing and Stuff Classes in Context
COCO-Stuff 10K dataset v1.1
https://arxiv.org/abs/1612.03716 https://github.com/nightrome/cocostuff

Scene Parsing

MIT Scene Parsing Benchmark
http://sceneparsing.csail.mit.edu/
ADE20K
  • intro: train: 20,120 images, val: 2000 images. contains 150 stuff/object category labels (e.g., wall, sky, and tree) and 1,038 imagelevel scene descriptors (e.g., airport terminal, bedroom, and street).
  • homepage: http://groups.csail.mit.edu/vision/datasets/ADE20K/
Semantic Understanding of Scenes through the ADE20K Dataset
https://arxiv.org/abs/1608.05442

ImageNet


ImageNet-Utils

Captioning / Description

TGIF: A New Dataset and Benchmark on Animated GIF Description
Collecting Multilingual Parallel Video Descriptions Using Mechanical Turk

Video


Dataset # Videos # Classes Year Manually Labeled ?
Kodak 1,358 25 2007
HMDB51 7000 51
Charades 9848 157
MCG-WEBV 234,414 15 2009
CCV 9,317 20 2011
UCF-101 13,320 101 2012
THUMOS-2 18,394 101 2014
MED-2014 ≈28,000 20 2014
Sports-1M 1M 487 2014
ActivityNet 27,801 203 2015
FCVID 91,223 239 2015
UCF101 - Action Recognition Data Set

HMDB51: A Large Video Database for Human Motion Recognition
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
Sports-1M
Charades Dataset
  • intro: This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.
  • intro: The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos.
  • homepage: http://allenai.org/plato/charades/
FCVID: Fudan-Columbia Video Dataset
YouTube-8M: A Large-Scale Video Classification Benchmark
stabilized video frames
The Kinetics Human Action Video Dataset
e-Lab Video Data Set(s)
  • intro: “Currently, e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each (see histogram below). We are aiming to collect overall 1750 (50 × 35) videos with your help.”
  • homepage: https://engineering.purdue.edu/elab/eVDS
Video Dataset Overview

Scene

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

Autonomous Driving

BDD: Berkely Deep Drive

OCR

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Chinese Text in the Wild

Retrieval

Oxford5k
Paris6k
Oxford105k
UKB
NUS-WIDE
ImageNet-YahooQA
DeepFashion: In-shop Clothes Retrieval

Person Re-ID


Dataset Description
CUHK01 971 identities, 3884 images, manually cropped
CUHK02 1816 identities, 7264 images, manually cropped
CUHK03 1360 identities, 13164 images, manually cropped + automatically detected
Person Re-identification Datasets
CUHK Person Re-identification Datasets
http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html
PRW (Person Re-identification in the Wild) Dataset

Person Re-identification in the Wild
DukeMTMC-reID
  • intro: DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset
  • intro: 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images
  • github: https://github.com/layumi/DukeMTMC-reID_evaluation
DukeMTMC4ReID
Person Re-ID (PRID) Dataset 2011
https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/
MARS (Motion Analysis and Re-identification Set) Dataset
X-MARS Reordering of the MARS Dataset for Image to Video Evaluation
MSMT17
Labeled Pedestrian in the Wild
SenseReID
https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view
3DPeS
http://www.openvisor.org/3dpes.asp

Fashion

Large-scale Fashion (DeepFashion) Database
Apparel classification with Style

Attribute Datasets

Attribute Datasets

Pedestrian Attribute Recognition

A Richly Annotated Dataset for Pedestrian Attribute Recognition
Pedestrian Attribute Recognition At Far Distance
Market-1501_Attribute
DukeMTMC-attribute

Tracking

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
DukeMTMC: Duke Multi-Target, Multi-Camera Tracking Project
  • intro: DukeMTMC aims to accelerate advances in multi-target multi-camera tracking. It provides a tracking system that works within and across cameras, a new large scale HD video data set recorded by 8 synchronized cameras with more than 7,000 single camera trajectories and over 2,000 unique identities
  • homepage: http://vision.cs.duke.edu/DukeMTMC/
The WILDTRACK Seven-Camera HD Dataset
https://cvlab.epfl.ch/data/wildtrack

Color Classification

Vehicle Color Recognition on an Urban Road by Feature Context
http://mclab.eic.hust.edu.cn/~pchen/project.html

License Plate Detection and Recognition

Application-Oriented License Plate (AVOP) Database
http://aolpr.ntust.edu.tw/lab/download.html
CCPD: Chinese City Parking Dataset

Tools

VoTT: Visual Object Tagging Tool 1.5
  • intro: Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos
  • github: https://github.com/Microsoft/VoTT
LabelImg: a graphical image annotation tool and label object bounding boxes in images

Pychet Labeller
ml-pyxis: Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).
  • intro: Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
  • github: https://github.com/vicolab/ml-pyxis
Open Image Dataset downloader
BBox-Label-Tool
Data Labeler for Video
Computer Vision Annotation Tool (CVAT)

  • intro: Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms
  • github: https://github.com/opencv/cvat

Artist

BAM! The Behance Artistic Media Dataset

Resources

CV Datasets on the web
http://www.cvpapers.com/datasets.html
Awesome Public Datasets
Machine Learning Repository
https://archive.ics.uci.edu/ml/datasets.html

Comments

Popular posts from this blog

SOX - Sound eXchange - How to use SOX for audio processing tasks in research.

Sox of Silence - Original post - http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/

How to get video or audio duration of a file using ffmpeg?