Papers with Data

Data reigns supreme 🥇

Every day it becomes more evident that data is the limiting factor for state-of-the-art 📈 machine learning. Your model architecture may be revolutionary, but without high-quality data 📊 to train on, it will be doomed to mediocrity.

Pair idea with execution and use top-notch data in your next project!

WACV 2024

Title	Tags	Paper	Dataset	Code
dacl10k: Benchmark for Semantic Bridge Damage Segmentation	`image`, `semantic segmentation`, `classification`, `construction`, `defect`

ICCV 2023

Title	Tags	Paper	Dataset	Code
Satlas: A Large-Scale, Multi-Task Dataset for Remote Sensing Image Understanding	`image`, `SAR`, `satellite`, `detection`, `climate`
Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds	`3D`, `point cloud`
EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding	`image`, `object`, `ego`
Equivariant Similarity for Vision-Language Foundation Models	`image`, `similarity`, `caption`
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes	`video`, `segmentation`, `tracking`
SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes	`multi-object tracking`, `sports`

CVPR 2023

We've combed through the 2359 papers accepted to CVPR in 2023 and compiled a short-list of papers introducing exciting new datasets.

Title	Tags	Paper	Dataset	Code
MVImgNet: A Large-scale Dataset of Multi-view Images	`multi-view`, `image`
GeoNet: Benchmarking Unsupervised Adaptation across Geographies	`geolocation`, `image`
Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset	`denoising`, `image`
Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo	`optical flow`, `stereo`, `image`
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing	`image`, `editing`
ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data	`RGB-D`, `segmentation`, `video`
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification	`low-light`, `cross-modal`, `IR`
JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking	`pose estimation`, `image`, `keypoint`, `tracking`
A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation	`synthetic`, `domain adaptation`, `supervised`

Papers from 2023

Title	Tags	Paper	Dataset	Code
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data	`perceptual similarity`, `image`, `synthetic`, `diffusion`, `JND`, `2AFC`

Papers from 2022

Title	Tags	Paper	Dataset	Code
Calving fronts and where to find them: a benchmark dataset and methodology for automatic glacier calving front extraction from synthetic aperture radar imagery	`glacier`, `climate`, `SAR`, `satellite`, `image`, `semantic segmentation`
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting	`conservation`, `detection`, `SONAR`, `video`, `tracking`, `counting`

Classics

Title	Tags	Paper	Dataset	Code
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases	`x-ray`, `image`, `healthcare`, `detection`

Contributing 👋

We would love your help in making this repository even better! If we missed a paper that introduced a new dataset, or if you can think of any ways to improve the repository, feel free to open an issue or a pull request.

Note

This repository is inspired by paperswithcode, and the template was adapted from top-cvpr-2023-papers.

voxel51/papers-with-data

voxel51

Reviews

Repository Details