• Stars
    star
    177
  • Rank 214,805 (Top 5 %)
  • Language OpenEdge ABL
  • License
    MIT License
  • Created over 6 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

COCO-CN

COCO-CN is a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags. The new dataset can be used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.

Chinese sentences COCO-CN train COCO-CN val COCO-CN test
human written βœ… βœ… βœ…
human translation ❌ ❌ βœ…
machine translation (baidu) βœ… βœ… βœ…

coco-cn annotation examples

Progress

  • version 201805: 20,341 images (training / validation / test: 18,341 / 1,000 / 1,000), associated with 22,218 manually written Chinese sentences and 5,000 manually translated sentences. Data is freely available upon request. Please submit your request via Google Form.
  • Precomputed image features: ResNext-101
  • COCO-CN-Results-Viewer: A lightweight tool to inspect the results of different image captioning systems on the COCO-CN test set, developed by Emiel van Miltenburg at the Tilburg University.
  • NUS-WIDE100: An extra test set.

Citation

If you find COCO-CN useful, please consider citing the following paper: