format-corpus
An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.
All items, apart from the source code under 'tools', is CC0 licenced unless otherwise stated. The source code is Apache 2.0 Licenced unless otherwise stated.
A recent summary of the contents of the repository can be found here.
How to Contribute
See http://wiki.curatecamp.org/index.php/Collecting_format_ID_test_files for more information.
See metadata-template.ext.md for a simple per-file metadata template.
Pooled Signatures
As well as pooling example files, we also pool format signatures:
- Tika signatures staged here: https://github.com/openplanets/format-corpus/tree/master/tools/fidget/src/main/resources/tika-bl-staging
- Tika signatures later merged here: [https://github.com/openplanets/format-corpus/blob/master/tools/fidget/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml here]
- DROID signatures go [https://github.com/openplanets/format-corpus/tree/master/tools/fidget/src/main/resources/droid here].
More details here: http://wiki.curatecamp.org/index.php/Improving_format_ID_coverage