• Stars
    star
    155
  • Rank 240,864 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 10 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This contains an evolving dataset of fake and real images shared in social media.

image-verification-corpus

This contains an evolving dataset of fake and real posts with images shared in social media (twitter for now). The purpose of the dataset is the development of an open corpus that may be used for assessing online image verification approaches (based on tweet text and user features) and for building classifier for new content (currently tweets containing images).

The dataset consists of three files:

  • set_images.txt: File that contains fake and real images that have been verified by online sources. These images were used to find tweets to build our dataset. The file contains the image_id field used as a reference for each image, the image_url field that presents the online url of the image, the annotation that declares the veracity of the image and the event that the image comes from.
  • tweets_images.txt: File that contains the tweets used to form the dataset and the associated images they contain. This file contains the tweet_id field that presents the id of each tweet, the image_id that presents the reference id of the associated image, the annotation that declares the veracity of each tweet and the event that the tweet comes from.
  • tweets_images_update.txt: File that contains only the pure fake tweets from the previous file. Tweets with funny content or tweets declaring that their content is fake, have been removed from the dataset.
  • tweets_event.txt: File that contains the tweets with fake content we used, but they are no longer online available, either because the user erased them or the user account is suspended.

To use the corpus, just use the set_images.txt file with the verified images and one of the other files that contains the tweets as described above.

The computational-verification project implements a framework that uses the corpus. If you use this dataset and/or the linked framework in your research, please include the following reference in your work:

C. Boididou, S. Papadopoulos, Y. Kompatsiaris, S. Schifferes, N. Newman. Challenges of Computational Verification in Social Media. In Proceedings of SNOW II: Social News on the Web workshop, WWW'14 Companion.

The mediaeval2015 folder contains the version of the dataset provided for the Verifying Multimedia Use task in the context of MediaEval Workshop 2015. In folders devset and testset, you will find the tweet data shared for training and testing respectively. When organizing this task, we have also shared for each dataset's tweets, some features based on tweet and user characteristics and some forensic ones for the images which are associated with the tweets.

If you use this dataset for your research, please include a citation to the following paper: Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., & Kompatsiaris, Y. (2018). Detection and visualization of misleading content on Twitter. International Journal of Multimedia Information Retrieval, 7(1), 71-86.

@article{boididou2018detection,
  author = {Detection and visualization of misleading content on Twitter},
  title = {Boididou, Christina and Papadopoulos, Symeon and Zampoglou, Markos and Apostolidis, Lazaros and Papadopoulou, Olga and Kompatsiaris, Yiannis},
  journal = {International Journal of Multimedia Information Retrieval},
  volume={7},
  number={1},
  pages={71--86},
  year={2018},
  doi = "10.1007/s13735-017-0143-x",     
  publisher={Springer}
}

More Repositories

1

CUDA

GPU-accelerated LIBSVM is a modification of the original LIBSVM that exploits the CUDA framework to significantly reduce processing time while producing identical results. The functionality and interface of LIBSVM remains the same. The modifications were done in the kernel computation, that is now performed using the GPU.
HTML
213
star
2

visil

Authors official PyTorch implementation of the "ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning" [ICCV 2019]
Python
203
star
3

ndvr-dml

Authors official Tensorflow implementation of the "Near-Duplicate Video Retrieval with Deep Metric Learning" [ICCVW 2017]
Python
118
star
4

FIVR-200K

FIVR-200K dataset from the "FIVR: Fine-grained Incident Video Retrieval" [TMM 2019]
Python
78
star
5

intermediate-cnn-features

Feature extraction from videos based on intermediate layers of a Convolutional Neural Network.
Python
63
star
6

multimedia-indexing

A framework for large-scale feature extraction, indexing and retrieval.
Java
59
star
7

greek-sentiment-lexicon

A lexicon to be used for sentiment analysis in Greek.
34
star
8

news-popularity-prediction

A set of methods that predict the future values of popularity indices for news posts using a variety of features.
Python
33
star
9

pygrank

Recommendation algorithms for large graphs
Python
29
star
10

reveal-graph-embedding

Implementation of community-based graph embedding for user classification.
Python
28
star
11

fake-video-corpus

A dataset of debunked and verified user-generated videos.
25
star
12

ImproveMyCity-Mobile

The Android mobile version of the web-based ImproveMyCity application
Java
21
star
13

MyoWebToolkit

Web tools to do research with Myo
JavaScript
18
star
14

JGNN

A Fast Graph Neural Network Library written in Native Java
Java
16
star
15

mmdemo-dockerized

A set of services for monitoring of multiple social media platforms based on Docker.
JavaScript
16
star
16

reveal-user-classification

Performs user classification into labels using a set of seed Twitter users with known labels and the structure of the interaction network between them.
Python
11
star
17

topic-detection

Provides the implementation of a topic detection framework developed for the MULTISENSOR project.
R
9
star
18

easIE

easy Information Extraction: a framework for quickly and simply generating Web Information Extractors and Wrappers.
Java
8
star
19

simmo

Socially interconnected/interlinked and multimedia-enriched objects: A model for representing multimedia content in the context of the Web and Social Media.
Java
8
star
20

prophet

PROPheT (PERICLES Ontology Population Tool)
Python
6
star
21

decentralized-gnn

A library for implementing Decentralized Graph Neural Network algorithms.
Python
6
star
22

reveal-user-annotation

Utility methods for generating labels for Twitter users and handling their storage and retrieval.
Python
5
star
23

verge

VERGE is a hybrid interactive video retrieval system, which is capable of searching into video content by integrating different search modules that employ visual- and textual-based techniques.
PHP
5
star
24

category-based-classification

Contains the implementation of a category-based classification framework developed for the MULTISENSOR project.
Python
5
star
25

contextual-video-verification

Provides support to end users for verifying web videos using metadata and contextual signals.
Java
4
star
26

DanceAnno

Dance annotation tool for data obtained with the Kinect sensor
Python
4
star
27

hackair-data-retrieval

Contains components for air quality data collection, image collection from Flickr and web cams, and image analysis for sky detection and localization.
Java
4
star
28

mgraph-summarization

Implementation of MGraph framework for generating summaries from large collections of social media posts (e.g. tweets).
Java
4
star
29

adaptive-fairness

Implementation of an algorithmic framework for achieving optimal fairness-accuracy trade-offs.
MATLAB
3
star
30

twitter-aq

Dataset and code to reproduce results of Twitter-based Air Quality estimation.
Python
3
star
31

image-privacy

Implements a personalized machine learning approach for image privacy classification.
Java
3
star
32

hugomklab

Lab's static website based on Hugo
HTML
3
star
33

gnn-tf

A TensorFlow framework for the definition and training of Graph Neural Network architectures on interoperable predictive tasks.
Python
2
star
34

usemp-pscore

Implementation of the USEMP Privacy Scoring framework.
Java
2
star
35

hackair-decision-support-api

Contains the hackAIR ontology and reasoning implementation.
Java
2
star
36

company-data-integration

Implements techniques for matching between company-related data across different sources.
Java
1
star
37

simmo-stream-manager

Stream manager adaptation for use with SIMMO.
Java
1
star
38

yamlres

Retrieving algorithm component combinations from online (or local) yaml resources.
Python
1
star
39

pericode

PeriCoDe project
MATLAB
1
star
40

patent_ontologies

PATExpert Semantic Representation Framework
1
star
41

reveal-community-ranking

Reveal Community Ranking
JavaScript
1
star
42

multisensor-concept-event-detection

Python
1
star
43

pygrank-f

A forward-oriented programming variation of pygrank
Python
1
star