• Stars
    star
    114
  • Rank 306,338 (Top 7 %)
  • Language
    Python
  • Created over 9 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Large Arabic Resources For Sentiment Analysis

Large Multi-Domain Resources for Arabic Sentiment Analysis

3rd Best Paper winner at the International Conference on Computational Linguistics and Intelligent Text Processing CICLing2015

Download the paper from here

Overview :

The Repository includes the following :

  • 33K Automatically annotated Reviews in Domains of Movies, Hotels, Restaurants and Products
  • Domain specific lexicons, semi automatically generated from the datasets above (2K total)
  • A total of 615 Experiments over each of the datasets experimenting :
    • Classifiers : Linear SVM, Logistic Regression, KNN, BNB, SGD training with SVM (Hinge loss and L1 penality)
    • Sandard Features : TFIDF, Term Count, Term Existence, Delta-TFIDF
    • Lexicon Based Features: domain specific and domain general
    • Combining features : Lexicon based feature vectors + Standard features
    • Classification Problems : with neutral class included or not
    • Balanced or unBalanced Datasets
  • Results of Each of the Experiments

Dataset Statistics

Datasets :

####ATT.csv

  • Dataset of Attraction Reviews scrapped from TripAdvisor.com
  • 2154 reviews

####HTL.csv

  • Dataset of Hotel Reviews scrapped from TripAdvisor.com
  • 15572 reviews

####MOV.csv

  • Dataset of Movie Reviews scrapped from elcinema.com
  • 1524 reviews

####PROD.csv

  • Dataset of product reviews scrapped from souq.com
  • 4272 reviews

####RES1.csv

  • dataset of restaurant reviews scrapped from qaym.com
  • 8364 reviews

####RES2.csv

  • dataset of restaurant reviews scrapped from tripadvisor.com
  • 2642 reviews

####RES.csv

  • RES1.csv and RES2.csv combined
  • 10970 reviews

Lexicons

Domain specific lexicons, semi automatically generated from the datasets above (2K total)

lexicon MOV RES PROD HTL BOOK Total
size 87 734 369 218 874 1913

More Repositories

1

CNN-RelationExtraction

Convolution neural network for relation extraction between two given entities
Python
178
star
2

Zeroshot-QuestionGeneration

Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types
Python
88
star
3

RE-NLG-Dataset

T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples
Python
63
star
4

must-read-sentimentAnalysis

List of Resources for Sentiment Analysis Researcher Starter
46
star
5

t-rex

A Large Scale Alignment of NaturalLanguage with Knowledge Base Triples for Relation Extraction and Natural language Generation
HTML
45
star
6

relation-discovery-2-entities

unsupervised relation discovery between two entities
Python
24
star
7

large-arabic-multidomain-lexicon

repository for the project of building large arabic multidomain lexicon for sentiment analysis using feature selection from multiple resources scrapped from the web.
Python
17
star
8

EBM-visualizations

notebooks of cool EBM visualizations
Jupyter Notebook
16
star
9

awesome-reading-group

Agile reading group that works
7
star
10

domain-shift-prediction

👓 "To Annotate or Not Predicting Performance Drop under Domain Shift" EMNLP2019
6
star
11

OpenIE

OpenIE Experiments
Python
4
star
12

Awesome-Energy-Based-Models

learning material on energy based models
3
star
13

Weet-it_Website

an ASP.net websites utilizes the Weetit service
ASP
1
star
14

Template-AfricaNLP-Workshop

Template: AfricaNLP Workshop
TeX
1
star
15

sentiment-experiments

experiments for automating of building of lexicon used for sentiment analysis
Prolog
1
star
16

tweets-grapper

LightWeight Tweets Grapper available with some processing options on the tweets for Text Mining common tasks
Python
1
star
17

Kinect-Piano

an application which use Kinect as a
1
star
18

foursquare2RDF

a service collects data from foursquare for this specific brand and finally rdf-izes this data.
C#
1
star
19

Weet-itWebsite

C#
1
star
20

Template-for-ICLR-2021-Workshop-on-Energy-Based-Models

Forked from https://github.com/ICLR/Master-Template/tree/master/iclr2021
TeX
1
star