• Stars
    star
    128
  • Rank 281,044 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Arabic Dictionary for Morphological analysis

Arramooz

Arabic Dictionary for Morphological analysis

downloads downloads

Developers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com Collect data manually Mohamed Kebdani, Morroco < med.kebdani gmail.com>

Features value
Authors Authors.md
Release 0.3
License GPL
Tracker linuxscout/arramooz/Issues
Website http://arramooz.sourceforge.net
Source Github
Download sourceforge
Feedbacks Comments
Accounts @Twitter @Sourceforge

Description

Arramooz Alwaseet is an open source Arabic dictionary for morphological analyze, It can help Natural Language processing developers. This work is generated from the Ayaspell( Arabic spellchecker) brut data, which are collected manually.

This dictionary consists of three parts :

  • stop words
  • verbs
  • Nouns

If you would cite it in academic work, can you use this citation

T. Zerrouki‏, Arramooz Alwaseet : Arabic Dictionary for Morphological analysis,  http://arramooz.sourceforge.net/ https://github.com/linuxscout/arramooz

or in bibtex format

@misc{zerrouki2011arramooz,
  title={Arramooz Alwaseet : Arabic Dictionary for Morphological analysis},
  author={Zerrouki, Taha},
  url={http://arramooz.sourceforge.net/},
  year={2011}
}

API

The python API is available as arramooz-pysqlite

Files formats

Those files are available as :

  • Text format (tab separated)
  • SQL database
  • XML files.
  • StarDict files
  • Python + Sqlite libray

BUILD Dictionary in multiple format

The source files are data folder as open document speadsheet files, then we can build dictionary with

make

which will generate xml, sql and text files, and package it in releases folder.

To make Hunspell files only

make spell

To make SatrDict files only

make stardict

NOTE: you must use stardict-editor to Compile releases/stardict/arramooz.sdic in babylon format

To modify the version, you can update $VERSION variable in Makefile file.

To clean releases use:

make clean

To modify data or updating data you can open files in data/ in libreoffice calc format, clean releases, and do make.

Stopwords

The Stop words list is developed in an independent project (see http://arabicstopwords.sourceforge.ne)

Data Structure

Data Structures in multiple format (csv, sql, xml) are described in DataStructures.md

  • nouns and verbs are described in datastructures.md
  • Stop words ( are explained in separate project Arabic Stopwords

Script Files:

1- generate the abstract dictionary from the brut manual dictionary:

python2 $SCRIPT/verbs/gen_verb_dict.py -f $DATA_DIR/verbs/verb_dic_data-net.csv > $OUTPUT/verbs.aya.dic

2- generate the file format (xml, csv, sql) of dictionary from verbs.aya.dic

python2 $SCRIPT/verbs/gen_verb_dict_format.py -o xml -f $OUTPUT/verbs.aya.dic > $OUTPUT/verbs.xml
  • [scripts/verbs]

    1- verbdict_functions.py : functions to handle verbs dict used in the generation process

    2- verbs/gen_verb_dict.py: generate the abstract dictionary from the brut manual dictionary

    3- verbs/gen_verb_dict_format.py: generate the file format (xml, csv, sql) of dictionary from verbs.aya.dic

  • [scripts/nouns]

    1- noundict_functions.py : functions to handle nouns dict used in the generation process

    2- nouns/gen_noun_dict.py: generate the file format (xml, csv, sql) of dictionary

  • [requirement]

    1- libqutrub

    2- pyarabic

Data Files:

This files are used to create ayaspell dictionary for spellchecking arramooz\verbs\data

File Description
verb_dic_data-net.csv brut data made manually by Mohamed kebdani.
ar_verb_normalized.dict A list of arabic verbs, from Qutrub project.
triverbtable.py A list of trilateral verbs, used by Qutrub.
verbs.aya.dic The verb dictionary in abstract format.

More Repositories

1

pyarabic

pyarabic
Python
388
star
2

mishkal

Mishkal is an arabic text vocalization software
Python
241
star
3

tashaphyne

Tashaphyne: Arabic Light Stemmer
Python
86
star
4

qutrub

Qutrub: Arabic verb conjugator
Python
70
star
5

arabicnlptoolslist

Arabic NLP tools List inventory
67
star
6

festival-tts-arabic-voices

Arabic voices for Festival TTS
Scheme
65
star
7

adawat

Adawat: Arabic Text tools
Python
48
star
8

yaraspell

YaraSpell is an simplified arabic spell checker
Python
43
star
9

ayaspell

AyaSpell Arabic Dictionary for Hunspell Spellchecker
Shell
33
star
10

qalsadi

Qalsadi: Arabic mophological analyzer Library for python.
Python
32
star
11

alyahmor

Arabic flexionnal morphology generator
Python
30
star
12

fareh

Fareh: Arabic rules database for grammar and style checking فارح: لغتنا الجميلة
Python
29
star
13

arabicstopwords

Arabic Stop Word List
Python
27
star
14

Arrand-arabic-random-text

Python
25
star
15

ghalatawi

Ghalatawi: Arabic Autocorrect library
Python
20
star
16

shellshal

Shell Scripts for Arabic Language
Shell
17
star
17

mysam-tagmanager

Mysam: Arabic tags manager, ميسم: إدارة الوسوم العربية
Python
16
star
18

tashkeela2

Arabic vocalized text corpus
14
star
19

mishtar

Mishtar: Named and temporal entities chunker
Python
13
star
20

naftawayh

Naftawayh: arabic word tagger
Python
12
star
21

miknaaz

Generate arabic golden standard corpus for morphology and stemming
Python
12
star
22

yaziji

Yaziji : Arabic phrase generator
JavaScript
12
star
23

awk-arabic

Arabic Texts task by AWK
11
star
24

festival-arabic

Arabic Support for Festival speech synthesis system
Scheme
11
star
25

thaalab-aranasyn

Thaalab: Arabic Syntaxical Analyzer
Python
10
star
26

arramooz-pysqlite

Arabic Dictionary for Morphological analysis - python + sqlite
Python
9
star
27

sarf

Sarf - Arabic Morphology System
Java
9
star
28

saygh

Arabic morphological generator
Python
8
star
29

maskouk-pysqlite

Arabic collocations library and data for Python
Python
8
star
30

arabic-roots

Arabic roots list resource
Python
7
star
31

arabic-stemmers-tester

َArabic test for stemmers
Python
7
star
32

aghlat

Aghlat: Arabic misspelling corpus
Python
7
star
33

malsoune

Malsoune: deaf students assistant
Java
7
star
34

adawat-latex

Text tools to handle conversion into Latex with arabic support
Python
6
star
35

qutrubi

Qutrubi : Arabic verb conjugation Mobile Application
Java
4
star
36

asmai-arabic-semantic

Asmai: Al'Asma'i arabic semantic analyzer
TSQL
4
star
37

quran_word_index

Index of Quran words in arabic
Python
4
star
38

AraCorpus

Arabic Corpus
3
star
39

linuxscout

My CV
3
star
40

arabic-affixes

Arabic Affixes (prefixes and suffixes) resource
3
star
41

sylajone-arabic-syntax

Sylajone: Arabic syntax Analyzer library
Python
3
star
42

examanager

Exam and control Management
3
star
43

strm-tests

Create Random tests for Stucture Machine 1- first Year MI, Mathematiques & Informatiques in Algerian universities.
Python
3
star
44

openCTT

Open course time tabler
C#
2
star
45

salamscout

A game for scout to learn some skils
Python
1
star
46

mintiq-raspberry

Porting Arabic Speech synthesis on Raspberry
Makefile
1
star
47

nibras-app

Nibras: a technical terms dictionary for Students
Java
1
star
48

hunspell

Hunspell spellchecker
C++
1
star
49

hcla_lexique

Dictionaries of the High Council of the Arabic Language
Python
1
star
50

i3rab-quiz-data

Python
1
star