Arramooz
Arabic Dictionary for Morphological analysis
Developers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com Collect data manually Mohamed Kebdani, Morroco < med.kebdani gmail.com>
Features | value |
---|---|
Authors | Authors.md |
Release | 0.3 |
License | GPL |
Tracker | linuxscout/arramooz/Issues |
Website | http://arramooz.sourceforge.net |
Source | Github |
Download | sourceforge |
Feedbacks | Comments |
Accounts | @Twitter @Sourceforge |
Description
Arramooz Alwaseet is an open source Arabic dictionary for morphological analyze, It can help Natural Language processing developers. This work is generated from the Ayaspell( Arabic spellchecker) brut data, which are collected manually.
This dictionary consists of three parts :
- stop words
- verbs
- Nouns
If you would cite it in academic work, can you use this citation
T. Zerrouki, Arramooz Alwaseet : Arabic Dictionary for Morphological analysis, http://arramooz.sourceforge.net/ https://github.com/linuxscout/arramooz
or in bibtex format
@misc{zerrouki2011arramooz,
title={Arramooz Alwaseet : Arabic Dictionary for Morphological analysis},
author={Zerrouki, Taha},
url={http://arramooz.sourceforge.net/},
year={2011}
}
API
The python API is available as arramooz-pysqlite
Files formats
Those files are available as :
- Text format (tab separated)
- SQL database
- XML files.
- StarDict files
- Python + Sqlite libray
BUILD Dictionary in multiple format
The source files are data folder as open document speadsheet files, then we can build dictionary with
make
which will generate xml, sql and text files, and package it in releases folder.
To make Hunspell files only
make spell
To make SatrDict files only
make stardict
NOTE: you must use stardict-editor to Compile releases/stardict/arramooz.sdic in babylon format
To modify the version, you can update $VERSION variable in Makefile file.
To clean releases use:
make clean
To modify data or updating data you can open files in data/ in libreoffice calc format, clean releases, and do make.
Stopwords
The Stop words list is developed in an independent project (see http://arabicstopwords.sourceforge.ne)
Data Structure
Data Structures in multiple format (csv, sql, xml) are described in DataStructures.md
- nouns and verbs are described in datastructures.md
- Stop words ( are explained in separate project Arabic Stopwords
Script Files:
1- generate the abstract dictionary from the brut manual dictionary:
python2 $SCRIPT/verbs/gen_verb_dict.py -f $DATA_DIR/verbs/verb_dic_data-net.csv > $OUTPUT/verbs.aya.dic
2- generate the file format (xml, csv, sql) of dictionary from verbs.aya.dic
python2 $SCRIPT/verbs/gen_verb_dict_format.py -o xml -f $OUTPUT/verbs.aya.dic > $OUTPUT/verbs.xml
-
[scripts/verbs]
1- verbdict_functions.py : functions to handle verbs dict used in the generation process
2- verbs/gen_verb_dict.py: generate the abstract dictionary from the brut manual dictionary
3- verbs/gen_verb_dict_format.py: generate the file format (xml, csv, sql) of dictionary from verbs.aya.dic
-
[scripts/nouns]
1- noundict_functions.py : functions to handle nouns dict used in the generation process
2- nouns/gen_noun_dict.py: generate the file format (xml, csv, sql) of dictionary
-
[requirement]
1- libqutrub
2- pyarabic
Data Files:
This files are used to create ayaspell dictionary for spellchecking arramooz\verbs\data
File | Description |
---|---|
verb_dic_data-net.csv | brut data made manually by Mohamed kebdani. |
ar_verb_normalized.dict | A list of arabic verbs, from Qutrub project. |
triverbtable.py | A list of trilateral verbs, used by Qutrub. |
verbs.aya.dic | The verb dictionary in abstract format. |