• Stars
    star
    241
  • Rank 161,848 (Top 4 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 10 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mishkal is an arabic text vocalization software

Mishkal

Mishkal Arabic text vocalization software مشكال لتشكيل النصوص العربية

GitHub stars GitHub forks GitHub contributors GitHub issues downloads downloads GitHub license

Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Features value
Authors Authors.md
Release 1.10 Bouira
License GPL
Tracker linuxscout/mishkal/Issues
Mailinglist [email protected]
Website tahadz.com/mishkal
Source Github
Download sourceforge
Feedbacks Comments
Accounts @Facebook @Twitter @Sourceforge

Table of Contents

Citation

Please, if you want to cite this software use the following citation

@thesis{zerrouki2020adawat,
author = {Taha Zerrouki},
title = {Towards An Open Platform For Arabic Language Processing},
type = {PhD thesis},
institution = {Ecole Nationale Supérieure d'informatique, Alger, Algérie},
date = {2020},
}

Install

You can Install Mishkal as library or Software

Python lib

pip install mishkal

Install from github

  1. Clone mishkal project from GitHub:
git clone https://github.com/linuxscout/mishkal.git
  1. Install necessary packages:
pip install -r miskal/requirements.txt

Requirments

- pyarabic  : basic arabic library
- sylajone  : aranasyn syntaxical analyzer
- arramooz  : arabic morphological dictionary
- asmai     : semantic analyzer
- CodernityDB :  pure python, fast, NoSQL database, used as cache system to minimize load of morphological analyzer 
- collocations : collocation library ( deprecated)
- libqutrub : verb conjugation library used by morphological analyzer
- maskouk   : collocation library
- naftawayh : word tag library
- qalsadi   ; morphological analyzer
- tashaphyne : light stemmer used by morphological analyzer

Usage

Mishkal provides:

  • Console command line
  • python library
  • GUi interface
  • Web interface
  • API interface

GUI:

  • Windows: MishkalGui.exe

  • GUI: Linux

    python interfaces/gui/mishkal-gui.py
    

Web server (linux, windows)

python3 interfaces/web/mishkal-webserver

Console (linux/windows)

$ python3 bin/mishkal-console.py -f filename

Usage: bin/mishal-console.py  -f filename [OPTIONS]
           bin/mishal-console.py  'السلام عليكم' [OPTIONS]

        [-f | --file = filename]       input file 
        [-o | --outfile = filename]    output file to write vocalized text to, '$FILENAME (Tashkeel).txt' by default
        
        [-h | --help]             outputs this usage message
        [-v | --version]        program version
        [-p | --progress]      display progress status
        [-a | --verbose]       enable verbosity

        * Tashkeel Actions
        -------------------
        [-r | --reduced]        Reduced Tashkeel.
        [-s | --strip]             Strip tashkeel (remove harakat).
        [-c | --compare]      compare the vocalized text with the program output

        * Tashkeel Options
        ------------------
        [-l | --limit]             vocalize only a limited number of line
        [-x | --syntax]         disable syntaxic analysis
        [-m | --semantic]    disable semantic analysis
        [-g | --train]             enable training option
        [-i | --ignore]           ignore the last Mark on output words.
        [-t | --stat]               disable statistic tashkeel

This program is licensed under the GPL License

Example:

>>> import mishkal.tashkeel
>>> vocalizer = mishkal.tashkeel.TashkeelClass()
>>> text = u"تطلع الشمس صباحا"
>>> vocalizer.tashkeel(text)
' تَطْلُعُ الشَّمْسُ صَبَاحًا'
>>> 

JSON connection API التشكيل عن بعد

يمكن استدعاء خدمة الموقع عبر مكتبة جيسون json و ajax من أي موقع، ويمكنك استعمالها في موقعك.

  • طريقة الاستدعاء 1- باستعمال تقنية json مع مكتبة Jquery
<!DOCTYPE html   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <script src="http://code.jquery.com/jquery-latest.js"></script>
</head>
<body>
  <div id="result">

</div>
<script>
$().ready(function() {
$.getJSON("http://tahadz.com/mishkal/ajaxGet", {text:"السلام عليكم\nاهلا بكم\nكيف حالكم", action:"TashkeelText"},
  function(data) {
      $("#result").text(data.result);
  });

 });
</script>

الاستدعاء يكون كما يأتي

$.getJSON("http://tahadz.com/mishkal/ajax...", {text:"السلام عليكم\nاهلا بكم\nكيف حالكم", action:"TashkeelText"},

حيث

  • text: النص المطلوب تشكيله.
  • action: العملية المطلوبة وهنا هي TashkeelText.

النتيجة تكون من الشكل

{"result": " السّلامُ عَلَيكُمْ اهلا بِكُمْ كَيْفَ حالُكُمْ", "order": "0"}

حيث

  • result: النص الناتج المشكول.
  • order: رقم السطر في النص الأصلي، فإذا كان النص الأصلي كبيرا يقسمه المشكال لعدد من الاسطر، وقد لا يرجعون في نفس الترتيب، لذا حددنا رقم الترتيب.

How does Mishkal work:

Mishkal use a rule based method to detect relations and diacritics, First, it analyzes all morphological cases, it generates all possible diacritized word forms, by detecting all affixes and check it in a dictionary. second, It add word frequency to each word.

The two previous steps are made by support/Qalsadi ( arabic morphological analyzer), the used dictionary is a separated project named 'Arramooz: arabic dictionnary for morphology".

Third, we use a syntax analyzer to detect all possible relations between words. The syntax library is named support/ArAnaSyn. This analyzer is basic for the moment, it use only linear relations between adjacent words.

Forth, all data generated and relations will be analyzed semantically, to detect semantic relation in order to reduce ambiguity. The use libary is support/asmai ( Arabic semantic analysis). The semantic relations extraction is based on corpus. The used corpus is named "Tashkeela: arabic vocalized texts corpus".

In the final stage, The module mishkal/tashkeel tries to select the suitable word in the context, it tries to get evidents cases, or more related cases, else, it tries to select more probable case, using some rules like select a stop word by default, or select Mansoub case by default.

The rest of program provides functions to handles interfaces and API with web/desktop or command line

Featured Posts

  • كيفية شكيل الحروف والكلمات أو حتى نصوص باللغة العربية في ثواني من خلال متصفحك- رضا بوربعة
  • خدمة عربية جديدة : تشكيل النصوص العربية Sam Hamou
  • إطلاق الإصدار التجريبي برنامج مشكال لتشكيل النصوص العربية Zaid AlSaadi
  • مشكال: الطريق نحو التشكيل مدونة اليراع
  • مشكال لتشكيل النصوص العربية: إطلاق واجهة سطح المكتب مدونة اليراع
  • تعرّف على مشاريع “تحدّث” .. مشاريعٌ للغةٍ عظيمة محمد هاني صباغ

More Repositories

1

pyarabic

pyarabic
Python
388
star
2

arramooz

Arabic Dictionary for Morphological analysis
Python
128
star
3

tashaphyne

Tashaphyne: Arabic Light Stemmer
Python
86
star
4

qutrub

Qutrub: Arabic verb conjugator
Python
70
star
5

arabicnlptoolslist

Arabic NLP tools List inventory
67
star
6

festival-tts-arabic-voices

Arabic voices for Festival TTS
Scheme
60
star
7

adawat

Adawat: Arabic Text tools
Python
48
star
8

yaraspell

YaraSpell is an simplified arabic spell checker
Python
43
star
9

ayaspell

AyaSpell Arabic Dictionary for Hunspell Spellchecker
Shell
33
star
10

qalsadi

Qalsadi: Arabic mophological analyzer Library for python.
Python
32
star
11

alyahmor

Arabic flexionnal morphology generator
Python
30
star
12

fareh

Fareh: Arabic rules database for grammar and style checking فارح: لغتنا الجميلة
Python
29
star
13

arabicstopwords

Arabic Stop Word List
Python
27
star
14

Arrand-arabic-random-text

Python
25
star
15

ghalatawi

Ghalatawi: Arabic Autocorrect library
Python
20
star
16

shellshal

Shell Scripts for Arabic Language
Shell
17
star
17

mysam-tagmanager

Mysam: Arabic tags manager, ميسم: إدارة الوسوم العربية
Python
16
star
18

tashkeela2

Arabic vocalized text corpus
14
star
19

mishtar

Mishtar: Named and temporal entities chunker
Python
13
star
20

naftawayh

Naftawayh: arabic word tagger
Python
12
star
21

miknaaz

Generate arabic golden standard corpus for morphology and stemming
Python
12
star
22

yaziji

Yaziji : Arabic phrase generator
JavaScript
12
star
23

awk-arabic

Arabic Texts task by AWK
11
star
24

festival-arabic

Arabic Support for Festival speech synthesis system
Scheme
11
star
25

thaalab-aranasyn

Thaalab: Arabic Syntaxical Analyzer
Python
10
star
26

arramooz-pysqlite

Arabic Dictionary for Morphological analysis - python + sqlite
Python
9
star
27

sarf

Sarf - Arabic Morphology System
Java
9
star
28

saygh

Arabic morphological generator
Python
8
star
29

maskouk-pysqlite

Arabic collocations library and data for Python
Python
8
star
30

arabic-roots

Arabic roots list resource
Python
7
star
31

arabic-stemmers-tester

َArabic test for stemmers
Python
7
star
32

aghlat

Aghlat: Arabic misspelling corpus
Python
7
star
33

malsoune

Malsoune: deaf students assistant
Java
7
star
34

adawat-latex

Text tools to handle conversion into Latex with arabic support
Python
6
star
35

qutrubi

Qutrubi : Arabic verb conjugation Mobile Application
Java
4
star
36

asmai-arabic-semantic

Asmai: Al'Asma'i arabic semantic analyzer
TSQL
4
star
37

quran_word_index

Index of Quran words in arabic
Python
4
star
38

AraCorpus

Arabic Corpus
3
star
39

linuxscout

My CV
3
star
40

arabic-affixes

Arabic Affixes (prefixes and suffixes) resource
3
star
41

sylajone-arabic-syntax

Sylajone: Arabic syntax Analyzer library
Python
3
star
42

strm-tests

Create Random tests for Stucture Machine 1- first Year MI, Mathematiques & Informatiques in Algerian universities.
Python
3
star
43

examanager

Exam and control Management
3
star
44

openCTT

Open course time tabler
C#
2
star
45

salamscout

A game for scout to learn some skils
Python
1
star
46

mintiq-raspberry

Porting Arabic Speech synthesis on Raspberry
Makefile
1
star
47

nibras-app

Nibras: a technical terms dictionary for Students
Java
1
star
48

hunspell

Hunspell spellchecker
C++
1
star
49

hcla_lexique

Dictionaries of the High Council of the Arabic Language
Python
1
star
50

i3rab-quiz-data

Python
1
star