• Stars
    star
    144
  • Rank 255,590 (Top 6 %)
  • Language
    Python
  • Created over 14 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Turkish deasciifier in Python based on Deniz Yüret's turkish-mode for Emacs

turkish-deasciifier: Turkish deasciifier

This is a deasciifier Python library and command line utility for Turkish that solves the problem of diacritics restoration (also known as diacritics reconstruction). It takes a Turkish string containing only ASCII characters (that is, without proper diacritics) and replaces the relevant characters with their corresponding Turkish letters.

The web-based, online version of this system is available at:

http://turkceyap.appspot.com/

Keep in mind that diacritics restoration (deasciification) for Turkish doesn't work 100% of the time; it is an active research topic! Still, this library is good enough for many practical purposes, and served many people and projects in the last 10 years.

This system is based on the turkish-mode for GNU Emacs by Prof. Deniz Yüret.

Table of Contents

  1. Installation
  2. Example Python Library Usage
  3. Example CLI (Command Line Interface) Usage
  4. Other Programming Languages and Systems
  5. Advanced Research

Installation

Python 3

For now, the recommended way to install is to use pip and install direcly from the project's GitHub repository:

pip install git+https://github.com/emres/turkish-deasciifier.git

Python 2

Keep in mind that switching to Python 3 is strongly recommended! If you insist on using Python 2.x, you can install using the following command:

pip install Turkish-Deasciifier

Example Python Library Usage

Python 3

from turkish.deasciifier import Deasciifier

my_ascii_turkish_txt = "Opusmegi cagristiran catirtilar."
deasciifier = Deasciifier(my_ascii_turkish_txt)
my_deasciified_turkish_txt = deasciifier.convert_to_turkish()
print(my_deasciified_turkish_txt)

Python 2

Keep in mind that switching to Python 3 is strongly recommended! If you insist on using Python 2.x, you can use the library in the following manner:

from turkish.deasciifier import Deasciifier

my_ascii_turkish_txt = "Opusmegi cagristiran catirtilar."
deasciifier = Deasciifier(my_ascii_turkish_txt.decode("utf-8"))
my_deasciified_turkish_txt = deasciifier.convert_to_turkish()
print my_deasciified_turkish_txt.encode("utf-8")

Example CLI (Command Line Interface) Usage

Python 3

Example tested in a Bash shell:

$ echo "Opusmegi cagristiran catirtilar." | turkish-deasciify
$ cat somefile.txt | turkish-deasciify

Python 2

Keep in mind that switching to Python 3 is strongly recommended!

Example tested in a Bash shell:

$ echo "Opusmegi cagristiran catirtilar." | turkish-deasciify-python2
$ cat somefile.txt | turkish-deasciify-python2

Other Programming Languages and Systems

Advanced Research

For recent advanced scientific research articles, please see the following:

More Repositories

1

youtube2mp3

Download a Youtube music video and extract the sound as an MP3 file
Shell
74
star
2

turkish-mode

Developed by Deniz Yüret, this mode is for people trying to type Turkish documents on a U.S. keyboard using Emacs. The program provides a turkish-mode in which the correct Turkish accents are added to the ascii version of the last word typed each time the user hits space.
Emacs Lisp
31
star
3

turkceyap

Google App Engine version of Turkish Deasciifier
Python
7
star
4

clozefox

A Gap Exercise Generator Plugin with Scalable Intelligence for Mozilla Firefox
JavaScript
5
star
5

turkeyEarthquakeAlerter

A minimalist earthquake alerter for Turkey
Shell
5
star
6

jetpack-turkish-deasciifier

Firefox extension of turkish-deasciifier implemented using Jetpack
JavaScript
3
star
7

linear2tree

Simple web based linguistics utility to convert a sentence in bracketed notation into a tree
Common Lisp
3
star
8

dotemacs

My .emacs file
Emacs Lisp
2
star
9

cpucount

Sample project that shows how to count CPU cores by building a shared library
C
2
star
10

haskellBookExercises

Personal solutions of the exercises in the Haskell Book
Haskell
2
star
11

uaFrequency

A highly organization specific set of scripts for the analysis of word frequencies
Python
2
star
12

webTaskTimer

A Firefox extension (using Jetpack) to show your Internet usage statistics in a very simple way
JavaScript
2
star
13

uech

Yerelde uzakta yerelde çalıştır
Shell
1
star
14

belgamonitor

A smally utility to monitor the Internet capacity already used for Belgacom Internet accounts
Ruby
1
star
15

clozefoxServer

The back-end web service for ClozeFox plug-in
Python
1
star
16

ModernTinyGP

Modernized version of TinyGP Java genetic programming system as described in the book "A Field Guide to Genetic Programming" (2008) by Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee.
Java
1
star
17

idris

Repository for Idris programming self-study
Idris
1
star