Discover Momnadar1/authorshipAttributionInUrduPoetry Open Source

Stars
1
Language
Jupyter Notebook
Created over 3 years ago
Updated over 2 years ago

Momnadar1

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

The work on the Urdu language dataset. The dataset is collected from five nonsocial websites e.g. Urdu Library. After dataset collection from different sources like Urdu Library, Iqbal, Rekhta, and so on, dataset preprocessing is done both manually and programmatically. Pre-processing is done by removing characters like punctuation marks, comma, semicolons, colon, and so on. After data pre-processing is completed, machine learning algorithms and neural networks are trained and tested on this dataset. Machine learning libraries included are gensim and sklearn. Algorithms used are Support Vector Machine, Multinomial Naïve Bayes, Multilayer Perceptron (MLP) and pre-trained model word2vec. After comparison of these algorithms, we got the highest accuracy of 82.85% and precision of 83.0% of Support Vector Machine. The most focusing part while doing the thesis was to increase our dataset that is unique couplets of three unique poets and increase the accuracy of our trained models.

frontend-data-science

TypeScript

backend_data_science

Jupyter Notebook

Momnadar1/authorshipAttributionInUrduPoetry

Momnadar1

Reviews

Repository Details

More Repositories