• Stars
    star
    220
  • Rank 174,477 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 6 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

State-of-the-Art Language Modeling and Text Classification in Hindi Language

hindi2vec

State-of-the-Art Language Modeling and Text Classification in Hindi Language

Results

We achieved State of the Art Perplexity = 46.81 for Hindi compared to 40.68 for English (lower is better)

  • To the best of my knowledge on September 18, 2018

Update: nlp-for-hindi uses sentencepiece instead of the word based spacCy tokenizer which I use. On those tokens, the measured perplexity for that LM is ~35. I encourage you to check that work out as well.

Downloads

TODO

  • Language modeling based on wikipedia dump
  • Release Language Models: Hindi Language Model
  • Create Text classification Datasets: BBC Hindi
  • Benchmark text classification with FastText

Idea Dump

  • Change the custom head to be used for transliteration instead of classification, Hindi script (Devnagri) to English script (Roman)
  • MTL tasks for training and inference using custom heads
  • Text to Speech - using datasets from news recordings or Hindi subtitles of dubbed movies

FastAI Installation

This version of the notebook uses fastai lib's v0.7, used in their Part 2 v2 course in Summer 2018. The best way to install it via conda as mentioned here

Special thanks to Jeremy, Rachel and other contributors to fastai. This work is a reproduction of their work in English to Hindi. Thanks to @cstorm125 for thai2vec which inspired this work.

More Repositories

1

awesome-project-ideas

Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
7,470
star
2

NLP_Quickbook

NLP in Python with Deep Learning
Jupyter Notebook
560
star
3

best-of-jupyter

Jupyter Tips, Tricks, Best Practices with Sample Code for Productivity Boost
420
star
4

pytorch-web-deploy

Simple, fast web deployment for your PyTorch models
Python
70
star
5

agentai

Text to Python Objects via a LLM Function Call
Python
54
star
6

coronaIndia

Experiments & NLP Deployments for CoronaVirus Related Work
Jupyter Notebook
34
star
7

Hinglish

Hinglish Text Classification
Jupyter Notebook
30
star
8

breakoutlist-india

High potential opportunities for ambitious engineers, designers, data people and future founders. The best teams to join.
27
star
9

llama2demo

Python
14
star
10

Twitter-Geographical-Sentiment-Analysis

Finds the Happiest US and Indian State based on Sentimental Analysis of Twitter Data
Python
13
star
11

keras-practice

Notebooks covering Intro to CNN, Transfer Learning using VGG16
Jupyter Notebook
12
star
12

Genetic-Algorithm-Self-Study-Notes

Notes, Reading Sources and Bibliography on Genetic Algorithms
8
star
13

qdrant_tools

Python Tools to use with the Qdrant Python Client
Jupyter Notebook
7
star
14

nirantk.github.io

Jupyter Notebook
6
star
15

Text-Summarization

C
4
star
16

awesome-vectordb

Everything you need to decide and work with VectorDBs
Python
4
star
17

knee-xrays

Exploratory Repository
Jupyter Notebook
3
star
18

fitz-wrapper

CLI Utilities for PDF to Image Conversion, built with Py3
Python
3
star
19

OnDeckMLChallenge

Jupyter Notebook
3
star
20

fastvector

Python
3
star
21

DSA-BITS-Masti

Data Structures and Algorithms at BITS Pilani
C
3
star
22

experiments

Repository for Experimental Code
HTML
2
star
23

quickstart

Shell
2
star
24

comehomeandbuild

HTML
2
star
25

MITx-Analytics-Edge-Coursework

Code, Lecture Slides and Data from edx.org/course/analytics-edge-mitx-15-071x-0
HTML
2
star
26

cohere-learn

Utils which wrap around Cohere API: FewShotClassify and more coming soon
Python
1
star
27

Noor

Bringing Light to What We are Taught :)
HTML
1
star
28

interview_practice

Archive
C++
1
star
29

Aditi

1
star
30

latest-news-ncert

Link educational topics to latest NEWS
Python
1
star
31

julie

Julie is a blogging assistant and linter for AI Hackers wanting to make their work more accessible
Python
1
star
32

qdrant-course

Jupyter Notebook
1
star
33

CovidSeer

Complimentary Repo for Publishing Public facing Covid India work
Jupyter Notebook
1
star
34

go-demo

Demo code for the Golang lecture by @theonewolf
Go
1
star
35

bq

Binary Quantization in Numpy
1
star