Algorithm-for-Bengali-Error-Dataset-Generation
We present a unique algorithm for Bangla that can be used to generate error or misspelled Bangla words, like we often do while writing in Bangla through English keyboard. The purpose of making the error Bangla words or in other words, the Bangla error word dataset, is that, it can be used for evaluating performance of various existing Bangla spell checkers along with paragraph correction in order to improve the quality of suggestions for misspelled words. We propose this algorithm for Bangla, regarding the various complex context-sensitive rules of Bangla. In our work, we made a cluster list and realize the similar pronounced letters in Bangla and replace those letters for making an error word. Our goal is to make the errors in writing as human as possible. We are striving for finding out the mistake patterns of human while writing Bangla, using English keyboard with QWERTY layout and use that subject for our work. We also test various existing Bangla spell checkers with our dataset and found a variety of results. The results are shown a table in the result section.