Due to the large number of Chinese characters, recognition and writing Chinese characters are part of the difficulties for Chinese language learning. Further, in recent years, the facts that Chinese handwriting rates have dropped, internet essays contain many Chinese spelling errors and deliberately used homophone words in advertisements, which have unwittingly influenced students' cognition of words and made typos a characteristic of the internet era.
The research of English spelling error automatic detection has been for decades, employing the theoretical basis of natural language processing, English spelling error checking accuracy has reached above 95%. Hence, currently, most search engines, word processors support the function of spell error detection. However, the accuracy of Chinese spelling error checking is not easy to reach even 70%. According to the past researches, the difficulties of Chinese spelling errors can be roughly summed up into two reasons. First, there is no delimiter between Chinese words, which makes it difficult to detect Chinese spelling errors. Second, there are a large number of Chinese characters which makes the error model includes a large amount of probability parameters. The error model need a very large spelling error tagged corpus to estimate the parameters. Unfortunately, manually construction of the Chinese spelling error tagged corpus is extremely costly and the quantity is difficult to expand as well.
The purpose of our project is as follows: First, we plan to collect a large number of Chinese spelling errors from internet automatically to build a large Chinese spelling error tagged corpus. This spelling error tagged corpus can be used as a research and teaching material for Chinese language education as well as to develop typos related applications. Second, the use of Chinese spelling error tagged corpus combined with artificial intelligence technology to develop Chinese spell-checking system. This system helps students to learn independently and helps news media and publishers enhance the quality of their documents in order to achieve a virtuous circle of reduce the Chinese spelling errors.