The separable word is a special language phenomenon in Chinese language. Because of its complicated grammar, it is regarded as one of the significant difficulties in teaching Chinese as a foreign language. Although there are many scholars proposed suggestions for the teaching of separable words, there is still a lack of systematic teaching methods. According to the teaching research of the separable words, it is necessary to improve the learners' cognition of the structural features of the separate forms and the combination forms of separable words. They suggest marking the features of the words in the textbook, to provide representative example sentences and the context.
In recent years, researchers have gradually notice that corpus is an import tool to observe separable words. However, there are few tools for the identification of the separable words in corpus, so that the Chinese corpus is mostly lacking the message. Many researchers indirectly observe the separable words in corpus by means of enumerating all possible separable words. But this method, on the one hand the process is extremely laborious and time-consuming; on the other hand, it must rely on the manually compiled separable word lists, which should be incomplete. Hence, some researchers proposed automatic identification methods of the separable words. However, these studies are still based on the manually compiled separable word lists. Further, their identification tools were not opened, so, there is still a lack of a reliable separable word identification tool.
The main purpose of this project is to develop a reliable automatic identification and tagging system for the separable words, by in-depth analysis of the separable words and employing the latest machine learning technology.?
Meanwhile, the project will automatically identify and tag all separable words in the COCT corpus by means of the automatic identification tool. The final result of the tagged COCT corpus will be combined with the corpus query system to provide the query and statistics of the separable words for Chinese language researchers, textbook editors and instructors. At the same time, the separable word identification tool will be opened for the Chinese corpus-based researches to share the results of this project.