袁平鹏

个人信息

Personal information

教授     博士生导师     硕士生导师

性别:男

在职信息:在职

所在单位:计算机科学与技术学院

学历:研究生(博士)毕业

学位:博士学位

毕业院校:浙江大学

学科:计算机系统结构
曾获荣誉:
2015    湖北省优秀硕士论文指导老师
2013    湖北省优秀硕士论文指导老师
2009    湖北省优秀学士论文指导老师

Learning Chinese Word Embeddings by Discovering Inherent Semantic Relevance in Sub-characters
发布时间:2022-08-15  点击次数:

论文类型:论文集
第一作者:Wei Lu
合写作者:Zhaobo Zhang,Pingpeng Yuan,Hai Jin,Qiangsheng Hua
发表刊物:CIKM'22
收录刊物:EI
所属单位:计算机科学与技术学院
学科门类:工学
一级学科:计算机科学与技术
文献类型:C
发表时间:2022-08-15
摘要:Learning Chinese word embeddings is important in many tasks of Chinese language information processing, such as entity linking, entity extraction and knowledge graph. A Chinese word consists of Chinese characters, which can be decomposed into sub-characters (radical, component, stroke, etc). Similar to roots in English words, sub-characters also indicate the origins and basic semantics of Chinese characters. So, many researches follow the approaches designed for learning embeddings of English words to improve Chinese word embeddings. However, some Chinese characters sharing the same sub-characters have different meanings. Furthermore, with more cultural interaction and the popularization of the Internet and web, many neologisms, such as transliterated loanwords and network terms, are emerging, which are only close to the pronunciation of their characters, but far from their semantics. Here, a tripartite weighted graph is proposed to model the semantic relationship among words, characters and sub-characters, in which the semantic relationship is evaluated according to the Chinese linguistic information. So, the semantic relevance hidden in lower components (sub-characters, characters) can be used to further distinguish the semantics of corresponding higher components (characters, words). Then, the tripartite weighted graph is fed into our Chinese word embedding model insideCC to reveal the semantic relationship among different language components, and learn the embeddings of words. Extensive experimental results on multiple corpora and datasets verify that our proposed methods outperform the state-of-the-art counterparts by a significant margin