华中科技大学主页平台管理系统袁平鹏--中文主页-- Learning Chinese Word Embeddings by Discovering Inherent Semantic Relevance in Sub-characters

袁平鹏

同专业博导同专业硕导

个人信息

Personal information

教授博士生导师硕士生导师

性别：男

在职信息：在职

所在单位：计算机科学与技术学院

学历：研究生(博士)毕业

学位：工学博士学位

毕业院校：浙江大学

学科：计算机系统结构
曾获荣誉：
2015    湖北省优秀硕士论文指导老师
2013    湖北省优秀硕士论文指导老师
2009    湖北省优秀学士论文指导老师

论文成果

中文主页 - 科学研究 - 论文成果

Learning Chinese Word Embeddings by Discovering Inherent Semantic Relevance in Sub-characters

发布时间：2022-08-15 点击次数：

论文类型：论文集
第一作者：Wei Lu
合写作者：Zhaobo Zhang,Pingpeng Yuan,Hai Jin,Qiangsheng Hua
发表刊物：CIKM'22
收录刊物：EI
所属单位：计算机科学与技术学院
学科门类：工学
一级学科：计算机科学与技术
文献类型：C
发表时间：2022-08-15
摘要：Learning Chinese word embeddings is important in many tasks of Chinese language information processing, such as entity linking, entity extraction and knowledge graph. A Chinese word consists of Chinese characters, which can be decomposed into sub-characters (radical, component, stroke, etc). Similar to roots in English words, sub-characters also indicate the origins and basic semantics of Chinese characters. So, many researches follow the approaches designed for learning embeddings of English words to improve Chinese word embeddings. However, some Chinese characters sharing the same sub-characters have different meanings. Furthermore, with more cultural interaction and the popularization of the Internet and web, many neologisms, such as transliterated loanwords and network terms, are emerging, which are only close to the pronunciation of their characters, but far from their semantics. Here, a tripartite weighted graph is proposed to model the semantic relationship among words, characters and sub-characters, in which the semantic relationship is evaluated according to the Chinese linguistic information. So, the semantic relevance hidden in lower components (sub-characters, characters) can be used to further distinguish the semantics of corresponding higher components (characters, words). Then, the tripartite weighted graph is fed into our Chinese word embedding model insideCC to reveal the semantic relationship among different language components, and learn the embeddings of words. Extensive experimental results on multiple corpora and datasets verify that our proposed methods outperform the state-of-the-art counterparts by a significant margin

首页

科学研究

教学研究

获奖信息

招生信息

学生信息

我的相册

教师博客

更多

袁平鹏

个人信息

论文成果