[中文地理链接数据集]
虽然地理领域早已是链接数据(亦称关联数据)的一个重要组成,中文地理链接数据的稀缺阻碍了中文知识和跨语言知识的集成和共享。在本项目中,我们提供了一个名为Clinga的中文链接数据集,其数据源自最大的中文维基百科。我们手工构建了一个新的地理本体对各种自然地理和人文地理实体进行分类,并自动与现有知识库进行链接。所得到的Clinga数据集现包含50多万中文地理实体,并已公开访问。
While the geographical domain has long been involved as an important part of the Linked Data, the small amount of Chinese linked geographical data impedes the integration and sharing of both Chinese and cross-lingual knowledge. In this project, we contribute to the development of a new Chinese linked geographical dataset named Clinga, by obtaining data from the largest Chinese wiki encyclopedia. We manually design a new geography ontology to categorize a wide range of physical and human geographical entities, and carry out an automatic discovery of links to existing knowledge bases. The resulted Clinga dataset contains over half million Chinese geographical entities and is open access.