Computer and Modernization

Previous Articles     Next Articles

Instance Alignment Algorithm Between Encyclopedia Based on Semi-supervised Co-training

  

  1. (1. Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China; 2. University of Chinese Academy of Sciences, Beijing 100039, China)
  • Received:2017-04-11 Online:2017-12-25 Published:2017-12-26

Abstract: Traditional supervised learning algorithms of instance alignment depend on large amounts of labeled data, and the feature representation methods are not suitable for data in encyclopedia. In view of these issues, a semi-supervised co-training instance alignment method is proposed. Instance alignment is modeled as a constrained binary classification problem. Then multiple features are extracted by fully utilizing different categories of existing information, including instance names, attributes, description texts and the critical discrete values extracted from the texts, such as temporal and numerical values. The features are divided into two relatively independent views, and two models are trained interactively on these two views, in order to learn more about the distribution of synonymous instances from the unlabeled data iteratively. Experimental results between two Chinese encyclopedia datasets show that the proposed method achieves a 84.3% F1-value on aligning instances, and outperforms other comparative methods, proving the effectiveness and applicability of the semi-supervised co-training instance alignment method.

Key words: instance alignment, semi-supervised method, co-training, feature representation, gradient boosting decision tree

CLC Number: