计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

基于多表关联的关系数据库多空值估计方法

  

  1. 南京航空航天大学计算机科学与技术学院,江苏南京211106
  • 收稿日期:2015-11-19 出版日期:2016-06-16 发布日期:2016-06-17
  • 作者简介:吴非(1991-),男,安徽池州人,南京航空航天大学计算机科学与技术学院硕士研究生,研究方向:数据库和数据挖掘; 毛宇光(1962-),男,江苏南京人,副教授,博士,研究方向:数据库系统及理论,数据挖掘与数据仓库,多值逻辑。
  • 基金资助:
    南京航空航天大学研究生创新基地(实验室)开放基金资助项目(kfjj201460)

A Multi-null Value Estimation Method Based on Multi-table Relationship Information in Relational Database

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Received:2015-11-19 Online:2016-06-16 Published:2016-06-17

摘要: 由于客观世界的复杂性,信息缺失、不确定是普遍存在的。数据库作为表达现实世界的一种工具,使用空值来表达信息缺失的现象。针对关系数据库中的空值问题,提出一种基于多表关联的多空值估计方法。该方法首先以尽可能少地引入误差的原则确定估计每一列空值的顺序;然后对每一列空值先采用本表的信息进行估计,当预测误差大于给定阈值时,根据该表与其他表的关系形式选择不同的模式引入多表信息来提高预测的准确度。实验结果表明该方法估计空值的效果与其他方法相比有较高的准确率。

关键词: 关系数据库, 空值, 模糊聚类, 回归系数

Abstract: Missing information, indefinite information as well as scarcity of information truly exist due to the complexity of the real world. Relational database, as an important tool to express the real world, uses null value to express the missing information. Focusing on estimation of null values in relational databases, the paper proposes a multi-null value estimation method based on multi-table relationship information. First, it arranges the sequence of estimating null values of each attribute based on the principle of minimizing the bias that is brought in. Then it estimates null values of each attribute based on the information of the basic table. After that it brings in multi-table relationship information when the forecast error exceeds a threshold value. The schema to be brought in information depends on the relationship between the basic table and other related tables. In that case the proposed method can improve the accuracy of forecasting null values. The experiment results show that the proposed method is of relatively high accuracy.

Key words: relational database, null value, fuzzy clustering, regression coefficient

中图分类号: