计算机与现代化

• 应用与开发 • 上一篇    下一篇

并行数据挖掘方法在水利普查成果分析中的应用

  

  1. (河海大学计算机与信息学院,江苏 南京 210098)
  • 收稿日期:2015-05-29 出版日期:2015-10-10 发布日期:2015-10-10
  • 作者简介:丁伟(1988-),男,安徽安庆人,河海大学计算机与信息学院硕士研究生,研究方向:云计算与数据挖掘; 万定生(1963-),男,江苏溧阳人,教授,研究方向:数据管理与数据挖掘; 樊龙(1988-),男,河南南阳人,硕士研究生,研究方向:云计算与数据挖掘。
  • 基金资助:
    水利部公益性行业科研专项(201501022)

Parallel Data Mining Methods in Analysis of Results of Water Census

  1. (College of Computer and Information, Hohai University, Nanjing 210098, China)
  • Received:2015-05-29 Online:2015-10-10 Published:2015-10-10

摘要: 随着第一次全国水利普查的结束,海量的水利普查数据随之产生。将云计算技术应用在水利普查数据挖掘领域,可以更加快速、高效和低成本地为水利决策提供科学、合理的支持。本文提出基于Map/Reduce的水利普查数据决策树分类挖掘方法MRC4.5算法,并将该算法应用于全国水利普查地下水取水井数据挖掘中。实验结果表明,与传统的C4.5算法相比,MRC4.5算法在处理大规模数据集时具有更高的执行效率和良好的加速比。

关键词: 水利普查, 数据挖掘, 决策树, C4.5算法, Map/Reduce技术

Abstract: With the end of first nation water census, massive water census data have been generated. To use the cloud computing technology in the area of water census data mining can provide scientific, reasonable supports for the decision of water conservancy in a quick, efficient and economical way. This paper proposes water census data decision tree classified mining algorithm MRC4.5 based on Map/Reduce and water census data of groundwater wells is applied to data mining with the algorithm. The experimental results indicate that compared with the traditional algorithm C4.5, MRC4.5 algorithm has higher efficiency and good speedup when dealing with massive data sets execution.

Key words: water census, data mining, decision-making tree, C4.5 algorithm, Map/Reduce

中图分类号: