[1] 罗超然,王纯,廖建新. 新闻网页内容抽取模块的设计与实现[J]. 电信技术, 2014(5):85-87.
[2] Chang C H, Kayed M, Girgis M R, et al. A survey of Web information extraction systems[J]. IEEE Transactions on Knowledge and Data Engineering, 2006,18(10):1411-1428.
[3] Grishman R. Information Extraction: Capabilities and Challenges[Z]. Notes Prepared for the 2012 International Winter School in Language and Speech Technologies, Rovira i Virgili University, Tarragona, Spain, 2012.
[4] Etzioni O, Banko M, Soderland S, et al. Open information extraction from the Web[J]. Communications of the ACM, 2008,51(12):68-74.
[5] Pol K, Patil N, Patankar S, et al. A survey on Web content mining and extraction of structured and semistructured data[C]// Proceedings of the 1st IEEE International Conference on Emerging Trends in Engineering and Technology. 2008:543-546.
[6] Crescenzi V, Mecca G, Merialdo P. RoadRunner: Towards automatic data extraction from large Web sites[C]// Proceedings of the 27th International Conference on Very Large Data Bases. 2001:109-118.
[7] Crescenzi V, Mecca G, Merialdo P. Automatic Web information extraction in the RoadRunner system[M]// Conceptual Modeling for New Information Systems Technologies. Springer Berlin Heidelberg, 2001:264-277.
[8] Crescenzi V, Mecca G, Merialdo P. Wrapping-oriented classification of Web pages[C]// Proceedings of the 2002 ACM Symposium on Applied Computing. 2002:1108-1112.
[9] Baroni M, Chantree F, Kilgarriff A, et al. Cleaneval: A competition for cleaning Web pages[C]// Proceedings of the 2008 International Conference on Language Resources and Evaluation. 2008:638-643.
[10] 李文奇,张忠能. 页面包装器自动生成的改进算法[J]. 计算机工程与应用, 2004,40(22):113-115.
[11] Wu Gongqing, Li Li, Hu Xuegang, et al. Web news extraction via path ratios[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013:2059-2068.
[12] 吴共庆. 基于标签路径特征的Web新闻内容抽取研究[D]. 合肥:合肥工业大学, 2012.
[13] 王中锋,王志海. 基于条件对数似然函数导数的贝叶斯网络分类器优化算法[J]. 计算机学报, 2012,35(2):364-374.
[14] 宫秀军. 贝叶斯学习理论及其应用研究[D]. 北京:中国科学院研究生院, 2002. |