计算机与现代化 ›› 2022, Vol. 0 ›› Issue (02): 79-84.

• 数据库与数据挖掘 • 上一篇    下一篇

面向工程数据检索的ElasticSearch索引优化策略

  

  1. (1.西南交通大学计算机与人工智能学院,四川成都611731;2.广州地铁设计研究院股份有限公司,广州广东510010)
  • 出版日期:2022-03-31 发布日期:2022-03-31
  • 作者简介:许贤慧(1996—),女,四川德阳人,硕士研究生,研究方向:人工智能与机器学习,E-mail: 1432112528@qq.com; 王淑营(1974—),女,研究员,博士,研究方向:云服务平台架构,自适应演化技术,E-mail: w_shuying@126.com; 曾文驱(1980—),男,研究员,硕士研究生,研究方向:BIM模型,大数据应用,E-mail: zengwenqu@163.com。
  • 基金资助:
    国家重点研发计划项目(2017YFB1201102)

ElasticSearch Index Optimization Strategy for Engineering Data Retrieval

  1. (1. School of Computer and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611731, China; 
    2. Guangzhou Metro Design & Research Institute Co. Ltd., Guangzhou 510010, China)
  • Online:2022-03-31 Published:2022-03-31

摘要: 随着生产制造业的发展,各行业在生产制造的过程中都会产生大量的工程数据,现代工程领域的数据检索需求要求能够通过关键字快速且准确检索出相应的结果,利用ElasticSearch可以实现工程数据的检索,但是其性能方面还有优化的空间。为了解决这个问题,本文对ElasticSearch的底层原理进行深入研究,在ElasticSearch的索引创建、索引分片以及索引段合并方面进行优化。首先对ElasticSearch的分词器进行修改并配置自定义词典,其次提出基于集群节点性能与索引数据量大小的索引分片策略,最后,根据节点性能对索引段合并的时机进行优化。通过基于地铁工程数据的检索进行实验,实验结果表明,改进的方法确实能够提高ElasticSearch的数据写入与查询性能。

关键词: ElasticSearch全文搜索引擎, 索引, 分片, 段合并, 性能优化

Abstract: With the development of manufacturing industry, various industries generate a large amount of engineering data during the manufacturing process, the data retrieval requirements of the modern engineering field requires that the corresponding results can be retrieved quickly and accurately through keywords. The retrieval of engineering data can be achieved by using ElasticSearch, but there is still space for optimization in terms of its performance. In order to solve this problem, based on the in-depth study of the underlying theory of ElasticSearch, the index creation, index fragmentation and index segment merging of ElasticSearch are optimized. Firstly, the ElasticSearch tokenizer is modified and a custom dictionary is configured. Secondly, an index sharding strategy based on the performance of the cluster node and the size of the index data is proposed. Finally, the timing of index segment merging based on node performance is optimized. Through the experiments based on the retrieval of subway engineering data, the experimental results show that the improvement method can indeed improve the data writing and query performance of ElasticSearch.

Key words: ElasticSearch full-text search engine, index, shard, segment merge, performance optimization