Traffic Accident Text Information Extraction Model Based on BERT and BiGRU-CRF Fusion

Abstract

Abstract: Aiming at existing traffic accident text data has difficulties in effectively extracting a large number of key heterogeneous data such as time, place and casualty loss, and the accuracy of traffic accident text information extraction methods based on static word vector deep learning model is low. The BERT (Bidirectional Encoder Representations from Transformers) is used for a dynamic vector mapping of the text characters in order to resolve the problem of ambiguity and context dependence insufficient from the source of data representation. The vectored features of text are extracted by using BiGRU(Bi-Gate Recurrent Unit) and text sequences with high features are output. Based on CRF (Conditional Random Fields), the probabilistic advantage of the global optimal output node is calculated to optimize the feature results of text sequence, and a BERT-BiGRU-CRF fusion model based on dynamic word vector is proposed forextracting the key information of traffic accident text. The comparison experiment shows that the average accuracy of the model in traffic accident text information extraction is 0.952 and F1 is 0.925, and 6.3 percentage points and 7.9 percentage points higher respectively than those of the model based on static word vector Word2Vec.

Key words: deep learning, text information extraction, heterogeneous information, BERT, BiGRU, CRF

FAN Hai-wei, QIN Jia-jie, SUN Huan, ZHANG Li-miao, LU Xin-siyu. Traffic Accident Text Information Extraction Model Based on BERT and BiGRU-CRF Fusion[J]. Computer and Modernization, 2022, 0(05): 10-15.

References

［1］张亚丽. 世界卫生组织发布《2018年全球道路安全现状报告》［J］. 中华灾害救援医学, 2019,7(2):48-49.
［2］曾祥坤,张俊辉,石拓. 基于主题提取模型的交通违法行为文本数据的挖掘［J］. 电子技术应用, 2019,45(6):41-45.
［3］FELDMAN R, DAGAN I. Knowledage discovery in textual databases(KDT)［C］// Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD-95). 1995,95:112-117.
［4］FRANKS B. Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics［M］.Wiley, 2012.
［5］NAYAK R, PIYATRAPOOMI N, WELIGAMAGE J. Application of text mining in analysing road crashes for road asset management［M］// Engineering Asset Lifecycle Management. Springer, 2010:49-58.
［6］GAO L, WU H. Verb-based text mining of road crash report［C］// Transportation Research Board the 92nd Annual Meeting. 2013:174-181.
［7］YOU J R, SANG H B. Analysis of the unstructured traffic report from traffic broadcasting network by adapting the text mining methodology［J］. The Journal of the Korea Institute of Intelligent Transport Systems, 2018,17(3):87-97.
［8］GOPALAKRISHNAN K, KHAITAN S K. Text mining transportation research grant big data: Knowledge extraction and predictive modeling using fast neural nets［J］. International Journal for Traffic and Transport Engineering (IJTTE), 2017,7(3):354-367.
［9］GASMI H, LAVAL J, BOURAS A. Information extraction of cybersecurity concepts: An LSTM approach［J］. Applied Sciences, 2019,9(19):39-45.
［10］GRAVES A, JAITLY N, MOHAMEDA R. Hybrid speech recognition with deep bidirectional LSTM［C］// 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 2013. DOI:10.1109/ASRU.2013.6707742.
［11］赵瑞晨. 基于深度学习的铁路设备事故数据挖掘与分析［D］. 北京:北京交通大学, 2020.
［12］NGUYEN N, GUO Y. Comparisons of sequence labeling algorithms and extensions［C］// International Conference on Machine Learning. ACM, 2007:681-688.
［13］余同瑞,金冉,韩晓臻,等. 自然语言处理预训练模型的研究综述［J］. 计算机工程用, 2020,56(23):12-18.
［14］PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018:253-261.
［15］VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need［C］// Proceedings of the 31st International Conference on NeuralInformation Processing Systems. 2017:6000-6010.
［16］北京创想安科科技有限公司. 安全管理网［EB/OL］. ［2021-09-21］. http://www.safehoo.com/NewsSpecial/Traffic/.
［17］贾熹滨,叶颖婕,陈军成. 基于关联规则的交通事故影响因素的挖掘［J］. 计算机科学, 2018,45(S1):447-452.
［18］张振宇. 基于自然语言理解的安全事故信息处理系统的设计与实现［D］. 北京:华北电力大学(北京), 2017.
［19］PENNINGTON J, SOCHER R, MANNING C D. Glove: Global vectors for word representation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). 2014:1532-1543.
［20］ABEGAZ T, BERHANE Y, WORKUA, et al. Effects of excessive speeding and falling asleep while driving on crash injury severity in Ethiopia: A generalized ordered logit model analysis［J］. Accident Analysis & Prevention, 2014,71:15-21.
［21］WU Q, CHEN F, ZHANG G H, et al. Mixed logit model-based driver injury severity investigations in single- and multi-vehicle crashes on rural two-lane highways［J］. Accident Analysis & Prevention, 2014,72:105-115.
［22］PENG Y Y, BOYLE L N. Commercial driver factors in run-off-road crashes［J］. Transportation Research Record Journal of the Transportation Research Board, 2012,2281:128-132.
［23］DELEN D, SHARDA R, BESSONOV M. Identifying significant predictors of injury severity in traffic accidents using a series of artificial neural networks［J］. Ergonomics, 2001,44(1):107-117.
［24］蔡娜. 女性驾驶员道路交通事故影响因素分析［D］. 北京:北京工业大学, 2010.
［25］韦瑜佳,陈梦甜. 基于网络节点重要性排序的较大交通事故影响因素分析［J］. 科技创新与应用, 2019(21):44-47.
［26］黄合来,周汉楚,潘震宇,等. 一种文本挖掘应用于道路交通事故数据处理的方法［P］. 湖南省：CN110134963A, 2019-08-16.
［27］王莉. 基于知识图谱的城市轨道交通建设安全管理智能知识支持研究［D］. 徐州:中国矿业大学, 2019.

[1]	ZHENG Jiuchao, ZHAO Xinyuan. Entity Linking Method Based on Topics and Description Information [J]. Computer and Modernization, 2024, 0(12): 10-14.
[2]	QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism [J]. Computer and Modernization, 2024, 0(11): 106-112.
[3]	CHEN Kai1, LI Yiting1, 2, QUAN Huafeng1. A River Discarded Bottles Detection Method Based on Improved YOLOv8 [J]. Computer and Modernization, 2024, 0(11): 113-120.
[4]	YANG Jun1, HU Wei1, ZHU Wenfu2. Visual SLAM Loop Closure Detection Algorithm Based on Improved MobileNetV3 [J]. Computer and Modernization, 2024, 0(10): 21-26.
[5]	WANG Yingying, HAO Xiao. Fine-grained Image Classification Based on Res2Net and Recursive Gated Convolution [J]. Computer and Modernization, 2024, 0(10): 74-79.
[6]	SHI Xingyu1, LI Qiang2, ZHUANG Li3, LIANG Yi3, WANG Qiulin3, CHEN Kai3, WU Chenzhou3, CHANG Sheng1. Object Detection Models Distillation Technique for Industrial Deployment [J]. Computer and Modernization, 2024, 0(10): 93-99.
[7]	MA Yu, YANG Yong, REN Ge, Palidan Tuerxun. Automated Essay Scoring Method Based on GCN and Fine Tuned BERT [J]. Computer and Modernization, 2024, 0(09): 33-37.
[8]	ZHAO Dun1, SHE Xuebing2, WU Changxing3. Named Entity Recognition in Field of Party Building Based on BERT-BiLSTM-CRF [J]. Computer and Modernization, 2024, 0(09): 91-94.
[9]	ZHANG Ze1, ZHANG Jianquan2, 3, ZHOU Guopeng2, 3. Camera Module Defect Detection Based on Improved YOLOv8s [J]. Computer and Modernization, 2024, 0(09): 107-113.
[10]	CHENG Yazi1, LEI Liang1, 2, CHEN Han1, ZHAO Yiran1. Multi-scale Depth Fusion Monocular Depth Estimation Based on Transposed Attention [J]. Computer and Modernization, 2024, 0(09): 121-126.
[11]	CHENG Meng, LI Hao. Improved Deciduous Tree Nest Detection Method Based on YOLOv5s [J]. Computer and Modernization, 2024, 0(08): 24-29.
[12]	WANG Mengxi, LI Jun. Review of Fall Detection Technologies for Elderly [J]. Computer and Modernization, 2024, 0(08): 30-36.
[13]	SHI Xianwei1, FAN Xin2. Semantic Segmentation of Video Frame Scene Based on Lightweight [J]. Computer and Modernization, 2024, 0(08): 49-53.
[14]	XU Xin’ai, LI Gang. An Image Generation Method of Classroom Expression Images [J]. Computer and Modernization, 2024, 0(08): 88-91.
[15]	GAO Shuaipeng, WANG Yifan. Survey on Group-level Emotion Recognition in Images [J]. Computer and Modernization, 2024, 0(08): 98-107.