Computer and Modernization

High Illumination Visible Image Generation Based on Generative Adversarial Networks

ZHUANG Wen-hua, TANG Xiao-gang, ZHANG Bin-quan, YUAN Guang-ming

2023, 0(01): 1-6.

Asbtract ( 523 )

PDF (7034KB) ( 158 )

References | Related Articles | Metrics

To solve the problem of low accuracy of target detection under low illumination conditions at night， this paper proposes a generative adversarial network-based algorithm for high illumination visible light image generation. To improve the ability of the generator to extract features， a CBAM attention module is introduced in the converter module； To avoid the noise interference of artifacts in the generated images， the decoder of the generator is changed from the deconvolution method to the up-sampling method of nearest neighbour interpolation plus convolution layer； to improve the stability of the network training， the adversarial loss function is replaced from the cross-entropy function to the least-squares function. The generated visible images have the advantages of spectral information， rich detail information and good visibility enhancement compared with infrared images and night visible images， which can effectively obtain information about the target and scene. We verified the effectiveness of the method by image generation metrics and target detection metrics respectively， in which the mAP obtained from the test on the generated visible image improved by 11.7 percentage points and 30.2 percentage points respectively compared to the infrared image and the real visible image， which can effectively improve the detection accuracy and anti-interference capability of nighttime targets.

Enhanced Image Caption Based on Improved Transformer_decoder

LIN Zhen-xian, QU Jia-xin, LUO Liang

2023, 0(01): 7-12.

Asbtract ( 598 )

PDF (1421KB) ( 137 )

References | Related Articles | Metrics

Transformer's decoder model（Transformer_decoder）has been widely used in image caption tasks. Self Attention captures fine-grained features to achieve deeper image understanding. This article makes two improvements to the Self Attention, including Vision-Boosted Attention（VBA）and Relative-Position Attention（RPA）. Vision-Boosted Attention adds a VBA layer to Transformer_decoder, and introduces visual features as auxiliary information into the attention model, which can be used to guide the decoder model to generate more matching description semantics with the image content. On the basis of Self Attention, Relative-Position Attention introduces trainable relative position parameters to add the relative position relationship between words to the input sequence. Based on COCO2014 experiments, the results show that the two attention mechanisms of VBA and RPA have improved image caption tasks to a certain extent， and the decoder model combining the two attention mechanisms has better semantic expression effects.

Method of Fish Image Expansion Based on NS-StyleGAN2 Network

LI Hai-tao, HU Ze-tao, ZHANG Jun-hu

2023, 0(01): 13-17.

Asbtract ( 350 )

PDF (2725KB) ( 332 )

References | Related Articles | Metrics

Category imbalance often occurs in the field of image multi-classification， which has a negative impact on the learning and training of the classification model. It can be effectively solved by expanding the category with fewer samples. Generative adversarial network， as a newly developed neural network in recent years， can output generated samples that are very similar to real samples when trained by real image samples. According to this characteristic， this paper designs a noise-suppressed second generation style generation adversarial network 2（NS-StyleGAN2） by combining the design philosophy of the second generation style generation adversarial network （StyleGAN2） and the characteristics of fish image. NS-StyleGAN2 removes the noise input of the low-resolution layer in the StyleGAN2’s synthetic network， so as to suppress the noise weight of the low-resolution layer and make the StyleGAN2-generated samples’ detail features more close to the real samples’. 202 images of silver carp are used for training. The method proposed in this paper is superior to DCGAN， WGAN and StyleGAN2 in inception score， Frechet inception distance and kernel inception distance， which shows this method can be used for image expansion effectively.

Classification Method of Small Sample Apple Leaves Based on SE-ResNeXt

BAI Xu-guang, LIU Cheng-zhong, HAN Jun-ying, GAO Jia-meng, CHEN Jun-kang

2023, 0(01): 18-23.

Asbtract ( 330 )

PDF (2370KB) ( 115 )

References | Related Articles | Metrics

Based on the existing deep learning technology， this study adopts the variant SE-ResNeXt based on residual neural network to construct a convolutional neural network model wich can automatically classify apple varieties and train the model based on transfer learning method. The data is taken from 20 types apple leaves images taken at the Apple Industry Base in Jingning County， Gansu Province. There are 50 pictures of each type of apple leaves， 1000 pictures in total. On this dataset， six models， likes ResNet50，ResNet101，SE-ResNet50，SE-ResNet101，SE-ResNeXt50 and SE-ResNeXt101， are carried out comparison experiments. The results show that SE-ResNeXt101 outperforms other comparison models， with the highest accuracy rate of 97.5% and the inference time of single image only 0.125 s. The method proposed in this paper provides a mean for identifying apple varieties efficiently and accurately in the future， and can be a great help for assisting agricultural research and apple planting.

YANG Xu-zhao， LEI Zhi-yong， WANG Jiao-jiao

Non-saliency Feature Image Registration Algorithm Based on Local Overlapping Region

2023, 0(01): 24-29.

Asbtract ( 346 )

PDF (7605KB) ( 182 )

References | Related Articles | Metrics

Aiming at the problems of rapidity and accuracy in the registration process of scenes with non-saliency features such as desert and Gobi， this paper proposes a non-saliency feature image registration method based on local overlapping region. Firstly，the image to be registered is preprocessed by using the image mark to enhance its features， then the overlapping area of multiple images to be registered is budgeted through multi camera three-dimensional projection， and the overlapping area is segmented by using the image mask and the image segmentation technology. Finally， the overlapping area is registered by ORB+GMS（Oriented Brief-Grid-based Motion Statistics for Fast）fusion algorithm to complete the registration of multiple images. The registration based on the image overlapping region avoids the disadvantage of low accuracy in the overall registration of images with non-saliency features. Due to the local registration， it has a faster registration speed than the global registration. Compared the traditional registration method with the improved registration method proposed in this paper， the experimental results show that the registration accuracy of the improved method proposed in this paper is improved by 28% on the basis of the traditional registration method. The algorithm has higher robustness and real-time performance.

Chinese Short Text Entity Disambiguation Based on Multi-feature Factor Fusion

WANG Yong-di, LEI Gang

2023, 0(01): 30-36.

Asbtract ( 344 )

PDF (1375KB) ( 138 )

References | Related Articles | Metrics

Most of the existing Chinese short text entity disambiguation models only consider the semantic matching features between the mention context and the description of the candidate entity in the disambiguation process， and do not consider the effective disambiguation features such as the co-occurrence features between the candidate entities in the same query text and the similarity features between the mention type of the candidate entities and entities. To solve these problems, this paper first uses the pre-training language model to obtain the semantic matching features of mention context and candidate entity description. Then, co-occurrence feature and type feature are proposed for entity embedding and mention type embedding. Finally, by fusing the above features, the entity disambiguation model based on multi feature factors is realized. The experimental results show that the co-occurrence features and type features proposed in this paper are feasible and effective in entity disambiguation, and the entity disambiguation method based on multi-feature factor fusion proposed in this paper can achieve better disambiguation effect.

SAR Ship Classification Based on Multi-convolutional Neural Network Fusion

ZHANG Xiao, LYU Ji-yu, ZHAO Shuang, WU Yu-lun, WANG Chun-le

2023, 0(01): 37-42.

Asbtract ( 331 )

PDF (3992KB) ( 85 )

References | Related Articles | Metrics

The accuracy of small ship classification in Syntactic Aperture Radar （SAR） images is low. To solve the problem， a classification approach based on the weighted fusion of different convolutional neural network results is proposed. Firstly， a high-resolution convolutional neural network is constructed to conduct multi-scale feature fusion， fine-tuning model and label smoothing are introduced to reduce the problem of training over-fitting. Then three single classification models are trained using the high-resolution network， MobileNetv2 network and SqueezeNet network. Finally， the results of three classification models are fused by weighted voting. The fusion method is used to carry out classification experiment on GF-3 ship dataset， the results obtained are: precision 94.83%， recall rate 95.43%， F1 score 0.9513. Experimental results show that the algorithm model proposed in this paper has better classification ability， which verifies its effectiveness in high-resolution SAR image ship classification.

Tibetan Medical Entity Recognition Based on Tibetan BERT

ZHU Ya-jun, Yong Tso, Nyima Tashi,

2023, 0(01): 43-48.

Asbtract ( 363 )

PDF (1464KB) ( 125 )

References | Related Articles | Metrics

Tibetan medicine character embedding is of great significance for Tibetan medical entity recognition， but there is a lack of high-quality Tibetan language model. Combined with Tibetan structural characteristics， the BERT model based on syllable is trained by using ordinary Tibetan news text, and a BERT-BiLSTM-CRF model is built by using the Tibetan BERT model. Firstly， the model uses Tibetan BERT model to learn the character embedding of Tibetan medicine text, and enhances the ability of character embedding to express Tibetan characters and their context information. And then， the BiLSTM layer is used to further extract the dependencies between characters in Tibetan medicine text. Finally， the CRF layer is used to strengthen the legitimacy of the label sequence. The experimental results show that using Tibetan BERT model to initialize character embedding is helpful to improve the recognition of Tibetan medical entity， and the F1 value reaches 96.18%.

Review of Relation Extraction Based on Pre-training Language Model

WANG Hao-chang, LIU Ru-yi

2023, 0(01): 49-57.

Asbtract ( 726 )

PDF (1190KB) ( 318 )

References | Related Articles | Metrics

In recent years, with the continuous innovation of deep learning technology， the application of pre-training models in natural language processing has become more and more extensive, and relation extraction is no longer purely dependent on the traditional pipeline method. The development of pre-training language models has greatly promoted the related research of relation extraction， and has surpassed traditional methods in many fields. First， this paper briefly introduces the development of relationship extraction and classic pre-training models；secondly, summarizes the current commonly used data sets and evaluation methods, and analyzes the performance of the model on each data set; finally， discusses the development challenges of relationship extraction and future research trends.

Identification of Main Streamline of Intersection Group Based on Coordinated Influence Flow

ZHANG Jian-xu, WU Cheng-feng

2023, 0(01): 58-62.

Asbtract ( 246 )

PDF (1718KB) ( 81 )

References | Related Articles | Metrics

In order to determine the main streamline of intersection group and better coordinate the control of intersection group， an algorithm for identifying the main streamline of intersection group based on coordination influence flow is established. Firstly， the flow components affected by route coordination are analyzed. Through the statistical analysis of the trajectory of floating vehicles within the intersection group， the alternative main streamline is determined and the number of floating vehicles affected by streamline coordination is calculated；Then， the coordinated impact flow of the alternative main streamline is estimated by using the intersection flow and steering ratio； Finally， according to of the number of floating vehicles affected by the statistics of coordination of the alternative main streamline and the estimated coordination impact flow，we calculate the streamline weight index， and determine the main streamline of the intersection group through the weight index. Taking an intersection group divided by Yanta District of Xi’an as an example， the main streamline of the intersection group is identified for verifying the effect of the algorithm. The experimental results show that the algorithm can use the floating vehicle data and flow data to identify the main streamline of the intersection group in real time， and provide support for the signal coordination control of the intersection group.

Speech Emotion Recognition of Hybrid Multi-scale Convolution Combined with Dual-layer LSTM

LIANG Ke-jin, ZHANG Hai-jun, LIU Ya-qing, ZHANG Yu, WANG Yue-yang

2023, 0(01): 63-68.

Asbtract ( 452 )

PDF (1137KB) ( 183 )

References | Related Articles | Metrics

Aiming at the deficiencies of deep learning algorithms in the extraction of speech emotion features and the low recognition accuracy， the effective emotion features in the speech data are extracted, and the features are spliced and merged at multiple scales to construct speech emotion features and improve the deep learning model’s performance. Traditional recurrent neural networks cannot solve the long-term dependence problem of speech emotion recognition. The dual-layer LSTM model is used to improve the effect of speech emotion recognition， and a model combining hybrid multi-scale convolution and dual-layer LSTM model is proposed. Experimental results show that under the Chinese Emotion Database（CASIA） of the Institute of Automation of the Chinese Academy of Sciences and the Berlin Emotion Open Data Set（Emo-DB）， compared with other emotion recognition models， the speech emotion recognition model proposed in this article has a great improvement in accuracy.

Stock Movement Prediction Algorithm Based on Deep Learning

ZHOU Run-jia

2023, 0(01): 69-73.

Asbtract ( 973 )

PDF (1263KB) ( 320 )

References | Related Articles | Metrics

To improve the accuracy of stock movement prediction， this paper proposes a stock movement prediction algorithm AACL（Adversarial Attentive CNN-LSTM）which utilizes CNN and LSTM for feature extraction and combines attention mechanism and adversarial training. The algorithm uses CNN to extract the overall trend information of the stock， LSTM to extract the short-term fluctuation information of the stock， and connects multiple stocks through the attention mechanism to capture the rising and falling relationship between stocks. The algorithm also introduces adversarial training to improve the robustness of the algorithm by interfering the data. To verify the effectiveness of the AACL algorithm， experiments are carried out on three data sets KDD17， ACL18， and China50， and compared with existing algorithms. Experiments results show that the algorithm proposed in this paper can obtain the best result.

Robust Defense Method for Graph Convolutional Neural Network

QIAN Xiao-zhao, WANG Peng

2023, 0(01): 74-80.

Asbtract ( 398 )

PDF (2403KB) ( 150 )

References | Related Articles | Metrics

Recently， graph convolutional neural networks （GCNs） have been increasingly mature in research and application. Although its performance has reached a high level， but GCNs have poor model robustness when subjected to adversarial attacks. Most of the existing defense methods are based on heuristic empirical algorithms and do not consider the reasons of the structural vulnerability of GCNs. Nowadays， researches have shown that GCNs are vulnerable due to non-robust aggregation functions. This paper analyzes the robustness of the winsorised mean function and the mean aggregation function in terms of the breakdown point and the impact function resistance. The winsorised mean has a higher breakdown point compared to the mean function. The influence function of the winsorised mean is bounded in jumps and can resistant to outliers， while the influence function of the mean function is unbounded and very sensitive to outliers. An improved robust defense method， WinsorisedGCN， is then proposed based on the GCNs framework by replacing the aggregation function in the graph convolution operator with a more robust winsorised mean. Finally， this paper uses the Nettack counter-attack method to study and analyze the robustness of the proposed model under different perturbation budgets， and the model performance is evaluated by accuracy and classification margin evaluation metrics. The experimental results demonstrate that the proposed defense scheme can effectively improve the robustness of the model under adversarial attacks while ensuring the model accuracy compared to other benchmark models.

Deep Learning Classification Algorithm for Electrocardiogram Signal Based on Adversarial Domain Adaptation

JIANG Si-qing, CHEN Xiao-Jun, GAO Hao-Jun, HE Jia-jin, WU Jian,

2023, 0(01): 81-87.

Asbtract ( 333 )

PDF (2339KB) ( 120 )

References | Related Articles | Metrics

Cardiovascular disease has become one of the major diseases threatening human life and health. Electrocardiogram （ECG）is a common clinical diagnosis of the important methods for arrhythmia and is widely used in health monitoring of patients with heart disease. As a result of the existing medical resources， the use of artificial intelligence method to analysis and diagnosis in order to overcome these limitations of increasingly urgent demand， the use of automatic detection and classification methods in clinical practice may help doctors make accurate and rapid diagnosis of diseases. In this paper， eight common arrhythmia types are classified， and a deep learning classification method of ECG signals based on adaptive antagonism domain is proposed， which solves and improves the problems of insufficient training sample labeling and data distribution differences caused by individual differences. This method consists of three modules: Multi-scale feature extraction module A， domain recognition module B and multi-classifier module C. Module A is composed of two groups of different parallel convolution blocks， which increases the width of feature extraction. Module B is composed of three convolution blocks and a fully connected layer to fully extract shallow features. In module C， time features and deep learning extracted features are connected in series on the fully connected layer to enhance feature diversity. The experimental results show that the accuracy， sensitivity and positive predictive value of this method can reach 98.8%， 97.9% and 98.1%， and the proposed model can help doctors accurately detect different types of arrhythmias in routine electrocardiogram.

Extended Isolated Forest Anomaly Detection Algorithm Based on Simulated Annealing

WANG Shi-yu, XIAO Li-dong, YAN Xin-chun, YING Wen-hao

2023, 0(01): 88-94.

Asbtract ( 602 )

PDF (1393KB) ( 149 )

References | Related Articles | Metrics

Extended Isolation Forest （EIF） effectively solves the problem that Isolation Forest（iForest） is not sensitive to local abnormal points， but EIF replaces the isolated condition of axis-parallel with a hyperplane with random slope， which causes the algorithm model to lose part of the generalization ability， and increases time cost due to a large number of vector dot multiplication operations. In response to the above situation， an Extended Isolation Forest based on Simulated Annealing （SA-EIF） is proposed. The algorithm calculates the accuracy value and the difference value of each iTree （Isolation Tree） according to the prediction result of each iTree for the data set， then builds fitness function based on this. Finally， the iTree with better detection performance is selected by the simulated annealing algorithm to construct integrative learning model. The experimental results of K-fold cross-validation in the ODDS anomaly detection dataset indicate that the SA-EIF algorithm is sensitive to local anomalies， reducing the time cost by 20%~40% compared with EIF， and the recognition accuracy is about 5%~10% higher than EIF.

Stock Volatility Prediction of LightGBM-GRU Model under Corrective Learning Strategy

SHI Zhi-wei, WU Zhi-feng, ZHANG Zhe

2023, 0(01): 95-102.

Asbtract ( 546 )

PDF (1925KB) ( 169 )

References | Related Articles | Metrics

In order to improve the accuracy of traditional intelligent algorithms in time series prediction and the adaptability of solving engineering data problems， a corrective learning strategy is proposed. Volatility is widely used in the financial field， so it is of great value to predict the volatility of stocks. Since the time series of stock prices are non-linear and non-stationary， predicting the volatility of the stock market has become a difficult point in time series forecasting. In this paper， a simulation experiment is carried out by corrective learning strategy， and a LightGBM-GRU model is designed. Using LightGBM and GRU as the base model and corrector， we predict the volatility of 126 stocks from different industries in the next 10 minutes within 3 years. According to RMSPE，MAE，MSE，RMSE and other indicators： even the classical integrated learning model with good effect， the accuracy and generalization ability also can be improved at the same time by the corrective learning strategy. This paper points out that in the era of algorithm enrichment and big data， the contradiction of intelligent algorithms has turned into a contradiction between the limited versatility of intelligent algorithms and the diversity of engineering problems. Correcting learning strategies can provide new ideas for data simulation.

Flame Detection Algorithm Based on Improved YOLOV5

WANG Hong-yi, KONG Mei-mei, XU Rong-qing

2023, 0(01): 103-107.

Asbtract ( 727 )

PDF (1474KB) ( 252 )

References | Related Articles | Metrics

Aiming at the existing flame detection algorithms having problems of low average detection accuracy and high missed detection rate of small target flames， an improved YOLOV5 flame detection algorithm is proposed. The algorithm uses the Transformer Encode module to replace the CSP bottleneck module at the end of the YOLOV5 backbone network， which enhances the network's ability to capture different local information and improves the average accuracy of flame detection. In addition， the CBAM attention module is added to the YOLOV5 networker， which enhances the network's ability to extract image features， and can better extract features for small target flames， reducing the missed detection rate of small target flames. Experiment with the algorithm on the public datasets BoWFire and Bilkent， the experimental results show that the average flame detection accuracy of the improved YOLOV5 network is higher， reaching 83.9%， the small target flame missed detection rate is lower， only 1.6%， and the detection rate is 34 frames/s. Compared with the original YOLOV5 network， the average accuracy is improved 2.4 percentage points， the small target flame missed detection rate is reduced by 4.1 percentage points， the improved YOLOV5 network can meet the real-time and precision requirements of flame detection.

Energy Consumption Prediction of Air-conditioning System in Underground Engineering Based on XGBoost Hyperparameter Optimization

FENG Zeng-xi, CHEN Hai-yue, WANG Tao, ZHAO Jin-tong, LI Shi-yan

2023, 0(01): 108-113.

Asbtract ( 417 )

PDF (2175KB) ( 111 )

References | Related Articles | Metrics

Aiming at the problem that it is difficult to accurately predict the air-conditioning system’s energy consumption in underground engineering， an energy consumption prediction model based on the eXtreme Gradient Boosting algorithm （XGBoost） optimized by the Beetle Antennae Search （BAS） algorithm is proposed. The algorithm optimizes the position update strategy in the conventional beetle algorithm by introducing a typical optimal solution guidance mechanism， and uses a linear decreasing strategy to correct the search step size of the beetle， so as to achieve the global optimum point and improve the convergence speed. The number of decision trees and the maximum depth of the tree in XGBoost， which have a greater impact on the prediction accuracy of the mode， are used to optimize by the improved BAS， so as to obtain the optimal parameter combination of XGBoost and improve the model prediction accuracy. Finally， taking the air-conditioning system of an underground security project as the research object， the validity of the proposed prediction model is verified.

Recognition of Safety Helmets Based on Contextual Information Fusion

XIAO Li-hua, XU Chang, SHANG Hao-liang, LUO Zhong-da, WU Xiao-zhong, MA Xiao-feng, JIANG Zhi-wen, CHEN Jun-jie

2023, 0(01): 114-119.

Asbtract ( 288 )

PDF (1877KB) ( 77 )

References | Related Articles | Metrics

In order to prevent the accidents caused by the lack of personal protection， this paper focuses on the intelligent identification of personnel wearing helmets in complex construction scenarios. Aimingat the problems of the small object recognition and the missing texture information of helmets， it enhances the representation learning ability of one-stage object detection methods by extracting and fusing contextual information. First， this paper proposes a local context perception module and global context fusion module to improve the discriminability of learned features. The local context perception module combines the information of head and helmet to obtain discriminative feature representations. The global context fusion module merges the semantic information from high-level layers with shallow features; it helps the model obtain more abstract feature representations. Secondly， to address the small object detection issue， this paper uses multiple object detection modules to recognize multiscale objects. Experimental results on the helmet recognition dataset show that the proposed two modules improve the mAP by 11.46 percentage points and the AP of helmet detection by 10.55 percentage points. The proposed method has the advantages of high speed and high precision， and provides effective technical solutions for smart construction sites.

Medical Knowledge Extraction Based on BERT and Non-autoregressive

YU Qing, MA Zhi-long, XU Chun

2023, 0(01): 120-126.

Asbtract ( 419 )

PDF (1336KB) ( 124 )

References | Related Articles | Metrics

In order to avoid the problems of error accumulation and entity overlap caused by the pipeline entity relation extraction model， a joint extraction model based on BERT and Non-autoregressive is established for medical knowledge extraction. Firstly， with the help of the BERT pre-trained language model， the sentence code is obtained. Secondly， the Non-autoregressive method is proposed to achieve parallel decoding， extract the relationship type， extract entities according to the index of the subject and object entities， and obtain the medical triplet. Finally， we import the extracted triples into the Neo4j graph database and realize knowledge visualization. The dataset is derived from manual labeling of data in electronic medical records. The experimental results show that the F1 value， precision and recall based on BERT and non-autoregressive joint learning model are 0.92， 0.93 and 0.92， respectively. Compared with the existing model， the three evaluation indicators have been improved， indicating that the proposed method can effectively extract medical knowledge from electronic medical records.

Table of Content