Computer and Modernization

Proportional Dominance Logistic Regression Optimized Voice Disorder Index Algorithm

HE Ruonan1, FAN Xiang2, CHEN Yi1, JIANG Yufei1, CAO Hui1

2024, 0(08): 1-4. doi:10.3969/j.issn.1006-2475.2024.08.001

Asbtract ( 230 )

PDF (1000KB) ( 296 )

References | Related Articles | Metrics

To address the problem that the voice impairment index lacks the analysis and optimization of non-traditional acoustic feature parameters when extracting traditional acoustic feature parameters， this paper proposes an algorithm to optimize the voice impairment index based on ordered proportional dominance logistic regression. Firstly， the spectral flatness is extracted and correlated with the voice impairment index. Secondly， the new equation of voice disorder index is obtained by applying the proportional odds logistic regression method. Finally， a comparison is made between the DSI and the traditional voice disorder index for the samples taken from the database. This paper optimizes the algorithm to broaden the range of values for DSI. The algorithm in this paper is applied to the classification of voice disorders. The experimental results show that the algorithm can effectively determine the values of DSI and obtain good classification results quickly.

STRL: Testing Algorithm Based on Reinforcement Learning#br# #br#

ZHAO Huarui

2024, 0(08): 5-10. doi:10.3969/j.issn.1006-2475.2024.08.002

Asbtract ( 219 )

PDF (1805KB) ( 134 )

References | Related Articles | Metrics

Reinforcement learning has become research focus in the field of machine learning in recent years due to its characteristic of generating dynamic data through interaction with the environment without requiring a large number of samples for training. This paper proposes a new software testing framework STRL based on reinforcement learning， which can effectively solve the problem of long time consuming and low state coverage of regression testing. STRL utilizes reinforcement learning algorithm PPO to achieve efficient adaptive exploration. Experiments results show that the STRL algorithm outperforms manual testing and automated script testing in terms of state coverage and testing time.

Regional Enterprise Association Visualization and Relationship Mining Based on#br# Knowledge Graph

WANG Xianshun, XIONG Qingzhi, WAN Lei, LI Xiang, LIN Chongshan, JIN An’an

2024, 0(08): 11-16. doi:10.3969/j.issn.1006-2475.2024.08.003

Asbtract ( 214 )

PDF (2609KB) ( 186 )

References | Related Articles | Metrics

Given the complex network structure of existing regional enterprise association analysis results， which is difficult to comprehend， and the dynamic nature of regional enterprise associations in time and space. In response to the challenges in interpreting results in current regional enterprise analysis， this paper adopts a knowledge graph-based model for regional enterprise association analysis. It utilizes diverse and heterogeneous data for knowledge extraction and storage， coupled with the Neo4j graph database to realize knowledge storage of regional enterprise relationships. In terms of force-directed layout， the utilization of repulsive force optimization and node-edge processing successfully achieves the visualization of enterprise relationships. Through in-depth exploration and analysis of inter-enterprise associations， the aim is to reveal cooperation and competition relationships among regional enterprises， providing decision support for government industrial policy formulation， enterprise investment attraction， and inter-enterprise collaboration. Experimental results demonstrate that the model accurately reveals inter-enterprise relationships， offering robust support for regional economic development.

Combining Knowledge Tracing and Graph Convolution for Knowledge Concept#br# Recommendation

WANG Yan, CONG Xin, ZI Lingling

2024, 0(08): 17-23. doi:10.3969/j.issn.1006-2475.2024.08.004

Asbtract ( 249 )

PDF (1848KB) ( 592 )

References | Related Articles | Metrics

The innovative development of technology has led to the flourishing advancement of online education platforms， which provide a huge amount of educational resources， each type of which contains rich knowledge concepts. The current research mainly focuses on personalized course resource recommendation by knowledge graph， which is vulnerable to the data sparsity problem and difficult to be extended. Difficulty in matching learners’ learning status with learning resources， the model KT-GCN （Knowledge Tracing-Graph Convolution Network） is proposed. Firstly， the overall modeling of learners’ knowledge level is performed using knowledge tracing， getting the learner’s current learning status. Then path encoding is performed using graph convolutional network， accessing to learner-adapted learning paths， path selection is performed using TransE method and multi-hop path. Finally， predictive scoring is performed to obtain a recommended list of the most matching learning resources. To validate the performance of the model， comparison experiments are conducted with the baseline model on multiple datasets， and corresponding ablation experiments are performed to verify the performance of each component of the model.

Improved Deciduous Tree Nest Detection Method Based on YOLOv5s

CHENG Meng, LI Hao

2024, 0(08): 24-29. doi:10.3969/j.issn.1006-2475.2024.08.005

Asbtract ( 234 )

PDF (2245KB) ( 206 )

References | Related Articles | Metrics

To address the difficulty of detecting small bird nest targets in complex backgrounds， an improved YOLOv5s network architecture named YOLOv5s-nest is proposed. YOLOv5s-nest incorporates several enhancements: a refined attention mechanism called Bi-CBAM is inserted into the Backbone to effectively enhance the network’s perception of small targets; the SDI structure is introduced into the Neck to integrate more hierarchical feature maps and higher-level semantic information; the InceptionNeXt structure is inserted into the Neck to improve the model's performance and computational efficiency; and in the detection head， ordinary convolutions are replaced with PConv to efficiently extract spatial features and enhance detection efficiency. The experimental results show that the average precision of the improved model reached 89.1%， representing an increase of 6.8 percentage points compared to the original model.

Review of Fall Detection Technologies for Elderly

WANG Mengxi, LI Jun

2024, 0(08): 30-36. doi:10.3969/j.issn.1006-2475.2024.08.006

Asbtract ( 870 )

PDF (2530KB) ( 644 )

References | Related Articles | Metrics

With the rapidly growing aging population in China， the proportion of the elderly living alone has significantly increased， and thus the aging-population-oriented facilities have received increased attention. In a domestic environment， the elderly are likely to fall down due to different reasons such as lack of care， aging， and sudden illness， which have become one of the main threats to their health. Therefore， monitoring， detecting and predicting fall down behavior of the elderly in real-time can ensure their safety to some extent， while further reducing the life and health risks caused by accidental falling down. Based on a comprehensive overview of the research on human fall detection， we categorize fall detection into two categories： vision-free technologies and computer vision based methods， depending on different kinds of sensors used for data acquisition. We summarize and introduce the system composition of different methods and explore the latest relevant research， and discuss their method characteristics and practical applications. In particular， we focus on reviewing the deep learning based schemes which have been developing rapidly in recent years， while analyzing and discussing relevant principles and research results of deep learning based schemes in details. Next， we also introduce public benchmarking datasets for human fall detection， including dataset size and storage format. Finally， we discuss the prospect for the relevant research， and come up with reasonable suggestions in different aspects.

Indoor Scene Recognition Method Based on Multi-scale Feature and Attention Module

YUE Youjun1, 2, ZHANG Yuankun1, ZHAO Hui1, 2, WANG Hongjun1, 2

2024, 0(08): 37-42. doi:10.3969/j.issn.1006-2475.2024.08.007

Asbtract ( 140 )

PDF (1331KB) ( 128 )

References | Related Articles | Metrics

Scene recognition plays an important role in the task of visual information retrieval， segmentation and image/video understanding. With the development of deep learning theory， convolutional neural networks （CNN） greatly improve the ability of scene recognition by recognizing discriminative objects in images. In order to realize autonomous scene recognition for home service robots such as intelligent wheelchair beds， aiming at the condition of limited computing resources and memory requirements of mobile terminals or embedded devices， which leads to low scene recognition rate due to the single discriminative output from the network， an indoor scene recognition method based on multi-scale feature extraction and attention module is proposed. The method is based on MobileNetV2， which selects different branches from the network and extracts features at different scales. To focus on more discriminative features in the scene， the MRLA-Light attention module is added to the branches. The simulation results show that the accuracy is obviously improved， and the accuracy of tests on MIT Indoor 67 and Scene 15 scene datasets reaches 86.3% and 94.3% respectively， which is higher than the same type of networks.

Improved YOLOv8 Behavior Detection Algorithm for Intelligent Operation and#br# Maintenance System

MA Yong, WANG Jun, ZHANG Zijian, ZHAO Yuyang, ZHANG Jing, ZHOU Ming

2024, 0(08): 43-48. doi:10.3969/j.issn.1006-2475.2024.08.008

Asbtract ( 214 )

PDF (1917KB) ( 168 )

References | Related Articles | Metrics

Aiming at the problem that the intelligent operation and maintenance system is difficult to stably detect the behavior of computer room staff when maintaining the security of the computer room， leading to potential safety hazards， an improved YOLOv8 behavior detection algorithm is proposed. Firstly， an adaptive spatial weight convolution module is designed to improve the original C2f module and enhance the network’s ability to acquire multi-scale features. Secondly， a multi-residual deformable convolution module is proposed to enhance the algorithm’s ability to learn irregular spatial features， and it is integrated into the neck network to further improve the detection accuracy of computer room staff behavior. Then， aiming at the problem of the lack of current computer room image datasets， relevant images are collected and labeled from existing media， and transfer learning is used to further debug and optimize based on existing training weights. Finally， the Wise-IoU loss function is introduced to solve the impact of low-quality examples in the self-built dataset on training results. Experiment results show that the improved algorithm achieves a test accuracy of 87.84% on the standard NTU RGB+D dataset， which is superior to the comparison algorithm； compared with the original YOLOv8 in real computer room tests， the accuracy and recall rate are improved by 13.24% and 10.47%， respectively， and the parameter quantity is reduced by 18.07%.

Semantic Segmentation of Video Frame Scene Based on Lightweight

SHI Xianwei1, FAN Xin2

2024, 0(08): 49-53. doi:10.3969/j.issn.1006-2475.2024.08.009

Asbtract ( 183 )

PDF (1575KB) ( 217 )

References | Related Articles | Metrics

Scene segmentation is crucial for computers to understand the road environment， the large semantic segmentation model based on deep learning can often achieve excellent segmentation performance， but it cannot be flexibly deployed on edge devices because of its large number of parameters and computation. To solve this problem， this paper proposes an efficient scene semantic segmentation model E-SegNet from the perspective of lightweight. Firstly， the lightweight feature extraction model EfficientNet-B0 is used as the encoder of the model to extract the hierarchical features. Then， CPAM and CCAM modules based on the self-attention mechanism are used to establish the dependency between the single element in the deep features and the global central element in the two dimensions of spatial and channel. Finally， the feature of deep and shallow layers are fused and the final prediction results are output. Experimental results on video frame data set Camseq01 show that the proposed E-SegNet model achieves better segmentation performance with less than 1/10 of the parameters of DeeplabV3+ model and about 1/4 of the computational effort， which reflects the effectiveness of the model， and provides more schemes for deploying lightweight models on edge devices.

Pedestrian Tracking Algorithm Based on Improved YOLOv5s and DeepSORT

ZHENG Shangpo1, CHEN Defu1, LI Jianli2, LIN Guoxian2, WANG Xingping3

2024, 0(08): 54-58. doi:10.3969/j.issn.1006-2475.2024.08.010

Asbtract ( 203 )

PDF (2222KB) ( 154 )

References | Related Articles | Metrics

The study conducts focus on enhancing the detection accuracy of the YOLOv5s algorithm within the DeepSORT framework. The research work encompasses the integration of the attention mechanism called Convolutional Block Attention Module （CBAM） into the network structure of YOLOv5s， the refinement of the bidirectional feature fusion network Bi-directional Feature Pyramid Network （BiFPN）， and the adoption of Enhanced Intersection over Union （EIoU） as the bounding box loss function. Test results obtained from the VOC 2007 pedestrian dataset indicates improvements when compared to the original algorithm. Specifically， the proposed algorithm exhibits an increase of 0.3 percentage points in precision， 1.0 percentage points in recall， and 0.3 percentage points in average precision. Subsequently， the algorithm is evaluated on the MOT17 dataset， showcasing significant enhancements in multiple metrics. The MOTA metric experiences a 1.8 percentage points improvement， while IDF1， MT， and IDR see enhancements of 2.9 percentage points， 1， and 2.7， respectively. Moreover， the number of false negatives （FN） decreases by 4373， and the number of mostly lost targets （ML） decreases by 11. Overall， these empirical findings substantiate the efficacy of the improved YOLOv5s algorithm as a detector， effectively augmenting tracking precision in various scenarios.

Multi-object Tracking of UAV Based on Improved YOLOX and New Data Association Method

FU Shugang1, 2, 3

2024, 0(08): 59-66. doi:10.3969/j.issn.1006-2475.2024.08.011

Asbtract ( 191 )

PDF (3054KB) ( 186 )

References | Related Articles | Metrics

Multi-object tracking in UAV videos is a crucial computer vision task with extensive applications across various domains. To address the challenges of occlusions， small objects， and complex， varying backgrounds in UAV video scenes， an improved UAV multi-object tracking model is proposed. This paper improves the YOLOX network by integrating the Swin Transformer to enhance global information extraction capabilities and adding an additional detection head to boost the detection performance of small objects. Furthermore， this paper introduces the CBAM attention module to focus on informative features. In the data association stage， this paper adopts a new data association approach that retains all detection boxes， categorizing them into high-scoring and low-scoring detection boxes based on their confidence scores. The first association is performed between high-scoring detection boxes and tracking trajectories， while the second association is performed between unmatched trajectories and low-scoring detection boxes. Experimental results on the public datasets VisDrone2021 and UAVDT demonstrate that the proposed method exhibits relatively high superiority and robustness in UAV multi-object tracking scenarios.

Survey on Gesture Recognition and Interaction

WEI Jiakun, WANG Jiarun

2024, 0(08): 67-76. doi:10.3969/j.issn.1006-2475.2024.08.012

Asbtract ( 656 )

PDF (1322KB) ( 1119 )

References | Related Articles | Metrics

Gesture recognition and interaction technology is the cornerstone task of frontier research in human-computer interaction technology and artificial intelligence technology. This task takes the collaborative work of computers and devices to recognize and process gesture information and give machine operations corresponding to gestures as the main goal， and integrates a number of technologies such as motion capture， image processing， image classification， and multi-terminal collaborative interaction， which is a powerful guarantee to support the command and control system， robot interaction， medical operation and other cutting-edge intelligent interaction and human-computer interaction work nowadays. At present， the research on gesture recognition and interaction has become more and more mature with a wide range of application fields and rich application scenarios. This paper mainly provides a review of the gesture recognition development and interaction related technologies and hardware. Firstly， it sorts the research progress of gesture recognition and interaction technology out comprehensively， and categories the key steps of gesture recognition at the same time. Secondly， it classifies and elaborates the related work of the current mainstream gesture recognition depth sensors used for 3D gesture interaction. Subsequently， it analyses and discusses the real sense recognition technology for 3D gesture recognition. Finally， it analyses the deficiencies and urgent problems in gesture recognition and interaction technology， proposes the integration of such cutting-edge technologies as deep learning， pattern recognition and other feasible research ideas and methods， and makes predictions and prospects for the future research direction， technology development and application areas in this field.

Multi-scale Dual Attention Image Super-resolution Reconstruction Method

WANG Xin, YU Lei

2024, 0(08): 77-87. doi:A DOI: 10.3969/j.issn.1006-2475.2024.08.013

Asbtract ( 131 )

PDF (6525KB) ( 76 )

References | Related Articles | Metrics

Addressing the issues of limited feature information extraction and low feature utilization in existing image super-resolution reconstruction methods, we propose a Multi-Scale Dual Attention （MSDA） approach. Firstly， this method employs multi-scale feature extraction blocks to capture feature information from different scales of the input image. Subsequently， a dual attention mechanism is introduced to enable the network to rapidly focus on high-frequency regions in the images， while utilizing skip connections to mitigate feature information loss during deep network propagation. Lastly， a dropout layer is employed to balance the importance of feature channels， preventing network co-adaptation， and enhancing the model’s generalization capability. Experimental results on public test datasets， including Set5， Set14， BSD100， Urban100， and Manga109， demonstrate that MSDA achieves superior performance by generating images with enhanced high-frequency information， enriched texture details， and a perceptual resemblance to the original high-resolution images.

An Image Generation Method of Classroom Expression Images

XU Xin’ai, LI Gang

2024, 0(08): 88-91. doi:10.3969/j.issn.1006-2475.2024.08.014

Asbtract ( 193 )

PDF (1296KB) ( 254 )

References | Related Articles | Metrics

In order to build a database of classroom expression images and make up for the lack of classroom expression diversity under specific conditions， a method for generating classroom expression images based on deep convolutional generative adversarial networks （DCGAN） is proposed. Firstly， by using the offline teaching surveillance videos and the online classroom videos to independently collect classroom expression images， and a balanced and small image set with abundant sample features is obtained. Secondly， the training image set of classroom expression is constructed by image denoising， image enhancing and image mirroring. Thirdly， through the construction and preliminary parameter setting of the classroom expression image generation network based on DCGAN model， and constantly optimizing the network hyperparameters， the classroom expression image dataset is generated. Finally， the face detection algorithm and the IS （Inception Score） evaluation index are used to detect and evaluate the generated classroom expression images， and verify the feasibility and effectiveness of the generated images in the detection network. The experimental results show that the method based on DCGAN can generate more realistic classroom expression images， effectively improve the classroom facial expression dataset， and enhance the diversity of classroom expression images.

Automated Drawing Psychoanalysis Based on Image Classification

ZHAO Xiaoming, PAN Ting, LIU Weifeng

2024, 0(08): 92-97. doi:10.3969/j.issn.1006-2475.2024.08.015

Asbtract ( 311 )

PDF (3358KB) ( 191 )

References | Related Articles | Metrics

Drawing psychoanalysis method is widely used in the discovery and treatment of psychological illness and mental disorders. The House-Tree-Person （HTP） test is the most representative drawing psychoanalysis method， which projects the individual’s psychological state through the houses， trees， and persons drawn by the patient. Compared with the psychological health questionnaire， it has the advantages of being non-verbal， projective， and creative， and can systematically release the subconscious. At present， the HTP test is tested and evaluated by the therapist， which takes a long time in large-scale psychological screening， and the evaluation results will be affected by the experience and subjectivity of the therapist. Therefore， it is necessary to establish an automated method to improve the objectivity， reliability， and efficiency of the HTP test. The paper proposes an automated drawing screening method for the HTP test based on the relationship between psychological states and drawing features. The method extracts key features such as the size， position， and shadow of the drawing， and combines these features to build a psychological state classifier. This method can effectively screen out negative drawings for further diagnosis and treatment. At the same time， this paper collects the test drawing of HTP from the psychological counseling centers of the college and makes HTP dataset for experiments. Experimental results prove the superiority and application value of this method.

Survey on Group-level Emotion Recognition in Images

GAO Shuaipeng, WANG Yifan

2024, 0(08): 98-107. doi:10.3969/j.issn.1006-2475.2024.08.016

Asbtract ( 752 )

PDF (1434KB) ( 256 )

References | Related Articles | Metrics

In recent years， image-based group emotion recognition has received widespread attention， which aims to accurately determine the overall emotional state of groups in different scenes and with different numbers of people. Since group emotion recognition involves the analysis and fusion of multiple group emotion clues such as facial emotional features， scene features， and human posture features in pictures， this field is very challenging. At this stage， there is a lack of relevant review articles in this field to sort out the existing research， so as to better conduct the next step of research. This article carefully sorts out and categorizes group emotion recognition models with different emotional cues and different processing methods in this field. At the same time， the processing methods and characteristics of existing models are reviewed and analyzed， and models with different fusion methods and mainstream databases in this field are sorted out. Finally， a brief summary and outlook on the development of this field are given.

Vehicle Detection in UAV Image Based on YOLOv5s

WANG Tao1, 2, HUANG Dan1, 2, LIU Chanyi1, 2, ZHU Tao1, 2

2024, 0(08): 108-113. doi:10.3969/j.issn.1006-2475.2024.08.017

Asbtract ( 196 )

PDF (2747KB) ( 233 )

References | Related Articles | Metrics

The problem of complex backgrounds and large variations in target scales in vehicle images captured by unmanned aerial vehicle （UAV） makes it difficult for existing neural network models to detect small target objects when performing vehicle detection， which can easily lead to false detection and missed detection of small target objects. To solve this problem， an improveed method based on the YOLOv5s neural network is proposed. Firstly， we use the K-means++ algorithm to cluster dataset to obtain better anchor. Secondly， the SPD-Conv small target detection module is combined to reduce the false detection and miss detection rate， so as to improve the accuracy of vehicle detection. Finally， the detection head module is replaced by a decoupled head module to decouple the classification and regression tasks， thus further improve the classification accuracy. The article uses VisDrone-2019-DET dataset for vehicle detection， the mean average precision （mAP） of the improved network in this paper reaches 53.0%， which is 6.3 percentage points higher than the original YOLOv5s model， and can effectively reduce the probability of false detection and missed detection of small objects， enable more accurate vehicle detection.

Chinese Paper Invoice Text Recognition Method with Character Blurring

LAI Kun

2024, 0(08): 114-119. doi:10.3969/j.issn.1006-2475.2024.08.018

Asbtract ( 226 )

PDF (1686KB) ( 218 )

References | Related Articles | Metrics

This paper addresses the problem of low OCR recognition performance caused by character blurring in paper invoices. A novel adaptive iterative visual semantic model is proposed to tackle this issue. The model consists of two modules： the recognition module utilizes ResNet as the encoder and Transformer as the decoder to make initial predictions on the blurred text. The correction module takes the recognition module’s predictions and feeds them into a bidirectional language model， which leverages contextual semantic information to refine characters， thereby performing initial text correction. The results are then input to a discriminator， which outputs them directly if successful or iterates the language model for further refinement if failed， effectively improving the recognition accuracy. Experimental results demonstrate that the proposed model outperforms the current state-of-the-art Chinese recognition model ch_PP-OCRv3 by 3.39 percentage points in recognition accuracy and achieves an average 6.81 percentage points improvement compared to other models. Moreover， the model exhibits excellent generalization performance on public datasets such as IC15， IIIT5K， and IC03-Word， validating its effectiveness.

News Long Text Classification Model Based on Improved TF-IDF and AGLCNN

ZHOU Xianxi, MU Li

2024, 0(08): 120-126. doi:10.3969/j.issn.1006-2475.2024.08.019

Asbtract ( 263 )

PDF (1209KB) ( 179 )

References | Related Articles | Metrics

News long text classification is an important task in natural language processing， but traditional text representation methods have problems such as sparse features and insufficient semantics. In addition， long news texts contain a large amount of redundant information and may involve other topics， all of which can lead to incomplete text feature extraction. Therefore， this article proposes a news long text classification model based on improved TF-IDF algorithm and AGLCNN. This model firstly improves the TF-IDF algorithm by utilizing the distribution and position information of feature items between and within classes， and combines Word2Vec word vectors for text representation. Using attention mechanism to highlight keyword information， we input it into Bi-LSTM to capture text contextual features. Then we use CNN to highlight the prominent features of news topics. Considering that there may be sentences involving other topic information in long news texts， a gating mechanism is introduced to fuse the output features of Bi-LSTM and CNN to obtain the final text feature representation. Finally， we input the feature vectors into the Softmax layer for news classification. Comparative experiments are conducted on the THUCNews dataset and the Sohu News dataset， and the results show that the proposed model has recall rates of 0.985 and 0.976 on both datasets， respectively， which are superior to other classification models.

Table of Content