Computer and Modernization

Face Anti-spoofing Based on Domain Synthesis and Contrastive Learning

ZHANG Shanlu1, ZHANG Wei2

2025, 0(07): 1-8. doi:10.3969/j.issn.1006-2475.2025.07.001

Asbtract ( 111 )

PDF (3017KB) ( 144 )

References | Related Articles | Metrics

Abstract: Face anti-spoofing （FAS） is an important mean to guarantee the security of face recognition systems. Existing FAS methods have poor generalization in cross-dataset testing scenarios， which leads to a drastic performance degradation. To this way， this paper proposes a FAS method based on domain synthesis and contrastive learning. The proposed method mainly contains two novel modules: domain synthesis module and contrastive learning module. The former randomly swaps local regions of face images in different source domains at the image level to generate pseudo-source domain samples. Then， the above face images are reconstructed at the feature-level by exchanging the local features of the corresponding positions reconstructed at the image-level. By maximizing the similarity of the reconstructed sample’s features and the reconstructed features， this module can ensure the stability of the generated pseudo source domain while expanding the number of samples and attack types. This provides a solid data foundation for the proposed method to learn generalized feature spaces. The later minimizes the intra-class distance of the real face representation and maximizes the inter-class distance between the real face and spoof faces. Meanwhile， this module maximizes the inter-class distance between the real face and the reconstructed samples. This process effectively promotes intra-class compactness of the real face and ensures that the proposed method can learn a good decision curve. The proposed method is trained and tested on four publicly available face live detection datasets CASIA-FASD， Replay-Attack， MSU-MFSD， and OULU-NPU， and the experimental results show that the proposed method has a good generalization performance in cross-dataset testing scenarios.

Insulator Defect Detection by UAV Based on Lightweight YOLOv8

JIANG Zhiwei1, FU Xiaojin1, CHEN Wenbin1, JIANG Yichen2

2025, 0(07): 9-14. doi:10.3969/j.issn.1006-2475.2025.07.002

Asbtract ( 78 )

PDF (2897KB) ( 256 )

References | Related Articles | Metrics

Abstract: In order to solve the problems of insulator string and insulator self-explosion， damage， flashover defects in complex background， different scales， small target factors leading to false detection and missed detection， resulting in low detection accuracy， the CPCW-YOLOv8 algorithm is proposed. Firstly， a lightweight CBAM attention mechanism is introduced into the backbone part， so that the model can enhance the extraction ability of insulator strings and insulator defect features in complex backgrounds from both channel and space aspects. Then， the small target detection layer is added， and the multi-scale fusion is used to enhance the extraction of shallow semantic information by the network， so as to capture more details of insulator defects and improve the detection accuracy of small targets. Secondly， in order to make the model more lightweight， a lightweight module C2f-Faster is constructed. Finally， the original CIoU is optimized to WIoU to accelerate convergence and improve the detection accuracy. Experimental results show that compared with the original model， the number of parameters of CPCW-YOLOv8 is reduced by 12.6 precentage points， and the average accuracy is increased by 5.2 precentage points. The proposed network provides a more efficient method for the defect detection of insulators in power systems.

ADHD Classification Based on ConvNeXt and Attention

WANG Tao1, WU Qian1, 2

2025, 0(07): 15-20. doi:10.3969/j.issn.1006-2475.2025.07.003

Asbtract ( 63 )

PDF (1466KB) ( 118 )

References | Related Articles | Metrics

Abstract: Attention Deficit and Hyperactivity Disorder （ADHD）， commonly known as ADHD， is a common behavioral disorder in children. Since there is no clear etiology for ADHD， and there are only subtle differences in the brain structure between ADHD patients and normal children， which makes it difficult for clinicians to make effective diagnosis. For such disorders， a convolutional neural network based on ConvNeXt and attentional mechanisms is proposed for distinguishing ADHD patients from normal children. Firstly， the sMRI is preprocessed， secondly， the pre-trained model is loaded， the deep feature extraction is performed by the ConvNeXt network containing multidimensional collaborative attention， the ConvNeXt output layer is reconstructed and the final classification results are obtained. Validated on the ADHD-200 dataset， the experimental results show that its classification accuracy reaches 97.3%， which is better than the current mainstream methods， and the heat map of the model suggests the prefrontal lobe and other brain regions related to the disease， so it can be used as an effective and convenient auxiliary diagnosis method for ADHD.

Application of Multimodal Large Language Models in Diagnosis of Pigmented Skin Lesions

SUN Kaijie, HU Jili

2025, 0(07): 21-27. doi:10.3969/j.issn.1006-2475.2025.07.004

Asbtract ( 86 )

PDF (2898KB) ( 112 )

References | Related Articles | Metrics

Abstract: Accurate diagnosis of pigmented skin lesions presents a complex and challenging task. In contemporary medical practice， intelligent diagnostic tools can significantly enhance the precision of both diagnosis and treatment. This study proposes an innovative multimodal large language model， SkinCPM-V， to address diagnostic challenges associated with textural patterns， hair artifacts， and vascular structures in dermoscopic images. SkinCPM-V is deeply optimized based on MiniCPM-V， and specially customized for the characteristics of skin lesions. It has been extensively trained on publicly available dermatological datasets from Kaggle， leveraging the LoRA technique to achieve efficient parameter fine-tuning. Comprehensive evaluations reveal that SkinCPM-V achieves exceptional performance， with BLEU-4， ROUGE-1， ROUGE-2， and ROUGE-L scores of 0.8880， 0.9380， 0.9104， and 0.9349， respectively， indicating a high level of alignment between generated outputs and reference standards. Additionally， the model’s effectiveness in real-world diagnostic tasks is validated through F1 score of 0.9067， precision of 0.9028， and recall of 0.9444， highlighting its robust performance. Compared to other multimodal large language models， SkinCPM-V demonstrates superior results across all evaluation metrics. This highlights its ability to generate high-quality textual descriptions and underscores its potential for integration into clinical workflows. The findings of this study validate the utility of SkinCPM-V in the diagnosis of pigmented skin lesions and pave the way for broader applications of multimodal large language models in medical domains， offering a promising avenue for advancing diagnostic technologies.

Android Malicious Application Detection Based on RA-CNN and Residual Network

HUA Man, LIU Xiaoliang

2025, 0(07): 28-32. doi:10.3969/j.issn.1006-2475.2025.07.005

Asbtract ( 88 )

PDF (1641KB) ( 82 )

References | Related Articles | Metrics

Abstract: In recent years， Android malware detection methods based on bytecode images and deep learning have become increasingly popular， but such methods have the problems of limited feature extraction and sensitivity to noise data. To solve these problems， this paper proposes a detection method of fusion Residual Network （ResNet） and Recursive Attention Network （RACNN）. In this method， three bytecode files of DEX， XML and ARSC are extracted from the software samples and mapped to RGB images， and then the convolutional neural network embedded in the residual structure is used for feature abstraction and extraction. Subsequently， the Attention Suggestion Sub-Network （APN） uses the feature map as a reference to iteratively generate local region attention from coarse to fine. Meanwhile， the finer scale network magnifies the region of interest from the previous scale as the input of the next scale in a cyclic manner， and realizes classification through multi-scale learning. Experiments show that compared with similar bytecode-based image methods， the proposed method has improved in some indicators， the accuracy reaches 98.28%.

Double Privacy Protection Algorithm for User Location Based on Multi-anonymizer Architecture

HE Lili1, 2, ZHANG Chenglin1, 2, ZHANG Lei1, 2 , CAO Mingzeng1, 2

2025, 0(07): 33-42. doi:10.3969/j.issn.1006-2475.2025.07.006

Asbtract ( 50 )

PDF (1308KB) ( 94 )

References | Related Articles | Metrics

Abstract： With the proliferation and development of Location-Based Services （LBS）， the privacy protection of user location data has become an urgent issue. Existing anonymity-based privacy protection methods face challenges related to the semantic information in anonymous locations and query content affecting anonymity security. To address this， we propose a Dual Privacy Protection Algorithm Based on Multi-Anonymizer Architecture （MAA-DPPA）， utilizing a verifiable secret sharing algorithm. Unlike existing algorithms， MAA-DPPA enhances location privacy by integrating location anonymity with encrypted queries. Initially， a verifiable secret sharing and multi-anonymizer query encryption algorithm is introduced to enhance query security. Furthermore， an anonymization method is designed to improve semantic location privacy by satisfying individual semantic diversity and replacing sensitive semantic locations. Experimental results demonstrate that this algorithm achieves the highest level of privacy. The average improvement percentage range of MAA-DPPA compared to algorithms based on road network semantic diversity anonymous set construction under multiple conditions is 38.3% ~ 59.8%， and compared to algorithms based on semantic and trade-off perception， the improvement range is 51.1% ~76.5%. MAA-DPPA significantly enhances privacy protection while improving algorithm efficiency， verifying its effectiveness in safeguarding user location privacy.

A Survey of Network Outage Detection

KUANG Ye1, ZHOU Mo2, LIU Ceyue1

2025, 0(07): 43-54. doi:10.3969/j.issn.1006-2475.2025.07.007

Asbtract ( 66 )

PDF (1401KB) ( 66 )

References | Related Articles | Metrics

Abstract: With the popularization and development of the Internet， the network outage urgently needs to be solved in academia and industry and becomes a hot research topic. Large-scale network outage events seriously affect operation of network services and infrastructures， while a deep understanding of them can help strengthen people’s control over network condition. According to the research progress of the latest network outage detection technology， firstly， the definition and cause of network outage are given， and the security threats caused by network outages is revealed. Secondly， the network outage detection technologies are summarized and their advantages and disadvantages are presented. And then， the existing data collection platforms leveraging for network outage detection are summarized. Finally， the research prospects of network outage detection are presented， which provides a reference for the follow-up research on network security.

Long-and Short-Term Air Pollutant Concentration Forecasting Based on Optimized Transformer

CAI Bohan, LIU Jun

2025, 0(07): 55-62. doi:10.3969/j.issn.1006-2475.2025.07.008

Asbtract ( 66 )

PDF (3652KB) ( 340 )

References | Related Articles | Metrics

Abstract：Addressing the issues of low prediction accuracy， short timeliness， and difficulties in capturing spatiotemporal features for air pollutant concentration prediction， a Transformer architecture based on conditional mask self-attention is proposed， named CondMSA-Transformer. This paper improves the multi-head self-attention mechanism in the Transformer model， introduces the sparse attention concepts. By integrating critical environmental factors such as wind speed and wind direction， it implements intelligent “masking” of unnecessary site data， focusing on extracting the most valuable information within the spatiotemporal dimension. This strategy effectively avoids interference from weak signals of remote stations， reduces computational complexity， and enhances the model’s ability to capture core features. Comprehensive experimental evaluations on two real datasets in Beijing demonstrate that CondMSA-Transformer exhibits robust performance in both short-term and long-term prediction scenarios， providing up to 14.67% improvement in mean absolute error （MAE） for PM2.5 prediction compared to other existing methods. This shows its vast application potential and advancement in the field of air quality prediction.

3D Human Motion Similarity Estimation from Different Perspectives

LI Zihe, WANG Yiding

2025, 0(07): 63-68. doi:10.3969/j.issn.1006-2475.2025.07.009

Asbtract ( 73 )

PDF (3461KB) ( 217 )

References | Related Articles | Metrics

Abstract: With the abundance of online fitness and dance instructional videos， students often face difficulties in comparing their movements with those of the instructors due to inconsistencies in angles and scales when filming themselves， which hinders accurate movement similarity comparison. To fix this problem， this paper leverages existing 3D human pose estimation methods and proposes a motion similarity evaluation algorithm for videos filmed from different angles with a monocular camera. For two videos of human actions from different perspectives， this paper first extracts 2D human key points using the YOLOv8pose network， then elevates these to 3D key points using the GraphMLP network. This paper calculates the Euclidean distance matrix based on the two sets of 3D key point sequences and uses the DTW algorithm to identify corresponding frames between the two sets of actions. By adjusting the perspective of corresponding frames’3D key points through rotation and scaling， this paper aligns action sequences from different perspectives. Finally， the cosine similarity of skeletal vectors is used as the similarity evaluation metric. Experiments using mocap animations from different perspectives was conducted， the results demonstrated the effectiveness of the method proposed in this paper.

Speech Cloning Method Based on Self-attention Mechanism Speaker Encoder And SA-Decoder

JIAO Leyan, ZHU Xinjuan

2025, 0(07): 69-76. doi:10.3969/j.issn.1006-2475.2025.07.010

Asbtract ( 71 )

PDF (2428KB) ( 276 )

References | Related Articles | Metrics

Abstract: FreeVC model performs well in the field of speech cloning technology. However， due to the complex variations in speech features and information contained in speech sequences， such as timbre and style， the Speaker Encoder module in the FreeVC model only uses a single LSTM network， which is difficult to accurately extract and represent the speaker information， which leads to a decrease in the performance of the model in processing speech sequences， affecting the quality and accuracy of sound conversion. Moreover， the FreeVC model uses a traditional decoder， where the upsampling （deconvolution） operation can cause loss of detail， resulting in blurry and unclear speech articulation details in the reconstructed audio， thus generating audio artifacts. To address these issues， this paper proposes a speech cloning method based on self-attention mechanism， FreeVC-SA， for speaker encoder and SA-Decoder. The method takes the speaker’s Mel spectrum as input， and adds a self-attention mechanism on the LSTM network to help the model better capture long-distance dependencies and more accurate extract features such as speaker’s tone and style. Using the SA-Decoder decoder can effectively solve the problem of local receptive field limitation， making the generated speech cloning effect more realistic and clearer. Experimental results show that compared with all baseline models， FreeVC-SA speech cloning has significantly improved naturalness similarity and emotional similarity， and word error rate and character error rate have significantly decreased.

Power Load Forecasting of Bi-LSTM Based on Improving Whale Algorithm

RAO Hongyu1, CHEN Xin2, CHEN Sheng3, XIA Tian3, SHEN Li4, LUY Guangqiang5

2025, 0(07): 77-82. doi:10.3969/j.issn.1006-2475.2025.07.011

Asbtract ( 76 )

PDF (1702KB) ( 305 )

References | Related Articles | Metrics

Abstract: The accuracy of the power load prediction is the key to ensure the stable operation of power system after new energy grid connection. A power load forecasting model based on Bi-LSTM neural network and attention mechanism is proposed. Aiming at the problem that it is difficult to select the optimal hyperparameters in neural network， the whale algorithm is used to optimize the hyperparameters. To solve the problem of uneven distribution of global search and local development of whale optimization algorithm， a method of adjusting the piecewise nonlinear convergence factor is proposed. To solve the insufficient late-stage development ability， a method of combining adaptive weights and random difference variation is proposed for the hyperparameters optimization. Simulation verifies the effectiveness of the proposed method in hyperparameter design， and verifies the accuracy and effectiveness of power load forecasting based on improved whale algorithm.

Multi-modal Emotion Recognition Based on Text Guidance

ZHAI Junlong, GU Lin

2025, 0(07): 83-89. doi:10.3969/j.issn.1006-2475.2025.07.012

Asbtract ( 49 )

PDF (1853KB) ( 89 )

References | Related Articles | Metrics

Abstract: Multi-modal emotion recognition has been widely used in artificial intelligence， safe driving and other fields. Multi-modal information has rich modal representation， which is more accurate for emotion recognition. Text is a mode that expresses rich and accurate information. This paper proposes a multi-modal sentiment analysis model guided by multi-scale text features， that is， text features of different scales are aggregated to optimize the features of other modes. A text-guided multi-modal aggregation module AGG is designed， and the idea of contrast learning is introduced into the design of loss function to optimize the whole network. Each experimental index shows that the model has excellent performance in multi-modal emotion recognition， and the rationality and validity of the design are further proved by comparison experiment and ablation experiment.

Complex Fault Diagnosis Method of Charging Facilities Based on Bi-LSTM Residual Network

DENG Chao1, 2, YANG Fengkun1, 2, LI Jun3, CHEN Liangliang1, 2, GUO Zhichong4

2025, 0(07): 90-96. doi:10.3969/j.issn.1006-2475.2025.07.013

Asbtract ( 61 )

PDF (1360KB) ( 167 )

References | Related Articles | Metrics

Abstract: With the rapid development of electric vehicles， the reliability and safety of charging facilities have become the key factors to ensure user safety and enhance user experience. In view of the limitation that traditional fault diagnosis methods are not suitable for complex fault diagnosis due to insufficient data annotaion， this paper proposes a complex fault diagnosis method of charging facilities based on Bi-LSTM （Bidirectional Long Short-Term Memory Network） residual network from the perspective of maintenance work orders and historical operation work orders of charging facilities. This method extracts the effective features from the original orders and work orders， fuses the time series features of Bi-LSTM with the original features by using residual network， and finally inputs the fused features into the machine learning model to realize the complex fault diagnosis task of charging facilities. Experimental results show that this method has obvious improvement effect on complex fault diagnosis tasks compared with machine learning model.

HGAT: Multivariate Time Series Anomaly Detection Based on Hybrid Graph Attention Network

WEI Qingsong1, WANG Xiaojun2, WEI Yuan1, ZENG Shangqi1, FANG Yuan1

2025, 0(07): 97-105. doi:10.3969/j.issn.1006-2475.2025.07.014

Asbtract ( 58 )

PDF (2923KB) ( 354 )

References | Related Articles | Metrics

Abstract: In many complex systems， devices are typically monitored by network sensors and actuators， generating a large amount of multivariate time series data. Accurately capturing the intricate relationships among sensors and detecting and elucidating anomalies that deviate from these interconnections has become a critical challenge that the current technological field must address. To fully utilize the spatial-temporal dependency relationships and enhance the interpretability of anomalies， this paper proposes a method of Multivariate Time Series Anomaly Detection Based on Hybrid Graph Attention Network （HGAT）. Firstly， HGAT constructs a Feature Graph Attention Network （F-GAT） and a Temporal Graph Attention Network （T-GAT） using embedding vector similarity. Subsequently， the non-linear dependency relationships of variable dimensions and temporal dimensions are learned by parallelly applying the two graph attention layers， F-GAT and T-GAT. Finally， HGAT facilitates the co-optimization of the prediction-based and reconstruction-based models. It employs the anomaly scores derived from this collaborative optimization to delineate aberrant instances， thereby enhancing the explicability of the anomaly detection mechanism. Empirical evaluations conducted on the SWaT， WADI， and SMD datasets demonstrate that the proposed HGAT algorithm exhibits superior performance compared to the state-of-the-art baseline， the GDN method， yielding enhancements in the F1 metrics by 2.73 percentage points， 3.39 percentage points， and 0.9 percentage points for each dataset， respectively.

Lightweight Flame Detection Algorithm Based on Improved RT-DETR

WU Dong, FAN Yongsheng, SAN Binbin

2025, 0(07): 106-111. doi:10.3969/j.issn.1006-2475.2025.07.015

Asbtract ( 64 )

PDF (7481KB) ( 421 )

References | Related Articles | Metrics

Abstract: In order to improve the accuracy of flame detection and make the model lighter， a lightweight RT-DETR flame detection algorithm is proposed. First， EfficientVit is selected as the feature extraction network to reduce model computation and complexity. Secondly， an efficient hybrid encoder is designed to reduce the number of model parameters and the amount of calculation while maintaining the detection accuracy. The encoder consists of the LPE-AIFI module， which focuses on processing deep features， and the CGAFusion module， which improves the detection capability of the model through multi-scale feature fusion. Finally， the boundary box regression loss function MDPIoU is introduced to further improve the accuracy of the algorithm. The experimental results show that the floating-point operations （FLOPs） of the improved model are reduced by 48.8% and the number of parameters by 43.4% compared with the original model. On the basis of lightweight， mAP@0.5 reaches 88.6% and mAP@0.5:0.95 reaches 67.4%， which are respectively 2.2 percentage points and 2.7 percentage points higher than the benchmark model.

Bert-BiGRU-CRF with Self-attention Fusion for Text Causal Relationship Extraction

GAO Ningbo, ZHANG Xiaobin

2025, 0(07): 112-118. doi:10.3969/j.issn.1006-2475.2025.07.016

Asbtract ( 62 )

PDF (1745KB) ( 90 )

References | Related Articles | Metrics

Abstract: To address the issues of overlapping relations and long-distance dependencies in causal relation extraction from natural language texts， this paper introduces the tag2triplet algorithm to handle multiple causal triplets within the same sentence and embedded causality. It combines causal labeling schemes with deep learning architectures to minimize feature engineering while effectively modeling causal relationships. Additionally， the paper integrates self-attention mechanisms into the Bert-BiGRU-CRF model to capture long-distance dependencies between causal relations， allowing information to flow freely within the network and thereby more accurately extracting causal relationships. To validate the effectiveness of this approach， the model is compared with the currently widely used BiLSTM-softmax model， BiLSTM-CRF model， and Flair + CLSTM-BiLSTM-CRF model through experiments on the SemEval 2010 task8 dataset. The results demonstrate that the proposed model achieves a higher F1 score of 83.44%.

Feature Weighted Support Vector Machine Based on HSIC Lasso

LAI Zhiyong, WANG Tinghua, ZHANG Xin

2025, 0(07): 119-126. doi:10.3969/j.issn.1006-2475.2025.07.017

Asbtract ( 69 )

PDF (3224KB) ( 116 )

References | Related Articles | Metrics

Abstract: Support vector machine （SVM） has been successfully applied in data classification by transforming the original low-dimensional problem into a high-dimensional linear problem through kernel functions. However， the classical SVM algorithm treats all features equally， ignoring the fact that different features contribute differently to the output of the model. Therefore， the construction of the kernel space may not be entirely reasonable. This paper introduces a feature-weighted SVM algorithm based on the Hilbert-Schmidt independence criterion （HSIC） Lasso， named HSIC Lasso-FWSVM. The algorithm effectively measures the relationship between two random variables using the HSIC， computes the correlation between features， and that between features and labels within the feature space. These correlations are utilized as weights for the corresponding features. Next， the algorithm applies Lasso regression with sparse constraints to reevaluate the weights of various features， shrinking the weights of irrelevant features to zero. Finally， the obtained feature weights are applied to the SVM kernel function calculation， thereby avoiding interference from weakly or non-correlated features during kernel function computation. Experiments were conducted using the proposed algorithm on nine UCI datasets and compared with classical SVM and some recent feature weighted SVM algorithms. The results demonstrate that HSIC Lasso-FWSVM exhibits superior generalization and robustness.

Table of Content