Loading...

Table of Content

    24 September 2025, Volume 0 Issue 09
    Survey on Bundle Recommendation Algorithms
    LI Xiongqing1, 2, PENG Mingtian1, 2, LI Yong1, 2, WANG Junfei1, 2, LIU Dezhi1, 3, BIAN Yuxuan1, 3, CHAI Yuelin1, 3, LIU Yuntao1, 3
    2025, 0(09):  1-13.  doi:10.3969/j.issn.1006-2475.2025.09.001
    Asbtract ( 117 )   PDF (975KB) ( 102 )  
    References | Related Articles | Metrics

    Abstract: Bundle recommendation refers to optimizing and recommending the best solution by combining multiple related goods, services, or content, which can meet the various needs of users. With the rapid development of sectors like e-commerce and travel retail, bundle recommendation has become an important approach to improve user experience and business benefits. This paper reviews the research progress and application status of bundle recommendation algorithms. Firstly, the task definition, task characteristics, task challenges, and commonly used evaluation metrics are clarified. The task challenges include the integrity of bundled packages, diversity of bundled packages, data sparsity, cold start problems, and bundle generation problems. Secondly, the existing algorithms are classified into three major categories, data mining-based algorithms, traditional machine learning-based algorithms, deep learning-based algorithms, and further sorted out into seven subcategories. The characteristics of each category are thoroughly analyzed. Thirdly, commonly used datasets for the bundle recommendation task are summarized. Finally, the future development trends of bundle recommendation are discussed.

    PB-YOLOv7 Pedestrian Detection Method for Dense Scenes
    GUO Jinhao, WANG Fengping, WANG Haoqi
    2025, 0(09):  14-19.  doi:10.3969/j.issn.1006-2475.2025.09.002
    Asbtract ( 110 )   PDF (2524KB) ( 87 )  
    References | Related Articles | Metrics

    Abstract: Aiming at the problems of low detection speed and inaccurate localization of the dense crowd detection process in complex backgrounds, a dense scene pedestrian detection method PB-YOLOv7 is proposed. Firstly, the PP-LCNet-based network is used instead of the original backbone feature network to reduce the complexity of the model computing process by utilizing the depth-separable convolution. Secondly, the feature fusion idea of the bidirectional feature pyramid network BiFPN is used to enhance the feature fusion network’s utilization of the deeper, shallower, and the original feature information, and to reduce the loss of the important feature information in the process of convolution. Finally, the CBAM attention module is introduced to the junction location to enhance the feature extraction capability of the algorithm in order to make the network more concerned about the effective information. The experimental results show that the improved algorithm improves the mAP by 0.7 percentage points and the FPS value by 1.6 f/s compared with the original algorithm under the publicly available dense pedestrian dataset WiderPerson, realizing the balance between detection accuracy and detection speed.

    Target Pose Estimation Methods for Distribution Network Bypass Operation Robots
    YAO Jie1, YIN Honghai1, WANG Dahai1, LI Runzi2, ZHANG Qianwen2, GUO Yu2
    2025, 0(09):  20-26.  doi:10.3969/j.issn.1006-2475.2025.09.003
    Asbtract ( 74 )   PDF (3498KB) ( 77 )  
    References | Related Articles | Metrics

    Abstract: To replace humans in bypass operation tasks, robots need the ability to autonomously estimate the pose of target objects in complex work environments. Addressing the problem of real-time pose estimation of targets by bypass operation robots under complex backgrounds and varying lighting conditions, this proposes a 6D pose estimation algorithm (RTFT6D) based on improved YOLO-6D integrated with Transformer model. The YOLOv8 backbone network is modified to enhance inference speed, and a feature enhancement network incorporating the Transformer model is designed to improve the robustness of pose estimation. The experimental results show that the proposed algorithm surpasses most RGB image-based pose estimation algorithms in accuracy on the LINEMOD dataset, and it achieves excellent pose estimation performance for bypass operation targets under different lighting conditions.

    Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement
    ZHANG Jingying1, GENG Lin2, LIU Ningzhong2
    2025, 0(09):  27-34.  doi:10.3969/j.issn.1006-2475.2025.09.004
    Asbtract ( 74 )   PDF (2223KB) ( 65 )  
    References | Related Articles | Metrics
    Abstract: Training a fine-grained image classification (FGIC) model with limited data is a great challenge, where subtle differences between categories may not be easily discernible. A common strategy is to utilize pre-trained network models to generate effective feature representations. However, when fine-tuning the pre-trained model using limited fine-grained data, the model often tends to extract less relevant features, which triggers the overfitting problem. To address the above issues, this paper designs an new FGIC method named SDA-Net under low-data conditions, which optimizes the feature learning process by fusing the spatial self-attention mechanism and the self-distillation technique, which can effectively mitigate the overfitting problem caused by data scarcity and improve the performance of deep neural networks in low-data environments. Specifically, SDA-Net improves the intra-class representation by introducing spatial self-attention to encode contextual information into local features. Meanwhile, a distillation branch is introduced and the distillation loss is used in the augmented input samples, which realizes the deep enhancement and transfer of knowledge within the network. A comprehensive evaluation on three fine-grained benchmark data shows that SDA-Net exhibits significant performance gains compared to both traditional fine-tuning methods and the current SOTA low-data FGIC strategy. In 3 scenarios with 10% low-data volume, relative accuracies are improved by 30%, 47%, and 29%, respectively, compared to standard ResNet-50, and by 15%, 28%, and 17%, respectively, compared to SOTA.

    Lightweight Garbage Classification and Detection of Improved YOLOv8
    LUO Deyan, XU Yang, ZUO Fengyun, ZHANG Yongdan
    2025, 0(09):  35-42.  doi:10.3969/j.issn.1006-2475.2025.09.005
    Asbtract ( 87 )   PDF (1593KB) ( 85 )  
    References | Related Articles | Metrics

    Abstract: The current garbage classification and detection algorithms based on deep learning often have a large number of model parameters, leading to increased storage and computing costs. This results in significant computational load when running on resource-constrained mobile devices. To solve the above problems, a lightweight garbage detection algorithm based on improved YOLOv8n is proposed. The improved algorithm uses the GhostNet convolution module to realize the lightweight network in the YOLOv8n feature extraction network module. The RepConv structure reparameterization is used to improve the backbone network, which enhances the backbone network’s feature extraction ability and reduces its complexity during inference stages. Additionally, the C2f module of the neck network is improved by using convolution kernels of different sizes to obtain multi-scale feature information, thereby enhancing the detection accuracy of the model. Finally, transfer learning is used to improve generalization capabilities while accelerating model training for better overall detection accuracy. The experimental results show that the improved algorithm reduces both parameter count and computation by 26.8% and 24.7%, respectively, compared with the original model while achieving average detection accuracies of mAP50 and mAP50:95 at 98.1% and 93.8%. Overall, the proposed method not only reduces model complexity but also has better detection accuracy and can better adapt to the requirements of mobile devices.

    Defect Detection of Photovoltaic Panel Based on Improved YOLOv8-EDD
    JIA Tao1, WU Yuechao2, LYU Yang2, FU Wenlong3
    2025, 0(09):  43-49.  doi:10.3969/j.issn.1006-2475.2025.09.006
    Asbtract ( 93 )   PDF (2104KB) ( 101 )  
    References | Related Articles | Metrics

    Abstract: To solve the problems of low accuracy and slow detection speed of existing defect detection methods for photovoltaic panels, an novel defect detection model for photovoltaic panels is proposed based on improved YOLOv8-EDD. Firstly, multi-scale attention mechanism EMA is introduced to enable YOLOv8 model to pay more attention to the defect area of photovoltaic panels. Secondly, deformable convolutional DCNv2 is embedded into the original C2f module to enhance the model’s ability to extract irregular defect shape. At the same time, in order to alleviate the problem of reduced model detection speed due to the large number of DCNv2 parameters, the DySample lightweight upsampling operator is used to replace the original upsampling operator of YOLOv8 to reduce the number of model parameters and calculation complexity, thus to enhance the detecting speed. Finally, WIoUv3 loss function is integrated to reduce the influence of low-quality samples on the accuracy and improve the generalization ability of the model. In the experiment, compared with the original model, the accuracy of the improved YOLOv8-EDD model increases by 15.3 percentage points, the recall rate increases by 11.3 percentage points, mean of the average accuracy increases by 10.5 percentage points, and the detection speed has increased by 6.5 FPS. The results show that the proposed model not only improves the detection accuracy but also has faster detection speed, and is more suitable for the defect detection of photovoltaic panels.

    Quantum Identity Authentication Protocol Based on Quantum Secure Direct Communication
    WANG Song1, LI Yuanzhi2, CHEN Wei1, BIAN Yuxiang3
    2025, 0(09):  50-54.  doi:10.3969/j.issn.1006-2475.2025.09.007
    Asbtract ( 75 )   PDF (928KB) ( 58 )  
    References | Related Articles | Metrics

    Abstract: To ensure the security of identity authentication, a novel authentication protocol based on quantum-secure direct communication is proposed. This protocol can verify user identities without exposing the verification key information. It utilizes single-photon quantum states as carriers, eliminating the need for quantum state memories and any entangled photon sources, thereby enhancing the feasibility of quantum identity authentication under current technological conditions. The protocol comprises two stages: security detection and identity authentication. During the security detection stage, the communicating parties first use decoy sequences to check the security of the quantum channel, effectively resisting unauthorized attacks. In the identity authentication stage, the parties confirm each other’s identity through identity authentication sequences. Security analysis shows that this protocol can effectively withstand common attacks, making it a secure quantum identity authentication method. Comparative analysis indicates that this protocol has advantages in practicality and effectiveness over other existing protocols.

    User Device Security Authentication Method for Cloud Computing Data Transmission
    JI Xiulan
    2025, 0(09):  55-60.  doi:10.3969/j.issn.1006-2475.2025.09.008
    Asbtract ( 66 )   PDF (1077KB) ( 65 )  
    References | Related Articles | Metrics

    Abstract: An user device security authentication method for cloud computing data transmission is proposed. With the support of cloud services, a unified user identity management model is established using security assertion markup language. Based on different assertion transmission methods, service provider pull and identity provider push methods are used to manage user identity authentication information, achieving flexible and efficient management of user identities. Based on this, a user device authentication scheme is designed. By using encryption and verification algorithms, it is verified whether the device is a service terminal provided by a legitimate cloud service provider, protecting the security of privacy information and simultaneously achieving access control of cloud computing data. The experimental results show that the proposed user device security authentication method has high trust in user devices and requires relatively less computation in each stage of authentication. It has the advantages of low computational cost and high security in the use of cloud computing data transmission.

    Facial Sketch Image Conversion Based on CycleGAN and Attention Mechanism
    LIN Ruizi, YAO Da, DAI Xin, SHEN Guoyu, WANG Jiahui, WAN Weiguo
    2025, 0(09):  61-66.  doi:10.3969/j.issn.1006-2475.2025.09.009
    Asbtract ( 54 )   PDF (3180KB) ( 77 )  
    References | Related Articles | Metrics

    Abstract: In recent years, because of its demand in law enforcement, criminal and entertainment fields, face sketch-photo synthesis has become a research hotspot. As deep learning model without paired image supervision, CycleGAN is good at cross-domain image conversion, providing a powerful tool for efficient conversion between sketches and photos. In view of the difficulty of collecting a large number of pairs of face images and sketch images, and the problems of fuzzy and low definition image details in face sketch image generation, an improved CycleGAN model is proposed. In this paper, the self-attention mechanism is introduced into the residual block of the ResNet architecture generator in the CycleGAN model, so that the CycleGAN generator model can learn the features of different channels and the importance of different regions in the face image more effectively, and automatically focus on the important regions of facial features, such as eyes, nose, mouth, etc., during image processing. At the same time, the edge clarity and integrity of the sketch are increased, so as to improve the quality of the generated face sketch image. The proposed model is implemented on the datasets CUHK and FS2K. The structural similarity, peak signal-to-noise ratio and multi-scale structural similarity are 0.7741, 11.7451 and 0.8504 respectively on CUHK and 0.7049, 13.2745 and 0.7970 respectively on FS2K. These results outperformed the comparison models of CycleGAN, Pix2Pix, MUNIT, and DCLGAN. According to the comparison experiment and subjective vision, the proposed model can effectively complete the process of face sketching and generate higher quality face sketching images.

    Insulator Defect Detection Based on Complex Environment
    JI Xingyu1, HUANG Chenrong2, YAO Juncai2, WANG Kai1, GU Mingjie1
    2025, 0(09):  67-72.  doi:10.3969/j.issn.1006-2475.2025.09.010
    Asbtract ( 64 )   PDF (2019KB) ( 62 )  
    References | Related Articles | Metrics

    Abstract: Nowadays, drones have been widely used in power inspection. However, due to the complex background of insulator defect images, small defect sizes, and the occurrence of multiple damage situations such as flashover, self explosion, and breakage, the detection speed and accuracy are greatly limited. To address these issues, a complex environment insulator defect detection method based on improved YOLOv5 is proposed. Firstly, an improved feature extraction network C2FNet is adopted to obtain richer feature information while ensuring lightweight. Secondly, the Res2Net module with multi-scale information is adopted to improve gradient propagation and training performance. Finally, a dynamic object detection head 3-DyHead with adaptive fusion is designed to dynamically adjust the network structure and parameters. The experimental results show that the average accuracy of this method has reached 94.2%, which is 4.1 percentage points higher than the original model. The precision P and recall R have increased by 3.2 percentage points and 4.0 percentage points, respectively. The average accuracy of insulator flashover, hammer, and defect has increased by 11.0 percentage points, 2.0 percentage points and 6.5 percentage points.
    Arbitrary Style Transfer Method with Multi-scale Semantic Adaptation
    ZHU Lulu, GU Lin
    2025, 0(09):  73-78.  doi:10.3969/j.issn.1006-2475.2025.09.011
    Asbtract ( 59 )   PDF (2765KB) ( 65 )  
    References | Related Articles | Metrics

    Abstract: Addressing the challenge of balancing style and content information in existing arbitrary style transfer models, this paper introduces an improved model for arbitrary style transfer. The model incorporates a multi-scale semantic adjustment module before the style transformation process. This module deeply adjusts style and content feature mappings to enhance key feature expressions, improving the coherence of image content structure and style features after style transfer. Additionally, a semantic adjustment loss function is proposed to precisely preserve the original image’s content structure and delicately transfer the target style’s features. The experimental results show that this method not only maintains the content information but also enhances the style transfer effects.

    Multimodal Tunnel Fire Detection Based on Temporal Features of Video
    YANG Tianshun1, SONG Huansheng1, LIANG Haoxiang2, LIU Haonan1, MA Xinzhou1, SUN Shijie1, ZHANG Shaoyang1
    2025, 0(09):  79-89.  doi:10.3969/j.issn.1006-2475.2025.09.012
    Asbtract ( 59 )   PDF (4409KB) ( 60 )  
    References | Related Articles | Metrics
    Abstract: The tunnel environment is closed and narrow, and once a fire occurs, the fire spread and harmful gas generation will seriously threaten the safety of life and property. Existing tunnel fire detection methods based on single-frame images often struggle to accurately distinguish between flames and flame-like light sources. To address this issue, a multi-frame sequence feature extraction method based on the YOLOV network is proposed, which utilizes the dynamic feature variations of targets in video sequences. The VSDFD module is designed to differentiate between flames and light sources by analyzing the feature similarity of adjacent interval frames. In addition, combined with the ambient temperature information collected by the temperature sensor, an MFD multi-mode fusion method is proposed by using DST evidence theory and its derivation method TBM, which is used to calculate the fire probability and realize the tunnel fire detection. The experimental results show that the VSDFD module significantly improves the ability to distinguish between flames and light sources. The MFD method effectively controls the fusion probability below 0.5 in cases of false alarms, while maintaining the probability above 0.5 in fire scenarios. Compared with other methods, the proposed approach achieves an average improvement of 2.8 percentage points in detection accuracy, a 2.7 percentage points reduction in the missed detection rate, and a 5.2 percentage points decrease in the false detection rate. Experiments in various real tunnel fire scenarios verified the accuracy of the proposed method in fire detection.

    Single View 3D Hair Modeling Based on Mask Constraints
    LI Su, LI Dequan, QING Yu
    2025, 0(09):  90-96.  doi:10.3969/j.issn.1006-2475.2025.09.013
    Asbtract ( 72 )   PDF (2373KB) ( 63 )  
    References | Related Articles | Metrics

    Abstract: A 3D hair reconstruction method based on Mask constraints is proposed. Firstly, the hair front view Mask is inferred using SAM, and a reasonable back view Mask is predicted by GAN network, the depth pseudo-labeling is introduced to fine-tune the Hourglass network to predict the hair depth map and enhance the relative positional relationship of the hair, and in addition, the U-Net is used to infer the hair orientation map, which is inputted into the Stacked Hourglass network, and respectively the front-view Mask and back-view Mask constraints to generate 3D hair spatial points to ensure that the generated 3D point cloud of hair is projected reasonably within the Mask range. Meanwhile, the robustness of the model at the hair edge is enhanced by controlling the negative sample sampling rate. Finally, a parallel algorithm is used to accelerate the hair synthesis and significantly improve the reconstruction efficiency. The experimental results show that the method is effective in dealing with complex hairstyles and improving the realism of hair.

    CNN-BiLSTM and LightGBM Stock Prediction Based on Dual Attention Mechanism
    LIU Cheng, FENG Guang
    2025, 0(09):  97-103.  doi:10.3969/j.issn.1006-2475.2025.09.014
    Asbtract ( 82 )   PDF (2738KB) ( 103 )  
    References | Related Articles | Metrics

    Abstract: The stock market is crucial for economic development, according to its intense volatility, investors can effectively reduce investment risks and achieve higher returns if they can predict changes in stock prices more accurately. Due to the limitations of traditional time series models such as ARIMA in dealing with nonlinear problems, its forecasting effect is often unsatisfactory in the stock market. This paper proposes an innovative hybrid algorithm, which combines CNN-BiLSTM and LightGBM technology with dual attention mechanism, and makes use of the powerful nonlinear learning ability of neural network to achieve efficient and accurate prediction of stock market volatility. In practice, the stock data is preprocessed by ARIMA model, and then convolutional neural network combined with attention mechanism is used to construct feature attention module and extracts the key features from the stock data in an adaptive way. Then, by integrating bidirectional long short-term memory network and the attention mechanism, a temporal attention module is constructed to make a preliminary prediction of the future trend of stock prices. Finally, in order to further optimize the prediction accuracy, the model is introduced LightGBM to construct an error correction module to finely adjust the preliminary prediction results. The experiments show that the proposed model can not only improve the prediction accuracy, but also provide strong decision support for investors and institutions, so that they can more keenly explore market opportunities and achieve the goal of maximizing profits.

    Joint Knowledge Extraction for Cloud-edge Collaborative Multi-source Transmission Data
    SHANG Boxiang1, GUO Xiaoyan2, ZHENG Jian1, SUN Xianfan2
    2025, 0(09):  104-108.  doi:10.3969/j.issn.1006-2475.2025.09.015
    Asbtract ( 60 )   PDF (2565KB) ( 82 )  
    References | Related Articles | Metrics
     
    Abstract: With the increase in transmission volume of massive multi-source unstructured data in power IoT, it leads to the cloud-edge collaborative task scheduling and resource allocation with great latency. In this regard, this paper proposes a multi-unit knowledge joint extraction method by virtue of the advantages of knowledge graph in data storage and knowledge extraction, which consists of two independent sub-modules, one for extracting head entities and the other for extracting tail entities and their corresponding relationships. Firstly, candidate entities and relations are generated by enumerating the tagged sequences in the transmission data. Then, the two sub-modules are used to predict the entities and relations. Finally, the predicted entities and relations are jointly decoded to obtain the relation triples, and the knowledge contained in the transmission data is used as a basis for the visual display of the transmission scheduling mapping. The experimental results show that the F1 value of the model reaches 79%, the accuracy is 6% improved compared to other traditional methods, the knowledge extraction effect is better, and it can efficiently make highly parse to the unstructured transmission data to realize the accurate decision-making of the cloud-edge collaborative task scheduling and resource allocation.

    FedLDP: Efficient Federated Learning with Localized Differential Privacy
    CHENG Mengyuan, LI Yanhui, LYU Tianci, ZHAO Yuxin, HUANG Chen
    2025, 0(09):  109-118.  doi:10.3969/j.issn.1006-2475.2025.09.016
    Asbtract ( 53 )   PDF (4536KB) ( 56 )  
    References | Related Articles | Metrics

    Abstract: Federated learning, as a distributed machine learning framework, allows users to collaboratively train models by sharing model parameters without disclosing raw data. However, model parameters may still contain a substantial amount of sensitive information, and direct sharing poses considerable threats to individuals’ privacy. The state-of-the-art solution for this problem is local differential privacy, which can resist adversaries with arbitrary background knowledge and protect private information thoroughly. Due to the high dimensionality and multi-round characteristics of federated learning parameters, it is particularly challenging to apply local differential privacy into federated learning. In this paper, we propose FedLDP, an efficient algorithm to privately federate learning. To avoid the individuals’ privacy leakage, in this algorithm, an exponential mechanism-based dimension selection is used to select important parameter dimensions for global aggregation, and Laplace mechanism is utilized to perturb the selected parameter dimensions. In addition, to improve the learning efficiency and overall performance of the model, an incremental privacy budget allocation strategy is designed to adjust the privacy budget allocation during the iteration process, optimizing the model training process. We theoretically prove that FedLDP satisfies [ε]-LDP, and extensive experiments using MINIST and Fashion-MINIST datasets demonstrate that FedLDP improves the final model’s accuracy by 5.07 percentage points and 3.01 percentage points under the same level of privacy constraints compared with state-of-the-art schemes.

    Prediction of Breast Cancer Hospitalization Costs Based on Stacking Ensemble and Explainable Models
    ZHU Haiyu1, SUN Xiaoyan1, YUAN Zhenming1, YANG Lijing2
    2025, 0(09):  119-126.  doi:10.3969/j.issn.1006-2475.2025.09.017
    Asbtract ( 81 )   PDF (2499KB) ( 61 )  
    References | Related Articles | Metrics
     
    Abstract: Hospitalization costs are one of the factors that affect treatment choices and prognosis for breast cancer patients. Accurate prediction of hospitalization costs and personalized analysis of cost-influencing factors are crucial for efficient resource allocation and optimization of medical services. Addressing the issues of weak generalization ability and poor interpretability in single-model hospitalization cost prediction tasks, this paper proposes an interpretable stacking method. This method fully integrates the feature extraction capabilities of multiple models to achieve accurate prediction of hospitalization costs for breast cancer patients. The method employs a two-layer model fusion structure, where the first layer selects four base models and utilizes Bayesian optimization and five-fold cross validation techniques to optimize parameters, enhancing the predictive performance of each model. The final hospitalization cost prediction is then generated by the second-layer model. Additionally, this paper uses SHAP and LIME methods to analyze the results of breast cancer hospitalization cost predictions from the global and individual perspectives. The experimental results on a five-year dataset of breast cancer in patients from a certain hospital demonstrate that stacking method achieves an R2 metric of 0.877 in the cost prediction task, outperforming other related studies. The interpretable analysis indicates that length of stay and treatment method are the primary factors influencing overall costs, but there are variations in the influencing factors among different patients. This provides valuable insights for a deeper understanding of key factors affecting hospitalization costs.