基于卷积神经网络的“斗地主”策略

计算机与现代化 ›› 2020, Vol. 0 ›› Issue (11): 28-32.

基于卷积神经网络的“斗地主”策略

（贵州大学计算机科学与技术学院，贵州贵阳550025）

出版日期:2020-12-03 发布日期:2020-12-03
作者简介:徐方婧（1994—），女，贵州遵义人，硕士研究生，研究方向：人工智能与模式识别，深度学习，E-mail: 496949302@qq.com；魏鲲鹏（1985—），男，河南新乡人，硕士，研究方向：人工智能与模式识别，E-mail： weikunpeng@cmdi.chinamobile.com；王以松（1975—），男，教授，博士生导师，博士，研究方向：知识表示与推理，人工智能，机器学习，E-mail: yswang@gzu.edu.cn；彭啟文（1995—），男，贵州织金人，硕士研究生，研究方向：人工智能与模式识别，E-mail: 937356655@qq.com；于小民（1988—），男，河北唐山人，博士研究生，研究方向：人工智能与模式识别，深度强化学习，E-mail: 1031450835@qq.com。
基金资助:
国家自然科学基金资助项目（61976065）

Strategy of “Fighting the Landlord” Based on Deep Convolutional Neural Network

(College of Computer Science and Technology, Guizhou University, Guiyang 550025, China)

Online:2020-12-03 Published:2020-12-03

摘要/Abstract

摘要： 深度神经网络已经在国外的各种博弈中取得了惊人的成就，近几年，卷积神经网络因为其独特的单元结构获得了极大的关注，被频频运用到博弈AI智能体中，例如AlphaGo、冷扑大师等。而“斗地主”是典型的基于非完备信息的合作对抗博弈。本文设计一种7层卷积神经网络DDZ-CNN，用基于蒙特卡洛树“斗地主”自我博弈的近30万条数据来训练该网络以学习“斗地主”策略，训练过程中采用基于权重的方式对训练数据进行下采样以克服其分布不均的问题，而且网络能较快收敛。最后将训练好的模型与智能MCTS模型和真人进行了实战对抗，取得了不错的胜率，验证了本文算法的有效性与可行性。

关键词: 非完备信息博弈, 卷积神经网络, “斗地主”策略, 非均匀分布

Abstract: Deep neural network has made amazing achievements in various foreign games. In recent years, convolutional neural network has gained great attention because of its unique unit structure, and has been frequently used in game AI agents, such as AlphaGo and Cold Flutter Masters. “Fighting the Landlord” is a typical cooperative game based on incomplete information. In this paper, a 7-layer convolutional neural network DDZ-CNN is designed to train the network with nearly 300,000 pieces of data based on the self-gaming of “Fighting the Landlord” based on Monte Carlo tree to learn the “Fighting the Landlord” strategy. In the training process, the training data are down sampled by a weight-based method to overcome the problem of uneven distribution, and the network can converge quickly. Finally, the trained model is combated with intelligent MCTS model and real person, and a good winning rate is obtained, which verifies the effectiveness and feasibility of the algorithm in this paper.

Key words: imperfect information game, convolutional neural network, “Fighting the Landlord” strategy, nonuniform distribution

徐方婧, 魏鲲鹏, 王以松, 彭啟文, 于小民. 基于卷积神经网络的“斗地主”策略[J]. 计算机与现代化, 2020, 0(11): 28-32.

XU Fang-jing, WEI Kun-peng, WANG Yi-song, PENG Qi-wen, YU Xiao-min. Strategy of “Fighting the Landlord” Based on Deep Convolutional Neural Network[J]. Computer and Modernization, 2020, 0(11): 28-32.

参考文献

［1］李洪业. 幻影围棋非完美信息机器博弈问题关键算法研究［D］. 沈阳：东北大学, 2014.
［2］滕雯娟. 基于虚拟遗憾最小化算法的德州扑克机器博弈研究［D］. 哈尔滨：哈尔滨工业大学, 2015.
［3］ SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search［J］. Nature, 2016,529(7587):484-489.
［4］ SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge［J］. Nature, 2017,550(7676):354-359.
［5］ BROWN N, SANDHOLM T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals［J］. Science, 2018,359(6374):418-424.
［6］张加佳. 非完备信息机器博弈中风险及对手模型的研究［D］. 哈尔滨：哈尔滨工业大学, 2015.
［7］李昌. 基于Q学习算法的非完备信息机器博弈的研究［D］. 哈尔滨：哈尔滨工业大学, 2015.
［8］ SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play ［J］. Science, 2018,362(6419):1140-1144.
［9］林华. 基于Self-Play的五子棋智能博弈机器人［D］. 杭州：浙江大学, 2019.
［10］ROIZEN I, PEARL J. A minimax algorithm better than alpha-beta? Yes and No［J］. Artificial Intelligence, 1983,21(1/2):199-220.
［11］FULLER S H, GASCHNIG J G, GILLOGLY J J. Analysis of the Alpha-beta Pruning Algorithm［M］. Department of Computer Science, Carnegie-Mellon University, 1973.
［12］GELLY S, SILVER D. Combining online and offline knowledge in UTC［C］ // Proceedings of the 24th ACM International Conference on Machine Learning. 2007:273-280.
［13］CHASLOT G, BAKKES E, SZITA I, et al. Monte-Carlo tree search: A new framework for game AI［C］// Proceedings of the 4th Artificial Intelligence and Interactive Digital Entertainment Conference. 2008:216-217.
［14］张会娟,张强. 不确定性下非合作博弈强Nash均衡的存在性［J］. 控制与决策, 2010,25(8):1251-1254.
［15］BROWN N, SANDHOLM T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals［J］. Science, 2018,359(6374):418-424.
［16］季铭. 多人博弈模型的合作现象研究［D］. 苏州：苏州大学, 2010.
［17］KHAN A, SOHAIL A, ZAHOORA U, et al. A survey of the recent architectures of deep convolutional neural networks［J］. Artificial Intelligence Review, 2019, DOI:10.1007/s10462-020-09825-6.
［18］HE S, WANG Y, XIE F, et al. Game player strategy pattern recognition and how UTC algorithms apply pre-knowledge of player’s strategy to improve opponent AI［C］// IEEE International Conference on Computational Intelligence for Modeling Control & Automation. 2008:1177-1181.
［19］LECUN Y, BENGIO Y. Convolutional networks for images, speech, and time series［M］// The Handbook of Brain Theory and Neural Networks. MIT Press, 1995:276-278.
［20］LECUN Y, BENGIO Y, HINTON G. Deep learning［J］. Nature, 2015,521(7553):436.
［21］KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks［J］. Communications of the ACM, 2017,60(6):84-90.
［22］KIM P, SONG J H, SONG T K. A new frequency domain passive acoustic mapping method using passive Hilbert beamforming to reduce the computational complexity of fast Fourier transform ［J］. Ultrasonics, 2020,102:106030.
［23］何跃,赵书朋,何黎. 基于情感知识和机器学习算法的组合微文情感倾向分类研究［J］. 情报杂志, 2018,37(5):189-194.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[3]	刘宝宝, 杨菁菁, 陶露, 王贺应. 基于注意力的DSMSC的遥感图像场景分类[J]. 计算机与现代化, 2024, 0(12): 72-77.
[4]	陈雪松1, 李衡1, 王浩畅2. 结合注意力机制和Mengzi模型的短文本分类[J]. 计算机与现代化, 2024, 0(09): 101-106.
[5]	高帅鹏, 王怡凡. 基于图像的群体情绪识别综述[J]. 计算机与现代化, 2024, 0(08): 98-107.
[6]	周宪溪, 牟莉. 基于改进TF-IDF和AGLCNN的新闻长文本分类模型[J]. 计算机与现代化, 2024, 0(08): 120-126.
[7]	杨江1, 孙晓梅1, 许韬2. 基于业务内容构建股票关联关系的股价预测[J]. 计算机与现代化, 2024, 0(07): 21-25.
[8]	刘存莉1, 雷占占2, 郑澳2. 基于循环卷积神经网络的排水管网缺陷检测方法[J]. 计算机与现代化, 2024, 0(07): 26-35.
[9]	李珊, 王林娜, 高丁佳, 宣海波. 基于图神经网络的多层银企网络融合研究[J]. 计算机与现代化, 2024, 0(05): 27-32.
[10]	钟海龙1, 2, 何月顺1, 何璘琳1, 陈杰1, 田鸣3, 郑瑞银4. 基于代价敏感卷积神经网络的加密流量分类#br# #br#[J]. 计算机与现代化, 2024, 0(05): 55-60.
[11]	高埂1, 肖风丽2, 杨飞1. 基于改进MobileNetV3-Small的色素减退性皮肤病诊断[J]. 计算机与现代化, 2024, 0(05): 120-126.
[12]	游嘉靖1, 2, 何月顺1, 何璘琳1, 钟海龙1, 2. 基于AHP-CNN的加密流量分类方法[J]. 计算机与现代化, 2024, 0(04): 83-87.
[13]	许跃雯1, 李明1, 李莉2. 基于对比学习MocoV2的COVID-19图像分类#br#[J]. 计算机与现代化, 2024, 0(02): 81-87.
[14]	周成诚, 曾庆军, 杨康, 胡家铭, 韩春伟. 基于高效通道注意力模块的运动想象脑电识别[J]. 计算机与现代化, 2023, 0(12): 19-23.
[15]	刘付琪, 张达, 宋建华, 王海东. 基于CNN-BiLSTM的液压系统故障诊断[J]. 计算机与现代化, 2023, 0(09): 10-19.