[1] MITCHELL T M. Machine Learning[M]. New York: McGraw-Hill, 1997.
[2] GOODFELLOW I, BENGIO Y, COURVILLE A. Deep Learning[M]. Cambridge: MIT Press, 2016.
[3] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.
[4] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019:4171-4186.
[5] COULOMBE C. Text data augmentation made simple by leveraging NLP cloud APIs[J]. arXiv preprint arXiv:1812.04718, 2018.
[6] REGINA M, MEYER M, GOUTAL S. Text data augmentation: Towards better detection of spear-phishing emails[J]. arXiv preprint arXiv:2007.02033, 2020.
[7] NISHIKAWA S, RI R, TSURUOKA Y. Data augmentation for learning bilingual word embeddings with unsupervised machine translation[J]. arXiv preprint arXiv:2006.00262, 2020.
[8] SHORTEN C, KHOSHGOFTAAR T. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019,6. DOI: 10.1186/s40537-019-0197-0.
[9] KO T, PEDDINTI V, POVEY D, et al. Audio augmentation for speech recognition[C]// Proceedings of the Annual Conference of the 16th International Speech Communication Association. 2015:3586-3589.
[10]LIU P, WANG X M, XIANG C, et al. A survey of text data augmentation[C]// Proceedings of the 2020 International Conference on Computer Communication and Network Security. 2020:191-195.
[11]LIU Y, ZHANG M. Neural network methods for natural language processing[J]. Computational Linguistics, 2018,44(1):193-195.
[12]ZHANG X, ZHAO J B, LECUN Y. Character-level convolutional networks for text classification[C]// Proceedings of the 29th Conference and Workshop on Neural Information Processing Systems. 2015:649-657.
[13]GARG S, RAMAKRISHNAN G. BAE: BERT-based adversarial examples for text classification[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020:6174-6181.
[14]MUELLER J, THYAGARAJAN A. Siamese recurrent architectures for learning sentence similarity[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016:2786-2792.
[15]PENG B L, ZHU C G, ZENG M, et al. Data augmentation for spoken language understanding via pretrained language models[C]// Proceedings of Interspeech 2021. 2021:1219-1223.
[16]XIE Q Z, DAI Z H, HOVY E, et al. Unsupervised data augmentation for consistency training[C]// Proceedings of the 2020 Conference on Neural Information Processing Systems. 2020.
[17]NIEDERHUT D. Niacin: A Python package for text data enrichment[J]. Journal of Open Source Software, 2020,5(50). DOI: 10.21105/joss.02136.
[18]QUTEINEH H, SAMOTHRAKIS S, SUTCLIFFE R. Textual data augmentation for efficient active learning on tiny datasets[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020:7400-7410.
[19]XIE Z A, WANG S D I, LI J W, et al. Data noising as smoothing in neural network language models[C]// Proceedings of the 2017 International Conference on Learning Representations. 2017.
[20]GUO H Y, MAO Y Y, ZHANG R C. Augmenting data with mixup for sentence classification: An empirical study[J]. arXiv preprint arXiv:1905.08941, 2019.
[21]RAILLE G, DJAMBAZOVSKA S, MUSAT C. Fast cross-domain data augmentation through neural sentence editing[J]. arXiv preprint arXiv:2003.10254, 2020.
[22]赵小兵,鲍薇,董建,等. 基于数据增强的藏文改写检测研究[J]. 中文信息学报, 2019,33(12):83-90.
[23]何家劲. 基于文本数据增强与混合模型的情感分析研究[D]. 合肥:合肥工业大学, 2018.
[24]赵鹏飞. 基于生成对抗网络的中文语言模型数据增强技术[D]. 哈尔滨:哈尔滨工业大学, 2018.
[25]宋希良,韩先培,孙乐. 面向新类型人名识别的数据增强方法[J]. 中文信息学报, 2019,33(6):72-79.
[26]公安部户政管理研究中心. 二〇一九年全国姓名报告[EB/OL]. (2020-01-20)[2021-02-15]. https://www.mps.gov.cn/n2254314/n6409334/c6874817/content.html.
[27]XU L, TONG Y, DONG Q Q, et al. CLUENER2020: Fine-grained named entity recognition dataset and benchmark for Chinese[J]. arXiv preprint arXiv:2001.04351, 2020.
|