Computer and Modernization

Previous Articles     Next Articles

Stylometry-based Analysis of Literature Texts

  

  1. (School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330013, China)
  • Received:2018-11-16 Online:2019-05-14 Published:2019-05-14

Abstract: This study compares literary works from the perspective of stylometry. At present, the research on literature is mainly qualitative and subjective analysis, and there are few quantitative studies and empirical analysis. A total number of 225 literary works are collected in the study, including Internet literary works and classical literary works, which are divided into three subsets, corresponding to the “excellent”, “good” and “poor”. For each work, a lot of features regarding article length, part of speech, rhythm, vocabulary, etc. are extracted. Based on these features, classifiers such as decision trees, neural networks and Bayesian are constructed. The models are utilized to find the key differences among the three datasets. The study found that the three datasets have obvious differences in stylometry statistics, and for different pair of datasets, the features have different discriminative power.

Key words: stylometry, text analysis, literature text

CLC Number: