Computer and Modernization

    Next Articles

ContextSemantic-basedNaiveBayesianAlgorithmforTextClassification

  

  1. (InformationDepartment,BeijingUniversityofTechnology,Beijing100124,China)
  • Received:2017-12-04 Online:2018-07-05 Published:2018-07-05

Abstract: TheNaiveBayesclassifierisbasedontheassumptionthatthesamples’attributesareindependentoneanother.Asasimplebag-of-wordsmodel,itignorestheinfluenceofsynonymsincontexttoclassification.Thispaperproposestheconceptofsimilarwordandusesclustersofsimilarwordsinsteadofkeyworddictionaryintraining.First,word2vecistrainedtogetwordembedding.Second,thekeyworddictionaryisrepresentedbywordembeddingwhichisthenclusteredhierarchically,theclustersofsimiliarwordsarebuiltandexpanded.Theexperimentalresultsshowthattheabovemethodcanimprovetheaccuracyoftextclassification,andavoidtheinstabilityofclassificationeffectduetothedifferencesintrainingcorpus.

Key words: textcategorization, NaiveBayes, word2vec

CLC Number: