Computer and Modernization

Previous Articles     Next Articles

TextClusteringRetrievalBasedonLDAModel

  

  1. (SchoolofComputerandInformationTechnology,NortheastPetroleumUniversity,Daqing163318,China)
  • Received:2017-11-07 Online:2018-07-05 Published:2018-07-05

Abstract: Thetraditionalmethodofjudgingthesimilarityoftwodocumentsdoesnottakeintoaccountthesemanticrelationbehindthetexts,resultinginalargedifferencebetweentheresultsreturnedbytheretrievalsystemandtheusersqueryrequirements.ThispaperpresentsatextclusteringmethodbasedonLDAtopicmodel.Firstly,theapplicationprincipleofLDAtopicmodelisintroducedandthebasicmethodoftextminingisexpounded,andthentheLDAtopicmodelisconstructed.TheGibbssamplingmethodisusedtoderivetheprobabilitydistributionofthecharacteristicwords.Finally,thesetsoftestdataareclusteredwiththeK-means+〖KG-*3〗+methodchosenbytheoptimizationclustercenter.AndthedesignedLDA-GibbsmodeliscomparedwiththetraditionalTF-IDFmodel.Experimentalresultsshowthatthismodelcanimprovetheretrievaleffectofdataandhasgoodpromotionalvalue.

Key words: topicmodel, textclustering, latentDirichletallocation(LDA), clusterevaluation, informationretrieval(IR)

CLC Number: