Topic segmentation is the basic of efficiently retrieving and managing news story programs. Traditional topic segmentation technique based on Hidden Markov Model (HMM) only uses the transition of each topic to segment
news by searching for the topic boundary, this does not take into account the latent semantic relationship between each word in topics. This paper proposes an improved algorithm based on HMM, the algorithm uses the LSA as dimensionality
reduction and feature extraction method on the word frequency vectors, considering the context relationship among words. During the training step, the class label is extracted from the document through the K-means clustering process. The
LDA features and the labels are considered as the observation of the hidden states in the HMM, respectively, which also take into account the impact between different topics. Thus, the topic segmentation is implemented. From the results
of extensive experiments, the proposed model presents good capability to conduct the task of segmenting the news document.