Latent Dirichlet Allocation for Topic Discovery and Segmentation in Big Data

Clementking, A. and Rani, S. and Roseline, R. and K E, Purushothaman and Kavitha, G. and Murugan, S. (2024) Latent Dirichlet Allocation for Topic Discovery and Segmentation in Big Data. In: UNSPECIFIED.

Full text not available from this repository.

Abstract

Using Latent Dirichlet Allocation (LDA) for topic identification and segmentation in big data helps identify significant patterns and topics from large text corpora. LDA will be implemented and optimized to rapidly process and analyze large datasets, revealing hidden subjects and enhancing content structure. Creating a strong framework for accurate and scalable subject modeling would improve analysis and decision-making in social media analytics, consumer feedback, and academic research. The LDA technique must be refined to accommodate large data's great dimensionality and complexity while being computationally efficient. An innovative topic identification tool that processes large-scale text data quickly and reliably will reveal theme patterns and improve big data management and use. The Bigdata Corpus results demonstrate the results for Topic Distribution Across Documents in a sample of 5 topics and 5 documents vary from 0.1 to 0.25. The same dataset also has Top Words per Topic. 10 example words for 10 subjects the identical dataset with another instance has values from 0.03-0.15. Document clustering based on topic proportions in a sample of 5 documents clustering 5 topics yields 0.1-0.75. © 2025 Elsevier B.V., All rights reserved.

Item Type: Conference or Workshop Item (Paper)
Subjects: Computer Science > Computer Science
Divisions: Arts and Science > Vinayaka Mission's Kirupananda Variyar Arts & Science College, Salem > Computer Science
Depositing User: Unnamed user with email techsupport@mosys.org
Last Modified: 27 Nov 2025 06:38
URI: https://vmuir.mosys.org/id/eprint/1703

Actions (login required)

View Item
View Item