Comparative Evaluation of Machine Learning Techniques for Social Media Sentiment Analysis

In the modern era of digital communication, sentiment analysis has emerged as a key research domain within Natural Language Processing (NLP). This study focuses on assessing and comparing the effectiveness of three machine learning techniques Support Vector Machine (SVM), Random Forest (RF), and Naïve Bayes (NB) in categorizing sentiments conveyed on social media platforms, using Twitter as the case example. The dataset, obtained from Kaggle through the Kaggle JSON utility, consisted of tweets grouped into three sentiment classes: positive, negative, and neutral. Preprocessing procedures involved text cleaning, tokenization, removal of stop words, and the application of Term Frequency–Inverse Document Frequency (TF-IDF) for feature extraction. From the generated features, the most relevant 6,000 were retained for model training. The three algorithms were implemented in Python within a supervised learning framework, and their effectiveness was assessed using accuracy, precision, recall, and F1-score. The Support Vector Machine (SVM) model recorded an accuracy of 87%, with corresponding precision, recall, and F1-score values of 84%, 86%, and 85%. The Naïve Bayes (NB) classifier achieved 83% accuracy, alongside precision, recall, and F1-scores of 83%, 82%, and 82%, indicating a relatively balanced performance. The Random Forest (RF) model, however, delivered the highest performance, attaining 92% accuracy, 90% precision, 91% recall, and a 91% F1-score. These findings emphasize the strength of Random Forest, especially in addressing class imbalance, positioning it as the most effective technique among the models evaluated. Overall, the results demonstrate that Random Forest offers greater reliability and efficiency for sentiment classification on social media compared to SVM and NB across all evaluation metrics.

Leave a Reply

Your email address will not be published. Required fields are marked *