Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning

Farhana Bina

doi:doi:10.11648/j.ijdsa.20251103.13

Research Article |

| Peer-Reviewed

Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning

Farhana Bina^*

Published in International Journal of Data Science and Analysis (Volume 11, Issue 3)

Received: 30 April 2025 Accepted: 14 May 2025 Published: 19 June 2025

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The advancement in Artificial Intelligence technology revolutionizes new opportunities and challenges, particularly with large language model ChatGPT, in various domains, especially in the educational platform. This research endeavors a comprehensive analysis to explore the concerns and expressions associated with this AI tool on the social media platform X and in academic contexts. Two distinct datasets, comprising X data and survey responses from academics, were utilized to achieve the objectives. This research examines the valuable concerns regarding ChatGPT among X users on social media platform. To implement the Natural Language Processing (NLP) techniques which included Sentiment Analysis and Topic Modeling using Latent Dirichlet Analysis (LDA), the study aimed to identify the significant insights expressed by the social media users. The analysis obtained that, most frequent discussed topic was “ChatGPT”. The majority of discussions among the X users were positive in sentiment (49%), focusing on the utility of ChatGPT. Comparatively, negative discussions (47%) were also expressed by the users (47%) about students’ cheating in exams, and the generation of inaccurate information, which could affect students’ learning skills, and their critical thinking. Furthermore, approximately 27% of the discussions were expressed neutral sentiment regarding the generation of contents by ChatGPT. Various machine learning models were implemented to predict the classification of sentiment labels correctly. The Random Forest model performed well to classify all the sentiment labels correctly compared to others with highest accuracy of 62%. This research also unveiled the academics’ opinion in the context of education. A case study was conducted among the academics, where approximately 59% reported using ChatGPT for academic purposes and academics (24%) use this tool occasionally. In terms of its usefulness, 32% academics consider it is as useful, especially for generating writing contents. Additionally, 29% of them believed that this tool primarily improves students’ language and writing skills but they also expressed the concerns about overreliance potentially impacting their critical thinking and violating academic integrity. The major concerned keywords for academics include “research”, “accuracy of information”, and “critical thinking”, while for students, “academic integrity”, “critical thinking”, “risk”, “copy-paste”, and “creativity skills”. The majority of the sentiments regarding the concerns were negative for students (38%), and minority for academics (28%). Overall, academics expressed positive sentiments about the utility of using ChatGPT. This research highlights these findings and recommends further exploration of using this tool in educational practices with a focus on the identified concerns to guide future implementation.

Published in	International Journal of Data Science and Analysis (Volume 11, Issue 3)
DOI	10.11648/j.ijdsa.20251103.13
Page(s)	76-98
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

ChatGPT, NLP, Sentiment Analysis, Topic Modeling, Machine Learning, Social Media, Education

1. Introduction

The rapid advancements of Artificial Intelligence (AI), particularly in large language models, have created a significant impact in the field of Natural Language Processing (NLP). ChatGPT, an innovation of OpenAI, have brought a new era of AI driven language understanding and generation across various domains.

Within the realm of Education, ChatGPT has become the most prominent topic of discussion. Millions of users share their concerns, opinions, and thoughts on social media platform X (Figure 1). Their concerns concentrate around the potential benefits, and the associated risks of using this powerful language tool.

Figure 1. Word Cloud Image from X Content.

Download: Download full-size image

Within the realm of academia, the recognition of using this AI tools is growing very fast. The academics are actively engaged with this tool to explore their potential implications and benefits

[12]

. On the other hand, a feature of automatic generation of content is being nowadays a very alarming concern. Students are increasingly relying on this tool to complete their assignments often without considering the academic integrity policies. This growing tendency are worrying the educators about its potential consequences for students’ critical thinking abilities, and learning skills

[13]

In the educational domain a comprehensive synthesis of existing articles, papers, and blogs authored by researchers or academics will be carried out to uncover the primary concerns regarding the utilization of AI tools.

An article on the impact of ChatGPT in education

[1]

indicates that ChatGPT plays beneficial role in enhancing student engagement, customizing learning preferences with better learning outcome, strengthening language skills and abilities, and optimizing teaching practices efficiently. However, the article also mentioned notable significant negative concerns of using ChatGPT which includes violating privacy, threats to academic integrity and biased responses. These concerns lead the academics to emphasize on using diverse and representative data to train ChatGPT more effectively

[1]

. A research by Korkmaz (2023) represents a sentiment analysis of using ChatGPT based on X data

[9]

. This study evaluated the opinions and thoughts during the announcement of ChatGPT. This research study analyzed various dictionaries and concluded that a large number of users expressed the potential success of using this tool. A study by Sudirman I. D. (2023) found benefits of ChatGPT in entrepreneurship education enhanced students’ engagement and critical thinking

[10]

. A blog written by Nelson, J. (2023) about the possibility of future education under this AI driven technology will impact the integrity of academic institutions

[14]

. The researcher shared experts’ raised concerns about the utilization of ChatGPT. Another research by Zhai, X. (2022) and Dukewich K. (2023) examines the user experience of ChatGPT in education

[15, 17]

. The findings claim that this tool assist the researchers to write research paper in a systematic and informative way. These studies also raise the concerns that students may outsource the assignment tasks. It suggests to consider new formats of assessments to focus on improving the critical thinking abilities and creativity skills. In the context of academic integrity, a research was conducted by Cotton, D. R. (2023) on the risk of using ChatGPT in examinations

[16]

. Academic integrity and plagiarism concerns were raised and suggested strategies by arranging proper educational training and support to use AI tool responsibly

[18, 19]

. Fütterer et. al. (2023) provided a global scenario of using ChatGPT in education from X responses

[11]

. The study finds out a mixed of sentiments expressed by the users which were examined through sentiment analysis and topic modeling. Overall, multiple studies have shown the impact of using ChatGPT in educational platform and examined the threats of using this tool in teaching, research, assessment, and learning

[20-26]

After considering the above discussions, it is evident that, ChatGPT has raised an emerged level of interests among the researchers. This research has tried to achieve the solution of the following research questions:

(1) What are the reactions articulated by X users regarding the topic based on ChatGPT on social media platform?

(2) What are the key concerns and worries for both academics and students for using ChatGPT in educational contexts?

(3) In what manner do these concerns express as either positive or negative sentiments?

To achieve the research questions mentioned above, this research focused on the following objectives:

(1) Conduct a comprehensive analysis to examine the concerns expressed by X users on social media platform;

(2) Investigate the users’ concerns through sentiment analysis, and topic modeling using Latent Dirichlet Analysis (LDA);

(3) Examine the levels of sentiment within the X content;

(4) Predict the sentiment levels through machine learning approaches;

(5) Evaluate the underlying concerns, sentiment and relevant factors by an explanatory data analysis among academics in a case study;

(6) Examine how the expressed concerns align with AI in education and its potential impact on teaching and learning.

In this context, this research focused on identifying the potential concerns of using ChatGPT across the users’ opinion from social media platform X and from academic platform.

2. Materials and Methods

2.1. Data Collection and Data Description

The data collection involves both primary and secondary sources.

2.1.1. Primary Data

The primary data collection focuses on a case study among the academics from different universities. A questionnaire has been developed to conduct the online survey. The academics have been contacted through email. The purpose of the survey involves to discover the concerns, valuable insights, and thoughts from the academics regarding using ChatGPT in educational platform. A total of 37 respondents have been participated, providing essential information for analysis. The questionnaire includes several inquiries about the user experience of the academics such as the purpose for which they use ChatGPT in their academic works, the frequency of their usage, the benefits for students and academics, as well as the associated risk and concerns.

2.1.2. Secondary Data

To investigate the implications of ChatGPT in education on social media platform, a X data spanning from January 2023 to March 2023 has been collected from Kaggle

[3, 8]

. The original data collection process utilized snscrape in Python to scrape data using X API tool from social media platform, X

[4]

. This dataset comprises a CSV file containing 500,000 X posts related to ChatGPT including variables: date, id, content, username, like count and retweet count. This study aims to analyze the valuable insights related to ChatGPT from this X dataset and examine the concerns associated with the use of this AI powered technology in education

[2]

2.2. Data Pre-processing

Two distinct datasets for this research have been utilized which have undergone through a data pre-processing steps.

2.2.1. Survey Data Pre-processing

1. Data Cleaning

From the online survey a total of 37 responses have been received. Among them 3 responses have been removed. Since this research focuses information only from the academics, so the non-academic participants’ responses have been removed from the data. There is no missing data.

2. Data Transformation

To make the survey data suitable for analysis, the survey responses from the google drive have been converted into Microsoft excel worksheet in comma-delimited format (CSV). This data has been imported in Python for the implementations.

3. Variable Selection

The questionnaire contains timestamp, username, valuable research questions regarding the user experiences from the academics. For the implementation this research has mainly collected the responses of the academics. The personal information of the participants has not mentioned anywhere in this research study.

2.2.2. X Data Pre-processing

The X dataset utilized in this study was pre-processed by the following steps, as detailed by Ansari

[3]

1. Data Cleaning

The X dataset contained missing values and some values were entered inaccurately which were subsequently removed. Since the dataset contains a large amount of values, these deletions of missing values would not affect the overall implementation of the work

[3]

2. Pre-processing for Sentiment Analysis and LDA

To ensure the accuracy of sentiment analysis and avoid misleading in natural language processing, hyperlinks, hashtags, and account mentions were removed from the data. Punctuations, emoticons and stop words were also removed. All texts were converted into lower case and expanded contractions in the X posts

[3]

3. Data Transformation

To understand the meaning of text, tokenization separated the texts into a list of sub-words. Reducing words to their root forms, all the words were lemmatized and stemmed to improve the accuracy of the sentiment analysis

[6]

. In this step, all suffixes and prefixes were removed from the text

[3]

4. Vectorization

This step was carried out to identify the semantic relationship among words and to find out most frequently used words in the X posts

[3]

5. Pre-processing for Topic Modeling

To identify the possible topics within the X posts extra unnecessary words were removed to include the important words only based on the divided sentiment data frames (positive, negative, neutral)

[3]

. The retrieved X posts were converted into a text corpus

[6]

6. Pre-processing for Machine Learning

To implement ML models, the dataset was splitted into training and test sets.

2.3. Natural Language Processing

Natural Language Processing techniques are widely used to process human language effectively. The applications involve language translation, summarization, recognition of human speech, sentiment analysis and so on

[9]

. This research will focus on the sentiment analysis and Topic Modeling using LDA to analyze the X content and survey responses.

2.3.1. Sentiment Analysis

Sentiment analysis, a branch of Natural Language Processing, allows us to extract insights and sentiments from X user’s posts related to the topic of ChatGPT. In this research, both lexicon-based and machine learning-based approaches were implied. The lexicon-based implementation focuses on “Dictionary-Based Approach” and the machine learning approach is discussed in the next section. The posts and responses were categorized based on “positive”, “negative”, and “neutral” sentiments and computed the sentiment polarity scores and subjectivity to make a comparison investigation.

2.3.2. Topic Modeling of Using LDA

To extract the underlying topics embedded within the text data, a widely used topic modeling algorithm and Latent Dirichlet Allocation (LDA) have been employed. Here, the following steps were followed to implement LDA:

(i). Data Preparation

This step involves text cleaning, tokenization, stemming, count vectorization, sparse matrix conversion, and mapping words.

(1) Count Vectorization

The pre-processed text data has been transformed into numerical vectors of token counts (using “sklearn” toolkit, “feature_extraction, “CountVectorizer”, “TfidfVectorizer”) for each sentiment category.

(2) Sparse Matrix

To handle large text datasets, transforming the text data into a sparse matrix format (using “gensim”, “Dictionary”) is essential. In this step, the text data was converted into a sparse matrix format for each sentiment category.

(3) Gensim Corpus Conversion

The sparse matrices were then converted into the gensim corpus format using “gensim” library.

(4) Mapping Word IDs to Words

Words were assigned unique IDs to their actual text. A mapping was then created to connect these word IDs (using “id2word”) to their corresponding words. This mapping decodes the topics generated by LDA and associates them with meaningful terms from the original text corpus.

(ii). Data Training LDA Model

After transformation and mapping, to identify possible topic in the textual data, the LDA model was trained for each sentiment category. This involved setting parameters such as the number of topics, id2word mapping, random state, and passes. The LDA models were conducted with 10 iterations over the entire Gensim corpus.

(iii). Identification of Topics

Once the LDA model has been trained, the next step was to identify and interpret the identified topics from the X text content. Most frequent words have been examined within each topic associated with meaningful labels. These discovered topics provide valuable insights into ChatGPT X texts.

2.4. Machine Learning Approach

This research focused on the following machine learning approaches to classify the sentiments of X users regarding potential aspects of ChatGPT in order to gain valuable insights into public perceptions and opinions. The following steps have been undertaken to implement the machine learning algorithms.

2.4.1. Dataset Preparation

The pre-processed dataset has been divided into two components: features (X) and sentiment labels (positive, negative, neutral) (y).

2.4.2. Data Splitting

To evaluate the model effectively the cleaned dataset has been splitted into training (80%) and test (20%) set using the “train_test_split” library.

2.4.3. Machine Learning Models

The following machine learning algorithms (Table 1) to predict the classification of positive, negative, and neutral sentiments from the X posts by importing the following libraries from “sklearn” toolkit.

Table 1. Machine Learning Models.

SI.	ML Models	Library
1.	Logistic Regression:	“LogisticRegression”;
2.	Random Forest:	“RandomForestClassifier”
3.	Support Vector Machine:	“SVC”;
4.	Multinomial Naive Bayes:	“MultinomialNB”;
5.	K-Nearest Neighbors:	“KNeighborsClassifier”;
6.	Decision Tree:	“DecisionTreeClassifier”;
7.	Gradient Boosting:	“GradientBoostingClassifier”
* Maximum iteration: 1000; * Maximum features: 100

2.4.4. Text Classification Pipeline

For each model a text classification pipeline has been created that includes TF-IDF vectorization and the model itself.

2.4.5. Training the ML Models

Each machine learning model has been fitted to the training data.

2.4.6. Model Prediction

The sentiment labels have been predicted for each model for the test data based on trained model.

2.4.7. Model Evaluation

To assess the performance of the ML models, “classification_report” library has been employed from scikit-learn. This report provides (precision, recall, f1-score) detailed metrics and model accuracies for each sentiment categories.

2.5. Software and Tools

Python programming language and Microsoft excel has been used to implement the results of this research. The open source python code on Kaggle published by Ansari (2023a)

[2]

have been followed in carrying out few steps of X data analysis. To carry out the data pre-processing, a variety of libraries and packages were utilized for different tasks:

(1) Data visualization Tasks: “matplotlib.pyplot”, “seaborn”, “plotly”, “WordCloud”;

(2) Natural Language Processing Tasks: “nltk”, “word tokenize”, “WordNetLemmatizer”, “SentimentIntensityAnalyzer”, “stopwords”, “collections”;

(3) Topic Modelling: “gensim”;

(4) LDA (Latent Dirichlet Allocation) tasks: “pyLDAvis.gensim”, “gensim.corpora.dictionart”

(5) TF-IDF (Term Frequency-Inverse Document Frequency) tasks: “TfidfVectorizer”

(6) Other tasks: “counter”, “stats”, “image.

(7) ML Tasks: “LogisticRegression”; “RandomForestClassifier”; “SVC”; “MultinomialNB”; “KNeighborsClassifier”; “DecisionTreeClassifier”; “GradientBoostingClassifier”

3. Results

3.1. X Data Analysis

The X dataset contains 500k posts on ChatGPT from January to March 2023. The data descriptions are shown in Table 2 and Table 3.

Table 2. X Data Description.

Data Description
Length	500036
Shape	(500036, 6)
Index	[‘date’, ‘id’, ‘content’, ‘username’, ‘like_count’, ‘retweet_count’]
Type	Object

Table 3. Number of unique and missing values in each column.

Index	No. of unique values in each column	No. of missing Values in each column
date	475394	0
id	500007	6
content	493744	6
username	250006	34
like_count	1066	62
retweet_count	489	62

3.1.1. Data Pre-processing

In order to prepare the data for sentiment analysis, topic modeling and machine learning model training, a sequence of data pre-processing steps have been followed to clean the raw data. The following tasks (Table 4) have been performed to pre-process the data.

Table 4. Data Pre-processing Tasks.

SI.	Tasks	Details
1.	Date conversion:	The date column converted into date time using “datetime” and “timedelta” libraries
2.	Missing values removal:	Missing values were removed using “dropna”
3.	Hashtag removal:	Any hashtags mentioned in the tweet content were removed
4.	URL link removal:	URL and web links were removed to eliminate any web references
5.	HTML conversion:	HTML entities were converted (“&” to “and”, “<” to “<”, “>” to “>”) to ensure the text consistency
6.	New line character removal:	To maintain text coherence, new line characters such as ‘\r’ and ‘\n’ were replaced with a space
7.	Account name removal:	Mentioned X id or account details were removed as they do not carry any sentiment information
8.	Expanding contractions:	Contractions like “don’t” were expanded to full forms “do not” for consistency using “contractions” library
9.	Punctuation and emoji removal:	Special characters (‘@’, ‘#’, ‘$’, ‘%’, ‘*’, ‘&’), punctuations (‘.’, ‘,’, ‘!’, ‘:’, ‘?’, ‘”’), and emoji’s (,  etc.) were removed to focus on the tweet content
10.	Lowercasing words:	All texts were converted to lowercase to ensure uniformity
11.	Multiple space removal:	Multiple spaces were replaced with single space
12.	Stopwords removal:	Stopwords (e.g., ‘and’, ‘the’, ‘is’) were removed from the text
13.	Tokenization:	Word tokenization used to breakdown the texts into sub words using “NLTK” toolkit, “word_tokenize” libraries
14.	Lemmatization:	To perform NLP tasks, words were lemmatized to their root or dictionary form using “WordNetLemmatizer” from “NLTK” library
15.	Vectorization:	Text data converted to their numerical vector form to perform sentiment analysis, topic modeling, training ML algorithms. To assign numerical values to words based on the frequency (count vectorization) TF-IDF vectorizations were employed. Sparse matrices were converted to corpus using “gensim” library

These steps ensure that the raw data is clean, consistent, and ready for further analysis and implementations.

3.1.2. Keyword Distribution

Download: Download full-size image

Figure 2. Keyword Distribution in X Posts.

Before implementing sentiment analysis and topic modeling, a quick identification was conducted within the pre-processed X content for specific interested keywords (Figure 2). Each keyword represents a term, and the figure is useful to understand the most prominent words discussed in the tweet posts. Notably, approximately 105,826 topics were based on ‘chatgpt’. The higher frequency of the major keywords provides insights into the usage of ChatGPT for tasks such as ‘content’ creation, ‘writing’, ‘research’, ‘programming’, and ‘assignment’ etc. It is also worth noting that the keyword ‘student’ appeared 7,948 times, which is higher than the counts for ‘teacher’ (3,548), and ‘academic’ (1,732). Additionally, the words ‘fear’, ‘worry’ and ‘concern’ collectively appeared 5,944 times which is a very significant number that cannot be overlooked. These words were mentioned by the X users in the context of ‘plagiarism’, ‘academic integrity’, ‘copy’, and ‘cheat’, among other topics.

The most significant and frequently mentioned words appear (Figure 3) in the X posts include ‘chatgpt’ which appears 42% of the time, and ‘ai’, which appears for 18%, among the others. In the next step, sentiment analysis was implemented to categorize the ChatGPT X contents based on their sentiments.

Download: Download full-size image

Figure 3. Top 10 Used Words in X Posts.

3.1.3. Sentiment Analysis of X Text

Download: Download full-size image

Figure 4. Trend of Sentiment Distribution Over Time.

Under the context of natural language processing (NLP), sentiment analysis focuses on categorizing the opinions expressed in the text data based on different sentiments. The X contents have been categorized into ‘positive’, ‘negative’, and ‘neutral’ sentiment labels to uncover the trends, concerns, and potential insights. The Figure 4 illustrates the trend in sentiment distribution towards ChatGPT over the timeframe. The trend of the positive (green line) and neutral (blue line) sentiment distribution shows a fluctuate behavior while the negative (red line) sentiment distribution follows an upward trend in trend which indicates that negative sentiments towards ChatGPT is increasing over the time. Notably, the trend of positive and neutral sentiment appeared to be very high between 2023-02-01 and 2023-02-15, and in 2023-03-15. Moreover, it is not our objective to analyze the time series events.

The pie chart (Figure 5) also represents distribution of sentiments within the X posts. It is evident that the majority of the sentiments regarding ChatGPT in X posts are positive (51%) in comparison with negative sentiments, which account for only 15%, and neutral sentiments with 34%.

Download: Download full-size image

Figure 5. Sentiment Distribution of X Posts.

3.1.4. Topic Modeling of X Text Using LDA

Latent Dirichlet Allocation (LDA) was applied for topic modeling to identify the most frequent topic words associated with word weight distribution within the X text data.

(i). Identification of Positive Topics

Most frequent words from each five topics from the positive X contents are shown in Figure 6. The plot illustrates the weights indicating the importance of each specific words (“chatgpt”, “value”, “use”, “content” etc.) associated with meaningful labels.

Figure 7 illustrates the overall distribution of five positive topics which were mostly discussed on X posts related to ChatGPT. Among these topics, Topic 2 stands out as the most dominant, accounting approximately 49% and Topic 5 as the least dominant of the overall positive X contents.

Based on these words from the Figure 6, a possible subjective inference about the topics was made (Table 5). Among the positive tweet texts, Topic 2 was most prevalent subject discussed by the X users related to “Utility of ChatGPT”. This indicates that users were engaged in conversations related to the practical utility of ChatGPT associated with business scenarios, stakeholders, policy makers or educators. The second most prominent topic (Topic 4: 16%) is related to the application of ChatGPT for writing assistance. Also, X users positively discussed the generation of free contents through this AI prompt, accounting for 16% of the overall discussion on this topic.

Download: Download full-size image

Figure 6. Weights of Words for Positive X Topics.

Download: Download full-size image

Figure 7. Topic Modeling for Positive X Contents.

Table 5. Topic Names for Positive X Contents.

Topics	Topic Names	Percentage
Topic 1:	ChatGPT OpenAI language model tool and search engine	13.30%
Topic 2:	Utility of ChatGPT	49.33%
Topic 3:	Create free content using ChatGPT prompt	16.66%
Topic 4:	ChatGPT good writing assistance	17.75%
Topic 5:	Potentials of AI technology in the future	2.96%

(ii). Identification of Negative Topics

Figure 8 presents the word weights extracted from negative X texts, discussed mostly by the users.

Download: Download full-size image

Figure 8. Weights of Words for Negative X Topics.

Notably, this figure highlights the dominant words such as “chatgpt”, “student”, “wrong”, “cheating”, “write”, and “think” etc., which underscore concerns of using ChatGPT within educational context.

Furthermore, Topic 2 exhibits the highest percentage (47%) in expressing negative concerns associated with ChatGPT, as shown in Figure 9 inferred as “Inaccuracy and concerns of cheating in examination” (Table 6). Users have expressed concerns about the usage of AI tools by students, which can potentially lead to misuse and cheating during examinations. Academic integrity is also a matter of their concerns. The inaccuracy of information can negatively impact students’ learning skills.

Additionally, the second highest (21%) negative topic (Topic 1) is about using ChatGPT by students in school (Table 6) and the overreliance on this tool can have an impact on humans’ critical thinking abilities and their creativity skills.

Download: Download full-size image

Figure 9. Topic Modeling for Negative X Contents.

Table 6. Topic Names for Negative X Contents.

Topics	Topic Names	Percentage
Topic 1:	Using ChatGPT by students in school and impact thinking	21.62%
Topic 2:	Inaccuracy and concerns of cheating in examinations	47.46%
Topic 3:	New AI model and user accessibility	17.23%
Topic 4:	Limitations of ChatGPT	6.99%
Topic 5:	Potential threats by search engines (Google AI, ChatGPT)	6.70%

Approximately 17% of the content involves discussions of X users about the accessibility challenges and user experience of the new AI model. Topic 4 comprises of 7% of the discussions, focusing on the limitations and drawbacks of ChatGPT, which are also a source of negative concern. Lastly, Topic 5 (6.90%) seems to appear on potential threats posed by the search engines (e.g. Google AI and ChatGPT), particularly regarding the privacy and security concerns (Table 6).

(iii). Identification of Neutral Topics

The words related to neural topics are represented in Figure 10. The word weights extracted from neutral X texts, discussed mostly by the users. Notably, this figure highlights the dominant words such as “chatgpt”, “write”, “generated”, “answer”, “search”, and “use” etc.

The most frequent neutral topics associated with ChatGPT which are expressed by the X users are represented in Figure 11 and Table 7, whereas Topic 1 is about “Generating Contents by ChatGPT” exhibiting the highest importance of about 27%. Most of the people concerned about discussing the contents generated by ChatGPT. The Topic 3 is as prominent as Topic 1, is about searching and retrieving answers using the AI prompt, accounts of 26% of X discussion. Furthermore, the Topic 2 regarding the possibility and limitations of using ChatGPT in future comprises of 22% of overall neutral X contents.

Download: Download full-size image

Figure 10. Weights of Words for Neutral X Topics.

Download: Download full-size image

Figure 11. Topic Modeling for Neutral X Contents.

Approximately 15% and 9% of the content involves discussions on the other topics such as conversations with Chatbot to assist in writing (Topic 5) and performance of AI tool with Microsoft and Bing in 2023.

Table 7. Topic Names for Neutral X Contents.

Topics	Topic Names	Percentage
Topic 1:	Generating contents by ChatGPT	26.98%
Topic 2:	Possibilities and limitations of OpenAI ChatGPT in future	22.44%
Topic 3:	Search questions and retrieve answers using ChatGPT	26.13%
Topic 4:	ChatGPT with Microsoft AI and Bing in 2023	8.58%
Topic 5:	AI-Powered conversations	15.86%

In the next step, different machine learning models have been trained to the X dataset to examine the model performance based on accuracy measures for each sentiment category.

3.1.5. Implementation of Machine Learning Models

The performance of several machine learning models were evaluated for each sentiment classification tasks and the classification reports include metrics (precision, recall, f1-score, and accuracy) (Table 8).

Table 8. Classification Report of Machine Learning Models.

ML Models	Sentiments	precision	recall	f1-score	Accuracy
Logistic Regression:	Negative	0.34	0.01	0.02	0.60
	Neutral	0.54	0.64	0.58
	Positive	0.65	0.75	0.70
Random Forest:	Negative	0.39	0.09	0.14	0.62
	Neutral	0.54	0.74	0.62
	Positive	0.71	0.70	0.70
Support Vector Machine:	Negative	0.44	0.00	0.01	0.60
	Neutral	0.52	0.67	0.59
	Positive	0.66	0.73	0.69
Multinomial Naïve Bayes:	Negative	0.26	0.00	0.00	0.57
	Neutral	0.55	0.33	0.41
	Positive	0.57	0.89	0.70
K-Nearest Neighbour:	Negative	0.25	0.18	0.21	0.56
	Neutral	0.50	0.71	0.59
	Positive	0.72	0.58	0.64
Decision Tree:	Negative	0.28	0.15	0.20	0.59
	Neutral	0.52	0.76	0.62
	Positive	0.73	0.61	0.66
Gradient Boosting:	Negative	0.57	0.01	0.02	0.58
	Neutral	0.58	0.38	0.46
	Positive	0.58	0.89	0.70

Comparing the accuracy values, Random Forest achieved the highest accuracy (0.62), while K-Nearest Neighbor (KNN) achieved the lowest accuracy (0.56). However, the choice of the best model also depends on several factors such as precision, recall, and f1-scores. Since the f1-score makes a balance between the precision and recall values, we can use it to evaluate the models’ performances.

(1) For Negative Sentiments: None of the models perform particularly well, as all the f1-scores are below 0.25. KNN has the highest f1-score among all the models at 0.21, and Decision Tree has an f1-score of 0.20.

(2) For Neutral Sentiments: The f1-scores range from 0.41 to 0.62, with Random Forest having the highest f1-score at 0.62, and Multinomial Naïve Bayes having the lowest f1-score at 0.41.

(3) For Positive Sentiments: All the models performed quite well for positive sentiment classifications, with f1-scores ranging from 0.64 to 0.70.

In summary, determining the best-performing model depends on several factors, such as parametrization, computational efficiency, interpretability, and specific requirements. However, based on the above classification report, it can be concluded that Random Forest model is the top-performing model with the highest f1-scores and the highest accuracy for each classification task. Drawing insights from the extracted valuable concerns, sentiments and topics discussed in the above implementations regarding ChatGPT on social media platform X, a brief case study was conducted to investigate the application of AI tools in an educational context from an academic perspective.

3.2. Survey Data Analysis

In this section, the sentiment analysis has been implemented on the survey data to discover the concerns, valuable insights, and thoughts regarding using ChatGPT in educational platform.

3.2.1. Usage of ChatGPT by Academics

The pie chart (Figure 12) visually represents the usage of ChatGPT by academics, showing that a larger percentage (58%) of them are using this AI tool for academic purpose. A significant portion of academics are using ChatGPT regularly (14.7%) or occasionally (23.5%) for their academic works (Figure 13).

Download: Download full-size image

Figure 12. Usage of ChatGPT by Academics.

Download: Download full-size image

Figure 13. Frequency of using ChatGPT by Academics.

Based on our findings from the survey, it is observed that a large portion (32%) of respondents find ChatGPT to be useful while 26% of respondents expressed neutral thoughts regarding its utility in academic tasks (Figure 14).

Download: Download full-size image

Figure 14. Usefulness of ChatGPT in Academic Tasks.

Download: Download full-size image

Figure 15. ChatGPT Accuracy Responses.

The Figure 15 represents valuable insights into the reliability of ChatGPT generated information as perceived by the academics. The majority (41%) considers its accuracy of the generated information as moderate, while 26% of users believe that the information it provides is mostly accurate.

However, since ChatGPT operates as a tool, its accuracy depends on the presence of various features and variables. It gathers information from diverse sources, which can lead to inaccurate results. Consequently, none of the academics consider the responses generated by ChatGPT to be entirely precise.

3.2.2. Major Purpose of Using ChatGPT

Figure 16 illustrates that, 28% of the academics from our survey are using this AI tool to help with their writing tasks. Additionally, academics find ChatGPT to be a valuable tool for generating ideas and brainstorming. It is also notable that, ChatGPT is also significantly assisting the academics in programming, preparing lectures and quizzes. Some respondents from our survey may not be utilizing ChatGPT for these specific tasks. Overall, our survey underscores that ChatGPT is helping the academics mostly for writing related activities in academic tasks.

Download: Download full-size image

Figure 16. Purpose of Using ChatGPT in Academic Works by Academics.

3.2.3. Allowing Students to Use ChatGPT

Academics were asked about whether they allow their students to use ChatGPT or not. Approximately 27% of respondents were open to the idea of allowing their students to use ChatGPT whereas 23% were opposed to this opinion (Figure 17).

However, the decision of allowing students to use AI tools in educational settings depends on various factors and considerations. It is essential to address both the benefits and concerns of permitting the usage along with ethical considerations of the academic community.

Download: Download full-size image

Figure 17. Allowing Students to Use ChatGPT.

3.2.4. Concerns of Academics About Potential Benefits of Using ChatGPT by Students

Academics have raised their concerns from our survey about the potential benefits for students associated with this AI tool. They believe that ChatGPT primarily associates students to improve their language and writing skills (Figure 18). They also acknowledge that this tool can also contribute to enhancing their critical thinking abilities. However, students also use ChatGPT to write their assignments automatically which can make them dependent to it and violate ethical concerns.

Download: Download full-size image

Figure 18. Potential Benefits of using ChatGPT by Students.

3.2.5. Major Concerns for Academics and Students

Academics have expressed their thoughts, worries and concerns for both them and students about using ChatGPT which have been shown in the Table 9 and Table 10.

Table 9. Major Concerns for Academics.

	Major concerns for Academics
1.	Huge dependency on this for their research work. There's a risk that their own critical thinking and research skills might be side-lined.
2.	Reading book will impact.
3.	For academics, the concern is that the information given by this tool is highly questionable.
4.	Academics are supposed to be critical thinkers, so I think they are less in risk of falling in traps like students, however sometimes with the workload being vast, they might overlook the fact that information could be wrong in favor of getting more done in less time.
5.	Inaccurate information The major concern for academics using ChatGPT for academic purposes is the challenge of assessing the originality and authenticity of the generated content, potentially compromising the integrity of research and scholarly work.

Table 10. Major Concerns for Students.

	Major concerns for Students
1.	Using ChatGPT for academic purposes is the potential risk of overreliance on the tool.
2.	However my concerns are that it is mostly used for giving answers to questions that require critical thinking or effort from the student's side, leaving them without the skills that they are meant to have after completing their studies.
3.	The major concern for students using ChatGPT for academic purposes is the risk of over-reliance on the tool, which could hinder their critical thinking and independent problem-solving skills.

These concerns have been identified through a comprehensive search of keywords (Figure 19). Notably, a significant concern of using ChatGPT in academic research may encompass the risk of generating inaccurate information. Academics need to evaluate and verify the content generated by ChatGPT and consider the ethical issues. Moreover, excessive dependency on this tool affects the critical thinking and creativity which will lead to poor quality research.

From our survey, the major concerns of academics have been extracted regarding students’ use of ChatGPT for academic purpose. Notably, it is examined that academic consider academic integrity as the foremost concern (Figure 20). Academic tasks assigned to students are expected to be completed by their own level of critical thinking and effort. However, the automated generation of assignments and copying the content without proper citation potentially violate the academic integrity standards as well as affect their creativity skills. AI tools like ChatGPT can indeed generate inaccurate content, and the worry is that students may not spend enough time in verifying accuracy of the information, which can compromise their learning experiences. As previously mentioned, this tool can be used primarily for enhancing the writing skills within the boundaries of academic integrity standards, without students becoming overly reliant on it.

Download: Download full-size image

Figure 19. Major Concerns for Academics.

Download: Download full-size image

Figure 20. Major Concerns for Students.

3.2.6. Sentiment Analysis

A sentiment analysis was conducted on the survey data. The respondents expressed their concerns for academics in 21 sentences comprising 425 words, while for students in 29 sentences with 505 words. This means, concerns for students have been expressed more by academics. The sentiment analysis results indicate (Table 11) different sentiment scores.

The average sentiment score for academics’ concern is -0.06232, while for students, it is also negative with a score of -0.03873. It is also noticeable that, despite the negative sentiment, the overall valuable insights raised by academics have a positive sentiment (0.14793) because academics cannot deny the benefits of this AI tool. It has some advantages regarding all the concerns and risks.

Table 11. Sentiment Scores.

Sentiment Scores	Concerns for Academics	Concerns for Students	Concerns of Valuable Insights from Academics
Average Sentiment Score:	-0.06232	-0.03873	0.14793
Sentiment Label:	Negative	Negative	Positive
Total sentences:	21	29	29
Total words:	425	505	439
Sentiment Polarity:	0.04398	0.08841	0.15068
Sentiment Subjectivity:	0.49554	0.42447	0.44307

The sentiment polarity and subjectivity scores provide valuable insights into the emotional tone and depth of expression expressed by academics. The values are positive for both academics and students indicating that there is an overall positive tone in the concerns despite the negative sentiment.

The Table 12 shows the average sentiment scores for both positive and negative sentiments. It is evident that, for both groups, the negative sentiments regarding concerns of using ChatGPT are higher than the positive sentiments.

Table 12. Average Sentiments Scores.

Sentiment Types	Sentiment Score for Academics (%)	Sentiment Score for Students (%)
Positive Sentiment	12%	11.54%
Negative Sentiment	28%	38.46%

The majority of sentiments expressed by educators regarding academics are negative (28%), indicating significant concerns and reservations about using ChatGPT which includes the possibility of poor-quality research, negative impacts on critical thinking, overreliance, and presence of inaccurate information. They emphasize about addressing these concerns while using ChatGPT in an academic context.

In comparison, the majority of sentiments expressed by the academics towards students are also negative (38%). The academics have concerns that students using this tool in academic context may deteriorate their learning and thinking capabilities and they are fear of potential academic integrity issues.

4. Discussion

The key objective to conduct this research study was to investigate the underlying concerns, sentiments, and relevant factors related to OpenAI large language model “ChatGPT” through sentiment analysis using LDA, topic modeling, and implement the machine learning model for each classification task.

4.1. Discussion on X Data Analysis

The analysis of X posts about ChatGPT, utilizing the X dataset comprising 500k data, involved extensive data pre-processing steps, including the removal of hashtags, URLs, IDs, missing values, emoticons, space. Also, conversion and transformations of datetime were performed, as well as converting the text words to lowercase. These steps prepared the data for further analysis. To perform the Natural Language Processing (NLP) tasks, we further prepared the data through tokenization, lemmatization, and vectorization using specific libraries from “nltk” toolkit.

A keyword distribution analysis unveiled the most frequent terms associated with ChatGPT discussions on X, including ‘chatgpt’, ‘ai’, ‘like’, and ‘use’. Among the top 10 most frequent words, the “chatgpt” term appeared with a majority of 2,21,654 counts.

To categorize the various opinions related to ChatGPT expressed in the X posts, sentiment analysis was conducted to classify the contents into positive, negative, and neutral sentiment labels. It was observed that (Table 13), positive sentiment towards ChatGPT was the most prevalent, accounting for 51%, whereas, neutral sentiments at 34% and negative sentiments at 15%.

Table 13. Key Findings of X Data Analysis.

Most Frequent Word Counts	Positive Sentiment	Negative Sentiment	Neutral Sentiment
“chatgpt”: 2,21,654	51%	15%	34%

Sentiment	Topic	Percentage	Topic Names
Positive	Topic 2	49%	Utility of ChatGPT
Negative	Topic 2	47%	Inaccuracy and Concerns of Cheating in Examinations
Neutral	Topic 1	27%	Generating Contents by ChatGPT

Best Performed ML Model	Model Accuracy	F1-score
Random Forest	62%	Positive Sentiments	Negative Sentiments	Neutral Sentiments
Random Forest	62%	0.70	0.14	0.62

Next, to identify the key topics within each sentiment category, we applied Latent Dirichlet Analysis (LDA) for topic modeling. The LDA models were trained to generate five possible topics from within each sentiment category. The most dominant words’ weights from each topic were evaluated within each sentiment categories, yielding valuable findings for our research questions. From the key findings in Table 13, we noticed that, the most prominent positive topic revolved around the utility of ChatGPT, accounting for 49% of discussions, while the negative topic (47%) centered on the concerns of generating inaccurate information, potential misuse of ChatGPT, and violating academic integrity which is alarming for students. On the other hand, the neutral topic concentrated (27%) mostly in assisting humans in generating free content and in writing.

Finally, various machine learning models were implemented and measured the accuracy metrics. Through the comparison, it is concluded that Random Forest model performed well with the highest accuracy of 62% (Table 13). Additionally, the Random Forest model exhibited higher f1-scores when predicting the sentiment labels correctly compared to other machine learning models.

4.2. Discussion on Survey Data Analysis

A brief case study was conducted through a questionnaire among the academics from different universities to gather valuable concerns about ChatGPT in the context of educational perspectives. The responses from the survey underwent pre-processing which involved cleaning by removing non-academic responses and missing values.

The thoughts, concerns, and user experiences of academics have been explored regarding the usage of ChatGPT in academic tasks. A signification portion of surveyed academics, approximately 59%, reported using ChatGPT for academic purposes. Among them, around 15% use this AI tool in a regular basis, while approximately 24% use it occasionally.

In the context of the utility of the large language model tool in artificial intelligence, a majority, constituting 32% of academics consider it as useful whereas approximately 18% expressed that it is very useful, and another 26% of them regard its utility as averagely useful.

Regarding the accuracy of generated contents from ChatGPT prompt, 41% of academics expressed their concerns as moderate, while 26% of them consider that the generated information is mostly accurate.

In terms of the potential usage, 28% of academics find it as most helpful in writing, 19% think it is good for generating ideas and brainstorming, and a very least portion of them believe it does not assist much in research works.

Approximately 29% of academics believe that ChatGPT primarily assists students in improving their language and writing skills while potentially enhancing their critical thinking. However, they also express concerns about generating assignments automatically may negatively impact the students’ learning behavior, creativity skills, and leads them towards over reliance on it, which is a major alarming concern.

From a frequency distribution of major potential keywords regarding ChatGPT in an educational context, it is obtained, “research”, “accuracy of information”, and “critical thinking” were the most concerned keywords for academics. Meanwhile, for students, “academic integrity”, “critical thinking”, “risk”, “copy-paste”, and “creativity skills” were identified as major fears.

Table 14. Key Finding of Survey Analysis.

Valuable Key Findings from Survey Data Analysis
Usage of ChatGPT	Using ChatGPT	59%
Usage of ChatGPT	Not using ChatGPT	41%
User Type	Regular Users	15%
User Type	Occasional Users	23%
Usefulness	Useful	32%
	Very useful	17%
	Not at all useful	14%
Accuracy	Moderate accuracy	41%
Accuracy	Mostly Accurate	26%
Sentiment Score for Academics	Negative	28%
Sentiment Score for Students	Negative	38%
Perceived Benefits	Potential Benefit	“writing assistance”; “generating ideas and brainstorming”; “improving language and writing skills”; “enhancing critical thinking”;
Perceived Concerns	Potential Concerns	“poor quality research”; “academic integrity”; “overreliance”; “accuracy”; “impact critical thinking”; “impact creativity skills”;

After that, a sentiment analysis of concerns for both academics and students revealed that the sentiment label is “Negative” for both groups, with higher negativity observed among students. Despite the negative sentiments, valuable insights raised by academics had an overall positive sentiment by acknowledging the potential benefits of ChatGPT. The sentiment polarity and subjectivity measures indicates an overall positive tone in the concerns.

5. Conclusions

This research was carried out by exploring the applications and implications of artificial intelligence tool, ChatGPT across two distinct domains: social media and educational perspectives. Our research questions and the key findings suggest valuable insights into the perception of ChatGPT. Through NLP, sentiment analysis using LDA, and machine learning models, the research discovered potential valuable terms, keywords, and topics within each sentiment categories.

Our findings underscore the significant benefits of ChatGPT, as a valuable tool for writing assistance, idea generation, and boosting creativity skills. The advancement of artificial intelligence revolutionizes the learning process in educational practices. However, despite the strengths and advantages, our research has also discovered critical concerns, fears, worries from both social media users and educators. These concerns cannot be disregarded and should be taken under careful safeguards considerations by establishing effective use for students with clear guidelines, especially in educational contexts.

Looking forward to future research endeavors sheds the light on the need to consider the trade-offs between various model evaluation metrics and the complexity of computations of each machine learning model, particularly when using with advanced parametrizations. The constrained timeframe of this research prevented the implementations of advanced models, which could have enhanced the effectiveness of performing NLP tasks.

Abbreviations

AI	Artificial Intelligence
ML	Machine Learning
TF-IDF	Term Frequency-Inverse Document Frequency
RF	Random Forest
NLP	Natural Language Processing
LDA	Latent Dirichlet Allocation

Author Contributions

Farhana Bina is the sole author. The author read and approved the final manuscript.

Conflicts of Interest

The author declares no conflicts of interest.

References

[1]	Alves de Castro, C. (2023) ‘A discussion about the impact of CHATGPT in education: Benefits and concerns’, Journal of Business Theory and Practice, 11(2). https://doi.org/10.22158/jbtp.v11n2p28
[2]	Ansari, K. (2023) Cracking the CHATGPT code: A deep dive into 500,000 tweets using advanced NLP techniques, Medium. Available at: https://medium.com/@ka2612/the-chatgpt-phenomenon-unraveling-insights-from-500-000-tweets-using-nlp-8ec0ad8ffd37 (Accessed: 04 September 2023).
[3]	Ansari, K. (2023a) 500k chatgpt-related tweets Jan-Mar 2023, Kaggle. Available at: https://www.kaggle.com/datasets/khalidryder777/500k-chatgpt-tweets-jan-mar-2023 (Accessed: 04 September 2023).
[4]	Ansari, K. (2023c) Effortlessly scraping massive twitter data with snscrape: A guide to scraping 1000,000 tweets in., Medium. Available at: https://medium.com/@ka2612/effortlessly-scraping-massive-twitter-data-with-snscrape-a-guide-to-scraping-1000-000-tweets-in-d01c38e82d18 (Accessed: 04 September 2023).
[5]	Ashioyajotham (2023) Chat GPT tweet analysis, Kaggle. Available at: https://www.kaggle.com/code/ashioyajotham/chat-gpt-tweet-analysis/notebook (Accessed: 04 September 2023).
[6]	Goswami, S. and Raychaudhuri, D. (2020) Identification of disaster-related tweets using natural language processing: International conference on recent trends in Artificial Intelligence, IOT, Smart Cities & Applications (ICAISC-2020), SSRN. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3610676 (Accessed: 05 September 2023).
[7]	Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly Media, Inc.
[8]	GitHub - igorbrigadir/twitter-advanced-search: Advanced Search for twitter. (n. d.). GitHub. https://github.com/igorbrigadir/twitter-advanced-search
[9]	Korkmaz, A., Aktürk, C., & Talan, T. (2023). Analyzing the User's Sentiments of ChatGPT Using twitter Data. Iraqi Journal for Computer Science and Mathematics, 202–214. https://doi.org/10.52866/ijcsm.2023.02.02.018
[10]	Sudirman, I. D., & Rahmatillah, I. (2023). Artificial Intelligence-Assisted Discovery Learning: An Educational Experience for Entrepreneurship Students Using ChatGPT. In IEEE World AI IoT Congress (AIIoT) (pp. 979-8-3503-3761-7/23/$31.00). IEEE. https://doi.org/10.1109/AIIoT58121.2023.10174472
[11]	Fütterer, T., Fischer, C., Alekseeva, A., et al. (2023). ChatGPT in Education: Global Reactions to AI Innovations. 10 May 2023. PREPRINT (Version 1). Available at Research Square. https://doi.org/10.21203/rs.3.rs-2840105/v1
[12]	K. A. (2023). AI in Education - Evaluating ChatGPT as a Virtual Teaching Assistant. International Journal For Multidisciplinary Research, 5(4). https://doi.org/10.36948/ijfmr.2023.v05i04.4484
[13]	Li, Lingyao & Ma, Zihui & Fan, Lizhou & Lee, Sanggyu & Yu, Huizi & Hemphill, Libby. (2023). ChatGPT in education: A discourse analysis of worries and concerns on social media. arXiv - CS - Computers and Society Pub Date: 2023-04-29, https://doi.org/arxiv-2305.02201
[14]	Nelson, J. (2023, March 31). ChatGPT sparks concerns about future of education: Will it impact the 'integrity' of academic institutions? Fox Business. Retrieved from: https://www.foxbusiness.com/media/chatgpt-sparks-concerns-future-education-impact-integrity-academic-institutions
[15]	Zhai, X. (2022). ChatGPT User Experience: Implications for Education. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4312418
[16]	Cotton, D. R. E., Cotton, P. A., & Shipway, J. R. (2023). Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 12. https://doi.org/10.1080/14703297.2023.2190148
[17]	Dukewich, K., & Larsen, C. (2023). How are faculty reacting to ChatGPT? Slowly and thoughtfully written by two humans. Kwantlen Polytechnic University & Langara College. March 15, 2023.
[18]	Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences, 13(4), 410. https://doi.org/10.3390/educsci13040410
[19]	Gordon, C. (2023, April 30). How Are Educators Reacting To Chat GPT? Forbes. https://www.forbes.com/sites/cindygordon/2023/04/30/how-are-educators-reacting-to-chat-gpt/?sh=565bc762f1ca
[20]	Abecina, M. (2023). How ChatGPT will impact the future of education - McCrindle. McCrindle. https://mccrindle.com.au/article/how-chatgpt-will-impact-the-future-of-education/
[21]	Malik, A., Khan, M. L., & Hussain, K. (2023). How is ChatGPT Transforming Academia? Examining its Impact on Teaching, Research, Assessment, and Learning. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4413516
[22]	Mandelaro, J. (2023) How will AI chatbots like chatgpt affect higher education?, News Center. Available at: https://www.rochester.edu/newscenter/chatgpt-artificial-intelligence-ai-chatbots-education-551522/ (Accessed: 18 August 2023).
[23]	Pittalwala I. (2023). Is chatgpt a threat to education? University of California, Riverside, News. Available at: https://news.ucr.edu/articles/2023/01/24/chatgpt-threat-education (Accessed: 18 August 2023).
[24]	Rege, M., & Yarmoluk, D. (2023, April 12). The Impact of Artificial Intelligence and ChatGPT on Education. University of St. Thomas, Newsroom. Retrieved from: https://news.stthomas.edu/the-impact-of-artificial-intelligence-and-chatgpt-on-education/
[25]	Essien, D. A. (2023) The impact of chatgpt in higher education: A closer look, Bristol Institute for Learning and Teaching Blog. Available at: https://bilt.online/the-impact-of-chatgpt-in-higher-education-a-closer-look/ (Accessed: 18 August 2023).
[26]	CambriLearn Online School - Accredited Online Schooling. (2023, January). The impact of CHATGPT on Education. Available at: https://cambrilearn.com/blog/impact-chatgpt-education (Accessed: 18 August 2023).

Cite This Article

Plain Text BibTeX RIS

APA Style

Bina, F. (2025). Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning. International Journal of Data Science and Analysis, 11(3), 76-98. https://doi.org/10.11648/j.ijdsa.20251103.13

Copy | Download

ACS Style

Bina, F. Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning. Int. J. Data Sci. Anal. 2025, 11(3), 76-98. doi: 10.11648/j.ijdsa.20251103.13

Copy | Download

AMA Style

Bina F. Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning. Int J Data Sci Anal. 2025;11(3):76-98. doi: 10.11648/j.ijdsa.20251103.13

Copy | Download

@article{10.11648/j.ijdsa.20251103.13,
author = {Farhana Bina},
title = {Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning
},
journal = {International Journal of Data Science and Analysis},
volume = {11},
number = {3},
pages = {76-98},
doi = {10.11648/j.ijdsa.20251103.13},
url = {https://doi.org/10.11648/j.ijdsa.20251103.13},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20251103.13},
abstract = {The advancement in Artificial Intelligence technology revolutionizes new opportunities and challenges, particularly with large language model ChatGPT, in various domains, especially in the educational platform. This research endeavors a comprehensive analysis to explore the concerns and expressions associated with this AI tool on the social media platform X and in academic contexts. Two distinct datasets, comprising X data and survey responses from academics, were utilized to achieve the objectives. This research examines the valuable concerns regarding ChatGPT among X users on social media platform. To implement the Natural Language Processing (NLP) techniques which included Sentiment Analysis and Topic Modeling using Latent Dirichlet Analysis (LDA), the study aimed to identify the significant insights expressed by the social media users. The analysis obtained that, most frequent discussed topic was “ChatGPT”. The majority of discussions among the X users were positive in sentiment (49%), focusing on the utility of ChatGPT. Comparatively, negative discussions (47%) were also expressed by the users (47%) about students’ cheating in exams, and the generation of inaccurate information, which could affect students’ learning skills, and their critical thinking. Furthermore, approximately 27% of the discussions were expressed neutral sentiment regarding the generation of contents by ChatGPT. Various machine learning models were implemented to predict the classification of sentiment labels correctly. The Random Forest model performed well to classify all the sentiment labels correctly compared to others with highest accuracy of 62%. This research also unveiled the academics’ opinion in the context of education. A case study was conducted among the academics, where approximately 59% reported using ChatGPT for academic purposes and academics (24%) use this tool occasionally. In terms of its usefulness, 32% academics consider it is as useful, especially for generating writing contents. Additionally, 29% of them believed that this tool primarily improves students’ language and writing skills but they also expressed the concerns about overreliance potentially impacting their critical thinking and violating academic integrity. The major concerned keywords for academics include “research”, “accuracy of information”, and “critical thinking”, while for students, “academic integrity”, “critical thinking”, “risk”, “copy-paste”, and “creativity skills”. The majority of the sentiments regarding the concerns were negative for students (38%), and minority for academics (28%). Overall, academics expressed positive sentiments about the utility of using ChatGPT. This research highlights these findings and recommends further exploration of using this tool in educational practices with a focus on the identified concerns to guide future implementation.
},
year = {2025}
}

Copy | Download

TY - JOUR
T1 - Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning

AU - Farhana Bina
Y1 - 2025/06/19
PY - 2025
N1 - https://doi.org/10.11648/j.ijdsa.20251103.13
DO - 10.11648/j.ijdsa.20251103.13
T2 - International Journal of Data Science and Analysis
JF - International Journal of Data Science and Analysis
JO - International Journal of Data Science and Analysis
SP - 76
EP - 98
PB - Science Publishing Group
SN - 2575-1891
UR - https://doi.org/10.11648/j.ijdsa.20251103.13
AB - The advancement in Artificial Intelligence technology revolutionizes new opportunities and challenges, particularly with large language model ChatGPT, in various domains, especially in the educational platform. This research endeavors a comprehensive analysis to explore the concerns and expressions associated with this AI tool on the social media platform X and in academic contexts. Two distinct datasets, comprising X data and survey responses from academics, were utilized to achieve the objectives. This research examines the valuable concerns regarding ChatGPT among X users on social media platform. To implement the Natural Language Processing (NLP) techniques which included Sentiment Analysis and Topic Modeling using Latent Dirichlet Analysis (LDA), the study aimed to identify the significant insights expressed by the social media users. The analysis obtained that, most frequent discussed topic was “ChatGPT”. The majority of discussions among the X users were positive in sentiment (49%), focusing on the utility of ChatGPT. Comparatively, negative discussions (47%) were also expressed by the users (47%) about students’ cheating in exams, and the generation of inaccurate information, which could affect students’ learning skills, and their critical thinking. Furthermore, approximately 27% of the discussions were expressed neutral sentiment regarding the generation of contents by ChatGPT. Various machine learning models were implemented to predict the classification of sentiment labels correctly. The Random Forest model performed well to classify all the sentiment labels correctly compared to others with highest accuracy of 62%. This research also unveiled the academics’ opinion in the context of education. A case study was conducted among the academics, where approximately 59% reported using ChatGPT for academic purposes and academics (24%) use this tool occasionally. In terms of its usefulness, 32% academics consider it is as useful, especially for generating writing contents. Additionally, 29% of them believed that this tool primarily improves students’ language and writing skills but they also expressed the concerns about overreliance potentially impacting their critical thinking and violating academic integrity. The major concerned keywords for academics include “research”, “accuracy of information”, and “critical thinking”, while for students, “academic integrity”, “critical thinking”, “risk”, “copy-paste”, and “creativity skills”. The majority of the sentiments regarding the concerns were negative for students (38%), and minority for academics (28%). Overall, academics expressed positive sentiments about the utility of using ChatGPT. This research highlights these findings and recommends further exploration of using this tool in educational practices with a focus on the identified concerns to guide future implementation.

VL - 11
IS - 3
ER -

Copy | Download

Author Information

Farhana Bina

Department of Statistics and Data Science, Jahangirnagar University, Dhaka, Bangladesh

Contact Email

http://orcid.org/0000-0002-5352-1489

Download PDF

Plain Text BibTeX RIS

APA Style

Bina, F. (2025). Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning. International Journal of Data Science and Analysis, 11(3), 76-98. https://doi.org/10.11648/j.ijdsa.20251103.13

Copy | Download

ACS Style

Bina, F. Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning. Int. J. Data Sci. Anal. 2025, 11(3), 76-98. doi: 10.11648/j.ijdsa.20251103.13

Copy | Download

AMA Style

Bina F. Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning. Int J Data Sci Anal. 2025;11(3):76-98. doi: 10.11648/j.ijdsa.20251103.13

Copy | Download

Copy | Download

TY - JOUR
T1 - Analysing Concerns and Expressions of Using ChatGPT on Social Media and Educational Platform: An Application of Natural Language Processing and Machine Learning

VL - 11
IS - 3
ER -

Copy | Download