Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  git
5cc783af1e
twitter sentiment airline analysis
2 weeks ago
5cc783af1e
twitter sentiment airline analysis
2 weeks ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

twitter-airline-sentiment-analysis

Context

Twitter boasts a staggering 330 million monthly active users, providing businesses with an extensive platform to connect directly with customers. However, the sheer volume of information can make it challenging for brands to quickly identify negative social mentions that could potentially damage their reputation.

This has elevated the importance of sentiment analysis, a process that tracks the mood underlying social media conversations, as a critical component of social media marketing strategies.

By analyzing customer sentiments on Twitter, companies can gain insights into public perception about their products/services. This helps them stay informed about their brand image, monitor competitors, and spot emerging trends within the industry.

Objective

The goal is to conduct a sentiment analysis concerning the issues faced by each major U. S. airline, classifying tweets into positive, neutral, and negative categories.

Data Description

The Twitter data, scraped from February 2015, involved contributors classifying tweets into positive, negative, and neutral sentiments, and further categorizing the negative tweets by specific issues like "late flight" or "rude service."

Data Source:

  • This project uses a dataset available on Kaggle.
  • Access the dataset through the Kaggle project page: Twitter Airline Sentiment

Skills

  • Perform text-based exploratory data analysis (EDA).
  • Apply basic text pre-processing techniques.
  • Post text pre-processing, implement:
    • Bag of words technique.
    • Tf-idf (Term Frequency-Inverse Document Frequency) method.
  • Develop the classification model.
  • Fine-tune and assess the model's performance.
  • Visualize important features identified after training the model.

Actionable Insights and Recommendations

Summary

  • The objective of this project was to develop a predictive model for the sentiment of tweets about airlines.
  • Initial steps involved exploratory data analysis to understand the dataset's characteristics.
  • We pre-processed the data to optimize it for the modeling process.
  • Feature vectors were created using CountVectorizer and TF-IDF vectorization methods.
  • We constructed two RandomForestClassifier models.
  • Both models were trained and evaluated using test data to assess their performance.

Insights(model)

  • The CountVectorizer technique outperformed TF-IDF slightly in predictive accuracy, achieving 75% compared to TF-IDF’s 74%.
  • The recall for Neutral and Negative tweets was low across both models.
  • The models scored higher F1-scores for Negative tweets, indicating better precision and recall for this category.
  • A common issue with both models was their tendency to misclassify Neutral and Negative tweets.
  • The three most influential words identified were: hour, thank, and flight .
  • Overall, the models performed moderately well in predicting tweet sentiments.
  • Of the two, the model using CountVectorizer was more effective.

Insights(data)

  • The dataset exhibits an imbalance in the target variables, which could affect model accuracy.
  • It is evident that customers tend to post tweets predominantly when expressing negative experiences.
  • United, US Airways, and American Airlines received the highest proportion of negative tweets among the airlines.
  • There is also an uneven distribution of tweets among the airlines, with United Airlines receiving a significant majority.
  • For enhanced accuracy, consider using unsupervised learning models like TextBlob or VADER for sentiment analysis. These models can assess sentiments in text effectively without requiring prior training.

Recommendations

Address Data Imbalance:

  • Apply more sophisticated techniques to balance the dataset, such as different resampling strategies for the minority class or using anomaly detection methods to refine the handling of Neutral tweets.

Targeted Sentiment Analysis on Specific Airlines:

  • Build customized sentiment analysis models for each airline, particularly those with a high volume of negative feedback like United, US Airways, and American Airlines. Tailored models could capture specific sentiment triggers unique to each airline.

Integration of Unsupervised Learning Models:

  • Implement unsupervised models like TextBlob or VADER to complement the supervised learning approach. This dual approach can help validate findings and uncover additional insights from the data.

Continuous Model Evaluation and Updating:

  • Set up a pipeline for continuous learning where the model periodically retrains on new data, incorporating the latest customer sentiments and feedback. This helps the model stay relevant as customer behaviors and language evolve.

Expanding Data Sources: - To reduce the bias towards Twitter’s style and demographic, integrate data from other platforms such as Facebook, Instagram, and airline review sections on travel websites. This will help diversify the data inputs and likely improve model robustness.


Tip!

Press p or to see the previous file or, n or to see the next file

About

Utilizing Natural Language Processing to Analyze Sentiment Trends in Airline Customer Tweets.

Collaborators 1

Comments

Loading...