Sentiment analysis helps to analyze what is happening for a product or a person or anything around us. Suppose you are going to know about a Person or a Product or a Business to buy, prime property in a location. Here we need to understand before commenting anything about it. It is very useful in business decision making, and competitive analysis.
Let’s see how sentiment analysis can be done. The primary requirement is data and where it resides. Data can be in many forms like Word document, servers, Social media platforms, websites etc..,
What are the steps involved in sentiment analysis using python?
- Data cleansing (Text preprocess)
- Pre- trained algorithm or train your own algorithm
- Execute your model for unseen data(Cleaned data)
- Classify whether the sentiment is Positive or Negative
Basically Sentiment analysis can be performed using many ways using NLTK, Regular Expressions, VaderSentiment, TEXTBLOB, Logistic Regression, Naïve Bayes, SGD etc..,
Here I am performing sentiment analysis with Logistic Regression and SGD (Stochastic Gradient Descent) in python.
Steps involved in python
1.) Library – Which library are we going to use? There are many ways to do sentiment analysis like -‘NLTK’, ‘TEXTBLOB’, ‘Logistic Regression’, ‘VaderSentiment’, ‘Naive Bayes algorithm’, ‘SGD(Stochastic Gradient Descent)’ etc..,
2.) Data(Training) for Algorithms – Pre classified data based on positive words, negative words from a huge text.
a.) Downlaod data from link – http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz
b.) Text preprocess – Training data
c.) Split data – Training and Testing
3.) Fit Model & Predict – Basically we are fitting the model with train data and predict using test data for accuracy.
4.) Data(Real) Sentiment analysis – Extract anyone of the types direct data like ‘string’ typed by you,text file, csv file or social media streaming (Twitter, Facebook, Reddit etc..,) or web scrapping (Web pages)
a.) Extract data – Here data is extracted from ‘Twitter’
b.) Text preprocessing (We are not performing all the listed below but these are steps taken care using some functions)
– Unstructured data to structured data
– Removing special characters, symbols
– Removing stop words
– BOW (Bag of words) / Tokenization
– Upper case to Lower case conversion
– Stemming/ Lemmatization
– NER ( Named Entity Recognition)
– Covert ‘Word to vector’