The post Twitter Sentiment Analysis using Logistic Regression, Stochastic Gradient Descent appeared first on Data science Tutor.

]]>Sentiment analysis helps to analyze what is happening for a product or a person or anything around us. Suppose you are going to know about a Person or a Product or a Business to buy, prime property in a location. Here we need to understand before commenting anything about it. It is very useful in business decision making, and competitive analysis.

Let’s see how sentiment analysis can be done. The primary requirement is data and where it resides. Data can be in many forms like Word document, servers, Social media platforms, websites etc..,

**What are the steps involved in sentiment analysis using python?**

- Libraries
- Data
- Data cleansing (Text preprocess)
- Pre- trained algorithm or train your own algorithm
- Execute your model for unseen data(Cleaned data)
- Classify whether the sentiment is Positive or Negative

Basically Sentiment analysis can be performed using many ways using NLTK, Regular Expressions, VaderSentiment, TEXTBLOB, Logistic Regression, Naïve Bayes, SGD etc..,

Here I am performing sentiment analysis with Logistic Regression and SGD (Stochastic Gradient Descent) in python.

**Steps involved in python**

1.) Library – Which library are we going to use? There are many ways to do sentiment analysis like -‘NLTK’, ‘TEXTBLOB’, ‘Logistic Regression’, ‘VaderSentiment’, ‘Naive Bayes algorithm’, ‘SGD(Stochastic Gradient Descent)’ etc..,

2.) Data(Training) for Algorithms – Pre classified data based on positive words, negative words from a huge text.

a.) Downlaod data from link – http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz

b.) Text preprocess – Training data

c.) Split data – Training and Testing

3.) Fit Model & Predict – Basically we are fitting the model with train data and predict using test data for accuracy.

4.) Data(Real) Sentiment analysis – Extract anyone of the types direct data like ‘string’ typed by you,text file, csv file or social media streaming (Twitter, Facebook, Reddit etc..,) or web scrapping (Web pages)

a.) Extract data – Here data is extracted from ‘Twitter’

b.) Text preprocessing (We are not performing all the listed below but these are steps taken care using some functions)

– Unstructured data to structured data

– Removing special characters, symbols

– Removing stop words

– BOW (Bag of words) / Tokenization

– Upper case to Lower case conversion

– Stemming/ Lemmatization

– NER ( Named Entity Recognition)

– Covert ‘Word to vector’

The post Twitter Sentiment Analysis using Logistic Regression, Stochastic Gradient Descent appeared first on Data science Tutor.

]]>The post Bias, Variance Trade-off in Machine Learning? appeared first on Data science Tutor.

]]>**Bias**

Bias is an error term when building a learning algorithm model from a faulty assumptions. If bias is too large then the algorithm won’t able to model the relationship between input features(X) and target(y) variable output.

*“The error due to bias is taken as the difference between the average prediction of our model and the correct value which we are trying to predict.” **By scot fortman*

**Effect of High Bias** – Assume the model is not complex enough then misses out specific features or dynamics of data. Low complex models draws a straight line to fit the data points which forms high bias and low variance so the predications are in general, far from the correct values.

The scenario of high bias leads to **‘Under fitting’** the model.

**Variance**

Variance is an error term when the result of the model is executed with unseen data or new data. Then the model will be very specific to training set, gives more error in training data.

**Effect of High Variance – **Assume the model is highly complex draws high-degree polynomial line to fit the data points which forms a high variance and low bias so the distance between predictions and correct values are very small.

The scenario of high variance leads to **‘Over fitting’ **the model.

*“The error due to variance is taken as the variability of a model prediction for a given data point.” **By Scot Fortman*

__Different levels of Bias and variance combination__

The above figures shows variation in data points

**Bias, Variance trade-off**

Bias-Variance off is to get an optimal point for the model complexity. We can reduce either bias or variance but can’t reduce both bias and variance simultaneously. This can be done by modifying MSE (Mean squared error).

**Why Trade-off?**

- To minimize the error and get maximum accuracy from the model.
- To avoid over fitting and under fitting.
- To have consistencies in prediction.

**How to overcome Bias and Variance problem?**

- Training & Testing data
- Cross validation
- Dimensionality Reduction
- Regularization in Linear models/ANN
- Concept of over fitting
- Ensemble Learning
- Optimal value of k in KNN

The post Bias, Variance Trade-off in Machine Learning? appeared first on Data science Tutor.

]]>The post What is Machine Learning in a Nutshell? appeared first on Data science Tutor.

]]>Machine Learning is a computer application which can learn by itself using the experience without any user intervention. Machine Learning algorithms are built using statistics, mathematics and computer programming language. Data is the key source (input) for any Machine learning algorithms.

**Formal Definition**

Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed. “Arthur Samuel (1959)”

A computer is said to learn from experience E with respect to task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. “Tom MITCHELL (1998)”

**Difference between traditional programming and Machine Learning approach.**

In traditional programming model, we provide data as input then CPU processes set of code and finally gives the expected output data. Here any changes to the code due to error, needs to be a redesign of code.

In Machine Learning models, we provide output data(past data) as input to ML algorithms then CPU processes machine learning algorithms and finally system gives a new program as part of output. This program is typically called as ‘Models’ and it can be used any no. of times with data.

Machine Learning is a subset of Data science, also a main of component analytics and it is classified into four types.

- Supervised Learning
- Unsupervised Learning
- Re-enforcement Learning
- Evolutionary Learning

Let’s see all the Learning types one by one.

- Supervised Learning

When the training data set has both predictors (input) and the outcome(output) variables, we use supervised learning algorithms. That is, the learning is supervised by fact that predictors(X) and outcome(y) are available for the model to use. Techniques such as Regression, Logistic Regression, Decision tree, Random forest and so on are supervised learning algorithms.

- Unsupervised Learning

When the training data has only predictor (input) variable(X), but not the outcome variable(Y), then we use unsupervised learning algorithm. Techniques such as K-means clustering and hierarchical clustering are examples of unsupervised learning algorithms.

- Re-enforcement Learning

In many cases, the input variable X and output Y are uncertain (Predictive keyboard/Spell check). The algorithms are used in sequential decision making scenarios; techniques such as dynamic programming and Markov decision process are examples of reinforcement

- Evolutionary Learning

These are algorithms that imitate human/animal learning process. They are most frequently used to solve prescriptive analytic problems. Techniques such as genetic algorithm and ant colony optimization belongs to this category.

The post What is Machine Learning in a Nutshell? appeared first on Data science Tutor.

]]>The post What is Data science in a nutshell? appeared first on Data science Tutor.

]]>**Data science** is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. Hence it creates value to the organization.

Data science is a component of analytics, it consists of statistical and operational research techniques, Machine Learning and Deep learning algorithms. Given a problem, the objective of data science component of analytics is to identify a statistical model/Machine learning model that can be used for the business problem.

**Life cycle of Data science**

Most of the Artificial intelligence components are the subset of data science or it intersects the components namely Artificial Intelligence, Machine Learning, Neural Networks, Deep Learning, Big Data.

Massive Data is getting generated by businesses every second, Internet, Hardware (Storage, Processors), Cloud computing, Networks & Community, Funding & investments, Business competitiveness.

Data is new oil for the industries and data science is the electricity that powers the industries who wants to have competitive advantage in their business. If data is used properly then it will add real value to the organization by generating useful insights using past data which human can’t handle manually.

Example : Sales predictions, Effective stock maintenance, Leveraging Production capacity, Fraud detection in financial institutions, Customer retention.

The term “data science” has appeared in various contexts over the past thirty years but did not become an established term until recently. In an early usage, it was used as a substitute for computer science by Peter Naur in 1960.

The post What is Data science in a nutshell? appeared first on Data science Tutor.

]]>