2013-11-05

Today for my 30 day challenge, I decided to take a break from JavaScript and learn about text processing using the Python programming language. I will be focusing on Sentiment Analysis in this blog. My interest in sentiment analysis is few years old when I wanted to write an application which will process a stream of tweets about a movie, and then output the overall sentiment about the movie. Having this information would help me decide if I wanted to watch a particular movie or not.

I googled around, and found that Naive Bayes classifier can be used to solve this problem. The only programming language that I knew at the time was Java, so I wrote a custom implementation and used the application for some time. I was lazy to commit the code, so when my machine crashed, I lost the code and application. Now I commit all my code to github, and I have close to 200 public repositories :)

In this blog, I will talk about a Python package called TextBlob which can help developers solve this problem. We will first cover some basics, and then we will develop a simple Flask application which will use the TextBlob API.



What is TextBlob?

TextBlob is an open source text processing library written in Python. It can be used to perform various natural language processing tasks such as part-of-speech tagging, noun-phrase extraction, sentiment analysis, text translation, and many more. You can read about all the features supported by TextBlog in the official documentation.

Why should I care?

The reason I decided to learn TextBlob are as follows:

I wanted to develop applications which require text processing. When we add text processing capabilities to the application, the application becomes more human in that it can understand behavior better. Text processing is very hard to get it right. TextBlob stands on strong shoulders of NTLK, which is the leading platform for building Python programs to work with human language data.

I wanted to learn how text processing can be done in Python.

Install TextBlob

Before we can get started with TextBlob, we need to install Python and virtualenv on the machine. The Python version I am using in this blog post is 2.7.

There are various ways to install TextBlob on the machine as mentioned in the official documentation. We will use the pip install way. For developers unaware of pip, it is Python package manager. We can install pip from the official website. Go to any convenient directory on your file system, and run following commands.

The commands above will create a myapp directory on the local machine, then activate virtualenv with Python version 2.7, then install the textblob package, and then finally download the necessary NTLK corpora.

Github Repository

The code for today's demo application is available on github: day9-textblob-demo-openshift.

Application

The demo application is running on OpenShift http://showmesentiments-t20.rhcloud.com/. It is a very simple example of using TextBlob sentiment analysis API. As user types, he will see whether the message is positive(Green), negative(Red), or neutral(Orange).



We will develop a simple Flask application which will expose a REST API. If you are not aware of Flask, you can refer to my earlier post on it.

Next we will install the Flask framework. To install the Flask framework, we will run first activate the virtualenv and then use pip to install Flask.

As I mentioned in my earlier blog post on Flask, it is awesome for writing REST based web services. Create a new file called app.py under the myapp folder.

Copy the following code and paste it in the app.py source file

The code shown above does the following:

It imports the Flask class, jsonify function, and render_template function from flask package.

It imports the TextBlob class from textblob package.

It defines a route to '/' and 'index' url. So, if a user makes a GET request to either '/' or '/index', then the index.html will be rendered.

It defines a route to '/api/v1/sentiment/' url. The is a placeholder and will contain the text message the user want to run sentiment analysis on. We create an instance of TextBlob passing it the message. Next, we get polarity and subjectivity of the message, and then create a json object and return it back.

Finally, we start the development server to run the application using the python app.py command. We also enabled debugging by passing Debug=True. Debugging provides an interactive debugger in the browser when an unexpected exceptions occur. Another benefit of the debugger is that it will automatically reload the changes. We can keep the debugger running in the background and work through our application. This provides a highly productive environment.

The index() function renders an html file. Create a new folder called templates in the myapp directory and then create new file named index.html.

Copy the content to the index.html source file which uses Twitter Boostrap to add style. We are also using jQuery to make REST calls on a keyup event. We don't make REST calls if key is backspace, tab, enter, left , right, up, down.

You can copy the js and css files from my github repository.

Deploy to the cloud

Before we can deploy the application to our cloud environment, we'll have to do few setup tasks :

Sign up for an OpenShift Account. It is completely free and Red Hat gives every user three free Gears on which to run your applications. At the time of this writing, the combined resources allocated for each user is 1.5 GB of memory and 3 GB of disk space.

Install the rhc client tool on your machine. The rhc is a ruby gem so you need to have ruby 1.8.7 or above on your machine. To install rhc, just typesudo gem install rhc
If you already have one, make sure it is the latest one. To update your rhc, execute the command shown below.sudo gem update rhc
For additional assistance setting up the rhc command-line tool, see the following page: https://openshift.redhat.com/community/developers/rhc-client-tools-install

Setup your OpenShift account using rhc setup command. This command will help you create a namespace and upload your ssh keys to OpenShift server.

To deploy the application on OpenShift just type the command shown below.

It will do all the stuff from creating an application, to setting up public DNS, to creating private git repository, and then finally deploying the application using code from my Github repository.The application will be deployed on http://day9demo-{domain-name}.rhcloud.com. Please replace {domain-name} with your account domain name. The app is running here http://showmesentiments-t20.rhcloud.com/

That's it for today. Keep giving feedback.

What's Next

Sign up for OpenShift Online

Get your own private Platform As a Service (PaaS) by evaluating OpenShift Enterprise

Need Help? Ask the OpenShift Community your questions in the forums

Showcase your awesome app in the OpenShift Developer Spotlight. Get in the OpenShift Application Gallery today.

Show more