Planet.python.org

Data Community DC: A Tutorial for Deploying a Django Application that Uses Numpy and Scipy to Google Compute Engine Usin...

2013-12-17

Introduction

This longer-than-initially planned article walks one through the process of deploying a non-standard Django application on a virtual instance provisioned not from Amazon Web Services but from Google Compute Engine. This means we will be creating our own virtual machine in the cloud and installing all necessary software to have it serve content, run the Django application, and handle the database all in one. Clearly, I do not expect an overwhelming amount of traffic to this site. Also, note that Google Compute Engine is very different from Google App Engine.

What makes this app “non-standard” is its use of both the Numpy and Scipy packages to perform fast computations. Numpy and Scipy are based on C and Fortran respectively and both have complicated compilation dependencies. Binaries may be available in some cases but are not always available for your preferred deployment environment. Most importantly, these two libraries prevented me from deploying my app to either Google App Engine (GAE) or to Heroku. I’m not saying that it is impossible to deploy Numpy- or Scipy-dependent apps on either service. However, neither service supports apps dependent on both Scipy and Numpy out-of-the-box although a limited amount of Googling suggests it should be possible.

In fact, GAE could have been an ideal solution if I had re-architected the app, separating the Django application from the computational code. I could run the Django application on GAE and allowed it to spin up a GCE instance as needed to perform the computations. One concern with this idea is the latency involved in spinning up the virtual instance for computation. Google Compute Engine instances spring to life quickly but not instantaneously. Maybe I’ll go down this path for version 2.0 if there is a need.

Just in case you are wondering, the Djanogo app in question is here https://github.com/murphsp1/ppi-css.com and the live site is here www.ppi-css.com.

If you have any questions or comments or suggestions, please leave them in the comments section below.

Google Compute Engine (GCE)

I am a giant fan of Google Compute Engine and love the fact that Amazon’s EC2 finally has a strong competitor. With that said, GCE definitely does not have the same number of tutorials or help content available online.

I will assume that you can provision your own instance in GCE either using gcutil at the command line or through the cloud services web interface provided by Google.

Once you have your project up and running, you will need to configure the firewall settings for your project. You can do this at the command line of your local machine using the command line below:

Update the Instance and Install Tools

Next, boot the instance and ssh into it from your local machine. The command line parameters required to ssh in can be daunting but fortunately Google gives you a simple way to copy and paste the command from the web-based cloud console. The command line could look something like this:

Next, we need to update the default packages installed on the GCE instance:

and install some needed development tools:

and install some basic Python-related tools:

Note that in many of my sudo apt-get commands I include –yes. This flag just prevents me from having to type “Y” to agree to the file download.

Install Numpy and Scipy (SciPy requires Fortran compiler)

To install SciPy, Python’s general purpose scientific computing library from which my app needs a single function, we need the Fortran compiler:

and then we need Numpy and Scipy and everything else:

Finally, we need to add ProDy, a protein dynamics and sequence analysis package for Python.

Install and Configure the Database (MySQL)

The Django application needs a database and there are many to choose from, most likely either Postgres or MySQL. Here, I went with MySQL for the simple reason was that it took fewer steps to get the MySQL server up and running on the GCE instance than the Postgres server did. I actually run Postgres on my development machine.

The installation process should prompt you to create a root password. Please do so for security purposes.

Next, we are going to execute a script to secure the MySQL installation:

You already have a root password from the installation process but otherwise answer “Y” to every question.

With the DB installed, we now need to create our database for Django (mine is creatively called django_test). Please note that there must not* be a space between “–password=” and your password on the command line.

Finally for this step we need the MySQL database connector for Python which will be used by our Django app:

Install the Web Server (Apache2)

You have two main choices for your web server, either the tried and true Apache (now up to version 2+) or nginx. Nginx is supposed to be the new sexy when it comes to web servers but this newness comes at the price of less documentation/tutorials online. Thus, let’s play it safe and go with Apach2.

First Attempt

First things first, we need to install apache2 and mod_wsgi. Mod_wsgi is an Apache HTTP server module that provides a WSGI compliant interface for web applications developed in Python.

This seems to be causing a good number of problems. In my Django error logs I see:

and in:

I see things like:

with the occasional segfault:

which is a strong indicator that something isn’t quite working correctly.

Second Attempt

A little bit of Googling suggests that this could be the result of a number of issues with a prebuilt mod_wsgi. The solution seems to be grab the source code and compile it on my GCE instance. To do that, I:

Now, we need to grab mod_wsgi while ssh’ed into the GCE instance:

Once mod_wsgi is intalled, the apache server needs to be told about it. On Apache 2, this is done by adding the load declaration and any configuration directives to the /etc/apache2/mods-available/ directory.

The load declaration for the module needs to go on a file named wsgi.load (in the /etc/apache2/mods-available/ directory), which contains only this:

Then you have to activate the wsgi module with:

Note: a2enmod stands for “apache2 enable mod”, this executable create the symlink for you. Actually a2enmod wsgi is equivalent to:

Now we need to update the virtual hosts settings on the server. For Debian, this is here:

Restart the service:

and also change the owner of the directory on the GCE instance that will contain the files to be served by apache:

Now that we have gone through all of that, it is nice to see things working. By default, the following page is served by the install:

If you go to the URL of the server (obtainable from the Google Cloud console), you should see a very simple example html page.

Setup the Overall Django Directory Structure on the Remote Server

I have seen many conflicting recommendations in online tutorials about how to best lay out the directory structure of a Django application in development. It would appear that after you have built your first dozen or so Django projects, you start formulating your own opinions and create a standard project structure for yourself.

Obviously, this experiential knowledge is not available to someone building and deploying one of their first sites. And, your directory structure directly impacts yours app’s routings and the daunting-at-first settings.py file. If you move around a few directories, things tend to stop working and the resulting error messages aren’t necessarily the most helpful.

The picture gets even murkier when you go from development to production and I have found much less discussion on best practices here. Luckily, I could ping on my friend Ben Bengfort and tap into his devops knowledge. The directory structure on the remote server looks like this as recommended by Mr. Bengfort.

Apache will see the htdocs directory as the main directory from which to serve files.

/static will contain the collected set of static files (images, css, javascript, and more) and media will contain uploaded documents.

/logs will contain relevant apache log files.

/django will contain the cloned copy of the Django project from Git Hub.

The following shell commands get things setup correctly:

Configuring Apache for Our Django Project

With the directory structure of our Django application sorted, let’s continue configuring apache.

First, let’s disable the default virtual host for apache:

There will be aliases in the virtual host configuration file that let the apache server know about this structure. Fortunately, I have included the ppi-css.conf file in the repository and it must be moved into position:

Next, we must enable the site using the following command:

and we must reload the apache2 service (remember this command as you will probably be using it alot)

Now, when I restarted or reloaded the apache2 service, I get the following error message:

To remove this, I simply added the following line:

to the /etc/apache2/apache2.conf file using vi. A quick

shows that the error message has been banished.

Install a Few More Python Packages

The Django application contains a few more dependencies that were captured in the requirements file included in the repository. Note that since the installation of Numpy and Scipy has already been taken care of, those lines in the requirements.txt file can be removed.

Database Migrations

Before we can perform the needed database migrations, we need to update the database section of settings.py. It should look like below:

From the GCE instance, issue the following commands:

Deploying Your Static Files

Static files, your css, javascript, images, and other unchanging files, can be problematic for new Django developers. When developing, Django is more than happy to serve your static files for you given their local development server. However, this does not work for production setttings.

The key to this is your settings.py file. In this file, we see:

For production, STATIC_ROOT must contain the directory where Apache2 will serve static content from. In this case, it should look like this:

For development, STATIC_ROOT looked like:

Next, Django comes with a handy mechanism to round up all of your static files (in the case that they are spread out in separate app directories if you have a number of apps in a single project) and push them to a single parent directory when you go into production.

Be very careful when going into production. If any of the directories listed in the STATICFILES_DIRS variable do not exist on your production server, collectstatic will fail and will not do so gracefully. The official Django documentation has a pretty good description of the entire process.

More Settings.py Updates

We aren’t quite done with the settings.py file nand need to update the MEDIA_ROOT variable with the appropriate directory on the server:

Next, the ALLOWED_HOSTS variable must be set as shown below when the Django application is run in production mode and not in debug mode:

And finally, check to make sure that the paths listed in the wsgi.py reflect the actual paths on the GCE instance.

A Very Nasty Bug

After having gone through through all of that work, I found a strange bug where the website would work fine but then become unresponsive. After extensive Googling, I found the error, best explained below:

Some third party packages for Python which use C extension modules, and this includes scipy and numpy, will only work in the Python main interpreter and cannot be used in sub interpreters as mod_wsgi by default uses. The result can be thread deadlock, incorrect behaviour or processes crashes. These is detailed
here.

The workaround is to force the WSGI application to run in the main interpreter of the process using:

WSGIApplicationGroup %{GLOBAL}

If running multiple WSGI applications on same server, you would want to start investigating using daemon mode because some frameworks don’t allow multiple instances to run in same interpreter. This is the case with Django. Thus use daemon mode so each is in its own process and force each to run in main interpreter of their respective daemon mode process groups.

The ppi-css.conf file with the required changes is now part of the repository.

Some Debugging Hints

Inevitably, things won’t work on your remote server. Obviously leaving your application in Debug mode is ok for only the briefest time while you are trying to deploy but there are other things to check as well.

Is the web server running?

If it isn’t or you need to restart the server:

What do the apache error logs say?

Also, it is never a bad idea to log into MySQL and take a look at the django_test database.

Virtual Environment – Where Did It Go?

If you noticed, I did have a requirements.txt file in my project. When I started doing local development on my trusty Mac Book Air, I used virtualenv, an amazing tool. However, I had some difficulties getting Numpy and Scipy properly compiled and included in the virtualenv on my server whereas it was pretty simple to get them up and running in the system’s default Python installation. Conversing with some of my more Django-experienced friends, they reassured me that while this wasn’t a best practice, it wasn’t a mortal sin either.

Getting to Know Git and Git Hub

Git or another code versioning tool is a fact of life for any developer. While the learning curve for the novice may be steep (or vertical), it is essential to climb this mountain as quickly as possible.

As powerful as GIT can be, I found myself using only a few commands on this small project.

First, I used git add with several different flags to stage files before committing. To stage all new and modified files (but not deleted files), use:

To stage all modified and deleted files (but not new files), use:

Or, if you want to be lazy and want to stage everything everytime (new, modified, and deleted files), use:

Next, the staged files must be committed and then pushed to GitHub.

Commands in Local Development Environment

While Django isn’t the most lightweight web framework in Python (hello Flask and others), “launching” the site in the local development environment is pretty simple. Compare the command line commands needed below to the rest of the blog. (Note that I am running OS X 10.9 Mavericks on a Mac Book Air with 8 GB of 1600 MHz DDR 3.)

First, start the local postgres server:

Next start the local development web server using the django-command-extensions that enables debugging of the site in the browser.

Once a model has changed, we needed to make a migration using South and then apply it with the two commands below:

References

There are a ton of different tutorials out there to help you with all aspects of deployment. Of course, piecing together the relevant parts may take some time and this tutorial was assemble from many different sources.

A Simple Tutorial for GCE – This is a very basic tutorial that doesn’t get into the details of getting a very simple configuration up and running on GCE.

Some Background on Apache Configuration Files on Debian – Understanding the apache2 configuration files is important for getting this to work correctly and it would appear that Debian does things a bit non-standard.

Apache 2 Web Server on Debian 7 (Wheezy)

Deploy Django on Apache with Virtualenv and mod_wsgi

The Django Book – Deploying Django, Chapter 12

Complete Single Server Django Stack Tutorial

Start to Finish – Serving Django with UWSGI/NGINX on EC2

Non-techie Guide to setting up Django, Apache, MySQL on Amazon EC2

Deploying Django on Amazon EC2 Server

Deploying Python, Django on EC2 Linux Instance

How to use Django with Apache and mod_wsgi

Setting up Django with Nginx, Gunicorn, virtualenv, supervisor and PostgreSQL

The post A Tutorial for Deploying a Django Application that Uses Numpy and Scipy to Google Compute Engine Using Apache2 and modwsgi appeared first on Data Community DC.