2013-12-19

Python 3 is usually seen as the new Python version which breaks compatibility and raises new Unicode issues. Python 3 is much more than that. It’s a new clean language which has a more consistent syntax. It has many new features, not less than 15 new modules. Python 3 is already well supported by major Linux distributions, whereas Python 2.7 reached its end-of-life. Slowly, some bugs cannot be fixed in Python 2.7 anymore and are only fixed in the latest Python 3 release. Python 3 is now 5 years old and considered as a mature programming language.

This article describes Python 3.3 with a preview of Python 3.4 which is scheduled at the end of february 2014!

New Python 3 features

Python 3 has too many useful new features to list them all here. Read each “What’s New in Python 3.x” document for the full list of changes: Python 3.0, Python 3.1, Python 3.2, Python 3.3 and Python 3.4. A lot of exciting stuff (asyncio) is coming in Python 3.4!

To give you an overview of new features, 8 new modules were added between Python 2.7 and 3.3:

concurrent.futures (3.2): high-level interface for asynchronously executing callables, pool of threads and pool of processes ;

faulthandler (3.3): debug module to dump the traceback of Python threads on a crash, on a signal or after a timeout ;

importlib (3.1): portable implementation of the import statement written in pure Python (a minor subset is also available in Python 2.7) ;

ipaddress (3.3): create, manipulate and operate on IPv4 and IPv6 addresses and networks ;

lzma (3.3): classes and convenience functions for compressing and decompressing data using the LZMA compression algorithm ;

tkinter.ttk (3.1): Tk themed widget set introduced in Tk 8.5 ;

unittest.mock (3.3): replace parts of your system under test with mock objects and make assertions about how they have been used ;

venv (3.3): create lightweight “virtual environments” with their own site directories.

Python 3.2 has also a new argparse module: easy to write user-friendly command-line interfaces, but this module is also available in Python 2.7.

As many modules have been added between Python 3.3 and 3.4 as between Python 2.7 and Python 3.3! Python 3.4 has 7 new modules:

asyncio: infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives ;

enum: set of symbolic names (members) bound to unique, constant values ;

ensurepip: bootstrap the pip installer into an existing Python installation or virtual environment ;

pathlib: classes representing filesystem paths with semantics appropriate for different operating systems ;

selectors: high-level and efficient I/O multiplexing, built upon the select module primitives ;

statistics: functions for calculating mathematical statistics of numeric (Real-valued) data ;

tracemalloc: debug tool to trace memory blocks allocated by Python.

Having more modules in the Python standard library reduces the number of dependencies and so simplify the deployment of an application.

Finally, Python 3 has a much better support of Unicode. In short, it makes the internationalization (i18n) of an application easier. When porting an existing application written for Python 2 to Python 3, you may have the opposite feeling because all Unicode related issues should be fixed at once. If you write a new Python 3 application from scratch, Unicode “just works” and you don’t have to care of all these bytes versus characters, encodings and mojibake issues.

For a nice overview of changes between Python 2.7 and 3.3, see the “Python 3.3: Trust Me, It’s Better Than Python 2.7″ talk of Dr. Brett Cannon at PyCon US 2013: slides and video.

Python 3 is fast

Examples of Python 3 optimizations:

The mechanism for serializing execution of concurrently running Python threads (generally known as the GIL or Global Interpreter Lock) has been rewritten. Among the objectives were more predictable switching intervals and reduced overhead due to lock contention and the number of ensuing system calls.

range(), map(), dict.keys(), etc. now create an iterator or a generator instead of a temporary list to use less memory, it can also be faster

Unicode strings always use the most compact storage, up to 4 times smaller (ex: 1 byte per ASCII character instead of 4). A nice side effect is that some string operations on ASCII strings are up to 4 times faster. Compared to Python 2.7, Python 3.3 uses a little bit less memory than Python 2.7 on Django, see PEP 393: Performance and resource usage.

The decimal module has been reimplemented in C: it is now between 12x and 120x faster

The main loop evaluating bytecode is now 20% faster thanks to “computed goto“

The Python peephole optimizer produces more efficient bytecode. None, False and True are now keywords and so can be optimized. “x in {1, 2, 3}” pattern is optimized as “x in frozenset({1, 2, 3})” where the frozenset is stored as a pre-built constant.

Common text codecs (ASCII, Latin1, UTF-8) are two to four times faster.

The json module now has a C extension to substantially improve its performance.

Other Python 3 changes made Python 3 slower on some operations. To be fair, overall performances of Python 3.3 are almost the same than Python 2.7 performances. But some functions of your application can now be much faster on Python 3.

See the “Optimizations” section of each “What’s New in Python 3.x” document for the full list of optimizations: Python 3.1, Python 3.2, Python 3.3 and Python 3.4.

Status of Python 3 in Linux distributions

All major Linux distributions provide Python 3.3, or Python 3.2. RedHat 6 has been providing Python 3.3 in its new “Red Hat Software Collections” since September 2013.

ArchLinux already switched to Python 3 by default three years ago. Fedora and Ubuntu plan to switch in a near future. Ubuntu wants to go further, remove Python 2 from the default installation in april 2014: Shipping only Python 3 on the 14.04 CD. Fedora scheduled the switch in Fedora 22 (december 2014): Python 3 as the Default Implementation.

No more new features in Python 2.7

In Python, only one branch accepts new features: the default branch. Currently, the default branch is the future Python 3.4 release. Python 2.7 only accepts bugfixes, no more new features nor syntax changes. Read the “Python 2.8 Un-release Schedule” (PEP 404) for the rationale.

The development branches of Python 2 and 3 diverged so much in 5 years that it would require too much work to fix some bugs in Python 2. Python developers don’t want to duplicate their effort and prefer to focus on the next release. Another reason is to not introduce regressions. Python 2.7 is now very stable and heavily used in production. A single minor change might introduce a regression, even if Python has a very good code coverage with its huge test suite.

A recent example of a bug that cannot be fixed in Python 2.7 is the “Secure and interchangeable hash algorithm” (PEP 456) which fixes a security vulnerability (“hash DoS”). The vulnerability was partially fixed in Python 2.7: the hash function can now be randomized, but it should be done explicitly on the command line or using an environment variable. Read the Denial of service via hash collisions article (Jake Edge, January 2012) for more information. In Python 3.3, the hash function is randomized by default. Python 3.4 will use a new fast cryptographic hash function “SipHash”, see Python adopts SipHash (Jake Edge, November 2013). Python 2.7 and 3.3 are vulnerable but will not be fixed: see the end of the Python issue #14621: “Hash function is not randomized properly”.

Another example of a bug that cannot be fixed in Python 2.7 is the “Safe object finalization” (PEP 442). This PEP fixes an annoying memory leak of Python, which existed since the first Python version (23 years ago). When a group of Python objects are linked all together and at least one object has a destructor (__del__ method), these objects will never be deleted. It is difficult to identify reference cycles, but it is possible to workaround this issue using weak references. Thanks to the PEP 442, implemented in Python 3.4, objects with a destructor are now deleted by the garbage collector.

Why should OpenStack move to Python 3 right now?

The motivation to move away from Python 2 is the technical debt. OpenStack is one of the largest opensource project in term of lines of code, more than 2.5 million lines of code. If an application is not updated to track evolution of new features, it dies slowly. A technical debt has a concrete price: the longer you wait to port OpenStack to Python 3, the more expensive the portage will be.

Linux distributions want to slowly remove Python 2 and stop supporting it. Python 2.7 has more and more bugs, some bugs cannot be fixed anymore. If OpenStack is not ported on Python 3, it will more and more difficult to maintain it in a near future.

The development on Python 2 becomes also more expensive because new useful features must be backported. The backported code should be maintained, whereas new modules part of the Python standard library are maintainted by Python developers.

Switching to Python 3 will also improve performances, reduce the number of dependencies, and more generally to make OpenStack more robust.

The portage of OpenStack to Python 3 already started. A second blog article will give a status of this portage and propose a planning to port the servers, not only the clients.

Show more