Note
This is about packaging libraries, not applications.
For now there are no special considerations for C extensions here.
I think the packaging best practices should be revisited, there are lots of good tools now-days that are either unused or
underused. It's generally a good thing to re-evaluate best practices all the time.
I assume here that your package is to be tested on multiple Python versions, with different combinations of dependency
versions, settings etc.
And few principles that I like to follow when packaging:
If there's a tool that can help with testing use it. Don't waste time building a custom test runner if you can just
use nose or py.test. They come with a large
ecosystem of plugins that can improve your testing.
When possible, prevent issues early. This is mostly a matter of strictness and exhaustive testing. Design things to
prevent common mistakes.
Collect all the coverage data. Record it. Identify regressions.
Test all the possible configurations.
The structure*
This is fairly important, everything revolves around this. People layout packages like this:
I think this is actually a bad practice, a popular anti-pattern
perpetrated
by
outdated and
abandoned docs.
It's mainly reasoned around the idea that you should not install your package to run the tests. The only palpable
advantage this brings is that you can test before you install/deploy/build/whatever. In my opinion this not desirable
for a library.
I prefer this sort of layout:
The src dir is a better approach because:
You should not test code that is not installable - you should test installed code. Your users will use the
installed code not whatever you happen to have in your current development directory.
There are better tools. You don't need to deal with installing packages just to run the tests anymore. Just use tox -
it will install the package for you [2] automatically, zero fuss.
You need to test the installation too. If you ever uploaded a distribution on PyPI
with missing modules or broken dependencies it's because you didn't test the installation.
Simpler packaging code and manifest. It makes
manifests very simple to write (e.g.: you package a Django app that has templates or static files). Also, zero fuss
for large libraries that have multiple packages. Clear separation of code being packaged and code doing the packaging.
The setup.py should be as simple as this. E.g.:
You'll notice here that I don't include the tests in the installed packages. Because:
Module discovery tools will trip over your test modules. Strange things usually happen in test module. The help
builtin does module discovery. E.g.:
Tests usually require additional dependencies to run, so they aren't useful by their own - you can't run them
directly.
Tests are concerned with development, not usage.
It's extremely unlikely that the user of the library will run the tests instead of the library's developer. E.g.: you
don't run the tests for Django while testing your apps - Django is already tested.
The tests*
Again, it seems people fancy the idea of running python setup.py
test to run the package's tests. I think that's not worth doing - setup.py test is a failed experiment to
replicate some of CPAN's test system.
Python doesn't have a common test result protocol so it serves no purpose to have a common test command [1]. At least
not for now - we'd need someone to build specifications and services that make this worthwhile, and champion them. I
think it's important in general to recognize failure where there is and go back to the drawing board when that's
necessary - there are absolutely no services or tools that use setup.py test command in a way that brings added
value. Something is definitely wrong here.
I believe it's too late now for PyPI to do anything about it, Travis is already a solid,
reliable, extremely flexible and free alternative. It integrates very well with Github - builds
will be run automatically for each Pull Request.
To test locally tox is a very good way to run all the possible testing configurations (each configuration will be a
tox environment). I like to organize the tests into a matrix with these additional environments:
check - check package metadata (e.g.: if the restructured text in your long description is valid)
clean - clean coverage
report - make coverage report for all the accumulated data
docs - build sphinx docs
I also like to have environments with and without coverage measurement and run them all the time. Race conditions are
usually performance sensitive and you're unlikely to catch them if you're everything with coverage measurements.
The test matrix*
Depending on dependencies you'll usually end up with a huge number of combinations of python versions, dependency
versions and different settings. Generally people just hard-code everything in tox.ini or only in .travis.yml.
They end up with incomplete local tests, or test configurations that run serially in Travis. I've tried that, didn't
like it. I've tried duplicating the environments in both tox.ini and .travis.yml. Still didn't like it.
Eventually I've implemented a generator script that uses templates to generate tox.ini and .travis.yml. This is
way better, it's DRY, you can easily skip running tests on
specific configurations (e.g.: skip Django 1.4 on Python 3) and there's less work to change things.
I use something like this (a ugly configure file in the root of the project):
.travis.tmpl.yml:
This has some goodies in it: the libSegFault.so trick and a PyPy upgrade.
It basically just runs tox.
tox.tmpl.ini:
Note
I'm still looking for a way to use this cleanly on Windows. Perhaps have the configure script find all the installed
pythons and send the correct paths in the templates?
If you've been patient enough to read through that you'll notice:
The Travis configuration uses tox for each item in the matrix. This makes testing in Travis consistent with testing
locally.
The environment order for tox is clean, check, 2.6-1.3, 2.6-1.4, ..., report.
The environments with coverage measurement run the code without installing (usedevelop = true) so that coverage
can combine all the measurements at the end.
The environments without coverage will sdist and install into virtualenv (tox's default behavior [2]) so that
packaging issues are caught early.
The report environment combines all the runs at the end into a single report.
Having the complete list of environments in tox.ini is a huge advantage:
You run everything in parallel locally (if your tests don't need strict isolation) with detox. And you can still run
everything in parallel if you want to use drone.io instead of Travis.
You can measure cummulated coverage for everything (merge the coverage measurements for all the environments into a
single one) locally.
I'm still not completely happy with the configure script, it's easy to mess it up (code mixed with configuration,
templates look ugly because indentation can't be used) but there's very little of it at least. It's still way better
than maintaining tox.ini and .travis.yml by hand. Hopefully someone will build a tool one day to do this cleanly
[3].
Test coverage*
There's Coveralls - a nice way to track coverage over time and over multiple builds. It will automatically add comments
on Github Pull Request about changes in coverage.
TL;DR*
Put code in src.
Use tox and detox.
Test both with coverage measurements and without.
Use a generator script for tox.ini and .travis.ini.
Run the tests in Travis with tox to keep things consistent with local testing.
[1]
There's subunit and probably others but they are widely used.
[2]
(1, 2) See https://testrun.org/tox/latest/example/basic.html?highlight=install
[3]
There is a feature specification/proposal in tox for multi-dimensional configuration but it still doesn't solve the problem of generating the
.travis.yml file. There's also tox-matrix but it's not flexibile
enough.