Planet.python.org

Martin Fitzpatrick: Gremlins in the Machine: Creating custom tools for the Pathomx data analysis platform

2014-11-18

Pathomx is a workflow-based data analysis tool built on IPython. It
began as a metabolomic-analysis toolkit, but has extended to support general data analysis
workflows. It aims to be simple to use for non-experts while powerful enough for complex
analysis tasks. Key to both of these goals is the ability to create ‘custom tools’ that
can be drag-dropped together to form larger workflows.

Previously custom tool creation required a developer installation, however with the release
of v3(.0.2) you can now create custom tools using just the standard release. To get an
idea of what you can do with Pathomx you can download it here and
follow on this brief tutorial on creating custom tools.

The custom tool we’ll be creating is a ‘Gremlin’: a mischievous destroyer of data. You
probably won’t want to use this in your analysis, but it gives a good overview of
what is possible.

The tool stub

All tools follow a basic structure we’re going to call the tool stub. To get started on
custom tool, simply download the tool stub to your local machine. Unzip the file
somewhere convenient, preferably in a specific folder for custom Pathomx tools. You should
end up with the following folder structure:

<root>

.pathomx-plugin

__init__.py

loader.py

stub.py

stub.md

icon.png

A brief description of each follows -

.pathomx-plugin indicates that this folder is a Pathomx plugin folder. It also holds some
metadata about the plugin in the Yapsy plugin format. However, you don’t need to know about
that to use it just make your changes to the example provided.

__init__.py is an empty file required by Python to import the folder as a module. Leave empty.

loader.py contains the code required to initialise the plugin and start up. You can also
define config panels, dialogs and custom views (figure plots, etc.) in this file.

stub.py contains the actual code for the tool that will run on the IPython kernel.
stub.md contains the descriptive text in Markdown format.

icon.png is the default icon for all tools in this plugin. You can add other icons and define them
specifically on a per-tool basis if you require.

You can have more than one tool per plugin using the same loader to initialise them all.
This is useful when you have a number of tools that are conceptually related. This is
seen in the standard ‘Spectra’ toolkit that offers a number of tools for dealing with frequency data.

Customising the stub

To create your custom tool start with the stub file and customise from there. For this demo we’ll
create a custom tool that randomly reorders and drops data on each iteration. We’ll call
it ‘Gremlin’.

Open up the .pathomx-plugin file and edit the metadata. The only line
you have to edit is Name but feel free to edit the other data to match.
Do not change the Module line as this is needed to load the tool. Next
rename stub.md and stub.py to gremlin.md and gremlin.py
respectively. Then open up loader.py in a suitable text editor. We’re
going to add some features to the Gremlin tool to show how it is done.

In the loader.py file you will find the following:

There are two parts to the tool. The StubTool class that defines the tool
and configures set up, etc. and the Stub loader which handles
registration of the launcher for creating new instances of the tool. You
can define as many tools in this file as you want (give them unique names)
and register them in the same Stub class __init__.

The name of the tool is defined by the name parameter to the tool definition.
If none is supplied the tool will take the name of the plugin by default.
The shortname defines the name of the files that source code and information
text are loaded from e.g. stub.py and stub.md. So change the shortname value
to gremlin and the name to Gremlin.

Below is this is the default config definition. Here you can set default
values for any configuration parameters using standard Python dictionary syntax.
We’ll add a parameter evilness that defines how much damage the gremlin
does to your data, and gremlin_type that defines what it does. Edit the self.config definition to:

We’ve defined the parameters and given them both a default value of 1. These will
now be available from within the run kernel as config[‘evilness’] and
config[‘gremlin_type’].

Below the config definition there are two lines defining the input and output ports
of the tool respectively. You can name them anything you like as long as
you follow standard Python variable naming conventions. Data will be passed
into the run kernel using these names. They are defined as input_data and
output_data by default and that is enough for our gremlin tool.

How to train your Gremlin

The runnable source code for tools is stored in a file named <shortname>.py in
standard Python script style. We’ve already renamed stub.py to gremlin.py
so you can open that now. In it you’ll find:

That does not a lot. The first three lines simply import a set of standard
libraries for working with data: Pandas, NumPy and SciPy. You might
not need them all but it’s worth keeping them available for now. To start
our custom tool we need to add some code to mess up the data. First we need
a copy of the input_data to output, then we want to mess it up. Add the
following code to the file:

This is the main guts of our gremlin. A copy of the input_data is made to output_data
and then a simple loop iterates evilness times while performing
some or other task on the output_data. The choice of actions are: delete row,
delete column, switch two rows, switch two columns. An option is available to make a
random selection from these transformations. Setting evilness to 10 and gremlin_type
to 1 will perform 100 random operations on the data. Enough to drive anyone quite mad.

Finally, we use built in standard figure plotting tools to output a view of the transformed data.

Initial test

To see what damage the gremlin can do we need a set of data to work with. Download the
sample dataset, a set of processed 2D JRES NMR data with class assignments already applied.

Start up Pathomx as normal. Before we can use our Gremlin tool we’ll need to tell Pathomx
where to find it so it can be loaded. On the main toolbar select “Plugins” then “Manage plugins…”
to get to the plugin management view. Here you can activate and deactivate different plugins
and add/remove them from the Toolkit view. To find the Gremlin tool we’ll need to tell Pathomx
about the folder it is in.

Add the folder containing the Gremlin tool, or alternatively a parent folder if you want to create
more tools in the same place. Pathomx will automatically search through the entire tree
to find plugins so it’s probably best not to add an entire drive.

Once added the plugin list will refresh and be listed (and automatically activated) in the plugin list.
You can now close the plugin management list and see that your new tool is ready and waiting in
the Toolkit viewer. It will be there every time you run Pathomx.

Drag it into the workspace and click on it. You’ll notice that there isn’t much to see: there is
no configuration UI defined and we haven’t updated the about text. But it’s still a fully-operational
gremlin. So let’s see it in action.

Drag an Import Text/CSV tool into the workspace and select it. Using the open file widget
select the file you downloaded earlier containing the demo dataset. Have a look at the Spectra
view output to see how it should look.

Now drag from the Import Text/CSV output_data` port to the Gremlin “input_data port.
The gremlin tool will automatically calculate using the new data and display a modified plot
called ‘View’. If you can’t see the different between this and the earlier plot try pressing
the green play button a few times to re-run the tool. You will see the data change each time.

Adding configuration

A tool is not a lot of use without the ability to control it. All tools can be modified by editing
the source directly (see the # tab) but that isn’t particular convenient. Pathomx tools
can define configuration panels, containing multiple widgets that are linked to the defined config settings.

Add the following code to the loader.py file.

This block of code defines the configuration panel for the tool. This is done using standard
Qt (PyQt) widgets and layout code, which won’t be gone into detail here. However, the bits
unique to Pathomx tool code are worth a bit of explanation:

As previously described tools have an in-built config handler (based on the pyqtconfig package
available on PyPi). This keeps track of settings and also allows widgets to be attached and
automatically synced with configuration settings. This is achieved with self.config.add_handler linee.
The first parameter is the config key to set, the second the widget and the (optional) third is a
mapping dictionary/lambda tuple that converts between the displayed and stored value.

This is used for the drop-down so that when Random is displayed, the stored value in
the config is actually 1. These mappings can be applied to any widget and can apply any transformation
required. The widget is synced to the config value as it is bound.

Each ConfigPanel has a default layout object defined to which
your widgets are attached. They can be placed directly using self.layout.addWidget(widget)
or, as above, by defining a new layout and assigning that. It’s usually useful to use a GridLayout
to place widgets on the panel alongside labels.

Finally, the self.finalise() call is required to apply the layouts and wrap up the initialisation.

Next, add the following line to the __init__ function of the GremlinTool class:

…and you’re good to go. Restart Pathomx and the Gremlin tool will auto-reload automatically.
Drag the tool into the workspace and then select it. On the left hand side you should see
your shiny new control panel. Connect the tool up with the sample data as before, and then
experiment with the config settings to see the effect.

Since we output the result of the transformation via the output_data port you can also
connect up other tools and see the effect there. For example, connect up a PCA or PLS-DA
tool and see the effect that the gremlin has on the ability of those algorithms to
separate the two classes in the dataset.

The polish

Open up the gremlin.md file and edit the file to say whatever you would like it to. You can
also replace the icon.png with a PNG format image more appropriate to an evil gremlin tool.

The end

This doesn’t cover everything that is possible within a custom tool, but it should give
you enough to get started on your own. If you’re interested in creating your own custom
tools or contributing to Pathomx in any other way get in touch!

The complete Gremlin tool is available for download.