2012-07-21

In this post I’ll make a quick introduction by example on how to use Zeitgeist’s Python API for good and profit.

If you’re interested in using Zeitgeist from C instead, see the libzeitgeist examples; to use it with C++/Qt, Trever’s «a web browser in 4 steps» may be of interest.

First things first

In case you’re not familiar with Zeitgeist, it may prove helpful to first read Mikkel’s introduction to Zeitgeist post.

If that’s too much to read, just should know that Zeitgeist is an event log. Like the history in your browser, it keeps track of what websites you open at which point in time. It also keeps track of when you close them, and of what browser you used, since it’s a system-wide service. Furthermore, it does the same for files, conversations, e-mails, and anything else you want to insert into it.

So Zeitgeist is a database of events, and an event can be pretty much anything. But what does it look like? It’s main attributes are the following:

timestamp – when did the event happen (milliseconds since Unix epoch)

interpretation – what sort of event is it (eg. opened, closed)

manifestation – why did it happen (user activity, notification…)

actor – which is the primary application involved

origin – where did it come from (eg. website where you clicked the link that opened this page)

Additionally, each event has one or more subjects, which have the following attributes:

uri

current_uri – updated URI if it changed since the event

interpretation – abstract type (document, image, video…)

manifestation – how it is stored (file, remote object, website)

origin – parent folder for files, domain name for websites

mimetype

text – a title for the event (eg. filename, website title…)

storage – identifier for the storage medium of the subject (eg. local, online, pendrive X)

Retrieving recent data

Okay, so let’s say you want to know the last song you’ve listened to (if you have a Zeitgeist-enabled music player). It’s a simple as:

This may need some explaining. We start by importing the Zeitgeist client and some associated data structures (such as Event and Interpretation), and create an instance of the client. The ZeitgeistClient class is a wrapper around Zeitgeist’s D-Bus API which not only makes it much nicer to use, but also makes it easy to install monitors (as we will see later) and provides convenient functionality such as automatic reconnection if the connection to the engine is lost.

To query for the most recent song, we just need to create a template with the restrictions we want to impose and submit it to Zeitgeist using the find_events_for_template call. If you haven’t read Mikkel’s post yet, please do so, as it introduces the structure of events and subjects (in short: an event has a timestamp, some other properties, and one or more subjects representing the resources -files, websites, people…- involved in the event).

The Python API is inherently asynchronous (if for some reason you need a synchronous API, you may still use the lower level ZeitgeistDBusInterface class), so we need to define a callback function to handle the results we receive from the Zeitgeist engine.

Finally, we need to create a main loop so the asynchronous functions can run.

Now this was a pretty simple example. Let’s make it more interesting. One song isn’t much, so let’s get the 5 most recent songs. Also, now we want both songs and videos.

The first part is pretty easy, we just need to change the num_events parameter. For the second extension, we have to change the event template. In fact, now we need two different event templates and the find_events_for_templates function, which takes an arbitrary number of event templates and ORs them. The result is as follows:

This will work, but unless you’re lucky you’re likely to get some duplicate line. Why is this? Well, other than that you may have used the same file twice, don’t forget that what you are requesting are actually events. If you’ve started playing a given song, you probably also stopped playing it, so that’s actually two of them (an AccessEvent and a LeaveEvent). Since this isn’t what we want, we’ll change the query a bit:

By requesting the most recent subjects, vs. the most recent events, we can filter out events with duplicate URI. See the ResultType documentation for other modes you can use. Note particularly the MostPopularSubjects result type.

I also used the chance to introduce the storage_state parameter. This one will filter out events for files Zeitgeist knows aren’t available (this mostly means online resources won’t be shown if you don’t have a network connnection; there’s also support for handling external storage media, but because of problems with GIO this is currently disabled).

Last but not least, the find_events_for_* methods also accept a timerange parameter. It defaults to TimeRange.until_now(), but you may change it to TimeRange.always() (if for some reason you’re working with events in the future) or to any other time range of your choice. Here it’s important to note that Zeitgeist’s timestamps use millisecond precision.

For more advanced queries, you can use more complex combinations of events and subject templates. The rule to keep in mind here is that events are OR’d and subjects are ANDed.

Additionally, some field (actor, origin, mimetype, uri and current_uri) may be prefixed with an exclamation mark (“!”) for NOT, or you may append an asterisc (“*”) to them for prefix search. You can even combine the two operators together. Here’s an example of a template you could build:

In my case, this template would fetch a list of the source code files I modified most recently but excluding those related to the Zeitgeist project.

Working with big sets of data

In case you’re trying to do something crazy, you may end up with a Zeitgeist query complaining that it exceeded the memory limit. You’re not supposed to do that. Instead, we provide some methods for working with large collections of events.

And there you have the source code files you worked with during the last 3 months, ordered from most to least popular (popularity is measured counting the number of events; for more precision, maybe you could limit the results to events with interpretation AccessEvent).

Why do we provide this mechanism instead of querying with a simple offset? Well, this avoids problems when the log changes (events are inserted or deleted). Have you ever been exploring the latest posts in some website, and as you change to the next page some of the results from the previous page show up again (because new posts have been added in the meantime)? With Zeitgeist this won’t happen.

Receiving information in real time

At this point you’re an expert at requesting all sorts of data from Zeitgeist, but now you want to show a list of the last kitten images you’ve viewed, updated in real time. Don’t worry, Zeitgeist can provide for this:

It’s important to note that on_delete won’t be called when an image is deleted (that’d be a newly inserted event with interpretation=DELETE_EVENT); rather, it’s called when a previously inserted event is deleted (for example, using the “forget recent history» option in Activity Log Manager).

In case you’re curious: for best performance, this doesn’t actually use D-Bus signals. Instead, this little call will setup a D-Bus object behind the scenes and register it with the Zeitgeist engine, so it can notify said object when (and only when) an event of its interest is registered.

To stop receiving notifications for a template, you’ll need the save the object returned by the install_monitor call:

Pro Tip: You can use the Zeitgeist Explorer GUI to quickly try out different queries (note: it’s still work in progress, so much funcionality is missing, but it does work somewhat).

Contextual awesomeness: finding related events

By now you’re familiar with retrieving events and keeping them up to date. Now it’s time for a little secret:

This little query example will return up to 10 websites I used at the same time as the Vala files inside my Zeitgeist directory, considering only data from the last 6 months. Nice, huh?

This is an experimental feature, and it doesn’t work well when operating on big inputs, so it’s usually better to use the find_related_uris_for_uris variant (which replaces the first query_templates parameter with a list of URIs).

Advanced searching: the FTS extension

Some people think prefix searches aren’t good enough for them, and this is why the Zeitgeist engine ships by default with a FTS (Full Text Search) extension.

Using the methods provided by this extension you can perform more advanced queries against subjects’ current_uri and text properties (unlike the name may suggest, the FTS extension doesn’t index the content of the files, but just the information in the event).

This is exposed as zeitgeist_index_search in libzeitgeist (the C library), but unfortunately isn’t currently available in the Python API. If you still need it, you’ll have to fallback to pretty much using the D-Bus interface (you still get reconnection support, though). Here’s an example:

The most interesting thing here is the query parameter. Quoting from the C documentation:

Modifying the log

So far we’ve only queried Zeitgeist for information, let’s get a bit more active.

You can delete events from Zeitgeist with the following query:

The confirmation callback will receive a timerange going from the first to the last event. If no events were deleted (because they didn’t exist), you’ll get (-1, -1).

And now for the interesting part. If your application involves resources (files, websites, contacts, etc.) of any sort, you’ll probably want to let Zeitgeist know that you’re using them. It’s time that you write a data-source!

We start by registering the data-source. Here we go:

Once that’s done (and if it is enabled), we are free to send our events:

If you don’t know what interpretation and manifestation your subject should have, you can use the following utility methods:

Event better, with Zeitgeist 0.9 you can just leave the subject (but not event!) interpretation and manifestation fields empty, and they’ll be guessed the same way as if you used those utility methods.

Pro Tip: You can examine all registered data-sources and toggle whether they are enabled or not using the zeitgeist-data-sources-gtk.py tool.

Conclusion

Wow, I’m impressed if you’ve got this far. By now you should have quite a good idea on how to use the Zeitgeist API, and I’m looking forward to seeing what you do with it in your next awesome project.

If you have any problem with Zeitgeist, feel free to visit us on IRC (#zeitgeist on irc.freenode.net), or join our mailing list. We’ll also be at GUADEC next week, so if you’re there make sure to say hi!

In case you missed them, here are some useful links:

Python API documentation

C API documentation

Zeitgeist project website

Meta-project on Launchpad

Bug tracker on FreeDesktop.org



No comments

© Siegfried-Angel Gevatter Pujals, 2012. |
Permalink |
License |
Post tags: gnome, zeitgeist

Show more