2015-08-22



alvinashcraft
shared this story
from Rob Conery.

In part 1 of this series I setup a nice save function, as well as another function to create an opinionated document storage table on the fly.

This works well and does what’s needed, but we can do so much more. Specifically: I want Full Text Indexing on the fly and bulk saves within a transaction.

Let’s do it.

Full Text Search

Our document table has a search field in it that’s of type tsvector, which is indexed using the GIN index for speed. I want to update that field whenever I save the document, and I don’t want to have a lot of API noise when doing it.

So I’ll use some convention.

Typically, when creating a Full Text Index, you’ll store fields with fairly specific names. Things like:

First or Last name, maybe email

a Title or Description of something

Address Information

I’d like to scan my document when it gets saved to see if it has any keys I might want indexed, and then I want to save them in my search field. I can do that with a function which I’ll call update_search:

Again, I’m using Javascript to do this (PLV8) and I’m pulling out a document based on ID. I’m then looping over the keys to see if there are any that I might want to store, if there are, I’m pushing to an array.

If we have any hits in that array I’m concatenating the values and saving in the search field of the document using to_tsvector, which is a built-in Postgres function that takes loose text and turns it into indexable values.

And that’s it! Running this, we get the following:



That’s perfect – now I can just pop this into my save_document function right at the end, and it gets called transactionally whenever I save something:

Bulk Saves

Right now I can pass in a single document to save_document, but I’d also like to be able to pass in an Array. I can do this by checking the type of the argument, and then pulling things out in a loop:

The nice thing about working with Javascript here is that the logic required for this kind of routine is fairly straightforward (as opposed to using PLPGSQL). I’ve pulled out the actual save routine into its own function – this is Javascript after all – so I can avoid duplication.

Then I check to see if the passed-in argument is an Array. If it is, I loop over it and call executeSql, returning a rollup of what happened.

If it’s not an Array, I just execute things as I have been, returning the entire document. The result:

Nice! The best thing about this is that it happens in a transaction. I love that!

Node Weirdness

If only this could work perfectly from Node! I’ve tried in both .NET and Node and, with .NET, things just work (oddly) using the Npgsql library. With Node, not so much.

Long story short: the node_pg driver does some weird serialization when it sees an object or array coming in as a parameter. Consider the following:

This is fairly typical Node/PG code. At the bottom, the run function is set to call my save_document function and pass along some data. When PG sees the object come in, it will serialize it to a string and the save will happen fine.

If I send an array, however…

I’ll get an error back saying that I have invalid JSON. The error message (from Postgres) will say it’s due to this poorly formatted JSON:

Which … yeah is hideous. I’ve tried to figure out what’s going on, but basically it looks like the node_pg driver is stripping out the outer array – perhaps calling Underscores flatten method? I don’t know. To get around this you need to change your call to this:

Onward!

This save routine is pretty slick, and it makes me happy. In the next post I’ll tackle the finders, and also create a Full Text Search function.

The post PostgreSQL Document API Part 2: Full Text Search and Bulk Save appeared first on Rob Conery.

Show more