2016-03-09

How would you analyse and organise thousands – or even millions – of documents so they can be found and used as efficiently as possible?

In a large law firm, a typical knowledge management system often accounts for 10,000 to 20,000 documents and general matter documents might number in the millions.

How would you then quickly repeat the exercise to account for documents being edited or added to/removed from such a database?

Artificial intelligence (AI) creates an opportunity to meet such challenges. The process is not, however, as simple as flicking a switch.

More than a name…

Taxonomy – the art of naming and sorting things – might be said to have its roots in ancient history. Egyptian wall paintings dating back to 1500 BC illustrating plants with medicinal properties might be viewed as early examples of basic taxonomy.

Fast-forward 3,500 years and taxonomists have so far managed to describe (ie name) about 1.9 million species of living things on Earth.

Now consider that, over the past 20 years, LexisNexis UK has built a digital database of over eight million legal documents.

A more taxing taxonomy…

In 2013 we were faced with the job of applying a new taxonomy to that entire database.

To illustrate the scale of the task, imagine an entirely manual review process. If each document took six minutes, a project of this size would require more than an average lifetime of non-stop work – 91 years!

While our taxonomy was not so manual, it was based on (manually written) keyword-based rules. A new taxonomy would, therefore, have involved a significant amount of manual input.

We needed a new, more automated, system. However, no such system existed. We would have to build it ourselves.

Enlisting Thor and Roxie

Despite their names, Thor and Roxie are not superhumans but core components of our homegrown supercomputer – the High Performance Computing Cluster (HPCC). We combined HPCC with (open source) natural language processing (NLP) code to create an AI system for the project.

At this point, we just needed to switch on the AI and sit back, right? Unfortunately not.

Brain training for machines

“Man is a slow, sloppy, and brilliant thinker; computers are fast, accurate and stupid”, John Pfeiffer.

No AI is yet capable of understanding an important/complex problem and solving it with an acceptable level of accuracy, without any (human) design and supervision. Such tasks still require a human being to:

define the problem;

tell the system how to learn from training (via software known as an “algorithm”); and

train the system (in an interactive process known as “machine learning”).

In the case of our taxonomy project, this involved the following steps:

A 10,000 term taxonomy was created by our in-house taxonomy experts.

A small batch of documents was selected by the taxonomy experts to comprise the “seed-set” and presented to our AI system for analysis.

Our system analysed the documents to identify rules and/or patterns which might indicate how a particular document would fit into the taxonomy. In essence, it seeks to identify whether the document is “about” any of the terms in the taxonomy not just whether it contains them – hence the need for natural language processing as part of the “toolkit”.

It then identified 100 other documents which – based on the rules and/or patterns so identified – it assessed to be potentially relevant and presented these back to a taxonomist.

The taxonomist then agreed or disagreed on what was presented back. Learning from this feedback, the system adjusted its patterns and rules before presenting back another 100 results.

This process repeated until the results reached the required level of accuracy. At this point, the final set of rules and patterns was applied to all eight million documents.

From start to finish, this project took two years.

Saving a life(time)

As against the (probably unrealistically ambitious) scenario imagined above where a fully manual review could have been completed in 91 years (spending just six minutes per document) this represents a saving of 89 years!

Of course, the resultant system is not “single-use”. As our database evolves, it will be necessary for us to repeat this exercise and – next time – we’ll be able to complete the entire process in less than a day.

AI: beyond taxonomy

In an environment as document-heavy as the legal industry, the relevance of this sort of exercise to law firms and other large legal teams is self-evident. Indeed, a number of firms are already actively looking at how best to implement such capabilities into their own document management systems.

Beyond taxonomy, the number of potential applications to legal practice is limited only by our ability to identify problems that might be solved (wholly or in part) using AI and select (or create) the technical components necessary to put together the right AI “toolkit”.

For example, our recent article on the case of Pyrrho examined the use of machine learning to drastically reduce the time and costs associated with e-discovery.

Over to you…

The next time you encounter a problem or bottleneck in your business, ask yourself, “Could this be solved or improved with AI?”

If you think the answer might be yes, our platform innovation and product development teams are always up for a challenge.

Previous (free) workshops have ranged from helping to break down big problems into manageable technology chunks, all the way to inspiring proofs of concept for entirely new solutions to challenges that may be shared by your own business or even the entire industry!

If you would like to discuss anything in this article or would like to find out more about a workshop with our platform innovation team, please contact Alex Smith.

Show more