2017-01-03



Google Search Appliance (GSA) is Google’s search solution for the Businesses on their private data stored in the various formats. Most of the Businesses need the crawl and search capability of Google (or something similar) for the quicker access to the private data. In the absence of these features, the organization end up wasting a lot of time in finding the relevant document or they end up recreating an already existing data/document.

GSA is not an Open Source solution and depending on your need, it does cost you significant money. This is where we felt a need to have an alternate solution. A solution which is implemented with Open Source Solutions and offers similar capabilities/features as  GSA. Since fast, accurate and controlled search is the key criteria, we decided to make use of one of the most popular open source search engine, ElasticSearch.

ElasticSearch is an Open Source Search & Analytics Engine built on top of the Apache Lucene. It is mainly focused on document storage and retrieval, searching and sorting of documents. It was designed to be used in distributed environments by providing flexibility and scalability.

As a part of this article, I am listing the most popular features of GSA and I will walk you through the implementation of Spell Checker capability of GSA using ElasticSearch. In a series of articles, I am going to show the implementation of other features as well.

Major GA functionalities

Following are the major GSA features, which business use for different reasons.

Spell Checker

Self-learning scorer

Highlight query terms

Dynamic navigation

Query Suggestions

Query Suggestions Blacklist

Synonyms

Related Queries

Collecting metrics

Advanced Search

Sorting by metadata

Autocomplete

Wildcard search

In this article, we will focus on Spell Checker!

Problem Statement

We have an e-commerce application where people will come and search for the products. There is a possibility that people may type the wrong word while searching. To handle this, the application should be smart enough to suggest the proper spellings for the requested search term.

Prerequisites

Proficient in J2SE, J2EE

Proficient in ElasticSearch concepts

GSA functionality understanding

Spell Checker implementation using ElasticSearch

As part of this feature implementation, ElasticSearch should check the spelling of search queries and offer spelling suggestions to Users.

The Spell Checker should use the ElasticSearch document’s data to make spelling suggestions. Spelling suggestions should be derived from ElasticSearch index documents dynamically based on the search query.

A single spelling suggestion is returned with the results for queries when the Spell Checker detects a possible spelling suggestion. Spelling suggestions are not automatically enabled by default, we need to make certain changes in ElasticSearch index.

Setup

Create ES Index Settings with Spell Checker Analyzer. We can query the Spell Checker analyzer for spelling suggestions in ES Index.

PUT ecommerce_parts {

“settings”: {

“index”: {

“analysis”: {

“filter”: {

“stemmer”: {

“type”: “stemmer”,

“language”: “english”

},

“stopwords”: {

“type”: “stop”,

“stopwords”: [

“_english_”

]

}

},

“analyzer”: {

“SpellChecker”: {

“type”: “custom”,

“char_filter”: [

“html_strip”

],

“filter”: [

“lowercase”

],

“tokenizer”: “standard”

},

“default”: {

“type”: “custom”,

“char_filter”: [

“html_strip”

],

“filter”: [

“lowercase”,

“stopwords”,

“stemmer”

],

“tokenizer”: “standard”

}

}

},

“number_of_replicas”: “1”,

“number_of_shards”: “5”,

“refresh_interval”: “1000”

}

}

}

Create ES Index Mappings

Created one additional field (spell_checker) in ES Index to link with above SpellChecker Analyzer to copy the ES Index field’s value into this field for spelling suggestions. Add this copy statement only for fields, which are required for spelling suggestions.

PUT ecommerce_parts/_mapping/ecommerce_parts_type

{

“properties”: {

“BrandName”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“Cat”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“Desc”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“SubCat”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“Term”: {

“type”: “string”,

“index”: “not_analyzed”,

“fields”: {

“raw”: {

“type”: “string”

}

},

“copy_to”: [

“spell_checker”

]

},

“spell_checker”: {

“type”: “string”,

“analyzer”: “SpellChecker”

}

}

}

Demonstration:

Search the documents with BrandName (‘Sprayaway’) and verify the results

Now query the SpellChecker analyzer for spelling suggestions with ‘Sprayaway’ search term and verify the spelling suggestions. The expectation is that there should not be any spelling suggestion because there is a Brand with ‘Sprayaway’ name.



In above query result, options array gives the spelling suggestions but it is empty for ‘Sprayaway’ search term. It is expected behavior.

Now query the SpellChecker analyzer for spelling suggestions with ‘Sprayway’ wrong Brand Name and verify the spelling suggestions. The expectation is that there should be spelling suggestion because there is no Brand with ‘Sprayway’ name.

In above query search results, we can see the ‘sprayaway’  as a spelling suggestion because we gave wrong Brand Name (‘Sprayway’), with this exercise we can say that Spell Checker is working as expected.

Summary

As a part of this article, I have listed the most popular features of GSA. Also, I have explained one specific use case of GSA and how it can be implemented using ElasticSearch. In a series of articles, I am going to show the implementation of other features as well. Hope, you are able to use this article to make better use of ElasticSearch.

At WalkingTree, we have been using ElasticSearch and related product suite for few years and we would love to help you take the advantage of this product.

References

https://enterprise.google.com/search/products/gsa.html

https://www.elastic.co/guide/en/ElasticSearch/reference/current/analysis-analyzers.html

https://www.elastic.co/guide/en/ElasticSearch/guide/current/configuring-analyzers.html

Filed under: Elastic Search, General

Show more