Last updated by maurice 5 years ago

Analyzers

In order to make your data searchable it is typically _analyzed_.

When text is analyzed using Lucene's @StandardAnalyzer@, for example, white-space and other irrelevant characters (eg punctation) are discarded, as are un-interesting words (eg, 'and', 'or', etc) and the remaining words are lower-cased. The input text is effectively normalized for the search index.

Additionally when you search with a query string, that too is analyzed. This process means that the terms you search on are normalized in the same way as the terms in the index.

Lucene includes many analyzers out of the box and you can also provide your own.

What we get with Compass

Compass acts as a registry of Analyzers, each identified by a name.

Compass provides two analyzers: "default" which is used for indexing and "search" which is used for searching (analyzing query strings).

They are both instances of Lucene's StandardAnalyzer (or equivalent).

You can re-define both of these or define additional analyzers with new names.

Defining Analyzer implementations

You can define an analyzer with #Compass settings and (since 0.5.1) as a #Spring bean.

Compass settings

The Compass settings can either be defined in the plugin's configuration or in a native Compass configuration file.

Compass actually provides shortcut names for some of the standard Lucene analyzers, and this is a simple way to define them, eg:

Map compassSettings = [
    'compass.engine.analyzer.german.type': 'German'
]

Here "German" is a synonym provided by Compass for one of the standard Lucene analyzers and it has been named "german".

But you can also define your own implementations this way with a fully qualified class name:

Map compassSettings = [
    'compass.engine.analyzer.swedishChef.type': 'com.acme.lucene.analysis.SwedishChefAnalyzer'
]

See the Compass settings reference and general discussion with XML examples for the complete range of options.

Spring bean

_Since 0.5.1_

If you define a Spring bean in resources.xml or resources.groovy that is an instance of org.apache.lucene.analysis.Analyzer then it wil be automatically registered with Compass using the Spring bean name as it's name.

This allows you to inject your analyzer with other Spring beans and configuration, eg

import com.acme.lucene.analysis.MyHtmlAnalyzer

beans = { htmlAnalyzer(MyHtmlAnalyzer) { context = someContext includeMeta = true } }

defines an analyzer called @"htmlAnalyzer"@, while

import org.apache.lucene.analysis.standard.StandardAnalyzer

beans = { 'default'(StandardAnalyzer, new HashSet()) // there are now no stop words }

re-defines the "default" analyzer so that it has no stop-words (and will not discard 'and', 'or', etc).

Using Analyzers

Indexing

For indexing purposes you define the analyzer in the mapping, either at the class level

class Book {
    static searchable = {
        analyzer 'bookAnalyzer'
    }
    String title
}

and/or at the property level

class Book {
    static searchable = {
        title analyzer: 'bookTitleAnalyzer'
    }
    String title
}

Property-level analyzers override class-level analyzers just for that property.

Note you can also use native Compass XML or annotations to map with custom analyzers.

Searching

You can say which analyzer you want to use on a per-query basis

def sr = Song.search("only the lonely", analyzer: 'songLyricsAnalyzer')

or with the plugin's configuration you can choose a search analyzer for all search queries (unless overriden on a per-query basis).

defaultMethodOptions = [
        search: [reload: false, escape: false, offset: 0, max: 10, defaultOperator: "and", analyzer: 'myAnalyzer'],
        suggestQuery: [userFriendly: true]
    ]

You could also simply redefine the "search" analyzer to achieve the same effect.