Last updated by admin 5 years ago

Compass and Grails how to

by Maurice Nicholson

Update

{color:red} If you are trying Grails and have Groovy domain classes, check out the Searchable Plugin. {color}

Why Compass?

Compass is an Open Source Search Engine Framework_. It is _not an alternative or replacement for GORM/Hibernate: it works alongside GORM to give your application the power of a dedicated search library (aka "Information Retrieval" tools (IR)).

Really, if you find that you've reached the flexibility or performance limits of full-text search using your DB alone, then it's time to employ an IR tool, which is where Compass/Lucene come in.

So let's say you do need search: why not use SQL/JDBC/Hibernate/raw Lucene/Hibernate search/Spring modules' Lucene support/another search library?

Compass is sometimes described as an Object (to) Search Engine Mapper (OSEM) because it implements a mapping between an object model and a search engine, so by analogy, it does for search engines what Hibernate does for databases. Likewise using Compass instead of raw Lucene is comparable to using Hibernate instead of raw JDBC.

It's really beyond the scope of this tutorial to compare alternatives, but here are my reasons to start using Compass today:

  • Compass makes search incredibly easy and powerful
  • Built on Lucene (a highly respected and de facto standard)
  • Does the tedious conversion of object to index for you; much less code than using raw Lucene
  • As well as separate fields for each object property, automatically creates a searchable "all" field (all object's properties concatenated together)
  • It can automatically mirror DB updates to the index for you (Compass::Gps)
  • Takes care of thread-safety and query caching
  • Object orientated like Hibernate; easily maps inheritance and composition
  • Can map persistent and non-persistent classes (Hibernate search most likely only maps Hibernate classes)
  • Returns actual mapped objects as search results, whereas Lucene returns Document, which you would then have to map back to a domain object yourself
  • Does not hit the DB to load search results; objects are rehydrated from the index; Hibernate search hits the DB (I think) so will be slower
  • Able to map anything to the search index (not just objects); XML and arbitrary resources by implementing Compass API
  • Configuration allowing single/multiple physical indexes
  • It's mature, at version 1.0, well tested and very actively supported!
  • It supports dynamic meta data expressions in Groovy!
  • Using a pure DB solution (SQL/JDBC/Hibernate/etc) is really not even an option; it's slow and nowehere near as rich query-wise as a search library

What is covered

The following tutorial uses a Java POJO domain model. Prior to using Compass I had Groovy domain model classes, but couldn't get these to work with Compass so changed to POJOs. I debugged it briefly and it was due to the Classloader used by Compass not being Groovy aware (it just uses the current context classloader). I didn't have time to investigate further.

This tutorial expands on the bookmarks sample application that comes with Grails which uses an annotated POJO domain model. If you can't use Java 5, consider a Java domain model with XML Hibernate mappings.

I also use the Compass annotations for the search index mappings, though it is also possible to define these with XML (like for Hibernate).

The source code is attached to this wiki page (http://docs.codehaus.org/pages/viewpageattachments.action?pageId=70170) and works with Grails 0.3.1. Inevitably when using Compass you end up writing some kind of helper code, so the code examples presented here are not exactly those in the bookmark app, but are adapted for readability.

Lucene and Compass concepts

Compass is built on Lucene and I found it useful to know some Lucene concepts first so here's a quick tour and comparison:

Lucene: Documents, Fields, Queries and how to map classes

  • A Document is basically the thing that is returned for each search hit and is typically mapped using one _document = one domain model class = one DB row_
  • A Document has one or more Fields, eg, for a Document representing a Book domain model object, you might have an "id" Field, and then several Fields like "title", "ISBN", etc. You can think of a Document as a HashMap, however a Document can have multiple values for a Field, eg, the "author" Field might have the values "Eric Gamma", "Richard Helm", "Ralph Johnson", and "John Vlissides"
  • A Query is the thing you issue to Lucene to search the index to get results; the results for a query is a list of Documents ordered by relevance (or some other sort criteria). A Query is a first-class oject (but can be parsed from strings) and Lucene provides an API to create queries programatically.
  • With Lucene, it's up to the application developer to decide how to build the index: you need to write code to build each Document and add each Field for anything and everything you want in the index to be searchable

Compass: Resources, Properties, Queries and how to map classes

  • Compass adds a layer of abstraction and morphs the Lucene concepts. In Compass a Document is called a Resource and Field is a Property.
  • The String query syntax is the same, but the API is different.
  • Compass indexes objects based on class mappings. You just define the mappings and Compass does the actual conversion and builds the index. Compass has the concept of "root" and non-root classes. Only root classes are returned as search results.
  • Because Compass can map hetereogeneous entities (different classes and other resource types) it introduces uses the term "alias". This is the "name" of a mapped entity type, eg, for a domain class (say org.grails.bookmarks.Bookmark) it is typically the shortened class name (so "Bookmark"). This gives you the ability to search for resources for specific Resource types by alias (class instances in the case of domain objects).

Install Lucene and Compass

Download Compass with dependencies from http://www.opensymphony.com/compass/download.action. Download Lucene separately if you need a specific version.

Copy the compass and lucene-core JARs to PROJECT_HOME/lib.

Map your classes

Map each of your "root" classes using Compass annotations. There is an XML equivalent for the annotations (as for Hibernate) if you aren't on Java 5.

Compass annotations all begin with "@Searchable", but note the below code also contains Java 5 persistence annotations.

Compass annotations overview

| @Searchable | marks a class as searchable: required | | @SearchableId | the id field of the class: required | | @SearchableProperty | marks a class property as searchable: for simple types (String/primitive/wrapper) | | @SearchableComponent | marks a class property as searchable: _for complex types for which search result matches should return this object_; Compass effectively adds that object's searchable data to this object's | | @SearchableReference | indicates that a property should be returned along with the class instance itself. Also works for collection types | | @SearchableMetaData | adds additional constant or dynamic data to the searchable content for your class |

Here's a class with only simple types (@SearchableProperty*):

// imports omitted…
@Searchable
@Entity
@Table(name="user")
public class User extends AbstractModel {
    // other details omitted…
    private Long id;
    private String login;

@SearchableId @Id @Column(name="user_id") @GeneratedValue public Long getId() { return id; }

@SearchableProperty(index = Index.UN_TOKENIZED) @Column(nullable=false,unique=true,length=10) public String getLogin() { return login; }

@SearchableProperty @Column(name="u_first_name") public String getFirstName() { return firstName; }

@SearchableProperty @Column(name="u_last_name") public String getLastName() { return lastName; }

// other details omitted… }

Here's a class that contains a reference to a collection of searchable class instances (@SearchableReference), includes the searchable data from another class (@SearchableComponent) and defines some dynamic meta data using a Groovy expression (@SearchableDynamicMetaData):

// imports omitted…
@Searchable
@SearchableDynamicMetaData(name = "tag", expression = "data.tags?.tag?.name?.unique()", store = Store.NO, index = Index.UN_TOKENIZED, converter = "groovy")
@Entity
public class Bookmark {
    // other details omitted…
    private User user;
    private Set<TagReference> tags;

@SearchableReference @OneToMany(cascade=CascadeType.ALL) public Set<TagReference> getTags() { return tags; }

@SearchableComponent @ManyToOne public User getUser() { return user; }

// other details omitted… }

When Compass returns an instance of Bookmark as a search result, the bookmark's "tags" property will also be populated. However using @SearchableReference requires that the referenced type (or component type of a collection) is also mapped as a root class, even if you don't actually need to search for them. In this case, TagReference also needs to be mapped with @Searchable, etc (not shown).

This referencing of relationships is not limited to direct relationships. We can also add @SearchableReference to the Tag tag field of TagReference and the Bookmark bookmark field of Tag which points back to the Bookmark, etc, so Compass can return a complete object graph from search. (Again, Tag needs to mapped with @Searchable, etc.)

The @SearchableComponent means any search query matching User-searchable data will return associated Bookmarks for that user.

This line

@SearchableDynamicMetaData(name = "tag", expression = "data.tags?.tag?.name?.unique()", store = Store.NO, index = Index.UN_TOKENIZED, converter = "groovy")

adds a "tag" Property (Lucene Field) to the index Resource (Lucene Document) with as many different values as there are tags for the bookmark. Eg, you have a Bookmark with the tags "snowboarding", "alps" and "extremesports", then the Document in the index for the bookmark will have a 3 "tag" Fields with these 3 string values. (See later for example searches).

Compass also automatically adds the meta data to the object's "all" property (ie, the other searchable properties combined); this can be disabled using the annotation's "excludeFromAll=false".

In some cases you can choose whether to store searchable in the index or not. The above @SearchableDynamicMetaData explicitly chooses not to store the data in the index (which reduces the size of the index), but storing can be very useful for debugging.

Notice the Groovy expression to evaludate the data ;): "data" is the instance of the class at index-time.

Config file, compass/compass.cfg.xml

There are various ways to configure Compass; programatically, with Schema based XML or DTD based XML. The Compass documentation recommends the schema XML variety, but I had problems getting the XML schema validation working, so I use the DTD style.

Following the Hibernate+Grails model I created a directory called PROJECT_HOME/compass and added the compass.cfg.xml there:

<!DOCTYPE compass-core-configuration PUBLIC
"-//Compass/Compass Core Configuration DTD 1.0//EN"
"http://www.opensymphony.com/compass/dtd/compass-core-configuration.dtd">

<compass-core-configuration>

<compass> <!-- The location of the index (Lucene Directory); takes optional prefix like "file:" or "ram:" --> <setting name="compass.engine.connection">/tmp/compass</setting>

<!-- Class mappings --> <mapping class="org.grails.bookmarks.Bookmark" /> <mapping class="org.grails.bookmarks.User" /> <mapping class="org.grails.bookmarks.Tag" /> <mapping class="org.grails.bookmarks.TagReference" />

</compass> </compass-core-configuration>

Build an instance of Compass

Your application needs an instance of the Compass class to index and search for objects.

You typically build and configure the instance at startup and use it globally througout your application, like a Hibernate SessionFactory (although it's possible to use different Compass instances in an application).

Here's what you'll need to do that:

// Configure and build Compass instance
    def conf = new org.compass.annotations.config.CompassAnnotationsConfiguration().
        configure(new File("./compass/compass.cfg.xml")).
        setConnection("/path/to/index/directory") // or "ram:myRamIndex" for RAM-based index
    compass = conf.buildCompass()

It's also possible to build the Compass instance in Spring XML of course.

Index your domain model

Now you need some code to index your objects. There are two ways to do this:

  • Manually one-at-time by using compassSession.save(bookmark), with other methods on CompassSession to delete or load objects.

    Using this technique is probably fine for indexes that will not change much after construction, but if you need to keep the index up-to-date with changes in your domain objects, you'll have to do it yourself using these CompassSession methods, though of course you have finer-grained control over what is and is not indexed and how often.

    You would also need to use this method for non-persistent classes and other Resource types (eg, XML or custom implementations).

  • Using Compass::Gps to do a one-off index of every mapped class instance. (Compass::Gps is specifically for persistent classes). Compass::Gps can also automatically update the index by listening to Hibernate CRUD events.
I use the second approach because it's easier. (I do need to be careful not to show certain search results to every user, but I use smart queries to match only the appropriate objects.)

// Build Compass::Gps for Spring+Hibernate
    def device = new org.compass.spring.device.hibernate.SpringHibernate3GpsDevice()
    device.name = "hibernate"
    device.sessionFactory = sessionFactory // Hibernate sessionFactory; can be obtained from applicationContext or injected by Grails
    device.fetchCount = 10 // or higher for large datasets
    def compassGps = new org.compass.gps.impl.SingleCompassGps()
    compassGps.addGpsDevice(device)
    compassGps.compass = compass

// start the gps, mirroring any changes made through Hibernate API // to be mirrored to the search engine compassGps.start() // index the database compassGps.index()

This code is adapted from the grails-app/service/CompassHelper.groovy class in the downloadable code bundle; it contains other useful methods to simplify Compass usage so I recommend getting it.

You could add the above code to Grail's ApplicationBootstrap#init closure to have this initialisation done whenever the app starts.

">

Let's search

Here's a partial re-implementation of the BookmarkController#search action using Compass instead of GORM/Hibernate:

// Open Compass session
    def session = compass.openSession()

// Get Compass QueryBuilder def builder = session.queryBuilder()

// Build query and get search hits def hits = builder.bool(). addMust(builder.alias("Bookmark")). // only Bookmarks addMust(builder.queryString(params.q).toQuery()). // matching "params.q" query string toQuery(). hits()

// Extract objects def bookmarks = (0...hits.length).collect { hits.data(it) }

// Close session session.close()

bookmarks is now a list of Bookmark instances matching the search criteria!

Notice that we used the query string literally:

addMust(builder.queryString(params.q).toQuery()). // matching "params.q" query string

Compass has identical query syntax to Lucene so the query could a simple word or words matching the default "all" property:

| eggnog | searches for "eggnog" in the "all" property | | eggnog -christmas | searches for objects containing "eggnog" but not "christmas" in the "all" property |

or a more complicated query on specific properties:

| country:brazil description:"folk music" | searches for objects whose "country" property contains "brazil" and "description" property contains the exact phrase "folk music" |

or a combination

Here are more query string examples courtesy of the Lucene documentation: http://lucene.apache.org/java/docs/queryparsersyntax.html, however see the advice next.

Before you go writing lots of query strings...

Because of possible differences in the way that content is initially indexed and a query string is later indexed at search time, you can get confusing results.

Consider that you have a User object and have indexed it's "login" property as UN_TOKENIZED, which means it is store exactly as it is (without modification by an analyser). Then you search for users using a query string:

|| User.login || search query string || result || the result of analysis for the "login" property when indexing || what is searched with || | john | login:john | (/) finds User for john | john | john | | John | login:John | (x) no results ! | John | john |

Why does this happen? Notice that when searching using a query string, "John" is analysed, and part of that analysis lowercases it, and this no longer matches the value that was used to index the resource.

It's not just about case sensitivity, it's the bigger issue of index-time vs search-time analysis and something that you, the application developer, needs to be aware of and deal with.

What's the solution?

One solution is to build your queries programatically, with which you have more control.

That's the next topic.

Building queries programtically

This is done using the Compass QueryBuilder, so that's all I show in the below examples. I've added an action to the bookmark controller to demonstrate each of these queries.

Search for all bookmarks containing all tags

Eg, http://localhost:8080/bookmarks/bookmark/searchByTags?tags=animals,zoo

def tags = params.tags.split('[,]')
    def bool = builder.bool().addMust(builder.alias("Bookmark")) // only Bookmarks
    tags.each { tag ->
        bool.addMust(builder.term("tag", tag))
    }
    bool.toQuery()

(Creates a query like "+alias:Bookmark +tag:animals +tag:zoo".)

Search for all bookmarks for a user and containing all tags

Eg, http://localhost:8080/bookmarks/bookmark/searchByUserAndTags?user=maurice&tags=animals,zoo

def tags = params.tags.split('[,]')
    def login = params.user
    def bool = builder.bool().
        addMust(builder.alias("Bookmark")). // only Bookmarks
        addMust(builder.term("login", login))
    tags.each { tag ->
        bool.addMust(builder.term("tag", tag))
    }
    bool.toQuery()

(Creates a query like "+alias:Bookmark +login:maurice +tag:animals +tag:zoo".)

If you've ever used the Lucene API to build Queries programatically, you'll immediately recognise that Compass' API is much more concise.

Optimisations for large data sets

I spent some time profiling the indexing process using raw Lucene and the majority of the time was spent on disk I/O.

Likewise Lucene/Compass people recommend using a larger-than-default in-memory index buffer, therefore making fewer writes to disk during the index process.

To achieve this effect with Compass add something like these lines to your compass.cfg.xml (DTD based) config file:

<!-- Settings for large data sets; means that the index is written to disk less frequently but needs more RAM -->
    <setting name="compass.engine.mergeFactor">10000</setting>
    <setting name="compass.engine.maxBufferedDocs">10000</setting>

Also set a larger fetchCount for Compass::Gps so that more records are fetched from the database when doing a bulk index:

device.fetchCount = 5000  // fetch more rows at a time; requires more RAM

or XML config equivalent.

But be aware that both of these settings means more RAM usage.

More resources

Read the Lucene in Action book!

Read the Compass reference manual (http://www.opensymphony.com/compass/content/documentation.html)!

Download Luke (http://www.getopt.org/luke/) and use it to inspect the index that Compass creates. Apart from being a way to learn (and de-mystify) a little about the generated index, it may help you to optimise the index, eg, look for fields that are being stored that don't need to be. It's also sometimes essential for debugging your queries.

Download the code bundle attachment and see in particular "grails-app/BookmarkController.groovy" and "grails-app/services/CompassHelper.groovy".