Last updated by admin
5 years ago
Compass and Grails how to
by Maurice NicholsonUpdate
{color:red} If you are trying Grails and have Groovy domain classes, check out the Searchable Plugin. {color}Why Compass?
Compass is an Open Source Search Engine Framework_. It is _not an alternative or replacement for GORM/Hibernate: it works alongside GORM to give your application the power of a dedicated search library (aka "Information Retrieval" tools (IR)).Really, if you find that you've reached the flexibility or performance limits of full-text search using your DB alone, then it's time to employ an IR tool, which is where Compass/Lucene come in.So let's say you do need search: why not use SQL/JDBC/Hibernate/raw Lucene/Hibernate search/Spring modules' Lucene support/another search library?Compass is sometimes described as an Object (to) Search Engine Mapper (OSEM) because it implements a mapping between an object model and a search engine, so by analogy, it does for search engines what Hibernate does for databases. Likewise using Compass instead of raw Lucene is comparable to using Hibernate instead of raw JDBC.It's really beyond the scope of this tutorial to compare alternatives, but here are my reasons to start using Compass today:- Compass makes search incredibly easy and powerful
- Built on Lucene (a highly respected and de facto standard)
- Does the tedious conversion of object to index for you; much less code than using raw Lucene
- As well as separate fields for each object property, automatically creates a searchable "all" field (all object's properties concatenated together)
- It can automatically mirror DB updates to the index for you (Compass::Gps)
- Takes care of thread-safety and query caching
- Object orientated like Hibernate; easily maps inheritance and composition
- Can map persistent and non-persistent classes (Hibernate search most likely only maps Hibernate classes)
- Returns actual mapped objects as search results, whereas Lucene returns Document, which you would then have to map back to a domain object yourself
- Does not hit the DB to load search results; objects are rehydrated from the index; Hibernate search hits the DB (I think) so will be slower
- Able to map anything to the search index (not just objects); XML and arbitrary resources by implementing Compass API
- Configuration allowing single/multiple physical indexes
- It's mature, at version 1.0, well tested and very actively supported!
- It supports dynamic meta data expressions in Groovy!
- Using a pure DB solution (SQL/JDBC/Hibernate/etc) is really not even an option; it's slow and nowehere near as rich query-wise as a search library
What is covered
The following tutorial uses a Java POJO domain model. Prior to using Compass I had Groovy domain model classes, but couldn't get these to work with Compass so changed to POJOs. I debugged it briefly and it was due to the Classloader used by Compass not being Groovy aware (it just uses the current context classloader). I didn't have time to investigate further.This tutorial expands on the bookmarks sample application that comes with Grails which uses an annotated POJO domain model. If you can't use Java 5, consider a Java domain model with XML Hibernate mappings.I also use the Compass annotations for the search index mappings, though it is also possible to define these with XML (like for Hibernate).The source code is attached to this wiki page (http://docs.codehaus.org/pages/viewpageattachments.action?pageId=70170) and works with Grails 0.3.1. Inevitably when using Compass you end up writing some kind of helper code, so the code examples presented here are not exactly those in the bookmark app, but are adapted for readability.Lucene and Compass concepts
Compass is built on Lucene and I found it useful to know some Lucene concepts first so here's a quick tour and comparison:Lucene: Documents, Fields, Queries and how to map classes
- A Document is basically the thing that is returned for each search hit and is typically mapped using one _document = one domain model class = one DB row_
- A Document has one or more Fields, eg, for a Document representing a Book domain model object, you might have an "id" Field, and then several Fields like "title", "ISBN", etc. You can think of a Document as a HashMap, however a Document can have multiple values for a Field, eg, the "author" Field might have the values "Eric Gamma", "Richard Helm", "Ralph Johnson", and "John Vlissides"
- A Query is the thing you issue to Lucene to search the index to get results; the results for a query is a list of Documents ordered by relevance (or some other sort criteria). A Query is a first-class oject (but can be parsed from strings) and Lucene provides an API to create queries programatically.
- With Lucene, it's up to the application developer to decide how to build the index: you need to write code to build each Document and add each Field for anything and everything you want in the index to be searchable
Compass: Resources, Properties, Queries and how to map classes
- Compass adds a layer of abstraction and morphs the Lucene concepts. In Compass a Document is called a Resource and Field is a Property.
- The String query syntax is the same, but the API is different.
- Compass indexes objects based on class mappings. You just define the mappings and Compass does the actual conversion and builds the index. Compass has the concept of "root" and non-root classes. Only root classes are returned as search results.
- Because Compass can map hetereogeneous entities (different classes and other resource types) it introduces uses the term "alias". This is the "name" of a mapped entity type, eg, for a domain class (say org.grails.bookmarks.Bookmark) it is typically the shortened class name (so "Bookmark"). This gives you the ability to search for resources for specific Resource types by alias (class instances in the case of domain objects).
Install Lucene and Compass
Download Compass with dependencies from http://www.opensymphony.com/compass/download.action. Download Lucene separately if you need a specific version.Copy the compass and lucene-core JARs to PROJECT_HOME/lib.Map your classes
Map each of your "root" classes using Compass annotations. There is an XML equivalent for the annotations (as for Hibernate) if you aren't on Java 5.Compass annotations all begin with "@Searchable", but note the below code also contains Java 5 persistence annotations.Compass annotations overview
| @Searchable | marks a class as searchable: required | | @SearchableId | the id field of the class: required | | @SearchableProperty | marks a class property as searchable: for simple types (String/primitive/wrapper) | | @SearchableComponent | marks a class property as searchable: _for complex types for which search result matches should return this object_; Compass effectively adds that object's searchable data to this object's | | @SearchableReference | indicates that a property should be returned along with the class instance itself. Also works for collection types | | @SearchableMetaData | adds additional constant or dynamic data to the searchable content for your class |Here's a class with only simple types (@SearchableProperty*):// imports omitted… @Searchable @Entity @Table(name="user") public class User extends AbstractModel { // other details omitted… private Long id; private String login; @SearchableId @Id @Column(name="user_id") @GeneratedValue public Long getId() { return id; } @SearchableProperty(index = Index.UN_TOKENIZED) @Column(nullable=false,unique=true,length=10) public String getLogin() { return login; } @SearchableProperty @Column(name="u_first_name") public String getFirstName() { return firstName; } @SearchableProperty @Column(name="u_last_name") public String getLastName() { return lastName; } // other details omitted… }
// imports omitted… @Searchable @SearchableDynamicMetaData(name = "tag", expression = "data.tags?.tag?.name?.unique()", store = Store.NO, index = Index.UN_TOKENIZED, converter = "groovy") @Entity public class Bookmark { // other details omitted… private User user; private Set<TagReference> tags; @SearchableReference @OneToMany(cascade=CascadeType.ALL) public Set<TagReference> getTags() { return tags; } @SearchableComponent @ManyToOne public User getUser() { return user; } // other details omitted… }
@SearchableDynamicMetaData(name = "tag", expression = "data.tags?.tag?.name?.unique()", store = Store.NO, index = Index.UN_TOKENIZED, converter = "groovy")
Config file, compass/compass.cfg.xml
There are various ways to configure Compass; programatically, with Schema based XML or DTD based XML. The Compass documentation recommends the schema XML variety, but I had problems getting the XML schema validation working, so I use the DTD style.Following the Hibernate+Grails model I created a directory called PROJECT_HOME/compass and added the compass.cfg.xml there:<!DOCTYPE compass-core-configuration PUBLIC "-//Compass/Compass Core Configuration DTD 1.0//EN" "http://www.opensymphony.com/compass/dtd/compass-core-configuration.dtd"><compass-core-configuration> <compass> <!-- The location of the index (Lucene Directory); takes optional prefix like "file:" or "ram:" --> <setting name="compass.engine.connection">/tmp/compass</setting> <!-- Class mappings --> <mapping class="org.grails.bookmarks.Bookmark" /> <mapping class="org.grails.bookmarks.User" /> <mapping class="org.grails.bookmarks.Tag" /> <mapping class="org.grails.bookmarks.TagReference" /> </compass> </compass-core-configuration>
Build an instance of Compass
Your application needs an instance of the Compass class to index and search for objects.You typically build and configure the instance at startup and use it globally througout your application, like a Hibernate SessionFactory (although it's possible to use different Compass instances in an application).Here's what you'll need to do that:// Configure and build Compass instance
def conf = new org.compass.annotations.config.CompassAnnotationsConfiguration().
configure(new File("./compass/compass.cfg.xml")).
setConnection("/path/to/index/directory") // or "ram:myRamIndex" for RAM-based index
compass = conf.buildCompass()Index your domain model
Now you need some code to index your objects. There are two ways to do this:- Manually one-at-time by using compassSession.save(bookmark), with other methods on CompassSession to delete or load objects.
Using this technique is probably fine for indexes that will not change much after construction, but if you need to keep the index up-to-date with changes in your domain objects, you'll have to do it yourself using these CompassSession methods, though of course you have finer-grained control over what is and is not indexed and how often.
You would also need to use this method for non-persistent classes and other Resource types (eg, XML or custom implementations).
- Using Compass::Gps to do a one-off index of every mapped class instance. (Compass::Gps is specifically for persistent classes). Compass::Gps can also automatically update the index by listening to Hibernate CRUD events.
// Build Compass::Gps for Spring+Hibernate def device = new org.compass.spring.device.hibernate.SpringHibernate3GpsDevice() device.name = "hibernate" device.sessionFactory = sessionFactory // Hibernate sessionFactory; can be obtained from applicationContext or injected by Grails device.fetchCount = 10 // or higher for large datasets def compassGps = new org.compass.gps.impl.SingleCompassGps() compassGps.addGpsDevice(device) compassGps.compass = compass // start the gps, mirroring any changes made through Hibernate API // to be mirrored to the search engine compassGps.start() // index the database compassGps.index()
Let's search
Here's a partial re-implementation of the BookmarkController#search action using Compass instead of GORM/Hibernate:// Open Compass session
def session = compass.openSession() // Get Compass QueryBuilder
def builder = session.queryBuilder() // Build query and get search hits
def hits = builder.bool().
addMust(builder.alias("Bookmark")). // only Bookmarks
addMust(builder.queryString(params.q).toQuery()). // matching "params.q" query string
toQuery().
hits() // Extract objects
def bookmarks = (0...hits.length).collect { hits.data(it) } // Close session
session.close()addMust(builder.queryString(params.q).toQuery()). // matching "params.q" query stringBefore you go writing lots of query strings...
Because of possible differences in the way that content is initially indexed and a query string is later indexed at search time, you can get confusing results.Consider that you have a User object and have indexed it's "login" property as UN_TOKENIZED, which means it is store exactly as it is (without modification by an analyser). Then you search for users using a query string:|| User.login || search query string || result || the result of analysis for the "login" property when indexing || what is searched with || | john | login:john | (/) finds User for john | john | john | | John | login:John | (x) no results ! | John | john |Why does this happen? Notice that when searching using a query string, "John" is analysed, and part of that analysis lowercases it, and this no longer matches the value that was used to index the resource.It's not just about case sensitivity, it's the bigger issue of index-time vs search-time analysis and something that you, the application developer, needs to be aware of and deal with.What's the solution?
One solution is to build your queries programatically, with which you have more control.That's the next topic.Building queries programtically
This is done using the Compass QueryBuilder, so that's all I show in the below examples. I've added an action to the bookmark controller to demonstrate each of these queries.Search for all bookmarks containing all tags
Eg, http://localhost:8080/bookmarks/bookmark/searchByTags?tags=animals,zoodef tags = params.tags.split('[,]')
def bool = builder.bool().addMust(builder.alias("Bookmark")) // only Bookmarks
tags.each { tag ->
bool.addMust(builder.term("tag", tag))
}
bool.toQuery()Search for all bookmarks for a user and containing all tags
Eg, http://localhost:8080/bookmarks/bookmark/searchByUserAndTags?user=maurice&tags=animals,zoodef tags = params.tags.split('[,]')
def login = params.user
def bool = builder.bool().
addMust(builder.alias("Bookmark")). // only Bookmarks
addMust(builder.term("login", login))
tags.each { tag ->
bool.addMust(builder.term("tag", tag))
}
bool.toQuery()Optimisations for large data sets
I spent some time profiling the indexing process using raw Lucene and the majority of the time was spent on disk I/O.Likewise Lucene/Compass people recommend using a larger-than-default in-memory index buffer, therefore making fewer writes to disk during the index process.To achieve this effect with Compass add something like these lines to your compass.cfg.xml (DTD based) config file:<!-- Settings for large data sets; means that the index is written to disk less frequently but needs more RAM --> <setting name="compass.engine.mergeFactor">10000</setting> <setting name="compass.engine.maxBufferedDocs">10000</setting>
device.fetchCount = 5000 // fetch more rows at a time; requires more RAM