SQL to Lucene Index November 17, 2008Posted by Andre Vellino in Data Mining, Open Source, Search.
Ever wanted to create a Lucene index from the result of an SQL query? If so, you now have an open source option: LuSql. LuSql is a Java command-line tool developed by my colleague Glen Newton and as one of the first users of this tool I’m in a good position to give it a glowing review. From Glen’s announcement:
[LuSql] allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple cores.
LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation. Its only dependencies are three Apache Commons libraries, the Lucene core itself, and a JDBC driver.
First – why is this useful? Consider a highly normalized database, which, for the purposes of your application, is essentially static and read-only (for example, a bibliographic database) and for which you want Lucene-style indexing features (rank ordered search, term frequency counts, phrase queries, proximity queries etc.) Creating a Lucene index and having your application refer to it rather than the DB is often just what you need.
Secondly, programatically querying a Lucene index is more light-weight than a java persistence framwork like Hibernate. Typically, you don’t need the power of Hibernate to query a bibliographic DB, which you want to look at only in a fixed way (e.g. on a per-bibliographic item basis).
The hardest part in using LuSql is getting your DB query right and figuring out which Lucene options you want or need in your index. So you do need to understand Lucene indexes and, if you have a complex SQL query, to have your query well debugged before you invoke LuSql.
I so much prefer Lucene indexes over RDBs for my applications that I like to think of this tool as transforming lead into gold!