PDA

View Full Version : Lucene Integration


javidnia
Aug 24th, 2004, 02:10 AM
There is an interesting search engine (jakarta lucene) which I need to integrate with spring to put my search results into http session.

Where should I start? Could anyone help?

Alef Arendsen
Aug 24th, 2004, 09:34 AM
Could you elaborate a bit here? Specifically as to what architecture your application will have. Are you looking to use Lucene in combination with a database or files? Also, where in the process do you see Lucene coming into action?

If you're storing things in a database I could see an interceptor processing the object before (or after) they are being saved and add to the the Lucene indices. We've done this a long time ago and it worked pretty well (although this was done without Spring).

Marcos Silva Pereira
Jan 11th, 2005, 04:04 PM
If you're storing things in a database I could see an interceptor processing the object before (or after) they are being saved and add to the the Lucene indices. We've done this a long time ago and it worked pretty well (although this was done without Spring).
Hi, I was searching for Lucene in forum and found this topic. So, I have implemented interceptors to deal with indexes in OpenNuke (https://opennuke.dev.java.net/) project:
https://opennuke.dev.java.net/source/browse/opennuke/src/java/org/opennuke/aop/lucene/

This works using a indexer class (https://opennuke.dev.java.net/source/browse/opennuke/src/java/org/opennuke/lucene/Indexer.java) an process objects with processors (https://opennuke.dev.java.net/source/browse/opennuke/src/java/org/opennuke/lucene/processor/) classes that deal with domain objects. Later, full text search is used in daos to execute a query using a IN SQL clausule like made in fullTextSearch method in a basic dao (https://opennuke.dev.java.net/source/browse/opennuke/src/java/org/opennuke/dao/hibernate/HibernateBasicDao.java) class.

I really want to listen opinions about this.

Best Regards.

Alarmnummer
Jan 19th, 2005, 04:52 AM
I have checked your sourcecode and have a few comments:

1) you don`t update your documents in batches. I have created a Indexer interface, and a BatchedIndexer implementation. THe last one processed new documents in batches, so you can delete a batch of documents..and write a batch of documents.

2) you have a writer open and a reader for deletes open at the same time. Doesn`t this give problems?


public void updateDocument(Document doc) throws IOException {

IndexWriter writer = makeWriter();

deleteDocument(doc);

writer.addDocument(doc);

writer.optimize();
writer.close();

}


Should be:

public void updateDocument(Document doc) throws IOException {

deleteDocument(doc);

IndexWriter writer = makeWriter();
writer.addDocument(doc);
writer.optimize();
writer.close();

}


And if you often add documents, don`t optimize all the time.


I`m currently writing a small and light framework on top of lucene to let me work with typed documents (it wraps a lucene document). And offers a lot of ready to use services.

@topicstarter:
But in principle.. Spring has no influence on how you are going to use Lucene.

Marcos Silva Pereira
Jan 21st, 2005, 12:31 PM
Hi, Alarmnummer, I appreciate your comments. Thanks for feedback.
1) you don`t update your documents in batches. I have created a Indexer interface, and a BatchedIndexer implementation. THe last one processed new documents in batches, so you can delete a batch of documents..and write a batch of documents.
Some code was changed but not commited and I have implemented a Indexer interface to provides abstraction over Lucene like was made with Searcher interface. BatchedIndexer will be implemented like batches inserts and new code send to cvs.
2) you have a writer open and a reader for deletes open at the same time. Doesn`t this give problems?


public void updateDocument(Document doc) throws IOException {

IndexWriter writer = makeWriter();

deleteDocument(doc);

writer.addDocument(doc);

writer.optimize();
writer.close();

}


Should be:

public void updateDocument(Document doc) throws IOException {

deleteDocument(doc);

IndexWriter writer = makeWriter();
writer.addDocument(doc);
writer.optimize();
writer.close();

}

Thanks, it is really a bug. I will fix it.
And if you often add documents, don`t optimize all the time.
Hum, maybe a boolean flag can tell to method if it should or not make optimize? What you think about?
I`m currently writing a small and light framework on top of lucene to let me work with typed documents (it wraps a lucene document). And offers a lot of ready to use services.
It's a open source project? Did you plan use annotations. I will refactoring some code to create processors based on annotations in domain classes. Is it a good idea?

Best Regards.

Alarmnummer
Feb 15th, 2005, 02:57 PM
Hum, maybe a boolean flag can tell to method if it should or not make optimize? What you think about?

I have an optimize service that optimizes my lucene index once and awhile. (I use Quartz for the scheduling)


It's a open source project?

Maybe in the future, but not at the moment.


Did you plan use annotations.

annotations for what?


I will refactoring some code to create processors based on annotations in domain classes. Is it a good idea?

Could you elaborate?

Marcos Silva Pereira
Feb 16th, 2005, 08:26 PM
Hum... use a scheduling job sounds like a good idea. I will think about it.

Well, I plan use annotations in my pojos to create a processor that read this annotations and decides how (keyword, store, unstore and field name) to index each attribute in pojo. Actually I am using simple reflection and deals with all attributes at the same way. Seems that you are thinking in index documents but, I am concerned about index objects. Despite this, what is your opinion about use annotations to decide how each attribute will be indexed?

Best Regards...

Alarmnummer
Feb 20th, 2005, 11:17 AM
Hum... use a scheduling job sounds like a good idea. I will think about it.

If you use the same scheduler thread for writing the index, and optimizing it.. you won`t have any problems.


Well, I plan use annotations in my pojos to create a processor that read this annotations and decides how (keyword, store, unstore and field name) to index each attribute in pojo. Actually I am using simple reflection and deals with all attributes at the same way. Seems that you are thinking in index documents but, I am concerned about index objects. Despite this, what is your opinion about use annotations to decide how each attribute will be indexed?

The thought has crossed my mind... But I want a typesafe document-object because the document is my main 'domain' object. I need typesafe methods on that document and therefor I have created a new document type (the basedoc) that wraps a luncene document and translates my typesafe methods to unsafe lucene methods.

Best Regards...[/quote]