| CLucene README |
| ============== |
| |
| ------------------------------------------------------ |
| CLucene is a C++ port of Lucene. |
| It is a high-performance, full-featured text search |
| engine written in C++. CLucene is faster than lucene |
| as it is written in C++. |
| ------------------------------------------------------ |
| |
| CLucene has contributions from many, see AUTHORS |
| |
| CLucene is distributed under the GNU Lesser General Public License (LGPL) |
| *or* |
| the Apache License, Version 2.0 |
| See the LGPL.license and APACHE.license for the respective license information. |
| Read COPYING for more about the license. |
| |
| Installation |
| ------------ |
| * For Linux, MacOSX, cygwin and MinGW build information, read INSTALL. |
| * Boost.Jam files are provided in the root directory and subdirectories. |
| * Microsoft Visual Studio (6&7) are provided in the win32 folder. |
| |
| Mailing List |
| ------------ |
| Questions and discussion should be directed to the CLucene mailing list |
| at clucene-developers@lists.sourceforge.net |
| Find subscription instructions at |
| http://lists.sourceforge.net/lists/listinfo/clucene-developers |
| Suggestions and bug reports can be made on our bug tracking database |
| (http://sourceforge.net/tracker/?group_id=80013&atid=558446) |
| |
| The latest version |
| ------------------ |
| Details of the latest version can be found on the CLucene sourceforge project |
| web site: http://www.sourceforge.net/projects/clucene |
| |
| Documentation |
| ------------- |
| Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/ |
| You can also build your own documentation by running doxygen from the root directory |
| of clucene. |
| CLucene is a very close port of Java Lucene, so you can also try looking at the |
| Java Docs on http://lucene.apache.org/java/ |
| |
| |
| Performance |
| ----------- |
| Very little benchmarking has been done on clucene. Andi Vajda posted some |
| limited statistics on the clucene list a while ago with the following results. |
| |
| There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about |
| 6108kb of HTML text. |
| org.apache.lucene.demo.IndexFiles with java and gcj: |
| on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb: |
| . running with java 1.4.1_01-99 : 20379 ms |
| . running with gcj 3.3.2 -O2 : 17842 ms |
| . running clucene 0.8.9's demo : 9930 ms |
| |
| I recently did some more tests and came up with these rough tests: |
| 663mb (797 files) of Guttenberg texts |
| on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields |
| Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram |
| Clucene: 232141. peak mem usage ~60, avg ~4mb ram |
| |
| Searching indexing using 10,000 single word queries |
| Jlucene: ~60078ms and used ~13mb ram |
| Clucene: ~48359ms and used ~4.2mb ram |
| |
| Platform notes |
| -------------- |
| |
| 'Too many open files' |
| Some platforms don't provide enough file handles to run CLucene properly. |
| To solve this, increase the open file limit: |
| |
| On Solaris: |
| ulimit -n 1024 |
| set rlim_fd_cur=1024 |
| |
| Acknowledgments |
| ---------------- |
| |
| The Apache Lucene project is the basis for this software, so the biggest |
| acknoledgment goes to that project. |
| |
| We wish to acknowledge the following copyrighted works that |
| make up portions of the CLucene software: |
| |
| CLucene relies heavily on the use of autoconf and libtool to provide |
| a build environment. |