Tagged: Java

Sphinx is one of the high performance, free open-source full-text search engine with rich feature list. Recently I was experimenting, on this tool for developing a search platform, for big and complex databases. It is very interesting, and easy to setup a developing environment in Mac OS. Sometimes the compiling using the method specified in the documents of sphinx will not work that easily. I found it is super easy, installation using mac ports. Here is how I done it.

Installing Sphinx using Mac Ports

1. If you don’t have mac-ports installed, Install mac-ports downloading from their site.
2. If you not installed X code tools from Mac OS X, DVD install it – need this for compiling sources in a UNIX development way.
3. Now install sphinx using following command. [sourcecode language="plain"]sudo port install sphinx[/sourcecode] This will compile and install everything required for making the sphinx work.

Now sphinx is installed and we have ‘indexer’, ‘search’ utilities and a ‘searchd’ daemon.

Basic way to use Sphinx

Sphinx comes with a sample sql file to setup a sql-query based search (Simplest configuration). This SQL and a sample Sphinx sample configuration is located at [sourcecode language="plain"]/opt/local/etc/sphinx[/sourcecode]. Try following steps to make it running with your configuration
1. Create a directory some where in your system. In my case I created at desktop named ‘sphinx’. [sourcecode language="plain"] cd ~/Desktop/sphinx [/sourcecode]
2. Create ‘test’ database in your mysql server and import the Sample SQL.
2. copy sphinx.conf from sample /opt/local/etc/sphinx to this folder.[sourcecode language="plain"]cp /opt/local/etc/sphinx/sphinx.conf.dist sphinx.conf[/sourcecode]
3. Now edit sphinx.conf and change the database Params and paths. In my case it looks like following. Edit the queries related to your custom databases to make it yours.
[sourcecode language="plain"]
source src1
type = mysql

sql_host = localhost
sql_user = test
sql_pass =
sql_db = test
sql_port = 3306 # optional, default is 3306

sql_query = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents

sql_attr_uint = group_id
sql_attr_timestamp = date_added

sql_query_info = SELECT * FROM documents WHERE id=$id

index test1
source = src1
path = /Users/lijeeshms/Desktop/sphinx/data/test1
docinfo = extern
charset_type = sbcs

mem_limit = 32M

port = 3312
log = /Users/lijeeshms/Desktop/sphinx/log/searchd.log
query_log = /Users/lijeeshms/Desktop/sphinx/log/query.log
read_timeout = 5
max_children = 30
pid_file = /Users/lijeeshms/Desktop/sphinx/log/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 0
unlink_old = 1
4. Create 2 folders, ‘data’ and ‘log’ inside the ‘sphinx’ directory. [sourcecode language="plain"]mkdir data log[/sourcecode]
5. Now run the indexer with the configuration we created from the test directory. [sourcecode language="plain"]indexer –config /<path-to>/sphinx.conf –all[/sourcecode] This will create the indexes based the confutation.
6. Now run the Sphinx search daemon using the same command. [sourcecode language="plain"]searchd –config /<path-to>/sphinx.conf [/sourcecode] If everything is fine, sphinx search engine will start with the above index configurations. If any error happens you need to check the configuration for any errors.
7. You can now test a search with ‘search’ command from terminal. [sourcecode language="plain"]search <query> [/sourcecode] The results will be shown.

Once all the above is working, it will be easy to use any of the client libraries for PHP (or PECL), Python, Ruby or Java to search from a web or desktop based environments. Sphinx provides options to run multiple indexes and search on selected indexes, thus allowing one daemon to use with multiple type of searches.
Sphinx can index data from an XMLPipe, this is best for creating configurable search indexes from very dynamic datasources like a CMS or CRM.