preload
Jan 11

Sphinx is one of the high performance, free open-source full-text search engine with rich feature list. Recently I was experimenting, on this tool for developing a search platform, for big and complex databases. It is very interesting, and easy to setup a developing environment in Mac OS. Sometimes the compiling using the method specified in the documents of sphinx will not work that easily. I found it is super easy, installation using mac ports. Here is how I done it.

Installing Sphinx using Mac Ports

1. If you don’t have mac-ports installed, Install mac-ports downloading from their site.
2. If you not installed X code tools from Mac OS X, DVD install it – need this for compiling sources in a UNIX development way.
3. Now install sphinx using following command.

sudo port install sphinx

This will compile and install everything required for making the sphinx work.

Now sphinx is installed and we have ‘indexer’, ’search’ utilities and a ’searchd’ daemon.

Basic way to use Sphinx

Sphinx comes with a sample sql file to setup a sql-query based search (Simplest configuration). This SQL and a sample Sphinx sample configuration is located at

/opt/local/etc/sphinx

. Try following steps to make it running with your configuration
1. Create a directory some where in your system. In my case I created at desktop named ’sphinx’.

 cd ~/Desktop/sphinx 

2. Create ‘test’ database in your mysql server and import the Sample SQL.
2. copy sphinx.conf from sample /opt/local/etc/sphinx to this folder.

cp /opt/local/etc/sphinx/sphinx.conf.dist sphinx.conf

3. Now edit sphinx.conf and change the database Params and paths. In my case it looks like following. Edit the queries related to your custom databases to make it yours.

source src1
{
	type					= mysql

	sql_host				= localhost
	sql_user				= test
	sql_pass				=
	sql_db					= test
	sql_port				= 3306	# optional, default is 3306

	sql_query				= \
		SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
		FROM documents

	sql_attr_uint			= group_id
	sql_attr_timestamp		= date_added

	sql_query_info			= SELECT * FROM documents WHERE id=$id
}

index test1
{
	source					= src1
	path					= /Users/lijeeshms/Desktop/sphinx/data/test1
	docinfo					= extern
	charset_type			= sbcs
}

indexer
{
	mem_limit				= 32M
}

searchd
{
	port					= 3312
	log						= /Users/lijeeshms/Desktop/sphinx/log/searchd.log
	query_log				= /Users/lijeeshms/Desktop/sphinx/log/query.log
	read_timeout			= 5
	max_children			= 30
	pid_file				= /Users/lijeeshms/Desktop/sphinx/log/searchd.pid
	max_matches				= 1000
	seamless_rotate			= 1
	preopen_indexes			= 0
	unlink_old				= 1
}

4. Create 2 folders, ‘data’ and ‘log’ inside the ’sphinx’ directory.

mkdir data log

5. Now run the indexer with the configuration we created from the test directory.

indexer --config /<path-to>/sphinx.conf --all

This will create the indexes based the confutation.
6. Now run the Sphinx search daemon using the same command.

searchd --config /<path-to>/sphinx.conf 

If everything is fine, sphinx search engine will start with the above index configurations. If any error happens you need to check the configuration for any errors.
7. You can now test a search with ’search’ command from terminal.

search <query> 

The results will be shown.

Once all the above is working, it will be easy to use any of the client libraries for PHP (or PECL), Python, Ruby or Java to search from a web or desktop based environments. Sphinx provides options to run multiple indexes and search on selected indexes, thus allowing one daemon to use with multiple type of searches.
Sphinx can index data from an XMLPipe, this is best for creating configurable search indexes from very dynamic datasources like a CMS or CRM.

More reading

Leave a Reply