William Jiang

JavaScript,PHP,Node,Perl,LAMP Web Developer – http://williamjxj.com; https://github.com/williamjxj?tab=repositories

Tag Archives: Coreseek

Setup coreseek_4.1-sphinx_2.0.1

The steps to setup coreseek_4.1-sphinx_2.0.1

Coreseek is Chinese-version Sphinx, which can search Chinese word – 中文全文搜索软件,是Sphinx的中文改进。
Here I list the core-steps of setup coreseek_4.1-sphinx_2.0.1, which has successfully work on CentOS 6.2 Server:
1. Dump MySQL table from from production env to developing env for testing purpose:

$ mysqldump  --databases production --tables contents | mysql -D development

2. Edit $HOME/etc/coreseek_sphinx.conf to setup coreseek index.
Notice: sph_counter table is optional. If use, it must be created at first, with 2 simple int columns of counter_id and max_id.
This table is used for the purpose of processing the existing data; for the still oncoming data, the setting here is ignored.


source contents {
	type	= mysql
	...	
	sql_query_pre		= SET NAMES 'utf8'
	sql_query_pre		= SET SESSION query_cache_type=OFF
    sql_query_pre		= REPLACE INTO sph_counter SELECT 1, MAX(cid) FROM contents
	sql_query_range		= SELECT MIN(cid), MAX(cid) FROM contents
	sql_query = SELECT * FROM contents WHERE cid >= $start AND cid <= $end
	sql_attr_uint 		= cate_id
	sql_attr_uint	 	= iid
	sql_attr_str2ordinal	= language
	sql_attr_str2ordinal	= createdby
	sql_attr_timestamp	= pubdate
}
...
index contents {
	source			= contents
	path			= /var/data/demo/contents
	min_word_len		= 3
	charset_type		= zh_cn.utf-8
	charset_dictpath	= /usr/local/mmseg/etc/
	stopwords		= /usr/local/mmseg/etc/stopwords.txt
}
searchd {
	port			= 9312
	log			= /var/log/demo/searchd.log
	query_log		= /var/log/demo/query.log
	pid_file		= /var/log/demo/searchd.pid
}

3. Create 2 dirs which is set in above coreseek_sphinx.conf file.

$ sudo mkdir /var/data/demo /var/log/demo

4. call coreseek ‘indexer’ to generate index for the MySQL table ‘contents’.
Then, start daemon ‘searchd’ for PHP-ext script which call SphinxAPI: sphinxapi.php.

$ sudo /usr/local/coreseek/bin/indexer -c $HOME/etc/coreseek_sphinx.conf contents
  
// first, check port 9312 is available?
$ netstat -ant | grep 9312

// if the port is available, not occupied, then:
$ sudo /usr/local/coreseek/bin/searchd -c $HOME/etc/coreseek_sphinx.conf 

// double check to make sure the daemon is running underneath:
$ ps -ef | grep searched | grep -v grep

5. Appendix

Here are very good examples:

url: http://www.shroomery.org/forums/search.php
source: http://www.shroomery.org/forums/dosearch.php.txt

The Sphinx PHP API is available at:
Sphinx for PHP: http://php.net/manual/en/sphinx.examples.php
Which Coreseek inherits from.

Advertisements

Install Coreseek-3.2.13 in CentOS-6.2

Coreseek-3.2.13 install in CentOS-6.2

It is never easy to install Coreseek fulltext-search source-codes. The Coreseek’s installation documents are not very clear, and its installation largely depends on environment settings. So it’s common having problems to install Coreseek.

After smoothly installing Sphinx 2.0.5, SCWS-1.2.0, I finally installed successfully Coreseek-3.2.13 in CentOS 6.2.
Here I summarized the steps.

1. pre-install support packages.

Read the pre-install requirements carefully and check what missed on the server.
I use ‘yum’ to install whatever I needed.

$ sudo yum install mysql-devel libxml2-devel expat-devel imkae gcc-c++

You can use ‘sudo yum install’ to install other packages such as ‘Development tools’, ‘php-devel’ etc if needed.

2. Dowload coreseek 3.2.13

Downlad Coreseek, which include 3-parts, and need to install 1 by 1:

$ wget http://www.coreseek.cn/uploads/csft/3.2/coreseek-3.2.13.tar.gz
$ tar xzvf coreseek-3.2.13.tar.gz
$ cd coreseek-3.2.13

3.Install mmseg part

$ cd mmseg-3.2.13

If you run ./bootstrap directly, the first command:
aclocal -I config
will fail:
config/sys_siglist.m4:20: warning: underquoted definition of SIC_VAR_SYS_SIGLIST

Here is 2 small tricks to fix it:
(a) edit bootstrap file, change shabang from #!/bin/sh to #!/bin/bash to inherit path and env variables.
(b) use root instead to execute.
$ sudo ./bootstrap

It should pass, then use ‘root’ permission to do configure & make:
$ sudo ./configure –prefix=/usr/local/mmseg
$ sudo make
$ sudo make install
Without ‘sudo’, there probably throw a lot of warnings and errors.

4. Install Coreseek part

$ cd csft-3.2.13/

Again, I change buildconf.sh’s shabang from #!/bin/sh to #!/bin/bash.

$ sudo bash buildconf.sh

$ ./configure –prefix=/usr/local/coreseek
–without-unixodbc –with-mmseg
–with-mmseg-includes=/usr/local/mmseg/include/mmseg/
–with-mmseg-libs=/usr/local/mmseg/lib/ –with-mysql

$ make; sudo make install

This way it compiled with mysql, and mmseg library.

5. Test by testpack

$ cd testpack/
$ locale
to check locale’s LANG’s setting, do the following setup:
$ export LANG=zh_CN.utf-8
$ export LC_ALL=zh_CN.utf-8

You may put these variables into /etc/profile.d/lang.sh or $HOME/.bash_profile or /etc/sysconfig/i18n or $HOME/.i18n.

$ /usr/local/mmseg/bin/mmseg -d /usr/local/mmseg/etc var/test/test.xml
The correct word segments display. That is cool.

From it’s README (http://www.coreseek.cn/products-install/install_on_bsd_linux/), it needs higher version of m4,automake,autoconf (in CentOS 5.5) which I don’t have.
My platform is CentOS 6.2(not CentOS 5.5), and the versions of these tools are lower than their requirements.
However, it still works.