Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#1 2007-08-01 08:57:30
- pspiliotis
- New Member
- Registered: 2007-07-16
- Posts: 3
customizing textpattern search engine to support case sensitivity
Hi,
I am currently developing 2 websites in greek using Textpattern. I noticed that searching using Textpattern is case sensitive. Is there a way to modify the search engine that comes with Textpattern to support, among other things, case insensitive searching?
Any resources regarding customizing searching in Textpattern would also be greatly appreciated.
thank you in advance for your time and effort
Offline
Re: customizing textpattern search engine to support case sensitivity
The actual searching is done by MySQL which by default searches case-insensitive. If you can, change the collation on the table to something case-sensitive (utf8_bin?.. what you’d want is utf8_greek_cs or utf8_greek_ci but that doesn’t exist). If that’s not possible, here are some other tricks.
Last edited by ruud (2007-08-01 09:49:26)
Offline
#3 2007-11-08 18:56:57
- kostas45
- Member
- From: Greece
- Registered: 2007-11-08
- Posts: 61
Re: customizing textpattern search engine to support case sensitivity
Hi,
Let me recapitulate:
Searching in Txp is currently case-insensitive for English and other languages, but not for the Greek language.
We need searching to be case-insensitive for Greek also.
This is what I have found so far:
Txp uses sql’s RLIKE to do the search in doArticles function, in siteroot/textpattern/publish.php (please correct me if I am wrong).
According to the MySQL manual (http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html):
Warning
The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
I believe that the problem arises from the fact that Greek actually uses multi-byte character sets.
Instead, English uses one-byte character sets, that is why the problem does not show up there.
To verify above assumption, anyone using a multi-byte language, other than Greek, would check if searching in his/her language is case-insensitive or not.
If assumption verified, then the php code in publish.php should somehow be changed by the developers, to not use RLIKE.
Thank you for your time,
Kostas
Last edited by kostas45 (2007-11-09 08:17:14)
Offline
Re: customizing textpattern search engine to support case sensitivity
Looks like I misread your question the first time.
Please post your full TXP diagnostics here.
Offline
#5 2007-11-08 20:31:51
- kostas45
- Member
- From: Greece
- Registered: 2007-11-08
- Posts: 61
Re: customizing textpattern search engine to support case sensitivity
Below are the full diagnostics of my dev machine:
Textpattern version: 4.0.5 (r2466)
Last Update: 2007-10-24 09:49:38/2007-07-01 20:03:44
Document root: c:/webs2 (C:\webs2)
$path_to_site: C:\webs2\txp2
Textpattern path: C:\webs2\txp2\textpattern
Permanent link mode: section_title
Temporary directory path: C:\webs2\txp2\textpattern\tmp
Site URL: localhost/txp2
PHP version: 5.2.1
GD Image Library: bundled (2.0.28 compatible); supported formats: GIF, JPG, PNG.
Server Local Time: 2007-11-08 22:26:49
MySQL: 5.0.27-community-nt
Locale: English_United Kingdom.1252
Server: Apache/1.3.37 (Win32) PHP/5.2.1
Apache version: Apache/1.3.37 (Win32) PHP/5.2.1
PHP Server API: apache
RFC 2616 headers:
Server OS: Windows NT 5.0
.htaccess file contents:
————————————
#DirectoryIndex index.php index.html
#Options +FollowSymLinks
#Options -Indexes
<IfModule mod_rewrite.c> RewriteEngine On #RewriteBase /relative/web/path/
RewriteCond %{REQUEST_FILENAME} -f [OR] RewriteCond %{REQUEST_FILENAME} -d RewriteRule ^(.+) – [PT,L]
RewriteRule ^(.*) index.php
</IfModule>
#php_value register_globals 0
————————————
Charset (default/config): latin1/utf8
character_set_client: utf8
character_set_connection: utf8
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8
character_set_server: latin1
character_set_system: utf8
character_sets_dir: C:\Program Files\MySQL\MySQL Server 5.0\share\charsets\
17 Tables: -
PHP extensions: bcmath, calendar, com_dotnet/0.1, ctype, session, filter/0.11.0, ftp, hash/1.0, iconv, json/1.2.1, odbc/1.0, pcre, Reflection, date/5.2.1, libxml, standard/5.2.1, tokenizer/0.1, zlib/1.1, SimpleXML, dom/20031129, SPL, wddx, xml, xmlreader/0.1, xmlwriter/0.1, apache, gd, mbstring, mysql/1.0, mysqli/0.1, PDO, pdo_mysql/1.0.2, pdo_sqlite/1.0.1, SQLite, xsl/0.1
Apache modules: http_core, mod_so, mod_mime, mod_access, mod_auth, mod_negotiation, mod_include, mod_autoindex, mod_dir, mod_cgi, mod_userdir, mod_alias, mod_env, mod_log_config, mod_asis, mod_imap, mod_actions, mod_setenvif, mod_isapi, mod_rewrite, mod_php5
pretext_data: array (
‘id’ => ‘’,
‘s’ => ‘’,
‘c’ => ‘’,
‘q’ => ‘’,
‘pg’ => ‘’,
‘p’ => ‘’,
‘month’ => ‘’,
‘author’ => ‘’,
‘request_uri’ => ‘/txp2/e69e2de94a6d920493ea2103482dd837/?txpcleantest=1’,
‘qs’ => ‘txpcleantest=1’,
‘subpath’ => ‘\\/txp2\\/’,
‘req’ => ‘/e69e2de94a6d920493ea2103482dd837/?txpcleantest=1’,
)
/include/txp_category.php: r2243 (3706fea923cd77f7053f7803de169df4)
/include/txp_plugin.php: r1917 (c63f72f33986c08367672fc9fe7b42dd)
/include/txp_auth.php: r2356 (33255ec1ea1a825163c78272496d8783)
/include/txp_form.php: r1913 (ecea3fecf9d7d1f8088cda67f097eceb)
/include/txp_section.php: r1891 (1f0121b3e2969d94bc8a7fb98bfdfbd5)
/include/txp_tag.php: r2260 (1bd67bdb9dcfb72e34ea967e39406216)
/include/txp_list.php: r2450 (997a3b1bec7115bf49b76f62b28da146)
/include/txp_page.php: r2099 (56bde34b6c7bcb9123ac91e73065e894)
/include/txp_discuss.php: r2451 (91e0b29ef39a9471ae5c78d0b1bba086)
/include/txp_prefs.php: r2405 (a4b76476930b2376199f23fbfd5f1ac9)
/include/txp_log.php: r2439 (16730c34e2a437dd88b8f5cc7eff8218)
/include/txp_preview.php: r1238 (696728f35f3557b648c011bb4d6496c3)
/include/txp_image.php: r2439 (9fac6ed0d9d4c3d8196492051f38dc9a)
/include/txp_article.php: r2453 (bdac8fcac5df2f93f10afa7e50c3fb6f)
/include/txp_css.php: r2403 (4e8c52bb1cf5bfe2e2f0640892f9b92e)
/include/txp_admin.php: r2403 (f8700a3d453ece08e7f137b47c967eda)
/include/txp_link.php: r2463 (0a0171bf606296106332d3fdcb83a678)
/include/txp_diag.php: r2361 (dccf3269049dd25e59afdd7ad8d235cd)
/include/txp_file.php: r2403 (e62abd5fcadabe629322ed17135d89eb)
/include/txp_import.php: r1238 (70a6207c0f3604ecfc4b20369986c4d7)
/lib/admin_config.php: r1747 (a2eb09f94d7902a6e95750fc4abcea17)
/lib/txplib_misc.php: r2464 (615afd44a10311f1c0b7852d9bc15d24)
/lib/taglib.php: r1535 (9b519f9dc88791e5ee8eacc029dd6975)
/lib/txplib_head.php: r2404 (2e067b25997cf67cddbdd365570e69d5)
/lib/classTextile.php: r2462 (a031e2ea894e339711c601f230c5ee71)
/lib/txplib_html.php: r2403 (97e173da3058b438513df67fd7d1ceca)
/lib/txplib_db.php: r2406 (5ed67642f805639b54e381fb22efd208)
/lib/IXRClass.php: r765 (137b91497628f0058a2fca9eba5c3b7f)
/lib/txplib_forms.php: r2403 (438a734b52acef40b36d8a3ba23987e8)
/lib/class.thumb.php: r2329 (b2a2fda54371dbd6c40ba553941f090e)
/lib/constants.php: r2361 (ab6d51668fab1e3c98e7d520b1a59f0f)
/lib/txplib_update.php: r1239 (10f28a986d23187b436369dc29ab552f)
/lib/txplib_wrapper.php: r2286 (419125ec74a17a70bf1e86ebfcd45253)
/publish/taghandlers.php: r2444 (cc9de8f2018b01398a2ba542c5f5bdc6)
/publish/atom.php: r2402 (46c4402717f695fde0d49d806adfa4c4)
/publish/log.php: r1637 (5254d0f3942086bc55723923307a51db)
/publish/comment.php: r2460 (2d1ae1dec0784f044e7005fa5ed50930)
/publish/search.php: r1748 (8c86ebcb5be08e214d81ca15a32164ca)
/publish/rss.php: r2393 (09aac29bf22ffa71c1e118e851cff3c3)
/publish.php: r2436 (7087864f1e7c6efe096d3b8e07c350b1)
/index.php: r2466 (30ecf35de5c1edc6ef68e780c8c79daa)
/css.php: r944 (8beba8f83a091068723435cdcdc02f2f)
Offline
Re: customizing textpattern search engine to support case sensitivity
In /textpattern/publish.php, try replacing (Title rlike '$q' or Body rlike '$q')
with match (Title,Body) against ('$q')
Offline
#7 2007-11-08 21:13:10
- kostas45
- Member
- From: Greece
- Registered: 2007-11-08
- Posts: 61
Re: customizing textpattern search engine to support case sensitivity
I replaced line 577:
$search = " and (Title rlike '$q' or Body rlike '$q') $s_filter";
with:
$search = " and (match (Title,Body) against ('$q')) $s_filter";
but now search does not working at all (finds nothing).
Was it what you meant?
Last edited by kostas45 (2007-11-09 06:29:51)
Offline
Re: customizing textpattern search engine to support case sensitivity
It does work, but it only works with keywords that are at least 4 characters long each.
Offline
#9 2007-11-08 21:32:46
- kostas45
- Member
- From: Greece
- Registered: 2007-11-08
- Posts: 61
Re: customizing textpattern search engine to support case sensitivity
Does it work for you?
Here, the search finds nothing.
If I search for “consectetuer” it does not find the First Post of the fresh installation.
Offline
Re: customizing textpattern search engine to support case sensitivity
It works for me, just tested it… but not on a fresh install. I have more articles. Using MATCH requires more articles, because if a keyword matches in more than half the articles, then it is ignored. When you only have one article (fresh install), then 1 match equals 100% of all articles (more than half).
Offline
#11 2007-11-09 08:06:45
- kostas45
- Member
- From: Greece
- Registered: 2007-11-08
- Posts: 61
Re: customizing textpattern search engine to support case sensitivity
You are right ruud.
I created more articles and the search works as you described.
Thanks a lot for the quick hack.
Will it be included in the next Txp release?
Cheers,
Kostas
Offline
Re: customizing textpattern search engine to support case sensitivity
Not in 4.0.x, because it does have some side-effects (only keywords of more than 4 chars that appear in less than 50% of articles and it requires a rewrite of the code that handles showing search excerpts), although personally I think that’s a good trade-off, because using MATCH is very fast compared to RLIKE.
Offline