Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Disable case-sensetive search on cyryllic site
I tried to disable case sensetivity :) in searching but it comes to nothing. Somewhere here i saw that changing collocation to utf8_unicode_ci
will turn case-sensivity to off.
This is the result of sql show full columns from textpattern;
ID int(11) NULL NO PRI NULL auto_increment select,insert,update
Posted datetime NULL NO MUL 0000-00-00 00:00:00 select,insert,update
AuthorID varchar(64) utf8_general_ci NO select,insert,update
LastMod datetime NULL NO 0000-00-00 00:00:00 select,insert,update
LastModID varchar(64) utf8_general_ci NO select,insert,update
Title varchar(255) utf8_general_ci NO MUL select,insert,update
Title_html varchar(255) utf8_general_ci NO select,insert,update
Body mediumtext utf8_general_ci NO select,insert,update
Body_html mediumtext utf8_general_ci NO select,insert,update
Excerpt text utf8_general_ci NO select,insert,update
Excerpt_html mediumtext utf8_general_ci NO select,insert,update
Image varchar(255) utf8_general_ci NO select,insert,update
Category1 varchar(128) utf8_general_ci NO MUL select,insert,update
Category2 varchar(128) utf8_general_ci NO select,insert,update
Annotate int(2) NULL NO 0 select,insert,update
AnnotateInvite varchar(255) utf8_general_ci NO select,insert,update
comments_count int(8) NULL NO 0 select,insert,update
Status int(2) NULL NO 4 select,insert,update
textile_body int(2) NULL NO 1 select,insert,update
textile_excerpt int(2) NULL NO 1 select,insert,update
Section varchar(64) utf8_general_ci NO MUL select,insert,update
override_form varchar(255) utf8_general_ci NO select,insert,update
Keywords varchar(255) utf8_general_ci NO select,insert,update
url_title varchar(255) utf8_general_ci NO select,insert,update
custom_1 varchar(255) utf8_general_ci NO select,insert,update
custom_2 varchar(255) utf8_general_ci NO select,insert,update
custom_3 varchar(255) utf8_general_ci NO select,insert,update
custom_4 varchar(255) utf8_general_ci NO select,insert,update
custom_5 varchar(255) utf8_general_ci NO select,insert,update
custom_6 varchar(255) utf8_general_ci NO select,insert,update
custom_7 varchar(255) utf8_general_ci NO select,insert,update
custom_8 varchar(255) utf8_general_ci NO select,insert,update
custom_9 varchar(255) utf8_general_ci NO select,insert,update
custom_10 varchar(255) utf8_general_ci NO select,insert,update
uid varchar(32) utf8_general_ci NO select,insert,update
feed_time date NULL NO 0000-00-00 select,insert,update
And this error occures when i’m trying to submit changes made in phpmyadmin:
How to make search not case sensitive?
Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?
Offline
Re: Disable case-sensetive search on cyryllic site
Is there way to make not case sensetive search?
Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?
Offline
Re: Disable case-sensetive search on cyryllic site
I think that I read somewhere here that the problem regarding this (which includes greek sites) is not with txp but with MySQL. Hopefully a developer or someone else with more knowledge than me will be able to provide you with a more technical explanation.
Last edited by colak (2008-01-28 08:57:30)
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Disable case-sensetive search on cyryllic site
The SQL error you got in the opening post => remove the fulltext index on the textpattern table, make the change in collation, then re-create the index.
Offline
Re: Disable case-sensetive search on cyryllic site
I recrated fulltext index
but search is still case sensetive :(
Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?
Offline
Re: Disable case-sensetive search on cyryllic site
Continue from this topic
Ruud adviced here to test some replacing:
$search = " and ($cols) $s_filter";
into:
$search = " and (match (`'.join('`, `', $cols)."`) against ('$q')) ".$s_filter;
But ^^ this line of code caused error. then i tried to apply this replace:
$search = " and (match (`Title`, `Body`) against ('$q')) ".$s_filter;
And it worked! Search among articles became case-insensetive. I’ll test this later on live hosting, but it seems that’s the snswer how to make case-insensetive search.
The other question is – <txp:search_result_excerpt />
– it doesn’t build excerpt from russian text. Can i help with tracing this problem?
Last edited by the_ghost (2008-11-28 23:53:06)
Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?
Offline
Re: Disable case-sensetive search on cyryllic site
Victor, doesnt search_result_excerpt show results when you enter a query that matches the case of the found result exactly… I mean, if you’re not relying on the search being case-insensitive?
In the function search_result_excerpt, try changing this line:
preg_match_all("/\b.{1,50}".preg_quote($q).".{1,50}\b/iu", $result, $concat);
into:
preg_match_all("/.{0,50}".preg_quote($q).".{0,50}/iu", $result, $concat);
Offline
Re: Disable case-sensetive search on cyryllic site
Just noticed – after making path at publish.php
searching words less then 4 letters doesn’t work. If I remeber, this is sql match
feature and can be fixed by changing some mysql parameter.
doesnt search_result_excerpt show results when you enter a query that matches the case of the found result exactly
Nope, search_result_excerpt doesn’t build output even in this occurance
In the function search_result_excerpt, try changing this line:
preg_match_all("/.{0,50}".preg_quote($q).".{0,50}/iu", $result, $concat);
Yeap! There is result, but with some strange thing – I have three equal articles (equal bodies) and search_result_excerpt is absent for last of them (screenshot).
And another strange thing – it doesn’t hilights query in search_result_excerpt if cases don’t match:
Last edited by the_ghost (2008-11-29 00:26:27)
Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?
Offline
Re: Disable case-sensetive search on cyryllic site
The minimum word length is set with the ft_min_len configuration parameter in MySQL. You’d have to re-create the fulltext index after changing this setting.
Not showing an excerpt for the last article…. what happens if you remove the article where the excerpt isn’t shown currently (the last one shown)? Do the other two still show up with an excerpt in the search results or is it now the second one (which is now the last result) that doesn’t show a result?
From what I’ve read, case-insensitivity support in PCRE (and PHP in general) is rather limited when dealing with UTF8 encoded strings. It works fine for the lower 7-bit, which is the US-ASCII character set, but for the multi byte characters, such as the cyrillic ones and basically any character with an accent, it depends on how PCRE was compiled on your system and even then it’s not guaranteed to work.
From the PCRE manual
Case-insensitive matching applies only to characters whose values are less than 128, unless PCRE is built with Unicode property support. Even when Unicode property support is available, PCRE still uses its own character tables when checking the case of low-valued characters, so as not to degrade performance. The Unicode property information is used only for characters with higher values. Even when Unicode property support is available, PCRE supports case-insensitive matching only when there is a one-to-one mapping between a letter’s cases. There are a small number of many-to-one mappings in Unicode; these are not sup- ported by PCRE.
Offline
Re: Disable case-sensetive search on cyryllic site
Came back again :) I really hope to make txp russian friendly in 4.0.8 release…
So, installed 4.0.8 RC2.
Search beahoviour without hacks (as is install):
- cyryllic words are searched case sensetively –
Южный != южный
. I have 5 (five) articles with equal body (titles differ). Every article has textЮжный
.- If i search
южный
i get zero results. - If i search
Южный
i get 5 (five) result. And search_result_excerpt builds good search excerpt (with hilighing).
- If i search
So, in “as is” install cyryllic search is case-senesetive. Lets try to apply a liittle hack at publish.php
(appr. line 635):
- $cols[$i] = "`$cols[$i]` rlike '$q'";
+ $cols[$i] = "lower(`$cols[$i]`) rlike lower('$q')";
Searching after hack:
- cyryllic words are searched case *in*sensetively –
Южный == южный
.- If i search
южный
i get 5 results – five search excerpts, but search query isn’t hilighited though it’s displayed in search_result_excerpt
- If i search
Южный
i get 5 (five) result. And search_result_excerpt builds good search excerpt (with hilighing).
- If i search
1. I made some benchmarks and didn’t noticed rising nor query time, nor run time with hack of publish.php
.
2. If patch for publish.php
can be apllied in core, how to solve problem with hilighing search excerpt?
Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?
Offline
Re: Disable case-sensetive search on cyryllic site
Ну слава Богу!
А то уж я думал, что поиск не пашет!
Can confirm, that Victor’s hack works pretty well for me. I think it must be included in future releases of TXP.
Offline
Re: Disable case-sensetive search on cyryllic site
- how many articles does your benchmarked website have and how much text is in them (kB)?
- no idea how to solve that without making the excerpt lower case.
Offline