Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2008-01-02 14:04:50

the_ghost
Plugin Author
From: Minsk, The Republic of Belarus
Registered: 2007-07-26
Posts: 907
Website

Disable case-sensetive search on cyryllic site

I tried to disable case sensetivity :) in searching but it comes to nothing. Somewhere here i saw that changing collocation to utf8_unicode_ci will turn case-sensivity to off.

This is the result of sql show full columns from textpattern;

ID	int(11)	NULL	NO	PRI	NULL	auto_increment	select,insert,update	 
Posted	datetime	NULL	NO	MUL	0000-00-00 00:00:00	 	select,insert,update	 
AuthorID	varchar(64)	utf8_general_ci	NO	 	 	 	select,insert,update	 
LastMod	datetime	NULL	NO	 	0000-00-00 00:00:00	 	select,insert,update	 
LastModID	varchar(64)	utf8_general_ci	NO	 	 	 	select,insert,update	 
Title	varchar(255)	utf8_general_ci	NO	MUL	 	 	select,insert,update	 
Title_html	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
Body	mediumtext	utf8_general_ci	NO	 	 	 	select,insert,update	 
Body_html	mediumtext	utf8_general_ci	NO	 	 	 	select,insert,update	 
Excerpt	text	utf8_general_ci	NO	 	 	 	select,insert,update	 
Excerpt_html	mediumtext	utf8_general_ci	NO	 	 	 	select,insert,update	 
Image	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
Category1	varchar(128)	utf8_general_ci	NO	MUL	 	 	select,insert,update	 
Category2	varchar(128)	utf8_general_ci	NO	 	 	 	select,insert,update	 
Annotate	int(2)	NULL	NO	 	0	 	select,insert,update	 
AnnotateInvite	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
comments_count	int(8)	NULL	NO	 	0	 	select,insert,update	 
Status	int(2)	NULL	NO	 	4	 	select,insert,update	 
textile_body	int(2)	NULL	NO	 	1	 	select,insert,update
textile_excerpt	int(2)	NULL	NO	 	1	 	select,insert,update
Section	varchar(64)	utf8_general_ci	NO	MUL	 	 	select,insert,update	 
override_form	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
Keywords	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
url_title	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_1	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_2	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_3	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_4	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_5	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_6	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_7	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_8	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_9	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
custom_10	varchar(255)	utf8_general_ci	NO	 	 	 	select,insert,update	 
uid	varchar(32)	utf8_general_ci	NO	 	 	 	select,insert,update	 
feed_time	date	NULL	NO	 	0000-00-00	 	select,insert,update

And this error occures when i’m trying to submit changes made in phpmyadmin:

How to make search not case sensitive?


Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?

Offline

#2 2008-01-28 08:54:38

the_ghost
Plugin Author
From: Minsk, The Republic of Belarus
Registered: 2007-07-26
Posts: 907
Website

Re: Disable case-sensetive search on cyryllic site

Is there way to make not case sensetive search?


Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?

Offline

#3 2008-01-28 08:56:20

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Disable case-sensetive search on cyryllic site

I think that I read somewhere here that the problem regarding this (which includes greek sites) is not with txp but with MySQL. Hopefully a developer or someone else with more knowledge than me will be able to provide you with a more technical explanation.

Last edited by colak (2008-01-28 08:57:30)


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#4 2008-01-28 11:56:22

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Disable case-sensetive search on cyryllic site

The SQL error you got in the opening post => remove the fulltext index on the textpattern table, make the change in collation, then re-create the index.

Offline

#5 2008-02-02 12:10:10

the_ghost
Plugin Author
From: Minsk, The Republic of Belarus
Registered: 2007-07-26
Posts: 907
Website

Re: Disable case-sensetive search on cyryllic site

I recrated fulltext index but search is still case sensetive :(


Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?

Offline

#6 2008-11-28 22:04:34

the_ghost
Plugin Author
From: Minsk, The Republic of Belarus
Registered: 2007-07-26
Posts: 907
Website

Re: Disable case-sensetive search on cyryllic site

Continue from this topic

Ruud adviced here to test some replacing:
$search = " and ($cols) $s_filter";
into:
$search = " and (match (`'.join('`, `', $cols)."`) against ('$q')) ".$s_filter;

But ^^ this line of code caused error. then i tried to apply this replace:
$search = " and (match (`Title`, `Body`) against ('$q')) ".$s_filter;

And it worked! Search among articles became case-insensetive. I’ll test this later on live hosting, but it seems that’s the snswer how to make case-insensetive search.

The other question is – <txp:search_result_excerpt /> – it doesn’t build excerpt from russian text. Can i help with tracing this problem?

Last edited by the_ghost (2008-11-28 23:53:06)


Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?

Offline

#7 2008-11-28 22:18:38

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Disable case-sensetive search on cyryllic site

Victor, doesnt search_result_excerpt show results when you enter a query that matches the case of the found result exactly… I mean, if you’re not relying on the search being case-insensitive?

In the function search_result_excerpt, try changing this line:

preg_match_all("/\b.{1,50}".preg_quote($q).".{1,50}\b/iu", $result, $concat);

into:

preg_match_all("/.{0,50}".preg_quote($q).".{0,50}/iu", $result, $concat);

Offline

#8 2008-11-29 00:11:22

the_ghost
Plugin Author
From: Minsk, The Republic of Belarus
Registered: 2007-07-26
Posts: 907
Website

Re: Disable case-sensetive search on cyryllic site

Just noticed – after making path at publish.php searching words less then 4 letters doesn’t work. If I remeber, this is sql match feature and can be fixed by changing some mysql parameter.

doesnt search_result_excerpt show results when you enter a query that matches the case of the found result exactly

Nope, search_result_excerpt doesn’t build output even in this occurance

In the function search_result_excerpt, try changing this line:
preg_match_all("/.{0,50}".preg_quote($q).".{0,50}/iu", $result, $concat);

Yeap! There is result, but with some strange thing – I have three equal articles (equal bodies) and search_result_excerpt is absent for last of them (screenshot).

And another strange thing – it doesn’t hilights query in search_result_excerpt if cases don’t match:

Last edited by the_ghost (2008-11-29 00:26:27)


Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?

Offline

#9 2008-11-29 13:12:11

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Disable case-sensetive search on cyryllic site

The minimum word length is set with the ft_min_len configuration parameter in MySQL. You’d have to re-create the fulltext index after changing this setting.

Not showing an excerpt for the last article…. what happens if you remove the article where the excerpt isn’t shown currently (the last one shown)? Do the other two still show up with an excerpt in the search results or is it now the second one (which is now the last result) that doesn’t show a result?

From what I’ve read, case-insensitivity support in PCRE (and PHP in general) is rather limited when dealing with UTF8 encoded strings. It works fine for the lower 7-bit, which is the US-ASCII character set, but for the multi byte characters, such as the cyrillic ones and basically any character with an accent, it depends on how PCRE was compiled on your system and even then it’s not guaranteed to work.

From the PCRE manual

Case-insensitive matching applies only to characters whose values are less than 128, unless PCRE is built with Unicode property support. Even when Unicode property support is available, PCRE still uses its own character tables when checking the case of low-valued characters, so as not to degrade performance. The Unicode property information is used only for characters with higher values. Even when Unicode property support is available, PCRE supports case-insensitive matching only when there is a one-to-one mapping between a letter’s cases. There are a small number of many-to-one mappings in Unicode; these are not sup- ported by PCRE.

Offline

#10 2009-01-19 23:53:28

the_ghost
Plugin Author
From: Minsk, The Republic of Belarus
Registered: 2007-07-26
Posts: 907
Website

Re: Disable case-sensetive search on cyryllic site

Came back again :) I really hope to make txp russian friendly in 4.0.8 release…

So, installed 4.0.8 RC2.

Search beahoviour without hacks (as is install):

  • cyryllic words are searched case sensetively – Южный != южный. I have 5 (five) articles with equal body (titles differ). Every article has text Южный.
    • If i search южный i get zero results.
    • If i search Южный i get 5 (five) result. And search_result_excerpt builds good search excerpt (with hilighing).

So, in “as is” install cyryllic search is case-senesetive. Lets try to apply a liittle hack at publish.php (appr. line 635):

-				$cols[$i] = "`$cols[$i]` rlike '$q'";
+				$cols[$i] = "lower(`$cols[$i]`) rlike lower('$q')";

Searching after hack:

  • cyryllic words are searched case *in*sensetively – Южный == южный.
    • If i search южный i get 5 results – five search excerpts, but search query isn’t hilighited though it’s displayed in search_result_excerpt
    • If i search Южный i get 5 (five) result. And search_result_excerpt builds good search excerpt (with hilighing).

1. I made some benchmarks and didn’t noticed rising nor query time, nor run time with hack of publish.php.
2. If patch for publish.php can be apllied in core, how to solve problem with hilighing search excerpt?


Providing help in hacking ATM! Come to courses and don’t forget to bring us notebook and hammer! What for notebook? What a kind of hacker you are without notebok?

Offline

#11 2009-02-12 11:15:10

warmrobot
Member
From: Moscow, Russia
Registered: 2007-01-22
Posts: 31
Website

Re: Disable case-sensetive search on cyryllic site

Ну слава Богу!

А то уж я думал, что поиск не пашет!

Can confirm, that Victor’s hack works pretty well for me. I think it must be included in future releases of TXP.

Offline

#12 2009-02-16 23:07:34

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Disable case-sensetive search on cyryllic site

  1. how many articles does your benchmarked website have and how much text is in them (kB)?
  2. no idea how to solve that without making the excerpt lower case.

Offline

Board footer

Powered by FluxBB