Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2010-03-07 18:36:57

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Article searches since r3316

Having recently resubscribed to the txp-dev mailing list I’m just catching up. Since I’m a bit stupid and can’t figure out how to reply to a single topic on the list I’m going to post here instead.

One of the things ruud highlighted about r3316 is the following:

No point in using RLIKE if you escape the extended regex characters.
Might as well use LIKE instead, which is probably a lot faster as well and
definitely easier than creating a built-in validator for regex search strings.

Combining doSlash with preg_quote may not be what you want. For example:
Searching for '+' is escaped to '\\+' which means you’re getting results for
strings that contain 1 or more consecutive backslashes.

Having read up on preg_quote() I’m inclined to agree that it’s actually not the right thing to do, although it does work in this instance. Also, RLIKE is indeed regarded as slower than LIKE. In my rudimentary tests I haven’t been able to get a search for + or * to match any articles that contain double backslash characters. That’s not to say it can’t happen — I’ve probably not tested hard enough.

So what I’m struggling with is how to actually get it to work without borking. If I simply remove the preg_quote and switch over to LIKE instead of RLIKE, I get no results for any searches using:

'title_body_excerpt' => "Title like '$crit_escaped' or Body like '$crit_escaped' or Excerpt like '$crit_escaped'"

I figured this was to do with ‘like’ not being wild by default. So altering it to:

'title_body_excerpt' => "Title like '%".$crit_escaped."%' or Body like '%".$crit_escaped."%' or Excerpt like '%".$crit_escaped."%'"

Seems to allow me to search reasonably reliably without error, but if I search for literal \\ I then also pick up articles with single backslashes in them because I guess the first backslash is being treated as an escape character(?)

I’d also have to change all the other lines from RLIKE to LIKE — they’ve always been that way for reasons best known to the person who wrote it. So, my question is, should I just ditch the preg_quote() and the RLIKEs, replacing them with wildcarded LIKE matches and take the hit on someone searching for \\ getting slightly wrong results? Or does anybody have any better ideas or a more robust mechanism for searching?


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#2 2010-03-07 21:32:29

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Article searches since r3316

http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like

Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.

So I guess after doSlash() you’d have to escape the backslashes once more.

Offline

#3 2010-03-16 11:26:54

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: Article searches since r3316

ruud wrote:

So I guess after doSlash() you’d have to escape the backslashes once more.

OK, thanks for the hint. Would:

$crit_escaped = addslashes(doSlash($crit));

cut it? It seems to work on my test rig after altering all rlikes to wild likes. Or would we then get into problems with magic_quotes_gpc?

If so, how about this… with a suitable explanation line above it! :

$crit_escaped = doSlash(doSlash($crit));

I’m really not well-versed enough in regex voodoo or escaping to know if this has any knock-on effects. Here’s the revised $critsql block, for reference:

$critsql = array(
	'id'                 => "ID in ('" .join("','", do_list($crit_escaped)). "')",
	'title_body_excerpt' => "Title like '%".$crit_escaped."%' or Body like '%".$crit_escaped."%' or Excerpt like '%".$crit_escaped."%'",
	'section'            => "Section like '%".$crit_escaped."%'",
	'keywords'           => "FIND_IN_SET('".$crit_escaped."',Keywords)",
	'categories'         => "Category1 like '%".$crit_escaped."%' or Category2 like '%".$crit_escaped."%'",
	'status'             => "Status = '".(@$sesutats[gTxt($crit_escaped)])."'",
	'author'             => "AuthorID like '%".$crit_escaped."%'",
	'article_image'      => "Image in ('" .join("','", do_list($crit_escaped)). "')",
	'posted'             => "Posted like '$crit_escaped%'",
	'lastmod'            => "LastMod like '$crit_escaped%'"
);

EDIT: searching for % or _ now causes a problem though… they return all articles because they are MySQL wildcards :-( Sam got round this in publish.php with str_replace(array('%','_'), array('\\%','\\_'), $crit). If those are the only two wildcards MySQL uses then that might need factoring in as well?

EDIT 2: and if replacing/fixing this search mechanism, it would probably be prudent to do the same on the other searchable pages: Files, Images, Links, Comments, Visitor Logs because searches for % or _ fail there too.

EDIT 3: But foreign character searches are now bogus :-( Searching for results in articles being returned that don’t contain that character sequence. Argghhhh!

Last edited by Bloke (2010-03-16 11:56:48)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#4 2010-06-04 11:00:13

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: Article searches since r3316

Based on Sam’s work in r3349, I’ve refactored the bogus attempt I made in the first place to now properly escape MySQL meta characters (% and _) and backslashes in the article searches on the admin side (r3357). In the process I’ve removed the last few rlike matches as per ruud’s suggestion.

I’ve also extended this to all tabs that have a search box (comment, file, image, link, and log).

Repercussions:

  • Searching for % or _ characters no longer return all articles. They are treated as literal characters
  • Searching for \ or \\ or + or *, etc are also treated literally
  • No more regex warnings
  • Searches remain case insensitive
  • Searches are accent insensitive ( = sa and é = e)

Acceptable? Any side-effects?


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#5 2010-06-05 20:28:28

thebombsite
Archived Plugin Author
From: Exmouth, England
Registered: 2004-08-24
Posts: 3,251
Website

Re: Article searches since r3316

I’ve just upgraded sites to r3358 and I now have some missing images. If you go to thebombsite you will see 2 litlle boxes where there are supposed to be some auto-generated thumbnails. I’m using the thumb.php script to generate these and also the newer <txp:image_list /> tag in the following block of code:-

p(portfolio). ==<txp:image_list id="790"><a rel="prettyPhoto" href="<txp:image_url />"><img class="thumb" src="<txp:site_url />thumb.php?src=<txp:image_url />&amp;w=150&amp;q=80" alt="<txp:title />" /></a></txp:image_list>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<txp:image_list id="791"><a rel="prettyPhoto" href="<txp:image_url />"><img class="thumb" src="<txp:site_url />thumb.php?src=<txp:image_url />&amp;w=150&amp;q=80" alt="<txp:title />" /></a></txp:image_list>==

I haven’t changed this code at all so it was working fine and now it isn’t. The thing that partly brings me to this thread is that if you click on one of the empty boxes you should get a lightbox image of the full-size page but instead you get a message to the effect of “Image cannot be loaded. Make sure the path is correct and image exist.”.

Now I mention all this here because I see you have been doing a bit of jiggery-pokery with “backslash” and I’m wondering if somehow that has messed with the output of the <txp:image_url /> tag seeing as how it is contained within the <txp:image_list></txp:image_list> wrapper??


Stuart

In a Time of Universal Deceit
Telling the Truth is Revolutionary.

Offline

#6 2010-06-05 21:06:03

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: Article searches since r3316

thebombsite wrote:

“Image cannot be loaded. Make sure the path is correct and image exist.”.

Well that’ll be because it’s looking for images called 7900.jpg and 7910.jpg, when it should be looking for 790.jpg and 791.jpg. Question is, where has the extra ‘0’ come from…

jiggery-pokery with “backslash” and I’m wondering if somehow that has messed with the output of the <txp:image_url />

… it’s not related to the backslash thing (that’s admin-side only) but there has been a change to the way images are handled insofar as changesets r3350-r3352 have introduced a new global specifically for images and a new internal function for generating the image URLs.

It seems to be a bug in the way the new function imagesrcurl() interacts with the thumbnail in the image_list when making the URL. The image_list() function passes thumbnail as ‘0’ (indicating no thumbnail) so when it’s passed into the function it adds this value ‘0’ to the URL. It does it on my dev site too.

We should either change the image_list function so it always renders ‘false’ instead of ‘0’ or, probably better, explicitly detect the emptiness of the thumbnail parameter inside imagesrcurl() and reset it to ''. My guess is that altering line 2050 of txplib_misc.php from:

if ($thumbnail) $thumbnail = 't';

to:

$thumbnail = ($thumbnail) ? 't' : '';

will fix it. I’ll check with wet to see if that’s the best way to go about it and he’ll no doubt issue a suitable fix in due course.

Thanks for the report.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#7 2010-06-05 21:13:02

Gocom
Developer Emeritus
From: Helsinki, Finland
Registered: 2006-07-14
Posts: 4,533
Website

Re: Article searches since r3316

Bloke wrote:

‘false’ instead of ‘0’

False would just return 0 when echoed, just like it currently does, right? So yes, check if it is false and then change the output to ''.

Last edited by Gocom (2010-06-05 21:14:49)

Offline

#8 2010-06-05 21:23:11

thebombsite
Archived Plugin Author
From: Exmouth, England
Registered: 2004-08-24
Posts: 3,251
Website

Re: Article searches since r3316

Well I can confirm that the mod to txplib_misc.php fixes the problem so I shall use it as a quick-fix until something more official turns up. ;)


Stuart

In a Time of Universal Deceit
Telling the Truth is Revolutionary.

Offline

Board footer

Powered by FluxBB