Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2015-06-11 15:35:09

Pat64
Plugin Author
From: France
Registered: 2005-12-12
Posts: 1,599
GitHub Twitter

SQL query for similar (enought) articles

Hi folks ;)

Because I need to retrieve some correspondences from similar (enough) titles, I want to use a query based on the first 2 or 3 words:

global $thisarticle;
// Keep only 2 or 3 first words (depending) into the current article's title
$title = strtok( $thisarticle['title'], ' ' ).' '.strtok(' ');
// Try a query to find similar enough article into an another section
$rs = safe_row('Title, Status, Section, url_title', 'textpattern', "1=1 and Title like '".$title."_' and Section = 'other-section' and Status = '4'");

But, unfortunatly I haven’t any results.
Could you please correct me? Tks.

Last edited by Pat64 (2015-06-11 15:38:19)


Patrick.

Github | CodePen | Codier | Simplr theme | Wait Me: a maintenance theme | [\a mi.ni.ma]: a “Low Tech” simple Blog theme.

Offline

#2 2015-06-11 16:00:23

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: SQL query for similar (enought) articles

Assuming your tokenization works (which you can verify by issuing dmp($title), try adding the wildcard % character instead of the single-character wildcard (underscore):

... and Title like '".$title."%' and Section = 'other-section' ...

The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#3 2015-06-11 16:11:09

Pat64
Plugin Author
From: France
Registered: 2005-12-12
Posts: 1,599
GitHub Twitter

Re: SQL query for similar (enought) articles

Thanks Stef for your reply !

The article tokenization works well but unfortunately the query still fails…

WTF.


Patrick.

Github | CodePen | Codier | Simplr theme | Wait Me: a maintenance theme | [\a mi.ni.ma]: a “Low Tech” simple Blog theme.

Offline

#4 2015-06-11 18:22:10

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: SQL query for similar (enought) articles

Hmm, I haven’t tried it out so I’m not sure why offhand. Try adding , 1 just before the closing bracket of the safe_row() call. That will dump out the query that the code is using. If you take that and paste it into phpMyAdmin’s Query tool and run it you’ll be able to see if it’s outputting anything at all. If not, then you can at least tweak the query there to try and figure out why.

Once you’ve got it going we can explore ways to make the query safe from injection attacks… but let’s get it working first.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#5 2015-06-11 18:40:31

Pat64
Plugin Author
From: France
Registered: 2005-12-12
Posts: 1,599
GitHub Twitter

Re: SQL query for similar (enought) articles

OK.
Thanks lot for your advice, Stef.
I’ll try tomorow morning and give you the result.

;)


Patrick.

Github | CodePen | Codier | Simplr theme | Wait Me: a maintenance theme | [\a mi.ni.ma]: a “Low Tech” simple Blog theme.

Offline

#6 2015-06-12 07:05:27

Pat64
Plugin Author
From: France
Registered: 2005-12-12
Posts: 1,599
GitHub Twitter

Re: SQL query for similar (enought) articles

Ok. I found the problem and a kind of solution.

First of all, the article title contains changed signs in use by some languages as the French one: quotes ' (this default sign on keyboards) are transformed by a sign (which is better).

Second, because I couldn’t get good results with the like query argument, I tried REGEXP instead with much better succes.

So here is the new query:

$rs = safe_row('Title, Status, Section, url_title', 'textpattern', "1=1 and Title REGEXP '".$title."' and Section = 'other-section' and Status = '4'");

which have good results.

Unfortunatly, I use 2 str_replace PHP functions in order to sanitize my part (first 2 or 3 words) of the global $thisarticle['title']

Is there a better choice?

Last edited by Pat64 (2015-06-12 07:06:22)


Patrick.

Github | CodePen | Codier | Simplr theme | Wait Me: a maintenance theme | [\a mi.ni.ma]: a “Low Tech” simple Blog theme.

Offline

#7 2015-06-12 08:09:48

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: SQL query for similar (enought) articles

Pat64 wrote #291505:

I use 2 str_replace PHP functions

You can do it one str_replace() if you pass arrays in:

str_replace(
   array('1st_needle', '2nd_needle'),
   array('1st_replacement', '2nd_replacement'),
   $haystack
);

The arrays can be of any length, with any ‘missing’ replacements that don’t have a corresponding needle being set to the empty string.

EDIT: but see what makss wrote below, as there’s more to this than just replacing characters.

Last edited by Bloke (2015-06-12 10:38:19)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#8 2015-06-12 10:21:11

makss
Plugin Author
From: Ukraine
Registered: 2008-10-21
Posts: 355
Website

Re: SQL query for similar (enought) articles

Pat64 wrote #291505:

First of all, the article title contains changed signs in use by some languages as the French one: quotes ' (this default sign on keyboards) are transformed by a sign (which is better).

Always use mysql_real_escape_string() or safe_escape() in Txp.
$title = safe_escape($title); // before sql query

Get first 2-3-4… words from string. (Change {2} and specify the required number of words)

	$q = "qq111 ww222 e33333 444 5555 666";
	if( preg_match('%^((.*? ){2})(.*)$%su', $q, $mm) ){
		print_r($mm);
	}
Array
(
    [0] => qq111 ww222 e33333 444 5555 666
    [1] => qq111 ww222                      // first 2 words
    [2] => ww222 
    [3] => e33333 444 5555 666              // other part of string
)

aks_cron : Cron inside Textpattern | aks_article : extended article_custom tag
aks_cache : cache for TxP | aks_dragdrop : Drag&Drop categories (article, link, image, file)

Offline

#9 2015-06-12 10:42:26

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,271
Website GitHub

Re: SQL query for similar (enought) articles

makss wrote #291509:

Always use mysql_real_escape_string() or safe_escape() in Txp

Makss’ solution for extracting the words is far better. But try to avoid mysql_real_escape_string() unless painted into a corner and you have no choice. It’s deprecated. Using safe_escape() or other Txp core functions will at least permit your code to migrate seamlessly to PDO when that bit of the core is written, as the burden is on us devs to provide a short-term upgrade path.

We may well deprecate such functions too in favour of class methods at some point, but our functions are likely to be around a lot longer than PHP keeps its mysql_*() calls.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#10 2015-06-15 07:26:42

Pat64
Plugin Author
From: France
Registered: 2005-12-12
Posts: 1,599
GitHub Twitter

Re: SQL query for similar (enought) articles

Ok guys. I found the problem.

First of all, I changed this time consuming REGEXP by LIKE into my query (10 times faster):

$rs = safe_row('Title, Status, Section, url_title', 'textpattern', "1=1 and Title like '%".$title."%' and Section = 'other-section' and Status = '4'");

Second, simple ' quote signs are changed by utf-8 ' ones from my article’s title. Because MySQL stores only plain text, I can’t get the corresponding result from my query:

$title = safe_escape($thisarticle['title']);

Could you tell me please, the best choice to transform my article’s title into plain MySQL friendly text?

Thank you by advance.

Notice: all str_replace PHP function applied on this text do not get good results…

Last edited by Pat64 (2015-06-15 07:29:04)


Patrick.

Github | CodePen | Codier | Simplr theme | Wait Me: a maintenance theme | [\a mi.ni.ma]: a “Low Tech” simple Blog theme.

Offline

#11 2015-06-15 08:00:55

etc
Developer
Registered: 2010-11-11
Posts: 5,053
Website GitHub

Re: SQL query for similar (enought) articles

Pat64 wrote #291546:

simple ' quote signs are changed by utf-8 ' ones from my article’s title. Because MySQL stores only plain text, I can’t get the corresponding result from my query:

$title = safe_escape($thisarticle['title']);...

Singe quotes are transformed into &#039; when you call <txp:title />, in $thisarticle['title'] they should be ok. The problem must come from somewhere else. You can try to apply htmlspecialchars_decode to be sure:

$title = safe_escape(htmlspecialchars_decode($thisarticle['title'], ENT_QUOTES));

Offline

#12 2015-06-15 08:35:14

Pat64
Plugin Author
From: France
Registered: 2005-12-12
Posts: 1,599
GitHub Twitter

Re: SQL query for similar (enought) articles

Thanks lot etc ;)

Nope: &#039; are still here and htmlspecialchars_decode() function didn’t do nothing…

Last edited by Pat64 (2015-06-15 08:36:49)


Patrick.

Github | CodePen | Codier | Simplr theme | Wait Me: a maintenance theme | [\a mi.ni.ma]: a “Low Tech” simple Blog theme.

Offline

Board footer

Powered by FluxBB