Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 2008-08-09 10:30:28

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,371
Website GitHub

Re: smd_fuzzy_find: making search results less precise

whocarez

Weird that you get the timeout. Do you have a lot of articles or sections? If so, can you limit the plugin to only look in particular sections to cut down the article pool? I’ve not tested the plugin with more than about 70 articles so I don’t know how it performs. I guessed it might be slow but I didn’t realise it was that slow. Also, the internal server error is worrying. Do you have anything in your server logs you can send me/post from that time that might shed any light on which part of the script is failing?

Regarding the timeout code, I can probably add that as an option for those that need it. I’m a bit snowed under still, so I’d like to get to the bottom of this problem first and see if there’s a better fix, but if it requires big plugin changes, so be it.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#14 2008-08-09 13:24:29

whocarez
Plugin Author
From: Germany/Ukraine
Registered: 2007-10-08
Posts: 305
Website GitHub Twitter

Re: smd_fuzzy_find: making search results less precise

Bloke wrote:

Weird that you get the timeout. Do you have a lot of articles or sections? If so, can you limit the plugin to only look in particular sections to cut down the article pool?

Ok, my site has at the moment 588 articles in three sections and the textpattern table in the database has about 6,6 MB of total usage.
I changed the fuzzy_find call a little bit and I think – better I feel :-) – that it helped a bit, but not at all. Now, it looks like this.

<txp:smd_fuzzy_find tolerance="3" form="search_results" section="article" category="Politik, Wirtschaft, Gesellschaft" />

If I change the tolerance to a value of “2” the problem also appears.

I’ve not tested the plugin with more than about 70 articles so I don’t know how it performs. I guessed it might be slow but I didn’t realise it was that slow. Also, the internal server error is worrying. Do you have anything in your server logs you can send me/post from that time that might shed any light on which part of the script is failing?

I doubt that it helps you, because the log is too short, but I can’t get more from my provider. Here is the snipped log. Maybe there is a special expression you are looking for????

anon-91-14-250-115.t-dialin.net - - [09/Aug/2008:14:29:57 +0200] "GET /?q=Julja HTTP/1.1" 500 540 "-" "Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.8.1.16) Gecko/20080702 Iceweasel/2.0.0.16 (Debian-2.0.0.16-0etch1)"

the corresponding textpattern log is this one, but I’m not sure, because of the time difference.

09 Aug 2008 14:28:54  	91.14.251.231  	p5B0EFBE7.​dip.​t-​dialin.​net  	?​q=Julja  	   	GET  	200

I tested some expressions and the problematic ones are – of course – the often appearing ones. For example: the site has the main theme Ukraine. If a user is misspelling Ukraine and is looking for “uraine” then he will get this “500 Internal Server Error” with a fuzzy tolerance of “2” and “3”. The term “ukraine” occurs in nearly every second article. The above used term “julja” has no exact counterpart in article texts. The term “julia”, the nearest hit, occurs nearly in every third article, so maybe that’s why a fuzzy tolerance of “2” is working, but “3” not. I switched now to tolerance of “1” that is working for all my examples, but the results are not every time satisfying … ;)

Regarding the timeout code, I can probably add that as an option for those that need it. I’m a bit snowed under still, so I’d like to get to the bottom of this problem first and see if there’s a better fix, but if it requires big plugin changes, so be it.

Ok, I understand ….

Offline

#15 2008-11-30 19:28:57

els
Moderator
From: The Netherlands
Registered: 2004-06-06
Posts: 7,458

Re: smd_fuzzy_find: making search results less precise

First, how this plugin works is a complete mystery to me, so forgive me if this is a dumb question… I installed wet_haystack and it would be great if smd_fuzzy_find could use the same search index. Is this possible at all?

Offline

#16 2008-11-30 19:38:20

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,371
Website GitHub

Re: smd_fuzzy_find: making search results less precise

Els wrote:

First, how this plugin works is a complete mystery to me

And me :-)

I installed wet_haystack and it would be great if smd_fuzzy_find could use the same search index. Is this possible at all?

I have no idea right now. I saw wet’s plugin earlier and thought “damn clever”. Leave it with me and when I do my “4.0.7 update check” for all my plugins I’ll see if this needs updating to take advantage of the full text searches and if it can read stuff from wet_haystack.

Thanks for reminding me, I’d almost forgotten about this plugin.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#17 2008-11-30 19:41:31

els
Moderator
From: The Netherlands
Registered: 2004-06-06
Posts: 7,458

Re: smd_fuzzy_find: making search results less precise

I wrote:

you are the fastest replying plugin author I know!!

I need to repeat myself ;)

Thanks!

Offline

#18 2008-12-02 22:15:44

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,371
Website GitHub

Re: smd_fuzzy_find: making search results less precise

New version is out:

v0.2 | compressed | Requires smd_lib v0.33 and Textpattern 4.0.7

Features:

  • Support for wet_haystack; searches whichever database fields you tell it
  • Improved search accuracy so it tries not to return partial words
  • Hopefully is not restricted to ASCII text any more: it only filters out Unicode punctuation and leaves everything else intact
  • Added container ability
  • Added delim option
  • Enhanced category/section matching to allow looking at <txp:variable /> and URL vars (using smd_if, you could now offer “advanced” search facilities where visitors can choose how fuzzy they want their search!)
  • Renamed subcats to sublevel and extended it to allow you to choose how far down the tree you traverse
  • Fixed some edge case bugs
  • Improved plugin help and debug output

Try it and see how you get on, reporting any plugin misdemeanours or enhancements here as always.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#19 2008-12-03 11:56:38

els
Moderator
From: The Netherlands
Registered: 2004-06-06
Posts: 7,458

Re: smd_fuzzy_find: making search results less precise

Thank you! I installed it right away last night in my local install, started playing with it, and this morning I overslept… ;)
It’s looking great, wonderful to have it search my keywords (tags) as well. The search accuracy definitely seems to be a lot better! I’ll put in on the live site this evening.

Thanks again :)

Offline

#20 2009-01-28 18:05:51

pieman
Member
From: Bristol, UK
Registered: 2005-09-22
Posts: 491
Website

Re: smd_fuzzy_find: making search results less precise

just got around to trying this, and not surprisingly it’s another minor classic from Bloke :-)

thanks Stef

Offline

#21 2009-04-07 20:52:24

els
Moderator
From: The Netherlands
Registered: 2004-06-06
Posts: 7,458

Re: smd_fuzzy_find: making search results less precise

This is silly… I’ve been using this plugin for ages and only now I notice that there is no sort attribute for the smd_fuzzy_find tag. Even sillier is that I have if_different in my search_results form (and sort="section" in the article tag) to display the section titles and only now I notice that the fuzzy_find output is of course looking quite messy because those results are not sorted by section…

So, since I managed to overlook this for a year or so, I guess it’s not that important ;) But I’m still curious if it would be easy to add a sort attribute… If it isn’t, never mind, I’ll just remove the if_different.

Edit: nevermind, please forget the request. I decided it’s a better solution to just show the words, not the articles.

I have another question though, Stef. I just updated from 0.2 to 0.21, but forgot that I messed with the plugin code (again… bad habit, I know) to output a question mark after every suggested word. And now I can’t remember how I did that… everything I try gives me an internal server error, which of course serves me right :)

Last edited by els (2009-04-07 21:42:57)

Offline

#22 2009-04-08 13:04:09

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,371
Website GitHub

Re: smd_fuzzy_find: making search results less precise

Els

The search results are supposed to be (emphasis on supposed) returned in weight order, i.e. the most likely candidates at the top. I’ve not looked at the code for a while so I don’t know offhand if it’s possible to reorder the output by some arbitrary means. But if it is possible I’ll make it so in the next release.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#23 2009-04-08 16:57:29

els
Moderator
From: The Netherlands
Registered: 2004-06-06
Posts: 7,458

Re: smd_fuzzy_find: making search results less precise

Bloke wrote:

But if it is possible I’ll make it so in the next release.

No, please don’t! I’ve left the search results out now, and actually I like it better this way. So just forget that I asked, OK? Apologies for waisting your time!

Offline

#24 2010-05-13 11:55:42

alanfluff
Member
From: Ottawa, Canada
Registered: 2008-09-15
Posts: 222
Website

Re: smd_fuzzy_find: making search results less precise

Hi Stef, thanks (againnn) for a brilliant plugin! Please can I ask a likely stupid question, it says I need to use smd_lib v0.33 — is it OK to use the latest ver of that, I’m about to use 0.36?

Tks! Cheers, -Alan


At LAST I’ve cheerfully donated to the core devs at #TXP. I only wish I were able to give more. Thanks to the devs and ALL fellow TXPers. -A

Offline

Board footer

Powered by FluxBB