Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Bug with single quote as apostrophe in search terms?
A client of mine made me aware of a problem with search. I can replicate it in my installations, so I assume this is a general problem, or maybe a bug.
Suppose to search for a term that has an apostrophe between two words, and the user type the single quote key for the apostrophe. Well, then the search results are buggy. They include some results that include the search terms, but not all. And the search_result_excerpt are blank!
But if you search for the same term but typing the real apostrophe (not the single quote), then you get a different search results list! And, this time, the search_result_excerpt are working.
I haven’t been able to identify some rule for the results output in the two cases. The search terms are always present in text.
If you want to replicate / observe the problem you can see it live on “one site of mine“http://www.ecologiadeisitiweb.net: just search for “architettura dell’informazione” (that is, information architecture) both with single quote or apostrophe, and you’ll see this different behavior. Mysql is 4.1.22.
Z-
Offline
#2 2008-06-17 14:22:33
- Mary
- Sock Enthusiast
- Registered: 2004-06-27
- Posts: 6,236
Re: Bug with single quote as apostrophe in search terms?
The FULLTEXT index is on the raw text that you enter, not the Textile-parsed text, which might explain the difference. As for the excerpt tag, I don’t know, but that tag depends upon a PHP regular expression, so I think the expression just needs adjusting?
Offline
Re: Bug with single quote as apostrophe in search terms?
The excerpt tag works on the parsed text, while the actual searching (as Mary indicated) is done on the raw text. Because textile changes the quote, it display the excerpt properly because the exact search term isn’t found in parsed text.
Offline
Re: Bug with single quote as apostrophe in search terms?
Yes, I checked it out and can confirm: the raw text surprisingly (for me…) contains the apostrophe or single quote accordingly with search results. Thanks, also for the excerpt explanation.
Not sure if this can be considered a bug, but now my problem is: there is something that should be done to take care of this in a safe and east way for textpattern? Clients or collaborators are not expected to know the difference from single quote and apostrophe when inserting or copying text in articles. And it is weird that two related function are executed on two different (even if somewhat related) data sources.
I initially thought of parsing the raw text before saving (even via plugin, even if I don’t know how difficult it is) to replace apostrophe with single quote. That would solve the majority of search situations, I suppose, because user almost often search with single quote, not apostrophe.
But then I realized that excerpt would never work on search terms containing single quote…
At the opposite, parsing the raw text for single quote into apostrophe before saving would miss any research done with single quote… I’ve got headache.
Any idea?… Maybe some parsing done directly on the search term entered by users before doing the search would solve? Is there a way to catch single quote and apostrophe, excluding them both from search in the raw text and in the excerpt generation, as they are ininfluent?…
…mumbling…
Offline
Re: Bug with single quote as apostrophe in search terms?
I’m with you Zanza. Over at neme.org we receive the texts in .doc format which contain smart quotes, all kinds of dashes, apostrophes, etc.
For now we run a series of search and replace routines in BBEdit but that is only because we are careful about it. Most users would not do it.
As Txp is a multi-lingual cms the problem of course is on every single language.
Somehow I think that although a solution should eventually be found I can also appreciate the complexities.
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Bug with single quote as apostrophe in search terms?
The solution will probaby be to totally ignore non-alphanumeric chars while searching.
Offline
Re: Bug with single quote as apostrophe in search terms?
ruud wrote:
The solution will probaby be to totally ignore non-alphanumeric chars while searching.
Yes, after a sleeping night I now think the same. :) Is it something that can (quite) easily be accomplished in the core or it may need plugin?
Offline
Re: Bug with single quote as apostrophe in search terms?
A plugin would be very inefficient.
Changing this in core is not trivial as it probably requires completely changing the way searching is done.
Offline
Re: Bug with single quote as apostrophe in search terms?
Mmm… I see.
So is there any chance this could be done (or tried) in the core? As Colak noted, this is a problem that affect many non-english site (even english site, actually, but it is just less frequent).
Also, changing the way searching is done could probably be a good chance to offer some more option by default, as AND or literal searching is maybe not the most efficient, but I don’t want to go deeper in this, it’s another topic. I just feel that a more robust search would simply benefit TXP, and that non-alphanumeric character problem should be taken into account as bottom-line. Should we open a “future ideas” thread?
Offline
Re: Bug with single quote as apostrophe in search terms?
No need for a new topic. It’s a known issue that we’ll have to address at some point.
Offline
Re: Bug with single quote as apostrophe in search terms?
Good. Thanks for considering it.
Offline