Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
How to get exact search results?
When I search for “Ashiya” (the name of a town in Japan) on my Textpattern site, I also get articles with the term “Higashiyamate” (an area in Kyoto) because: “Hig ashiya mate”.
I can’t seem to find anything in the tags to remedy this.
I only want articles with the exact term, eg.:
- Articles with “Ashiya”, not articles with words that contain “ashiya”.
- When searching for “loc”, only articles with “loc”, not articles with “location”, “clock” or “woodblock”…
How can I fix this?
Last edited by Kjeld (2021-07-08 12:16:45)
• Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
• MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
• JapaneseStreets.com – Japanese street fashion (mostly txp)
Offline
Re: How to get exact search results?
At the moment I’m not sure you can fix it. Using match="exact"
or surrounding search terms with double quotes only treats phrasal content (with spaces) as a grouped term. It still does a LIKE match under the hood.
Further, we seem to automatically trim the search term, otherwise you could search for “ashiya “ or “ ashiya” and find the term without the compound word being found. But the former wouldn’t help you if the article contained the sentence “in Ashiya!” or with some other punctuation after the word.
One solution is that we could introduce a regex-style match
feature. Using RLIKE or a user-defined matching pattern would enable us (you) to specify how the match was performed. Not entirely sure how to do that so I’ll take guidance if anyone has any clues.
The other, slightly low-key and inelegant, approach might be to somehow not trim the spaces off the search term. Maybe introduce some escape
parameter or add a dedicated attribute so you could elect to bypass this automatic behaviour.
Better, we could change the way search works so it automatically leaves spaces intact and will only strip them off if you use trim
or escape="trim"
(or tidy) etc in the <txp:search_input>
tag.
Finally, we could maybe find a way to trigger a binary search. There is a way to do that on some types of field but it’s hard-coded based on the field properties. Maybe there’s some mechanism we can put under user control that alters the search collation so you could pick a binary (or foreign language) collation when performing matches. This might be useful when we introduce multi-lingual content so it’s something we may need to think about anyway.
Failing that… any thoughts, Oleg? Not sure if etc_search has any tricks up its sleeve.
Last edited by Bloke (2021-07-08 12:49:04)
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: How to get exact search results?
Bloke wrote #330936:
At the moment I’m not sure you can fix it. Using
match="exact"
or surrounding search terms with double quotes only treats phrasal content (with spaces) as a grouped term. It still does a LIKE match under the hood.Further, we seem to automatically trim the search term, otherwise you could search for “ashiya “ or “ ashiya” and find the term without the compound word being found. But the former wouldn’t help you if the article contained the sentence “in Ashiya!” or with some other punctuation after the word.
One solution is that we could introduce a regex-style
match
feature. Using RLIKE or a user-defined matching pattern would enable us (you) to specify how the match was performed. Not entirely sure how to do that so I’ll take guidance if anyone has any clues.The other, slightly low-key and inelegant, approach might be to somehow not trim the spaces off the search term. Maybe introduce some
escape
parameter or add a dedicated attribute so you could elect to bypass this automatic behaviour.Better, we could change the way search works so it automatically leaves spaces intact and will only strip them off if you use
trim
orescape="trim"
(or tidy) etc in the<txp:search_input>
tag.Finally, we could maybe find a way to trigger a binary search. There is a way to do that on some types of field but it’s hard-coded based on the field properties. Maybe there’s some mechanism we can put under user control that alters the search collation so you could pick a binary (or foreign language) collation when performing matches. This might be useful when we introduce multi-lingual content so it’s something we may need to think about anyway.
Failing that… any thoughts, Oleg? Not sure if etc_search has any tricks up its sleeve.
Thanks a lot, Bloke!
I keep on asking the tough questions, don’t I…
I seem to vaguely remember that a very long time ago I changed something in phpMyAdmin to fix this, but can’t recall what and how. And my memory might be misleading me.
• Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
• MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
• JapaneseStreets.com – Japanese street fashion (mostly txp)
Offline
Re: How to get exact search results?
I expect it was to switch to a binary collation, but that will probably only help you match case-sensitive content. That in itself might be useful because a search for Ashiya would return only that term, and a search for ashiya would only pick it in the compound form of Higashiyamate.
But that won’t help you with the ‘loc’ thing because that appears in the lower-case form in many words.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: How to get exact search results?
Bloke wrote #330938:
I expect it was to switch to a binary collation, but that will probably only help you match case-sensitive content. That in itself might be useful because a search for Ashiya would return only that term, and a search for ashiya would only pick it in the compound form of Higashiyamate.
But that won’t help you with the ‘loc’ thing because that appears in the lower-case form in many words.
Very clear. Thank you.
• Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
• MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
• JapaneseStreets.com – Japanese street fashion (mostly txp)
Offline
Re: How to get exact search results?
Bloke wrote #330936:
One solution is that we could introduce a regex-style
match
feature. Using RLIKE or a user-defined matching pattern would enable us (you) to specify how the match was performed. Not entirely sure how to do that so I’ll take guidance if anyone has any clues.
Not pretending to have any special knowledge, but, from what I’ve read, RLIKE
is
- much slower than
LIKE
- not multi-byte safe
- more subject to injections
A poor man solution (if no pagination is implied) would be to post-filter the search results via <txp:evaluate />
. A better, but more demanding way, would be searching by cf or keywords.
Not sure if etc_search has any tricks up its sleeve.
It must be possible to use RLIKE
with etc_search
, but… see above.
Offline
Re: How to get exact search results?
Though we have a FULLTEXT
index on Title, Body
, so NATURAL LANGUAGE MODE
could help. Testing needed.
Offline
Re: How to get exact search results?
Yeah, if we can avoid RLIKE I’d be happier. The performance and security aren’t ideal.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: How to get exact search results?
Bloke wrote #330951:
Yeah, if we can avoid RLIKE I’d be happier. The performance and security aren’t ideal.
If I must choose, I’d rather have unrelated results than lower security and a slower website…
• Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
• MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
• JapaneseStreets.com – Japanese street fashion (mostly txp)
Offline
Re: How to get exact search results?
In my tests, NATURAL LANGUAGE MODE
search works rather well, except for very common (English) words like ‘and’, ‘name’ and so on that it fails to match. Should we add some m="natural"
search mode to core, or is it too marginal and etc_search
will suffice?
Offline
Re: How to get exact search results?
etc wrote #330956:
In my tests,
NATURAL LANGUAGE MODE
search works rather well, except for very common (English) words like ‘and’, ‘name’ and so on that it fails to match. Should we add somem="natural"
search mode to core, or is it too marginal andetc_search
will suffice?
If it can be safely added to the core, I think it would be a plus.
• Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
• MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
• JapaneseStreets.com – Japanese street fashion (mostly txp)
Offline
Re: How to get exact search results?
I’m all for it if it works. The option is always nice.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline