Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2021-07-08 12:15:38

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 446
Website

How to get exact search results?

When I search for “Ashiya” (the name of a town in Japan) on my Textpattern site, I also get articles with the term “Higashiyamate” (an area in Kyoto) because: “Hig ashiya mate”.

I can’t seem to find anything in the tags to remedy this.

I only want articles with the exact term, eg.:

  1. Articles with “Ashiya”, not articles with words that contain “ashiya”.
  2. When searching for “loc”, only articles with “loc”, not articles with “location”, “clock” or “woodblock”…

How can I fix this?

Last edited by Kjeld (2021-07-08 12:16:45)


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#2 2021-07-08 12:45:55

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: How to get exact search results?

At the moment I’m not sure you can fix it. Using match="exact" or surrounding search terms with double quotes only treats phrasal content (with spaces) as a grouped term. It still does a LIKE match under the hood.

Further, we seem to automatically trim the search term, otherwise you could search for “ashiya “ or “ ashiya” and find the term without the compound word being found. But the former wouldn’t help you if the article contained the sentence “in Ashiya!” or with some other punctuation after the word.

One solution is that we could introduce a regex-style match feature. Using RLIKE or a user-defined matching pattern would enable us (you) to specify how the match was performed. Not entirely sure how to do that so I’ll take guidance if anyone has any clues.

The other, slightly low-key and inelegant, approach might be to somehow not trim the spaces off the search term. Maybe introduce some escape parameter or add a dedicated attribute so you could elect to bypass this automatic behaviour.

Better, we could change the way search works so it automatically leaves spaces intact and will only strip them off if you use trim or escape="trim" (or tidy) etc in the <txp:search_input> tag.

Finally, we could maybe find a way to trigger a binary search. There is a way to do that on some types of field but it’s hard-coded based on the field properties. Maybe there’s some mechanism we can put under user control that alters the search collation so you could pick a binary (or foreign language) collation when performing matches. This might be useful when we introduce multi-lingual content so it’s something we may need to think about anyway.

Failing that… any thoughts, Oleg? Not sure if etc_search has any tricks up its sleeve.

Last edited by Bloke (2021-07-08 12:49:04)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#3 2021-07-08 12:54:55

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 446
Website

Re: How to get exact search results?

Bloke wrote #330936:

At the moment I’m not sure you can fix it. Using match="exact" or surrounding search terms with double quotes only treats phrasal content (with spaces) as a grouped term. It still does a LIKE match under the hood.

Further, we seem to automatically trim the search term, otherwise you could search for “ashiya “ or “ ashiya” and find the term without the compound word being found. But the former wouldn’t help you if the article contained the sentence “in Ashiya!” or with some other punctuation after the word.

One solution is that we could introduce a regex-style match feature. Using RLIKE or a user-defined matching pattern would enable us (you) to specify how the match was performed. Not entirely sure how to do that so I’ll take guidance if anyone has any clues.

The other, slightly low-key and inelegant, approach might be to somehow not trim the spaces off the search term. Maybe introduce some escape parameter or add a dedicated attribute so you could elect to bypass this automatic behaviour.

Better, we could change the way search works so it automatically leaves spaces intact and will only strip them off if you use trim or escape="trim" (or tidy) etc in the <txp:search_input> tag.

Finally, we could maybe find a way to trigger a binary search. There is a way to do that on some types of field but it’s hard-coded based on the field properties. Maybe there’s some mechanism we can put under user control that alters the search collation so you could pick a binary (or foreign language) collation when performing matches. This might be useful when we introduce multi-lingual content so it’s something we may need to think about anyway.

Failing that… any thoughts, Oleg? Not sure if etc_search has any tricks up its sleeve.

Thanks a lot, Bloke!

I keep on asking the tough questions, don’t I…

I seem to vaguely remember that a very long time ago I changed something in phpMyAdmin to fix this, but can’t recall what and how. And my memory might be misleading me.


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#4 2021-07-08 13:08:32

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: How to get exact search results?

I expect it was to switch to a binary collation, but that will probably only help you match case-sensitive content. That in itself might be useful because a search for Ashiya would return only that term, and a search for ashiya would only pick it in the compound form of Higashiyamate.

But that won’t help you with the ‘loc’ thing because that appears in the lower-case form in many words.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#5 2021-07-08 14:04:25

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 446
Website

Re: How to get exact search results?

Bloke wrote #330938:

I expect it was to switch to a binary collation, but that will probably only help you match case-sensitive content. That in itself might be useful because a search for Ashiya would return only that term, and a search for ashiya would only pick it in the compound form of Higashiyamate.

But that won’t help you with the ‘loc’ thing because that appears in the lower-case form in many words.

Very clear. Thank you.


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#6 2021-07-09 14:29:33

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: How to get exact search results?

Bloke wrote #330936:

One solution is that we could introduce a regex-style match feature. Using RLIKE or a user-defined matching pattern would enable us (you) to specify how the match was performed. Not entirely sure how to do that so I’ll take guidance if anyone has any clues.

Not pretending to have any special knowledge, but, from what I’ve read, RLIKE is

  • much slower than LIKE
  • not multi-byte safe
  • more subject to injections

A poor man solution (if no pagination is implied) would be to post-filter the search results via <txp:evaluate />. A better, but more demanding way, would be searching by cf or keywords.

Not sure if etc_search has any tricks up its sleeve.

It must be possible to use RLIKE with etc_search, but… see above.

Offline

#7 2021-07-09 14:57:33

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: How to get exact search results?

Though we have a FULLTEXT index on Title, Body, so NATURAL LANGUAGE MODE could help. Testing needed.

Offline

#8 2021-07-09 15:03:27

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: How to get exact search results?

Yeah, if we can avoid RLIKE I’d be happier. The performance and security aren’t ideal.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#9 2021-07-09 16:08:36

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 446
Website

Re: How to get exact search results?

Bloke wrote #330951:

Yeah, if we can avoid RLIKE I’d be happier. The performance and security aren’t ideal.

If I must choose, I’d rather have unrelated results than lower security and a slower website…


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#10 2021-07-10 09:53:41

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: How to get exact search results?

In my tests, NATURAL LANGUAGE MODE search works rather well, except for very common (English) words like ‘and’, ‘name’ and so on that it fails to match. Should we add some m="natural" search mode to core, or is it too marginal and etc_search will suffice?

Offline

#11 2021-07-10 10:05:43

Kjeld
Member
From: Tokyo, Japan
Registered: 2005-02-05
Posts: 446
Website

Re: How to get exact search results?

etc wrote #330956:

In my tests, NATURAL LANGUAGE MODE search works rather well, except for very common (English) words like ‘and’, ‘name’ and so on that it fails to match. Should we add some m="natural" search mode to core, or is it too marginal and etc_search will suffice?

If it can be safely added to the core, I think it would be a plus.


Old Photos of Japan – Japan in the 1850s~1960s (100% txp)
MeijiShowa – Stock photos of Japan in the 1850s~1960s (100% txp)
JapaneseStreets.com – Japanese street fashion (mostly txp)

Offline

#12 2021-07-10 13:47:31

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,250
Website GitHub

Re: How to get exact search results?

I’m all for it if it works. The option is always nice.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

Board footer

Powered by FluxBB