Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2017-01-31 01:21:55

GugUser
Member
From: Quito (Ecuador)
Registered: 2007-12-16
Posts: 1,473

Problems with umlauts in the search term

When I search for “Händel” in a website in German I’m working on, then the search result shows the titles of many articles without the search excerpt. There is nothing and no highlighted text. This articles doesn’t contain any word “Händel”. The only similar string I can found in one of the articles y analysed the database is the word “behandelt”.

What can I do for better search results?

Offline

#2 2017-01-31 05:59:59

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,058
Website

Re: Problems with umlauts in the search term

I added the word “Händel” in the body of an article, I can easily search for it both from the front end and the back-end.

How is the the ä displayed in the source code of the page? As ä of ä ?
Or maybe it has something to do with the encoding of the database?


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#3 2017-01-31 21:34:13

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Problems with umlauts in the search term

@guguser, which DB charset do you use (see diagnostics!): utf-8 or latin1?

Offline

#4 2017-02-01 00:22:04

GugUser
Member
From: Quito (Ecuador)
Registered: 2007-12-16
Posts: 1,473

Re: Problems with umlauts in the search term

Thanks phiw13 and ruud for your answers.

Perhaps my problem description was incomplete.

This is the site where the only work I did was the upgrade from Textpattern 4.0.8 to 4.6.2 and the implementation of the search function.

The details of the DB are:

Charset (default/config): latin1/utf8
character_set_client: utf8
character_set_connection: utf8
character_set_database: latin1
character_set_filesystem: binary
character_set_results: utf8
character_set_server: latin1
character_set_system: utf8
character_sets_dir: /usr/share/mysql/charsets/
17 Tables: OK

The search on the website finds articles with Händel, but it also finds articles without Händel but with any word that contains “andel”. The search makes no difference between ä and a.

The same happens when I make a search in PHPmyAdmin.

The umlauts in the articles are ä etc., not entities.

Last edited by GugUser (2017-02-01 00:23:02)

Offline

#5 2017-02-01 07:02:52

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: Problems with umlauts in the search term

Did you consider converting the tables to utf 8? rvm_latin1_to_utf8


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#6 2017-02-01 09:34:37

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: Problems with umlauts in the search term

This is the normal behavior of MySQL search for columns with _ci (case insensitive) collation. You can try to convert Body and Title fields to utf8_bin, but then the search will be both case and accent sensitive, which is very unpractical for German. It seems tricky to obtain accent-sensitive case-insensitive search.

Edit: or try converting both Title and Body fields to latin1_german1_ci or latin1_general_ci?

Offline

#7 2017-02-01 14:59:05

GugUser
Member
From: Quito (Ecuador)
Registered: 2007-12-16
Posts: 1,473

Re: Problems with umlauts in the search term

Many thanks to everyone for answering my question.

At the moment I don’t dare to change something directly in the database. I don’t understand the logic behind the described behavior. Whether in UTF-8 or Latin ISO-8859-1 A, Á, Ä, a, á and ä etc. are peculiar characters, a in UTF-8 is U+0061, ä is U+00E4 etc.

Nevertheless I tried in an other installation and searched for “Männerschmuck”. It found all with “Männerschmuck” and highlighted it. Then I searched for “Mann”. It found articles with “Mann”, showed the excerpts and highlighted “Mann”. But in the displayed search results there were also articles containing “Männerschmuck”, “fachmännisch” etc., but without displaying the excerpts and thus without highlighting.

Strange, I don’ understand why.

Offline

#8 2017-02-01 16:52:53

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: Problems with umlauts in the search term

GugUser wrote #303782:

Nevertheless I tried in an other installation and searched for “Männerschmuck”. It found all with “Männerschmuck” and highlighted it. Then I searched for “Mann”. It found articles with “Mann”, showed the excerpts and highlighted “Mann”. But in the displayed search results there were also articles containing “Männerschmuck”, “fachmännisch” etc., but without displaying the excerpts and thus without highlighting.

Strange, I don’ understand why.

The search is performed by MySQL, the highlighting by PHP, this can explain the discrepancy. It’s strange, though, that with latin_1_general_ci (?) collation, a search for “Mann” matches also “Männ” (not in my tests). Are you sure the bodies do not contain “mann” too?

Offline

#9 2017-02-02 01:42:38

GugUser
Member
From: Quito (Ecuador)
Registered: 2007-12-16
Posts: 1,473

Re: Problems with umlauts in the search term

etc wrote #303784:

It’s strange, though, that with latin_1_general_ci (?) collation, a search for “Mann” matches also “Männ” (not in my tests).

What is the risk if a change to latin_1_general_ci? The site has 1036 articles. Should I change the whole “textpattern” table?

Are you sure the bodies do not contain “mann” too?

Yes, you can try it yourself, sollberg-schmuck.ch (this is the website of my second search experiment).

Offline

#10 2017-02-02 09:28:59

etc
Developer
Registered: 2010-11-11
Posts: 5,028
Website GitHub

Re: Problems with umlauts in the search term

The risk is to loose eventual non-latin characters, but latin1_general_ci does not seem to work anyway. I have tested latin1_general_ci on an outdated French site, and the search makes difference between “associe” and “associé”, but umlauts may need a special treatment. Have you tried latin1_german1_ci? Converting just Title and Body should be ok, but it’s only my speculations without any warranty.

Offline

#11 2017-02-02 19:27:21

ruud
Developer Emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,068
Website

Re: Problems with umlauts in the search term

colak wrote #303776:

Did you consider converting the tables to utf 8? rvm_latin1_to_utf8

Try this, but backup first.

Offline

#12 2017-02-03 01:59:02

GugUser
Member
From: Quito (Ecuador)
Registered: 2007-12-16
Posts: 1,473

Re: Problems with umlauts in the search term

Thank you both, etc and ruud.

I’ll try it in a few days, at the moment I am busy with other things.

Offline

Board footer

Powered by FluxBB