Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#1 2017-01-31 01:21:55
- GugUser
- Member
- From: Quito (Ecuador)
- Registered: 2007-12-16
- Posts: 1,473
Problems with umlauts in the search term
When I search for “Händel” in a website in German I’m working on, then the search result shows the titles of many articles without the search excerpt. There is nothing and no highlighted text. This articles doesn’t contain any word “Händel”. The only similar string I can found in one of the articles y analysed the database is the word “behandelt”.
What can I do for better search results?
Offline
Re: Problems with umlauts in the search term
I added the word “Händel” in the body of an article, I can easily search for it both from the front end and the back-end.
How is the the ä
displayed in the source code of the page? As ä
of ä
?
Or maybe it has something to do with the encoding of the database?
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline
Re: Problems with umlauts in the search term
@guguser, which DB charset do you use (see diagnostics!): utf-8 or latin1?
Offline
#4 2017-02-01 00:22:04
- GugUser
- Member
- From: Quito (Ecuador)
- Registered: 2007-12-16
- Posts: 1,473
Re: Problems with umlauts in the search term
Thanks phiw13 and ruud for your answers.
Perhaps my problem description was incomplete.
This is the site where the only work I did was the upgrade from Textpattern 4.0.8 to 4.6.2 and the implementation of the search function.
The details of the DB are:
Charset (default/config): latin1/utf8
character_set_client: utf8
character_set_connection: utf8
character_set_database: latin1
character_set_filesystem: binary
character_set_results: utf8
character_set_server: latin1
character_set_system: utf8
character_sets_dir: /usr/share/mysql/charsets/
17 Tables: OK
The search on the website finds articles with Händel, but it also finds articles without Händel but with any word that contains “andel”. The search makes no difference between ä
and a
.
The same happens when I make a search in PHPmyAdmin.
The umlauts in the articles are ä
etc., not entities.
Last edited by GugUser (2017-02-01 00:23:02)
Offline
Re: Problems with umlauts in the search term
Did you consider converting the tables to utf 8? rvm_latin1_to_utf8
Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.
Offline
Re: Problems with umlauts in the search term
This is the normal behavior of MySQL search for columns with _ci
(case insensitive) collation. You can try to convert Body
and Title
fields to utf8_bin
, but then the search will be both case and accent sensitive, which is very unpractical for German. It seems tricky to obtain accent-sensitive case-insensitive search.
Edit: or try converting both Title
and Body
fields to latin1_german1_ci
or latin1_general_ci
?
Offline
#7 2017-02-01 14:59:05
- GugUser
- Member
- From: Quito (Ecuador)
- Registered: 2007-12-16
- Posts: 1,473
Re: Problems with umlauts in the search term
Many thanks to everyone for answering my question.
At the moment I don’t dare to change something directly in the database. I don’t understand the logic behind the described behavior. Whether in UTF-8 or Latin ISO-8859-1 A
, Á
, Ä
, a
, á
and ä
etc. are peculiar characters, a
in UTF-8 is U+0061
, ä
is U+00E4
etc.
Nevertheless I tried in an other installation and searched for “Männerschmuck”. It found all with “Männerschmuck” and highlighted it. Then I searched for “Mann”. It found articles with “Mann”, showed the excerpts and highlighted “Mann”. But in the displayed search results there were also articles containing “Männerschmuck”, “fachmännisch” etc., but without displaying the excerpts and thus without highlighting.
Strange, I don’ understand why.
Offline
Re: Problems with umlauts in the search term
GugUser wrote #303782:
Nevertheless I tried in an other installation and searched for “Männerschmuck”. It found all with “Männerschmuck” and highlighted it. Then I searched for “Mann”. It found articles with “Mann”, showed the excerpts and highlighted “Mann”. But in the displayed search results there were also articles containing “Männerschmuck”, “fachmännisch” etc., but without displaying the excerpts and thus without highlighting.
Strange, I don’ understand why.
The search is performed by MySQL, the highlighting by PHP, this can explain the discrepancy. It’s strange, though, that with latin_1_general_ci
(?) collation, a search for “Mann” matches also “Männ” (not in my tests). Are you sure the bodies do not contain “mann” too?
Offline
#9 2017-02-02 01:42:38
- GugUser
- Member
- From: Quito (Ecuador)
- Registered: 2007-12-16
- Posts: 1,473
Re: Problems with umlauts in the search term
etc wrote #303784:
It’s strange, though, that with
latin_1_general_ci
(?) collation, a search for “Mann” matches also “Männ” (not in my tests).
What is the risk if a change to latin_1_general_ci
? The site has 1036 articles. Should I change the whole “textpattern” table?
Are you sure the bodies do not contain “mann” too?
Yes, you can try it yourself, sollberg-schmuck.ch (this is the website of my second search experiment).
Offline
Re: Problems with umlauts in the search term
The risk is to loose eventual non-latin characters, but latin1_general_ci
does not seem to work anyway. I have tested latin1_general_ci
on an outdated French site, and the search makes difference between “associe” and “associé”, but umlauts may need a special treatment. Have you tried latin1_german1_ci
? Converting just Title
and Body
should be ok, but it’s only my speculations without any warranty.
Offline
Re: Problems with umlauts in the search term
colak wrote #303776:
Did you consider converting the tables to utf 8? rvm_latin1_to_utf8
Try this, but backup first.
Offline
#12 2017-02-03 01:59:02
- GugUser
- Member
- From: Quito (Ecuador)
- Registered: 2007-12-16
- Posts: 1,473
Re: Problems with umlauts in the search term
Thank you both, etc and ruud.
I’ll try it in a few days, at the moment I am busy with other things.
Offline