Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2012-11-05 20:16:56

harri
Member
From: ita
Registered: 2004-11-25
Posts: 15
Website

[4.4 -> 4.5 upgrade] garbled charset

Ok, I upgraded my installation yesterday as I had already done for several websites, but this time something went wrong. The site in question is in Italian, the character set is UTF-8, and since the upgrade all accents and hyphens are garbled. You can see for yourself:
http://papuasia.afasici.net/article/317/isolate-the-enemy

I don’t know if my hosting provider has changed something on the server and I have very little understanding of how the character set is managed in MySQL/PHP/HTML.

I need help :-)

Offline

#2 2012-11-06 01:24:15

uli
Moderator
From: Cologne
Registered: 2006-08-15
Posts: 4,303

Re: [4.4 -> 4.5 upgrade] garbled charset

I’ve seen similar when I copied text from a PDF, and also with some German umlaut characters I then had to change to HTML entities. But this here is too much to come from PDF copying alone. And is contained in the source code so it’s not a font issue.

  • When you post a new article, what happens to the characters in question therein?
  • What value for “collation” do you see for the “textpattern” table? (In phpMyAdmin: Click the database name for your TXP installation, look for the line beginning with “textpattern”.)
  • In phpMyAdmin, click “textpattern” in the left frame/column, then click the “Structure” tab. What values for “collation” do you see now for Body and Body_html?
  • Now switch to the “Browse” tab for table “textpattern”: Are the weird characters contained in the database in both fields, Body and Body_html?

In bad weather I never leave home without wet_plugout, smd_where_used and adi_form_links

Offline

#3 2012-11-06 10:16:22

harri
Member
From: ita
Registered: 2004-11-25
Posts: 15
Website

Re: [4.4 -> 4.5 upgrade] garbled charset

uli wrote:

I’ve seen similar when I copied text from a PDF, and also with some German umlaut characters I then had to change to HTML entities. But this here is too much to come from PDF copying alone. And is contained in the source code so it’s not a font issue.

When you post a new article, what happens to the characters in question therein?

They display perfectly, all accents and hyphens work as they should

What value for “collation” do you see for the “textpattern” table? (In phpMyAdmin: Click the database name for your TXP installation, look for the line beginning with “textpattern”.)

utf8_unicode_ci

In phpMyAdmin, click “textpattern” in the left frame/column, then click the “Structure” tab. What values for “collation” do you see now for Body and Body_html?

utf8_unicode_ci

Now switch to the “Browse” tab for table “textpattern”: Are the weird characters contained in the database in both fields, Body and Body_html?

Yes. Garbled in both fields

Thank you very much!

Last edited by harri (2012-11-06 10:16:53)

Offline

#4 2012-11-06 13:24:39

uli
Moderator
From: Cologne
Registered: 2006-08-15
Posts: 4,303

Re: [4.4 -> 4.5 upgrade] garbled charset

harri wrote:

They display perfectly, all accents and hyphens work as they should

So we can probably exclude that Textile is involved. Let’s see:

Yes. Garbled in both fields

Are the garbled characters the same in both fields, i.e. if you find a è in the Body field, do you find it exactly like that also in the Body_html field? No hidden invisible characters, in any of the fields? (For testing that, put your cursor into a word before an occurrence and watch the cursor moving while you press the right arrow key.)

utf8_unicode_ci

That’s like it should be. And I don’t think that such conversions take place during a TXP update. But your site looks bi-lingual, do you happen to have MLP installed, maybe installed the necessary update? If so, please post in the MLP topic so the MLP devs are informed.

OK, back to fixing the problem: In case the garbled characters are painstakingly the same in both fields, I’d do a search and replace with phpMyAdmin, after backing up the database, of course. If you don’t know how to do that, just ask.


In bad weather I never leave home without wet_plugout, smd_where_used and adi_form_links

Offline

#5 2012-11-06 13:29:01

maniqui
Member
From: Buenos Aires, Argentina
Registered: 2004-10-10
Posts: 3,070
Website

Re: [4.4 -> 4.5 upgrade] garbled charset

Hi Harri.
Check if the rvm_latin1_to_utf8 plugin helps you.
Just remember to back up your database before using the plugin.


La música ideas portará y siempre continuará

TXP Builders – finely-crafted code, design and txp

Offline

#6 2012-11-06 13:47:14

uli
Moderator
From: Cologne
Registered: 2006-08-15
Posts: 4,303

Re: [4.4 -> 4.5 upgrade] garbled charset

Didn’t know that plugin, Julían, thanks. I just tested it in a sandbox, and the mentioned è character pair remained unchanged.


In bad weather I never leave home without wet_plugout, smd_where_used and adi_form_links

Offline

#7 2012-11-06 22:12:58

maniqui
Member
From: Buenos Aires, Argentina
Registered: 2004-10-10
Posts: 3,070
Website

Re: [4.4 -> 4.5 upgrade] garbled charset

Hi Uli.
Was your sandboxed DB on latin1 to begin with?


La música ideas portará y siempre continuará

TXP Builders – finely-crafted code, design and txp

Offline

#8 2012-11-06 23:49:49

uli
Moderator
From: Cologne
Registered: 2006-08-15
Posts: 4,303

Re: [4.4 -> 4.5 upgrade] garbled charset

That’s a legitimate question. No, it was like harri’s a utf8_unicode_ci. But your call made me read Ruud’s post more carefully this time, and I saw that his plugin is only for DBs created with MySQL < 4.1, so it’s no use simply “reverting” my newer DB to latin1.

But harri will perhaps remember in which year the initial installation took place. It’ll be not too difficult to find out whether it might have been one of the affected MySQL versions back then.


In bad weather I never leave home without wet_plugout, smd_where_used and adi_form_links

Offline

#9 2012-11-07 05:15:58

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,007
Website GitHub Mastodon Twitter

Re: [4.4 -> 4.5 upgrade] garbled charset

What if you add AddDefaultCharset UTF-8 in your htaccess?


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#10 2012-11-07 09:58:12

harri
Member
From: ita
Registered: 2004-11-25
Posts: 15
Website

Re: [4.4 -> 4.5 upgrade] garbled charset

Hello everybody and thank you very much for your help!

The database was indeed very old, this site was setup around 2003 and since then I changed several hosting providers. I’ve been with the current hosting for two or three years now. However I tried both rvm_latin1_to_utf8 and the .htaccess solution, which did not work, then I simply replaced the weird characters in a dump of the textpattern database and reimported it with phpmyadmin.
Rough solution, if you want, but it worked :-)

I also realized that I still had a bunch of old plugins active but no longer in use, things like rss_suparchive and chh_article_custom (no MLP though), so maybe this could have been a factor.

However, it is all solved now. Thank you very much for your help guys!

Offline

#11 2012-11-07 13:03:50

uli
Moderator
From: Cologne
Registered: 2006-08-15
Posts: 4,303

Re: [4.4 -> 4.5 upgrade] garbled charset

harri wrote:

I simply replaced the weird characters in a dump of the textpattern database and reimported it with phpmyadmin.
Rough solution, if you want, but it worked :-)

Glad it worked for you!

This is no afterwards criticism, just a word of warning to those looking for a solution to their own character problems:

I’d not recommend this method cause you never know what character pairs exactly are used in other tables, e.g. in plugins or from additional script installations sharing the same database. I’d have done a search/replace from inside phpMyAdmin, and only on these tables that are affected. And once again: Backup before!


In bad weather I never leave home without wet_plugout, smd_where_used and adi_form_links

Offline

Board footer

Powered by FluxBB