Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Pages: 1
#1 2009-12-29 17:27:43
- matthijs
- Member
- Registered: 2008-12-15
- Posts: 32
Character set problems UTF-8 or latin
Backing up a database from a textpattern site I noticed weird characters.
Website in browser with headers UTF-8 displays
ö and ô
phpMyAdmin and sql dump files (using the Txp plugin admin_dbmanager) displays
ö and ô
This has to do with the character encoding. I’ve read tons about it, but still fail to understand what’s going on here. Is the first line UTF-8? Is the second line (the one from the sql file and within phpmyadmin) something else, like latin-1?
Can I do something about this? I want to have good backups from the database, and having all these weird characters in them makes me uncomfortable
Thanks!
Offline
Re: Character set problems UTF-8 or latin
Try adding $txpcfg['dbcharset'] = 'latin1';
to your textpattern.cfg file on the new install.
If that solves the problem, you can then use my plugin rvm_latin1_to_utf8 to convert the database to proper UTF8, assuming you have a non-ancient MySQL version installed. (but be sure to have good backups before doing so).
UTF-8 is a multibyte charset, which means that it can require multiple bytes to store a single character, which is what happens for all characters that are not part of US-ASCII, such as: ö and ô
If you look at that in a dump file, which treats it latin1 characters (latin1 always uses 1 byte per character) you see: ö and ô
The accented o’s take up 2 bytes per character, each of those bytes corresponds with 1 latin1 character, so it is displayed as 2 latin1 characters.
Last edited by ruud (2009-12-29 20:52:14)
Offline
Re: Character set problems UTF-8 or latin
Seems to me that your db is configured as UTF-8 but only your editor reading the sql dump and you phpMyAdmin are displaying the UTF-8 content as ISO-8859-1 (latin1). So the the weird chars are displayed as Ruud described.
Are you sure your editor and phpMyAdmin is configured as UTF-8?
Digital nomad, sailing the world on a sailboat: 32fthome.com
Offline
#4 2009-12-30 15:17:10
- matthijs
- Member
- Registered: 2008-12-15
- Posts: 32
Re: Character set problems UTF-8 or latin
Ruud and trenc, thanks for the answers.
Looking at the config file I use now on the site, it is
$txpcfg[‘dbcharset’] = ‘latin1’;
That’s weird though, since it seems the characters are in fact utf8
And just to be clear: the character ö does display as ö on the website
My textediter is Textmate, file Encoding is UTF8.
Looking at the response headers from phpMyAdmin:
bc.
Date: Wed, 30 Dec 2009 15:09:15 GMT
Server: Apache
X-Powered-By: PHP/5.2.5
Set-Cookie: pmaCookieVer=4; expires=Fri, 29-Jan-2010 15:09:15 GMT; path=/phpMyAdmin/; httponly
phpMyAdmin=YfX6edgtefE4wduhehdE46SrbETtV2; path=/phpMyAdmin/; HttpOnly
pma_fontsize=100%25; expires=Fri, 29-Jan-2010 15:09:15 GMT; path=/phpMyAdmin/; httponly
pma_lang=en-utf-8; expires=Fri, 29-Jan-2010 15:09:16 GMT; path=/phpMyAdmin/; httponly
pma_charset=iso-8859-1; expires=Fri, 29-Jan-2010 15:09:16 GMT; path=/phpMyAdmin/; httponly
pma_collation_connection=utf8_unicode_ci; expires=Fri, 29-Jan-2010 15:09:16 GMT; path=/phpMyAdmin/; httponly
pma_theme=original; expires=Fri, 29-Jan-2010 15:09:16 GMT; path=/phpMyAdmin/; httponly
Expires: Wed, 30 Dec 2009 15:09:16 GMT
Cache-Control: no-store, no-cache, must-revalidate, pre-check=0, post-check=0, max-age=0
Last-Modified: Wed, 30 Dec 2009 15:09:16 GMT
X-ob_mode: 1
Pragma: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Keep-Alive: timeout=15, max=91
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
200 OK
bc.
That’s the frustrating thing with character sets. I can look at any character but never know what it is that I’m seeing …
What I could also try is write a PHP script and do a direct select from the db, making sure the connection is set as utf-8
Offline
Re: Character set problems UTF-8 or latin
$txpcfg[‘dbcharset’] = ‘latin1’
means that the database thinks it’s storing latin1, when in fact TXP uses it to store UTF-8. If you then create a backup of such a table and store it as UTF-8, you’re seeing the latin1 representation of UTF-8 encoded in UTF-8. It’s probably normal.
I found that annoying, which is why I wrote a plugin to convert the database to latin1 instead of utf-8.
I’d recommend against doing a direct select using a UTF-8 connection to the database, especially since the dbcharset is set to latin1.
Offline
#6 2009-12-30 17:04:53
- matthijs
- Member
- Registered: 2008-12-15
- Posts: 32
Re: Character set problems UTF-8 or latin
So in the meantime I wrote a short test script in php and that confirmed that the characters are utf-8. When setting the headers of the script to utf-8 and doing a select the chars display fine. Without the header(utf8) they display messy.
So it’s the fault of phpMyAdmin which incorrectly retrieves the data. Now I have to look for some db backup script which does do it correct
Offline
Pages: 1