Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2009-04-30 10:11:37

ultramega
Member
Registered: 2006-02-26
Posts: 221

Strange UTF-8 thing

In short:

When I look source code for a new site I’m building, scandianvian special chars are not showing properly. Like the chracter ä (&auml), is replaced with two chars. But page is ok in browser window.

So it is (at least currently) not a problem, but to be on safe side I’d like to know what is causing this odd thing, as I haven’t earlier seen this and all my settings are like always.

In Deep:

  • Default source viewer is notepad2. It shows the source is encoded as “ansi”.
  • When I look source with Firefox built-in viewer, these chars are ok.
  • Page-template table in database is set UTF-8. When I look tables content in db, it is correct.
  • Looking sources for other sites I have in same server, (with same database settings also), notepad says they are UTF-8 and chars are showing right.
  • TXP’s built-in templates like “default” are showing correctly.
  • I have correct utf-8 character set in template code.
  • If I put this php-block to start of the that problem page header(‘Content-type: text/html; charset=UTF-8’); -the opening source code is right in Notepad too.
  • Page template code is showing correctly in TXP’s editor too.

The only difference between this and earlier sites I can see, is the procedure of building templates. Earlier sites I have coded straight in TXP, but with this new one, I have first made working html-based demo, from where I’m now copypasting code to TXP. And yes, these html-pages are saved in UTF-8 too :) These are not “Bom”-files either, if that makes a difference.

So which mysterious thing somewhere between these elements is causing this? Virtual beer for the solver!

Last edited by ultramega (2009-04-30 10:15:12)

Offline

#2 2009-04-30 11:49:18

trenc
Plugin Author
From: ⛵️, currently Göteborg, SE
Registered: 2008-02-27
Posts: 574
Website GitHub

Re: Strange UTF-8 thing

Hi,

Like the chracter ä (&auml), is replaced with two chars.

This happens if you look at an utf-8 enconded text, but the tool decode it as iso-8859-X, ansi (1-byte vs. 2(-4)-byte per char).
So your site is correct utf-8, but notepad2 don’t show it in utf-8. The Firefox source viewer will always take the encoding the site was sended, it will be correct displayed.

In the end: I think it’s a notepad2 issue, that will not detect the correct encoding from the file (source code). Maybe you have to set it manually?


Digital nomad, sailing the world on a sailboat: 32fthome.com

Offline

#3 2009-05-01 10:44:51

ultramega
Member
Registered: 2006-02-26
Posts: 221

Re: Strange UTF-8 thing

So would I think too, but it seems not to be the reason: when I look sources for every other site (especially my own sites on same server-space=same settings) notepad shows them correctly, but with this particular site (and with its self made template) not.

Differences with this upcoming and existing sites are actually two:

  • this time I copypasted html to txp-editor
  • new one uses sub-domain

Otherwise everything else seems to be same. I even copypasted charset notifcation from working html to this new to see I haven’t made any copyerror/typo. Not. Strange!

Offline

#4 2009-05-05 12:12:53

ultramega
Member
Registered: 2006-02-26
Posts: 221

Re: Strange UTF-8 thing

Still fighting with this: if I use xhtml strict doctype (like I almost always do), notepad2 opens source wrong. If I replace it with transitional, characters are showing right. Ok, but all other mentioned sites (on same server) uses strict too, and they still show correctly. Anyone understand this? I can even take everything out of the template, leaving only basic html stuff there, and it still doesn’t work.

Sure, it is not doing any harm as everything renders nicely out, but it is annoying to know there is something out of your control :)

Offline

Board footer

Powered by FluxBB