Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#13 2004-02-24 15:48:39

Dean
Founder (Gone, but not forgotten)
From: Languedoc
Registered: 2004-02-14
Posts: 235
Website

Re: Textile Internalization

Clearly there should be a ‘noencode’ option for Textile, one that allows utf-8 to pass through intact, yes?


text*

Offline

#14 2004-02-24 16:02:36

pospel
Member
From: Ukraine
Registered: 2004-02-23
Posts: 40
Website

Re: Textile Internalization

> Dean wrote:

> Clearly there should be a ‘noencode’ option for Textile, one that allows utf-8 to pass through intact, yes?

I think so. And charset (meta content-type) assigning should work as variable, not hardcoded in html, because it always changed for western, cyrillic, and asian encodings.

Offline

#15 2004-02-24 18:17:34

bici
Member
From: vancouver
Registered: 2004-02-24
Posts: 2,100
Website Mastodon

Re: Textile Internalization

howdy – if i understood the exchanges re language options
i take it that txp/textile can parse utf-8 …. is this correct. need to know before i start to dive in. and dean mny thx for all this.

ciao

P.S. Oggi ci troviamo in una punta piccolissima di tempo. Entro questa fine settimana cambier


…. texted postive

Offline

#16 2004-02-24 19:16:46

mamash
Member
From: Prague
Registered: 2004-02-21
Posts: 127
Website

Re: Textile Internalization

2Pospel: The beauty of Unicode (UTF8) is that is allows for virtually any language coding. In a decent modern browser (including Netscape4), UTF8 is supported and allows to work with Russian etc. languages just fine.

2Wil: Quotes are language specific, or at least for some languages. For example there is an official norm in Czech Republic which clearly states which quotes are allowed (i.e. the German ones) and their possible nesting behaviour. Using other quoting patterns is considered as breaking typographical/language rules, although it just might be justifiable in some cases. In Russia the french/french alternative is prevailing, at least in classical printed matter, although Internet often breaks these rules because not all CMS/editors are ready for this.

I belive this is a great opportunity for users of different countries and languages to collaborate and help Dean build a system of Textile plugins (?) for different languages. English is fine as a globalization lingua franca, but preserving traditional rules is an intellectual challenge par excellence (or am I just being too conservative?).


Who’s gonna textdrive you home tonight?

Offline

#17 2004-02-24 20:36:32

mamash
Member
From: Prague
Registered: 2004-02-21
Posts: 127
Website

Re: Textile Internalization

Well, at least you can say “we are using German quotes”. It sounds a little different from when we (the Czech) say “we are using German quotes” – for we do :)))


Who’s gonna textdrive you home tonight?

Offline

#18 2004-02-24 22:02:31

RickCogley
New Member
Registered: 2004-02-24
Posts: 3

Re: Textile Internalization

Japanese also has its own style of quotes, which look like (hopefully):

「TEST」

… and there are “quote like” japanese charset representations of standard western quotes, as well, which do not show up if you cannot view the japanese charset.


Rick Cogley :: Tokyo Japan

Offline

#19 2004-02-25 07:41:38

pospel
Member
From: Ukraine
Registered: 2004-02-23
Posts: 40
Website

Re: Textile Internalization

> mamash wrote:

> 2Pospel: The beauty of Unicode (UTF8) is that is allows for virtually any language coding. In a decent modern browser (including Netscape4), UTF8 is supported and allows to work with Russian etc. languages just fine.

I wish it work as you tell :) But if i leave textpattern charsets unchenged using it’s utf8 the almost 1/3 of all symbols displays wrong and plenty of text is just “eaten” by utf8 conversion functions. So, verdict – it doesn’t work correctly with cyrillic letters using utf-8.

new:
It’s seems that in admin mode while editing and reediting articles, they display correct, but in site view plenty of text messy and just gone. Also, title or articles is distort and randomly trimmed in both admin mode and live site view.

Last edited by pospel (2004-02-25 07:53:13)

Offline

#20 2004-02-25 09:36:48

mamash
Member
From: Prague
Registered: 2004-02-21
Posts: 127
Website

Re: Textile Internalization

So the problem obviously lies in the quality of PHP encoding functions. UTF8 works in general, believe me. No problem mixing English, Russian, Arabic and Japanese on the same page. ;)


Who’s gonna textdrive you home tonight?

Offline

#21 2004-02-25 09:53:26

Dean
Founder (Gone, but not forgotten)
From: Languedoc
Registered: 2004-02-14
Posts: 235
Website

Re: Textile Internalization

As far as Textile is concerned, it also depends if PHP has been built with mbstring functions. Otherwise, Textile is forced to rely on htmlentities(), which are dodgy at best.

If Txp is running on a machine using PHP as CGI (in many ways the preferred setup), then loading the mbstring libraries on the fly is possible.


text*

Offline

#22 2004-02-25 10:22:10

pospel
Member
From: Ukraine
Registered: 2004-02-23
Posts: 40
Website

Re: Textile Internalization

Dean, I think, i have figured the problem.

In method Textilethis() we have text preproccessing functions:

$text = $this->incomingEntities($text);
$text = $this->encodeEntities($text);

and in encodeEntities()

function encodeEntities($text)
{ return (function_exists(‘mb_encode_numericentity’)) ? $this->encode_high($text) : htmlentities($text, ENT_NOQUOTES, “utf-8”);
}

so the problem is php function htmlentities() – converting non-iso characters is buggy. Quick quotaion from php manual comments:

———

If you ever plan to support more than ISO8859-1 characters, you should stop using htmlentities() right now. Instead, you should:

1) Use htmlspecialchars() whereever you use htmlentities()
2) Add the character encoding you use to your html file:

<meta http-equiv=“Content-Type” content=“text/html; charset=iso-8859-1”>

This will make sure that all characters reach the end-user of your webpages unharmed. When you change to UTF-8, use the mb_strlen instead of strlen etc., make sure your text is in utf-8 everywhere, and use this one instead in the <head> section of your html file:

<meta http-equiv=“Content-Type” content=“text/html; charset=iso-8859-1”>

In case you use XHTML instead of old HTML, please note that the character encoding is also specified in the xml header. Replace this:

<?xml version=“1.0” encoding=“iso-8859-1”?>

with this:

<?xml version=“1.0” encoding=“utf-8”?>
——

read more here.

When i commented this $text = $this->encodeEntities($text), all look ok.

So, we need custom charset definition and ability to switch off htmlentities() conversion.

Last edited by pospel (2004-02-25 10:24:16)

Offline

Board footer

Powered by FluxBB