Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#16 2004-02-24 18:17:34

bici
Member
From: vancouver
Registered: 2004-02-24
Posts: 2,289
Website Mastodon

Re: Textile Internalization

howdy – if i understood the exchanges re language options
i take it that txp/textile can parse utf-8 …. is this correct. need to know before i start to dive in. and dean mny thx for all this.

ciao

P.S. Oggi ci troviamo in una punta piccolissima di tempo. Entro questa fine settimana cambier


…. texted postive

Offline

#17 2004-02-24 19:16:46

mamash
Member
From: Prague
Registered: 2004-02-21
Posts: 128
Website

Re: Textile Internalization

2Pospel: The beauty of Unicode (UTF8) is that is allows for virtually any language coding. In a decent modern browser (including Netscape4), UTF8 is supported and allows to work with Russian etc. languages just fine.

2Wil: Quotes are language specific, or at least for some languages. For example there is an official norm in Czech Republic which clearly states which quotes are allowed (i.e. the German ones) and their possible nesting behaviour. Using other quoting patterns is considered as breaking typographical/language rules, although it just might be justifiable in some cases. In Russia the french/french alternative is prevailing, at least in classical printed matter, although Internet often breaks these rules because not all CMS/editors are ready for this.

I belive this is a great opportunity for users of different countries and languages to collaborate and help Dean build a system of Textile plugins (?) for different languages. English is fine as a globalization lingua franca, but preserving traditional rules is an intellectual challenge par excellence (or am I just being too conservative?).


Who’s gonna textdrive you home tonight?

Offline

#18 2004-02-24 20:36:32

mamash
Member
From: Prague
Registered: 2004-02-21
Posts: 128
Website

Re: Textile Internalization

Well, at least you can say “we are using German quotes”. It sounds a little different from when we (the Czech) say “we are using German quotes” – for we do :)))


Who’s gonna textdrive you home tonight?

Offline

#19 2004-02-24 22:02:31

RickCogley
New Member
Registered: 2004-02-24
Posts: 3

Re: Textile Internalization

Japanese also has its own style of quotes, which look like (hopefully):

「TEST」

… and there are “quote like” japanese charset representations of standard western quotes, as well, which do not show up if you cannot view the japanese charset.


Rick Cogley :: Tokyo Japan

Offline

#20 2004-02-25 07:41:38

pospel
Member
From: Ukraine
Registered: 2004-02-23
Posts: 40
Website

Re: Textile Internalization

> mamash wrote:

> 2Pospel: The beauty of Unicode (UTF8) is that is allows for virtually any language coding. In a decent modern browser (including Netscape4), UTF8 is supported and allows to work with Russian etc. languages just fine.

I wish it work as you tell :) But if i leave textpattern charsets unchenged using it’s utf8 the almost 1/3 of all symbols displays wrong and plenty of text is just “eaten” by utf8 conversion functions. So, verdict – it doesn’t work correctly with cyrillic letters using utf-8.

new:
It’s seems that in admin mode while editing and reediting articles, they display correct, but in site view plenty of text messy and just gone. Also, title or articles is distort and randomly trimmed in both admin mode and live site view.

Last edited by pospel (2004-02-25 07:53:13)

Offline

#21 2004-02-25 09:36:48

mamash
Member
From: Prague
Registered: 2004-02-21
Posts: 128
Website

Re: Textile Internalization

So the problem obviously lies in the quality of PHP encoding functions. UTF8 works in general, believe me. No problem mixing English, Russian, Arabic and Japanese on the same page. ;)


Who’s gonna textdrive you home tonight?

Offline

#22 2004-02-25 09:53:26

Dean
Founder (Gone, but not forgotten)
From: Languedoc
Registered: 2004-02-14
Posts: 235
Website

Re: Textile Internalization

As far as Textile is concerned, it also depends if PHP has been built with mbstring functions. Otherwise, Textile is forced to rely on htmlentities(), which are dodgy at best.

If Txp is running on a machine using PHP as CGI (in many ways the preferred setup), then loading the mbstring libraries on the fly is possible.


text*

Offline

#23 2004-02-25 10:22:10

pospel
Member
From: Ukraine
Registered: 2004-02-23
Posts: 40
Website

Re: Textile Internalization

Dean, I think, i have figured the problem.

In method Textilethis() we have text preproccessing functions:

$text = $this->incomingEntities($text);
$text = $this->encodeEntities($text);

and in encodeEntities()

function encodeEntities($text)
{ return (function_exists(‘mb_encode_numericentity’)) ? $this->encode_high($text) : htmlentities($text, ENT_NOQUOTES, “utf-8”);
}

so the problem is php function htmlentities() – converting non-iso characters is buggy. Quick quotaion from php manual comments:

———

If you ever plan to support more than ISO8859-1 characters, you should stop using htmlentities() right now. Instead, you should:

1) Use htmlspecialchars() whereever you use htmlentities()
2) Add the character encoding you use to your html file:

<meta http-equiv=“Content-Type” content=“text/html; charset=iso-8859-1”>

This will make sure that all characters reach the end-user of your webpages unharmed. When you change to UTF-8, use the mb_strlen instead of strlen etc., make sure your text is in utf-8 everywhere, and use this one instead in the <head> section of your html file:

<meta http-equiv=“Content-Type” content=“text/html; charset=iso-8859-1”>

In case you use XHTML instead of old HTML, please note that the character encoding is also specified in the xml header. Replace this:

<?xml version=“1.0” encoding=“iso-8859-1”?>

with this:

<?xml version=“1.0” encoding=“utf-8”?>
——

read more here.

When i commented this $text = $this->encodeEntities($text), all look ok.

So, we need custom charset definition and ability to switch off htmlentities() conversion.

Last edited by pospel (2004-02-25 10:24:16)

Offline

#24 2004-02-25 10:28:14

Dean
Founder (Gone, but not forgotten)
From: Languedoc
Registered: 2004-02-14
Posts: 235
Website

Re: Textile Internalization

Yes, but htmlentities() is there for the (not exactly edge) cases where those writing vanilla Western-European text will benefit from having high-ascii characters converted to entities when there’s no mbstring functions to rely on. As I say, it’s a flawed solution.

I do, as part of a larger Textile internationalisation effort, need to implement a ‘clean utf-8’ option for Textile. I will. Honest.


text*

Offline

#25 2004-02-25 10:40:06

pospel
Member
From: Ukraine
Registered: 2004-02-23
Posts: 40
Website

Re: Textile Internalization

We hope so and will help you as much as we can :)

Besides, there are lib iconv() for correct converting characters to varios encodings, but it is a module, whicj need to be compiled with php and not commonly available on php hostings. Though, you can include it in sourcecode and distribute with Textpattern.

Offline

#26 2004-06-29 19:31:47

Denbo
Member
From: Russia
Registered: 2004-06-27
Posts: 12
Website

Re: Textile Internalization

mmm. I have a problem: when I commented

$text = $this->encodeEntities($text)

nothing changed and my russian text look ok on browser but not correct in source code :-(

example: http://minimal.ru/txp/

Can you help me?

Last edited by Denbo (2004-06-29 19:32:40)

Offline

#27 2004-06-29 19:43:50

jdueck
Plugin Author
From: Minneapolis, MN
Registered: 2004-02-27
Posts: 147
Website

Re: Textile Internalization

“not correct in source code” – what does that mean?

Offline

#28 2004-06-29 19:47:52

Hans
Member
From: Everywhere
Registered: 2004-03-07
Posts: 99
Website

Re: Textile Internalization

I’d say it looks pretty Russian to me, on the site and in the source code… at minimal.ru/txp/, of course.


Lumilux – A Photoblog

Offline

#29 2004-06-29 20:27:11

mamash
Member
From: Prague
Registered: 2004-02-21
Posts: 128
Website

Re: Textile Internalization

The site surely displays fine in Russian. Possible interpretation of your problem:

  1. your article titles in permanent links are garbled: there is no workaround for this, you will either have to disable title attaching (see the admin_config.php file), or use the ‘URL-only title’ feature on your TXP ‘write’ tab
  2. you use Internet Explorer which uses Notepad to display the source code and it’s not configured to display Unicode text properly (quite likely on Win98): use some another editor which fully supports Unicode and also check that you have selected a reasonable font in your editor (this is something that doesn’t really have anything to do with TXP)

Last edited by mamash (2004-06-29 20:27:46)


Who’s gonna textdrive you home tonight?

Offline

Board footer

Powered by FluxBB