Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Textile Internalization
- Encoding special chars into entities is generally a bad idea when used with non-Western scripts. I don’t see the point in entity coding, as long as correct charset declaration (UTF8) is intact. Two issues involved when using Textile:
Htmlspecialchars
destroys Czech (and possibly other) scriptsMb_encode_numericentity
is a safer way, though it only encodes some foreign characters, while leaving others intact; therefore it’s useless IMHO.
- The way Textile converts quotes and other characters is incompatible with non-English language and typography rules. For example:
- The Czech rules requires different „quoting characters“.
- „Nested quotes have to ‚differ‘ in Czech“.
- «Russian quotes» are another fine example.
Now, wouldn’t it be fine, if Textile had some some sort of language detection mechanism – allowing the user to select a default language (in Textpattern) and change to others using a LANG tag? This would trigger different rendering behaviour. The list of supported languages could be expanded by competent users, otherwise Textile would default to English rules.
Who’s gonna textdrive you home tonight?
Offline
Re: Textile Internalization
Not sure what you mean by “putting a lang parameter in the Textile class”, but Textile already recognizes a language setting tag. It doesn’t, however, trigger any special behavior (well, the browsers don’t really respond to it, either).
Some more food for thought: in Czech there are single-character prepositions that are not allowed to appear at the end of the line. Adding a non-breakable space is an easy solution to that. I’ve also been experimenting with word-hyphenation, but that would require referencing a hyphenation dictionary/database which would take more time than I’m currently willing to sacrifice.
My post was merely trying to suggest some possible future framework, as there is IMHO a great potential in turning Textile into a general typographical tool for virtually any language. Hmmm… is this sane enough, Dean? :)
Who’s gonna textdrive you home tonight?
Offline
Re: Textile Internalization
It surely is, and it raises much to think about. I want to do a lot of work on internationalising Textile. Let’s discuss it further when the other rhino has left the dinner table.
text*
Offline
Re: Textile Internalization
> mamash wrote:
** Htmlspecialchars
destroys Czech (and possibly other) scripts
Not only Czech, but any non iso-8859-1 encodings.
Offline
Re: Textile Internalization
htmlspecialchars() is surprisingly shitty for a PHP function – this is why Textile relies whenever possible on multibyte string functions. Mamash is right, however, in noting that mbstring encoding is only as effective as the map it is given – and the one currently made available on the Textile demo page is still decidedly (western) Euro-centric.
Though it’s still miles kilometres better than htmlspecialchars().
text*
Offline
Re: Textile Internalization
I agree.
Still I’m not convinced that entity encoding is necessary in general. I mean that’s what charset declarations are good for, right?
Which brings up another issue: programming for a Firebird/PostgreSQL database interface would probably be a good thing, since Unicode support in MySQL is still alpha.
Who’s gonna textdrive you home tonight?
Offline
Re: Textile Internalization
Believe me I’d love to remove entities from the equation altogether, but bad browsers are still stinking up the place.
Every interaction Txp has with mysql now involves vanilla sql passed through a single safe_query() function, thus I believe a port to postresql, if someone wants to do it, would be pretty easy to pull off.
text*
Offline
Re: Textile Internalization
Which “bad browsers” do you mean?
(Hm, maybe I should stop bugging you now, so that you could finally publish the gamma and allow me to start playing with it…)
Who’s gonna textdrive you home tonight?
Offline
#9 2004-02-24 07:23:29
- RickCogley
- New Member
- Registered: 2004-02-24
- Posts: 3
Re: Textile Internalization
Re Japanese:
- basic things seem to work.
- I have three articles, but search does not work.
- some textile markup does not work – i.e. text where text is some Japanese string.
Regards
Rick Cogley :: Tokyo Japan
Offline
#10 2004-02-24 07:27:10
- RickCogley
- New Member
- Registered: 2004-02-24
- Posts: 3
Re: Textile Internalization
Re Japanese: text searching is NOT working from the site main page, as of the first gamma.
However it DOES work from the admin area.
Rick Cogley :: Tokyo Japan
Offline
Re: Textile Internalization
I’ve changed utf-8 to windows-1251 in all “charset” places where it apear. Also i changed it all functions using encoding defined.
All seems work, but cyrillic text inputed is converted to their codes (&#symbol) and the code of a any page with russian letters apear to be overweighted. I guess, it can be easily patched, but i dont find yet where exactly. Maybe someone of community already done it?
Offline