Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2012-10-10 09:14:27

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,450
Website GitHub

The multi-lingual Textpattern experience

Following on from the MLP thread, this is a place to discuss both the technical, administrative, and workflow characteristics of what a multi-lingual Textpattern experience should be.

Turning Textpattern from being able to publish in one language to being able to publish in many — and the core vs plugin debate that ensues — should be a priority. While we encourage people to step in and offer direction, advice and scenarios such that the technical merits of each can be discussed, this is not a soap box for “MLP should be core”. Arguably, some aspects of the current MLP could become a core feature or at very least be a lightweight plugin, while others (like snippets for example) may be deemed more of a bolt-on module. A multi-lingual interface is perhaps not something to force upon everyone (or perhaps it is, if it’s done right), but it should definitely be easier for a plugin to offer such a feature than it currently is.

Some of the problems in the current MLP are simply that it pre-dates a lot of good stuff in core. In the light of Textpacks, for example, snippets could be completely re-engineered with a fraction of the code. With the addition of an ‘owner’ for strings (planned in 4.6) this is simplified even further. A lot of great stuff has been put into Txp 4.5 (such as the pub/sub hub) which something like MLP could seriously take advantage of.

The major roadblocks and opportunities I can see after spending days in the guts of the plugin code are:

  1. Hacking txplib_db.php to offer language marker injection is inexcusable. We should look to augment the existing safe_* routines with perhaps:
    1. A conjoined query construction API, such as those offered by Escher.
    2. A callback in safe_query that allows the query parts to be modified before submission (not sure if this would slow things down too much).
  2. Additional hooks on the admin side may be required to inject content. Most of this I feel is now done with pluggable UI and the Partials framework, it’s just that MLP doesn’t (currently) take advantage of them to the best of its ability.
  3. Better support internally for custom fields. This is already slated for 4.6+ and I have some ideas on how to make the core arrays more suitable for multi-lingual skulduggery.
  4. The whole ‘rendition’ idea of having a ‘master’ article and its translations is great, but if there was a way to make the tables easier to manage (for example, nesting articles using a ‘parent’ and adding a single ‘language’ column to the Textpattern table, instead of using a bunch of separate tables) then it should be assessed.
  5. The interface for managing and translating image and file metadata needs work.
  6. Look into interface improvements to make LTR/RTL more seamless.
  7. The ability for Textpattern to natively understand language markers as part of its URI scheme would take a lot of load off a plugin — whether this is possible is unexplored at present.
  8. Take advantage of the pub/sub hub and Textpacks as mentioned above.

Ideas, code fragments, and discussion on any of the above is welcome and encouraged. Textpattern has the power and flexibilty to offer this kind of functionality very, very easily and integrate it into the “Write first” workflow. It just needs some clever thinking. Please help us think cleverer!

Last edited by Bloke (2012-10-10 09:27:35)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#2 2012-10-10 12:43:43

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,091
Website GitHub Mastodon Twitter

Re: The multi-lingual Textpattern experience

I’m not using MLP but always felt that a stronger support for multi-lingual sites should eventually come to the to reflect the interculturality of this community. Having said that, here are some additions to the above.

Further to the interface for managing and translating image and file metadata it would be good to think about similar interfaces to translate sections, categories, authors, custom_fields (including excerpt, and keywords).


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#3 2012-10-10 12:54:22

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,450
Website GitHub

Re: The multi-lingual Textpattern experience

colak wrote:

similar interfaces to translate sections, categories, authors, custom_fields (including excerpt, and keywords).

Well, excerpts, keywords and CFs are covered by MLP now because you translate “an article” which encompasses all those things. As for sections and categories, I’m not sure what translation needs to occur. They’re both URL components so in theory are governed by the RFC about acceptable chars (although these are somewhat relaxed by browsers nowadays). Txp already (rightly or wrongly) converts ‘category’ into the local language in the URL so I suppose what you advocate is being able to specify that site.com/en/category/sheep and site.com/fr/catégorie/mouton are the same, right? Might be tricky to implement but it’s something to think about.

As for authors I don’t know. An author’s name is her name in any language isn’t it? Or have I missed the point?

Last edited by Bloke (2012-10-10 12:55:22)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#4 2012-10-10 13:36:53

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 3,196
Website

Re: The multi-lingual Textpattern experience

Bloke wrote:

An author’s name is her name in any language isn’t it? Or have I missed the point?

Oh, I don’t know… Take a bilingual En-Jpn site. On the Jpn side of things you’ll want to display the authors name with kanji & kanas (even for non-jpn authors), whereas on the En side, you’ll write the names with roman characters.

(still have to go through the rest of your initial post)


Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern

Offline

#5 2012-10-10 13:38:49

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,450
Website GitHub

Re: The multi-lingual Textpattern experience

phiw13 wrote:

On the Jpn side of things you’ll want to display the authors name with kanji & kanas (even for non-jpn authors), whereas on the En side, you’ll write the names with roman characters.

Fair enough. Consider it considered!


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#6 2012-10-10 18:09:28

makss
Plugin Author
From: Ukraine
Registered: 2008-10-21
Posts: 355
Website

Re: The multi-lingual Textpattern experience

Bloke wrote:

4 The whole ‘rendition’ idea of having a ‘master’ article and its translations is great, but if there was a way to make the tables easier to manage (for example, nesting articles using a ‘parent’ and adding a single ‘language’ column to the Textpattern table, instead of using a bunch of separate tables) then it should be assessed.

One of the options to support multiple languages in textpattern table:

Prefs:
1. Add some prefs `main_lang` (en,de,fr,…) – It’s default language for frontend.
2. Checkboxes determine which fields in textpattern table need to translate. (title, body/body_html, Excerpt/Excerpt_html, Keywords, custom_1, … custom_xx). For example, if the field contains a numeric value, then it does not need to translate. Store it in prefs as comma separated list.
3. Some interface for add/delete new language, store in prefs.

DB, table textpattern:
4. For every additional language adding a single ‘language’ column with prefix lang_, e.g. lang_de, lang_ru – type mediumtext or largetext

Article edit/store for additional language:
5. Build edit form with fields defined in #2 (javascript/ajax in write tab)
6. Store it in lang_ field as serialize($additionallang[])

Change article tag for support multilang:
If ( current_lang <> main_lang ) {
Get for current lang field lang_, unserialize() and override $thisarticle fields defined in #2
}

Possible it’s all.

Last edited by makss (2012-10-10 18:13:44)


aks_cron : Cron inside Textpattern | aks_article : extended article_custom tag
aks_cache : cache for TxP | aks_dragdrop : Drag&Drop categories (article, link, image, file)

Offline

#7 2012-10-10 19:21:16

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,450
Website GitHub

Re: The multi-lingual Textpattern experience

makss

Nice ideas, but wouldn’t we lose the ability to search articles if all the data was serialized in columns?

Some interface for add/delete new language, store in prefs.

This is handled by core in the Languages panel. With suitable hooks a multi-lingual plugin could be notified of additions/deletions that way, yes?


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#8 2012-10-11 04:51:45

wet
Developer Emeritus
From: Schoerfling, Austria
Registered: 2005-06-06
Posts: 3,330
Website Mastodon

Re: The multi-lingual Textpattern experience

Bloke wrote:

The whole ‘rendition’ idea of having a ‘master’ article and its translations is great, but if there was a way to make the tables easier to manage (for example, nesting articles using a ‘parent’ and adding a single ‘language’ column to the Textpattern table, instead of using a bunch of separate tables) then it should be assessed.

We will implement a “meta” store in 4.6 and use it for unlimited custom fields. If you abstract this feature it provides an apparatus to relate additional data to a single article. This additional data may as well be a rendition of an article’s property in another language.

How does MLP currently handle article feeds?

Offline

#9 2012-10-11 05:05:32

makss
Plugin Author
From: Ukraine
Registered: 2008-10-21
Posts: 355
Website

Re: The multi-lingual Textpattern experience

Bloke wrote:

Nice ideas, but wouldn’t we lose the ability to search articles if all the data was serialized in columns?

Need to check mysql search, I think that it will work.

This is handled by core in the Languages panel. With suitable hooks a multi-lingual plugin could be notified of additions/deletions that way, yes?

I do not know whether to strictly bind to the installed languages. Perhaps you can give the ability to use language without installing lang pack.

For fronend I usually use a simple table of languages.

name lang_en lang_de lang_ru
some_string English Русский
some_string2 English2 Deutsch2 Русский2

.

Load lang:

function aks_text_load(){
	global $aks_text;
	$aks_text = array();
	$lang = "lang_".get_current_lang();
	if( $rs = safe_rows_start("name, IF($lang='', lang_en, $lang) val", 'aks_text', '1=1') ){
		while ($a = nextRow($rs)) {
			$aks_text[$a['name']] = $a['val'];
		}
	}
}

If the string is not translated, it is taken from `lang_en`field. 2do: `lang_en` need replace to $prefs[‘main_lang’].


aks_cron : Cron inside Textpattern | aks_article : extended article_custom tag
aks_cache : cache for TxP | aks_dragdrop : Drag&Drop categories (article, link, image, file)

Offline

#10 2012-10-11 05:19:34

wet
Developer Emeritus
From: Schoerfling, Austria
Registered: 2005-06-06
Posts: 3,330
Website Mastodon

Re: The multi-lingual Textpattern experience

makss wrote:

Need to check mysql search, I think that it will work.

Providing a full text search for all article fields will lead to data leakage for sites which store sensitive information in custom fields. You’d basically have to mimic the current implementation for any translated field.

Offline

#11 2012-10-11 07:02:31

makss
Plugin Author
From: Ukraine
Registered: 2008-10-21
Posts: 355
Website

Re: The multi-lingual Textpattern experience

wet wrote:

Providing a full text search for all article fields will lead to data leakage for sites which store sensitive information in custom fields. You’d basically have to mimic the current implementation for any translated field.

Without additional overhead costs can not do in any case. I see two ways to resolve this issue.

Search in two phases:
  1. Bulk sql like on serialize lang_fields – it’s return limited data set
  2. PHP filter: unserialize() and strpos/preg_match on allowed fields

Another option – to stemming based on some lib like phpMorphy or other.

Create new table txp_search for storage meaningful and normalized forms of words.

Structure:

article_id
lang
normalize_text      mediumtext
.
UNIQUE KEY `uniq_idx` (article_id, lang),
KEY `article_id` (`article_id`)

normalize_text = normalize_func( join(' ', $data_from_all_allowed_fields_per_lang[] ) )

function normalize_func( $txt ){
	$out=array();
	$txt = strip_JS_CSS_HTML_noText($txt);     // all [^\w] - striped too
	$txt = strip_short_word($txt);   // <4 letters
	foreach( do_list($txt, ' ') as $word){
		$lang_part = get_lang_part($word);    // noun, verb, adjective, ...
		if( in_array($lang_part, $allowed_lang_part) ){
			$out[] = get_base_form_word( $word );
		}
	}
	return join(' ', array_unique($out) );
}

Full table txp_search on save_article or some action as rebuild_all_txp_search

On search:

$se_norm = get_base_form_words( all_search_words );
select article_id, lang from txp_search where normalize_text like(%....%)    //possible need join to textpattern table for check other condition (article status, category, section, ...)

Last edited by makss (2012-10-11 07:34:02)


aks_cron : Cron inside Textpattern | aks_article : extended article_custom tag
aks_cache : cache for TxP | aks_dragdrop : Drag&Drop categories (article, link, image, file)

Offline

#12 2012-10-11 07:12:47

wet
Developer Emeritus
From: Schoerfling, Austria
Registered: 2005-06-06
Posts: 3,330
Website Mastodon

Re: The multi-lingual Textpattern experience

makss wrote:

I see two ways to resolve this issue.

Please keep in mind that we need the matching terms in their surrounding context for <txp:search_result_excerpt />.

Offline

Board footer

Powered by FluxBB