Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Public-side multi-lingual URLs
Some of the public URLs of Textpattern are affected by your admin-side language settings. If you switch to German, a category landing page could be rendered:
example.de/kategorie/artikel/my-article-title
or an author landing page:
example.de/autor/stef-dawson
This poses problems under multi-lingual installations as anyone changing admin-side language affects public side URLs. There are other places this happens too, with image, link, and file landing pages, as well as file_download
s.
There’s going to be a big drive to translate strings soon, and I’d like to start paving the way for multi-lingual support to be baked into core (not necessarily an actual multi-lingual interface — it’s too early to say whether this is workable — but at the very least the ability for a simple plugin to do it without the need for a massive hack like MLP). The rendering of URLs in local language is standing in the way of that, to some degree.
It affects the following strings in the language packs:
- category
- author
- file_download
- context_article
- context_image
- context_file
- context_link
We do have options:
- Force English for everyone (
example.de/category/image/das-pferd
). This affects every language with current strings that have translations in the list above. - Stick with what we have. There’s a current bug that means it doesn’t work if the
context_
entries are in mixed case, but I can fix that. This would mean retaining URLs of the formexample.de/kategorie/bild/das-pferd
, which is nicer for international users, with the caveat that the strings should never be altered in the Textpack, and you should never change admin side language. Assuming all languages have the above strings translated, this affects all installs that don’t currently have a translation for those strings. Such URLs will break in 4.6.0. - Permit both forms (English and International). This, however, leads to possible duplicate content, which is bad for SEO, and affects the same group of sites as option 2. Plus I’ve no idea how to implement it.
- A preference setting to control how URL generation is handled, i.e. which language to render them in, irrespective of admin-side language chosen. It must be a language that you have installed, and you should not change it once set, unless you plan to do some redirects to maintain link juice. This at least gives site authors a fallback to retain the current English behaviour of their site if their language has the above strings translated prior to release, or to move forward and choose a different language. It’s quite a bit of work to retrofit Textpattern to make all the relevant tags aware of the setting, so they can render the appropriate URLs, but may be worth considering.
- Something else…?
As much as it pains me to break backwards compatibility in such a grave manner, I propose we go with Option 1: all “special” URLs that trigger Txp list or context-sensitive behavior using the above strings will be rendered in English. Failing that, option 4, although that’s a lot of work and I could do with a hand from a willing PHP volunteer to make the modifications to the tags.
To be clear, none of the above would affect the future ability to render actual section or article names in other languages for multi-lingual sites, it only affects the following special URL structures:
example.org/category/your-cat
example.org/author/some-user
example.org/file_download/1/your-file.ext
example.org/category/article/your-cat
example.org/category/image/your-image-cat
example.org/category/file/your-file-cat
example.org/category/link/your-link-cat
You would still be able to do this just fine:
example.org/butterfly/my-article
example.org/papillon/mon-article
example.org/schmetterling/meine-artikel
So, given all the potential pitfalls and barriers the current system brings to moving forward with internationalisation, can anyone see any big problems, besides the loss of some nice aesthetics in the URLs, with dropping support for these strings in the Textpacks and switching to English? Or is the notion of a preference better?
Or has anybody got any better ideas that allows us to retain support for different language strings in special URLs and move forward with multi-linguality in core, without introducing duplicate content problems or otherwise restricting the front side language to match the admin-side language?
Please chime in with your thoughts. Thanks!
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
#2 2015-07-04 20:19:42
- GugUser
- Member
- From: Quito (Ecuador)
- Registered: 2007-12-16
- Posts: 1,473
Re: Public-side multi-lingual URLs
Hi Bloke
In witch context you can output, according to your fist example, this german URL “example.de/kategorie/artikel/my-article-title”? I’ve never seen that related to Textpattern.
Offline
Re: Public-side multi-lingual URLs
GugUser wrote #292571:
In witch context you can output, according to your fist example
If your admin-side is in German, that’s the URL structure that’s used. But you probably haven’t ever seen it because, until today (which is what sparked this whole post) the translation strings for category
and context_*
have not been done in the German Textpack. So it falls back on English. The /artikel
part is optional (and assumed if no other context is given), I just gave it as an example.
A kind member of the community submitted the German translation strings the other day and it broke their category-based site URLs! It hadn’t occurred to me that the system of using translated strings was so fragile, hence this post and the call to action.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Public-side multi-lingual URLs
It seems to me that an ideal presentation of the urls would be two part:
If only one language is being used on the site, then the url always reflects the chosen language for the front of the site, regardless of the language used on the admin side (though most often it probably is the same as the front).
Example – Single language site is either:
example.com/kategorie/artikel/guten-morgen
or
example.com/category/article/good-morning
If multiple languages are used front side (a preference to turn on/off?), then the url scheme would consistently use the domain name followed by the ISO two letter designation, then the url would be completed entirely in the chosen language.
Example: Language 1
example.com/de/kategorie/artikel/guten-morgen
and
Exmaple: Language 2
example.com/en/category/article/good-morning
If you roll multi-lingual friendliness into Textpattern, is it possible to make Textpattern behave this way?
If so, what might be the reasons not to do it this way? Are they compelling?
edit which as I re-read your original post, is essentially option 4.
It’s more work, but seems like a higher-end solution that would present well and make Textpattern more competitive in the world of multi-lingual CMSes.
Last edited by maverick (2015-07-04 21:28:50)
Offline
Re: Public-side multi-lingual URLs
Stef – did you ever have a chance to take a look at how Ionize handles multiple languages. It seems really well integrated and is fairly intuitive to use.
Last edited by maverick (2015-07-04 21:56:06)
Offline
Re: Public-side multi-lingual URLs
maverick wrote #292574:
It seems to me that an ideal presentation of the urls would be two part
You raise a good point, thanks, and this comes under the #5 “something else” category I mentioned in the OP. The current system is entirely broken because of the fact there is no language designator in the URL. You’re right that if that came into being, everything would work nicely with multi-lingual URLs.
When I scoped my plans for Txp’s multi-lingual ability I added precisely that URL schema, although my plan was to make it either partly or entirely optional, in the same way the context is optional if things are able to be determined by other means.
My original thinking was that if the person’s browser indicates they are a German user then — in the absence of any other information to the contrary — the /de
could be implied and they’d see the German content: no need to physically add the language designator in the URL.
If they subsequently switched language through choice, a cookie overrides the browser’s indicator and the site is delivered in the desired language. Even in that circumstance, the chosen language could in fact be omitted from the URL because it can be implied from the cookie.
Landing page content is different per language, so there’d be no problem with duplicate content, everything just works. If you share a link with someone who hasn’t yet set a language cookie, the fact that the article title / section name / category trigger is in the target language differentiates the desired language so that would allow us to show the intended content, even without the designator being present.
In all cases that I could think of, the language designator is superfluous. The only reason (as far as I can tell) to include it is as a visual hint to users which language they’re using, or a clue for dim-witted search engine spiders (although meta tags should be used to indicate language anyway, right?) If anyone has any thoughts on this, btw, I’d like to hear them. Maybe without a designator, RSS feeds would break? Not sure. Haven’t entirely thought it through.
My original plan, therefore was that the language designator URL scheme was entirely optional. If you chose it from Prefs, then it’d use it and show it. If, however, you preferred not to show it, the site could still be multi-lingual using /section/title
permlink scheme because of the way we organise content, and use multi-lingual trigger words. If, as I proposed in the OP, we went all English, that would be harder to achieve.
If only one language is being used on the site, then the url always reflects the chosen language for the front of the site, regardless of the language used on the admin side
The only way Txp knows which language to use on the front side, is through having the language(s) installed on the admin-side. So if only one language is installed, then by default you would always have the site strings in that language, regardless if you’d written your actual body text in Klingon.
If multiple languages are used front side (a preference to turn on/off?)
I’d rather the act of installing a language adds that language’s capability to the front-side than having to explicitly state you are intending to use them. Maybe that’s misguided, but the notion of which content you want to display on the front side should be down to your language-specific tags (such as the, currently fictional, <txp:language_list>
tag. If you specify a list of langs, only those would be shown in the dropdown so you could only (officially) see such content.
If you were to add a new language to the site after content in other languages had been created, in this way you could translate stuff behind the scenes without it appearing on the public side until you change your tag to include the new language designator: lang="en, de, es"
. It’d probably be possible to actually access any Live articles in the new language by directly entering the URL with the designator in it, but that’s no different to post-dating an article now: it’s still there, you just don’t tell anyone about it!
then the url would be completed entirely in the chosen language.
Yes, so Option 5 above becomes:
- Leave everything as it is now, live with the inconsistencies and potential pitfalls with multi-lingual trigger URLs, but move towards adding language designator support.
If that was the case, I’d vote for option 5 straight away.
EDIT: and no, not looked at Ionize. Thanks for the reminder. If you have time and you’d like me to share my multi-lingual vision document with you, I’d appreciate the feedback from someone who has used other systems to see if it stacks up and is doable (and, crucially that I’ve not missed anything).
Last edited by Bloke (2015-07-04 22:03:44)
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
#7 2015-07-04 22:08:55
- ax
- Plugin Author
- From: Germany
- Registered: 2009-08-19
- Posts: 165
Re: Public-side multi-lingual URLs
maverick wrote #292574:
If so, what might be the reasons not to do it this way? Are they compelling?
It seems to me that talking URLs are a concept of the past. Nowadays, most people probably never pay attention to URLs because often it is packed with tracking information and other stuff that is irrelevant to the user. Safari hides them anyway, and I wonder how many users would notice or even change that default setting. Therefore, localising URLs would be a waste of efforts.
Offline
Re: Public-side multi-lingual URLs
ax wrote #292577:
It seems to me that talking URLs are a concept of the past.
Hehehe, well that’s Apple’s view at least! Take away more control, layer so many things on top of the OS that you can’t get to the nuts and bolts to fix it when it breaks, thus the only thing to do is buy a new device. Being a hacker, that’s not what I subscribe to, as I like to know exactly what’s going on inside my hardware.
But anyway, that’s all very well and good until search engines go down, then URLs become important, in the same way that if DNS goes down, you can still get to the site you want by using its IP (well, if you’re sad like me and know one or two :-)
But if the URL doesn’t matter, why do search engines give it such high status? And why do news sites use keyword stuffing techniques in the Title? It might not be deemed important to the user, but to the search engine spiders — and hence (indirectly) users who use search engines to find content — it still plays a vital role… at least for now, until the next wave of algorithms kick in that can more accurately spider content from body context.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Public-side multi-lingual URLs
ax wrote #292577:
It seems to me that talking URLs are a concept of the past. Nowadays, most people probably never pay attention to URLs because often it is packed with tracking information and other stuff that is irrelevant to the user. Safari hides them anyway, and I wonder how many users would notice or even change that default setting. Therefore, localising URLs would be a waste of efforts.
My thoughts are similar to Stef’s on the value of the url to the user, but I understand your point. I even agree that for the typical end user the url is usually not even on their radar – at least past the TLD url, and even then google preempts more often than not.
That said, urls are not merely for the end user. A url remains the essential means of destination to a page, even if it is under the hood or behind the scenes. You’ve got to have something, so you might has well set it up to be organized, logical, and as clean and as humanly readable as possible.
Then you have a win-win for search engines and for those end users who care.
I’ve been involved recently in a project using Symphony CMS.
(Tangent: There are days I so wish there was a way to merge the Symphony and Textpattern projects. There’s so much overlap in philosophy, etc. Even some former Txp folks I recognize from long ago here on the forum.)
Anyway, back to my observation from Symphony. I’ve have begun to learn how much I don’t understand re: the role of urls, and how much page/data processing can be done using a url. To that end, good urls matter as well.
Offline
Re: Public-side multi-lingual URLs
Bloke wrote #292576:
this comes under the #5 “something else” category I mentioned in the OP.
I obviously misunderstood #4 then when I re-read it. I’ll re-read it again :)
although my plan was to make it either partly or entirely optional, in the same way the context is optional if things are able to be determined by other means.
That sounds reasonable.
On a personal note, the part of me that is overly organized likes the language designation. Everything in it’s proper place and all that. However, from a read-ability perspective, I prefer simple and clean / less is more. To that end I’d leave them out.
It seems to me that the best of both worlds is to build the base url using the language designation, then have a preference to “hide” it. Some sort of url remapping feature either allows you to manually set a language url and/or automagically replaces the actual url with a spiffy url sans the language designation.
The only way Txp knows which language to use on the front side, is through having the language(s) installed on the admin-side. So if only one language is installed, then by default you would always have the site strings in that language, regardless if you’d written your actual body text in Klingon.
Of course. Makes senses, and I sort of knew this already – if I had thought a bit longer about it. So that part is already baked in :)
I’d rather the act of installing a language adds that language’s capability to the front-side than having to explicitly state you are intending to use them.
Ionize has a check box for using a language on the front of the site. Not sure why you would add a language and not use it, but perhaps there are use cases for more languages in the back than the front. Or perhaps it’s Ionize’s way to allow you to prepare translations for publishing behind the scenes and keep them from being accessible until they are ready to go live.
If you were to add a new language to the site after content in other languages had been created, in this way you could translate stuff behind the scenes without it appearing on the public side until you change your tag to include the new language designator:
lang="en, de, es"
.
Make sense.
Yes, so Option 5 above becomes: # Leave everything as it is now, live with the inconsistencies and potential pitfalls with multi-lingual trigger URLs, but move towards adding language designator support.
That has my vote as well
If you have time and you’d like me to share my multi-lingual vision document with you, I’d appreciate the feedback from someone who has used other systems to see if it stacks up and is doable (and, crucially that I’ve not missed anything).
I’m interested and willing. I may not be the strongest candidate for feedback but will offer what I can. Disclaimer: The next three weeks are heavily booked. I should have more time towards the end of July.
My multilingual experience is recent. I’m part of a team asked to deploy a multilingual site with other needs that Textpattern wasn’t quite ready to handle, so I went looking for options. Ionize has a lot of Txp’s qualities, but handles multi-lingual out of the box. Unfortunately it doesn’t feel quite as mature as Txp and has an even smaller dev. team. Symphony does unlimited custom fields out of the box (every field is a custom field), and has multiple multi-lingual extensions. So we ended up going with Symphony. I’m in the middle of learning it.
Last edited by maverick (2015-07-04 23:29:42)
Offline
#11 2015-07-05 05:39:33
- ax
- Plugin Author
- From: Germany
- Registered: 2009-08-19
- Posts: 165
Re: Public-side multi-lingual URLs
Bloke wrote #292578:
But anyway, that’s all very well and good until search engines go down, then URLs become important
Get real. When should this happen? In an apocalyptic movie? Textpattern users will survive, thanks to their ingenious URLs?
But if the URL doesn’t matter, why do search engines give it such high status?
I seriously doubt that google ranks an image higher if it is referenced in it’s native language, and what is the native language of an image anyway?
And why do news sites use keyword stuffing techniques in the Title?
SEO? No. This does not affect articles. With permanent link mode set to /section/id/title
, it is all in your language, and beyond Textpattern’s control (well, except the id).
Offline
Re: Public-side multi-lingual URLs
For the (average) end-user, the URL is just a black box, except maybe for the domain name. Like Peter (Ax) I personally don’t think it is worth the effort to localise those special URL’s. And BTW, you can’t have nice category1
or category2
in your URL when you run TXP in CJK languages (and other non-roman languages, I think) [1]. So on that count, I’m leaning towards option 1 (that might mess up /category
URL’s on one of my sites, but I don’t worry too much about that, those URL’s are no-index
anyway; might need some htaccess magic for end users – bookmarked URL’s).
Now, if you are thinking about a truly multilingual site (same articles in multiple languages) then an URL-structure of the type example.com/de/category/category_name
probably might make some sense at organisation level. But you need to go further, and include example.com/jp/section/article-title
and so on. Apple.com uses that scheme at all levels.
PS – Here is another one that is messy for the end-user: run Textpattern in Japanese; for category
you’ll get URL like this /カテゴリ/foo/
(aka /category/foo
). Displays fine in the location bar. If you bookmark that page in Safari and have look at your bookmark manager, the url is displayed as is. Now try that in Firefox – you get /%E3%82%AB%E3%83%86%E3%82%B4%E3%83%AA/foo
Nice and useful eh?
I think search engines will show you /カテゴリ
though. But I seriously doubt they give much weight to having that part of the URL localised. What matters, for the URL, is the article-title
if present in the URL.
– – –
[1] What we do for a Japanese site is have the category_name
romanised (as in : ringo
with category_title
“林檎”) and the whole site runs on messy URL.
Where is that emoji for a solar powered submarine when you need it ?
Sand space – admin theme for Textpattern
Offline