Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#16 2005-11-05 14:19:53
- igner
- Plugin Author

- Registered: 2004-06-03
- Posts: 337
Re: [issue] Special HTML characters (<, >, &) in article titles
loid – It’s a matter of entity encoding, and has been discussed here and elsewhere.
And then my dog ate my badger, and the love was lost.
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
In this issue, 4.0.2 breaks backward compatibility.
I’m thinking loud:
For future releases
- if someone wants to use special characters (like <, >, &) in article titles, he/she should type it directly as an html entity (<code>&lt;</code>, <code>&gt;</code>, <code>&amp;</code>), and dont expect that TXP does the conversion if they are typed as <, > or &.
- allow the use of HTML elements in article titles (and in section/category titles). This adds more flexibility to TXP, and also doesnt break backward compatibility.
Finally, I quote myself:
<blockquote>when I include an ampersand in the article title, I do it by typing <code>&amp;</code>.
Then, I save my article.
But if I’m going to edit the article again, in the article title input field, the <code>&amp;</code> has been removed and simple replace by &.</blockquote>
If I input a <code>&amp;</code>, then, when re-editing the article, I get a & (not encoded, as I typed it before). My question: is this a “fault” of TXP, or the database or the browser?
I mean, who is the responsible of changing what I typed originally (<code>&amp;</code>) to another thing (&)?
Thanks.
Last edited by maniqui (2005-11-05 15:18:00)
Offline
#18 2005-11-05 21:47:51
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: [issue] Special HTML characters (<, >, &) in article titles
if someone wants to use special characters (like <, >, &) in article titles, he/she should type it directly as an html entity (<, >, &), and dont expect that TXP does the conversion if they are typed as <, > or &.
That’s the opposite of most user’s expectations. It’s different to the behaviour of other fields like excerpt and body. It means that we rely on the user for RSS and Atom validity. And there are many named entities that are valid XHTML but invalid in an XML feed.
Alex
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
maniqui wrote:
if someone wants to use special characters (like <, >, &) in article titles, he/she should type it directly as an html entity (<code>&lt;</code>, <code>&gt;</code>, <code>&amp;</code>), and dont expect that TXP does the conversion if they are typed as <, > or &.
As Zem said, we want the opposite. The simplest way. That’s if we want a >, we type a >. If that’s is not valid for some technico-geeky-mysterious reason, let the software encode it properly (should I remember that most user don’t know what is a XML boundary character, and more don’t care ?).
allow the use of HTML elements in article titles (and in section/category titles). This adds more flexibility to TXP, and also doesnt break backward compatibility.
If that doesn’t break anything, yep why not. I more enclined to allow Textile in title (I personnaly heavily use book title in article title, and miss the appropriate tag —??cite??).
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
> Etz Haim wrote:
<blockquote>For example, and article is titled “a > b is a conditional”, and<txp:page_title separator=" :: " /> outputs:
<title> My Site Name :: a > b is a conditional </title>
which is invalid XHTML.
</blockquote>
I have tested (using a 4.0.1, but sure it’s also the same in 4.0.2) that the above is valid XHTML. I didnt receive any error when using > or < in titles. I also included an &, but it was correctly escaped to an html numeric entity.
When I run the page through the validator.w3.org, it says: “This Page Is Valid XHTML 1.0 Strict!”
OK, I must admit I receive some warnings, but it doesnt means “invalid”, or it does?
Quoted from the Mark-up Validation Service:
<blockquote> Warning Line 14 column 43: character “<” is the first character of a delimiter but occurred as data.< title>A > B & C < D | < /title>
This message may appear in several cases:
- You tried to include the “<” character in your page: you should escape it as “<”
- You used an unescaped ampersand “&”: this may be valid in some contexts, but it is recommended to use “&”, which is always safe.
- Another possibility is that you forgot to close quotes in a previous tag.</blockquote>
The HTML Tidy for Firefox shows those nice green icons that read “0 errors / 0 warnings”
But it’s very possible that I’m wrong and that “warning” message in the validation service deserves more attention and respect.
Or maybe not, and it just means nothing dangerous, and we can keep using > and < in our page titles without the need of escaping them to html entities.
So, in the meanwhile, maybe HTML elements in article/section/category titles can be restored, and also, < and > can be used without escaping them.
If behaviour is reverted
- if someone wants to use < and >, he can use them and still have valid code.
- if someone wants to include HTML elements in article/section/category titles, he can also use them (but remember to follow the tip above to remove it from the < title> tag by using the sab_striphtml plugin).
Am I missing something? (I have no doubt I am missing something)
Last edited by maniqui (2005-11-08 12:18:07)
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
Yes, it is invalid. Just think what a combination of < and > could do, ie. in a more complex formula.
Last edited by Etz Haim (2005-11-08 09:05:21)
VC3 :: weblog :: my wishlist
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
Etz Haim wrote:
<blockquote>Yes, it is invalid. Just think what a combination of < and > could do, ie. in a more complex formula.</blockquote>when you say “a more complex formula”, are you talking about math formulae and logical formulae as the “a > b” you put as example in the first post?
My argument
if you need to write more complex formulae, surely you will need more symbols than simple <, > and =.
For sure, you will want to use formulae symbols like:
<big>*⊇ ≥ ≡ ≠ φ ∞ ƒ ↔*</big> and many other symbols.
Do you have those symbols in your keyboard? I suppose you dont, so, how will you put them in your article title?
You will do the same I did to display those symbols in this post:
you will type them as HTML entities:
<code>&supe; &ge; &equiv; &ne; &phi; &infin; &fnof; &harr;</code>
So, this is my logic: if you need to type those symbols as HTML entities, why wont you type the < and > also as HTML entities? wont you type them as entities because you have them in your keyboard?. Lazy boy! ;)
Well, as a second argument, I would ask: arent math/logical formulae out of the scope of a simple article title? I want to mean: arent formulae a very rare case in article titles? wont be the need of formulae in titles just to a very small niche of txp users?
Is it very common to have “a > b”-like titles in your posts? I dont know, I’m just asking.
One counterargument to my argument
Sure, many users will use > and < in a context that is not a math formulae.
Article titles like “It’s my birthday > buy me a present!” ¿are of common use? (I dont know).
If user wants to type > and < directly from the keyboard, they should accept they will have invalid code in their <code><title></code> and also in his site.
Finally, I ask, have we lost the power of using HTML elements in our article/section/category titles just for escaping <, > and &?
I’m asking standing at my own ignorance, because I dont know how dangerous could be to have a few unescaped characters in our code.
And there is no need to expect that TXP do the job of escaping those characters: just learn to type <code>&lt;</code>, <code>&gt;</code> and that’s all you need to know.
Thanks <small>and excuse my barbarian english and some lack of consistency behind my logic</small>
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
That’s why unicode is for. Title are supposed to handle complex glyph.
How to type them ? Well the same way you type them everywhere… if one needs help, upm_quicktags does this wonderfully.
Offline
#24 2005-11-13 23:31:07
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: [issue] Special HTML characters (<, >, &) in article titles
maniqui: it’s still not clear to me what you’re proposing. Send us some code that does what you’re suggesting and we’ll take a look at it.
Otherwise, I think this has strayed a long way from being a bug report.
Alex
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
> zem wrote:
> maniqui: it’s still not clear to me what you’re proposing. Send us some code that does what you’re suggesting and we’ll take a look at it.
Sorry, “pale face”, me not speak very good english ;)
I will try to summarize:
In TXP 4.0.1, it was posible to use HTML elements (acronym, strong, em, etc) in the article titles (and even in section/category titles). And that was a nice feature that now we dont have in 4.0.2.
Now (4.0.2), if you use HTML elements in titles, the < and > are escaped and so, you get article titles like “I love < abbr title=“Textpattern”>TXP< /abbr>”, when before (4.0.1) you got a nice title like “I love <abbr title=“Textpattern”>TXP</abbr>”.
The same applies to section/category titles. Before, you can use HTML elements in them, now you cant (and that was very very cool, because it added lot of flexibility… think: you can have spans or strong tags in the section title!)
So, as I see it, TXP have lost an interesting feature.
And as far as I understand, TXP lost that feature for one reason: to have valid content in the <code><title></title></code> tag.
You cannot use a > or a < in the <code><title></code>, but you can have <code>&lt;</code> or <code>&gt;</code>
And the problem started when someone used <code><txp:page_title /></code>.
In an individual article context, that txp tag will include the article title in the <code><title></code> tag.
So, if you have a title like “a > b”, you will get invalid code (a warning) if the “>” is not escaped in the <code><title></code>.
So, between 4.0.1 and 4.0.2, the change was to escape any > and < in article/section/category title. The result: now you have valid content in <code><title></code> but you cant use HTML elements in article/section/category titles. In 4.0.1, “out of the box”, if you use HTML elements you will get invalid content in title element, but that is why I wrote the tip about using sab_striphtml to strip HTML elements from any article/section/category title.
My thoughs/suggestion: how many users use < and > in the article titles? how many users writes formulae in article titles? I dont know, but I would bet they are just a minority.
Of course it’s nice to write formulae symbols in article titles, but if you want to do that, then manually encode your special characters. Write <code>&lt;</code> instead of typing <, write <code>&gt;</code> instead of > and etcetera.
That way, if you revert back to the behaviour of 4.0.1, you can use the HTML elements in article titles (and strip them from the <code><title></code>). Also, when you need to use special characters in titles (< > and many more) you type them as HTML entities references.
I think I have been not very clear again… ¿no?
Next time I will try to write no more than two paragraphs, I promise.
Thanks!
Offline
#26 2005-11-14 04:43:03
- zem
- Developer Emeritus

- From: Melbourne, Australia
- Registered: 2004-04-08
- Posts: 2,579
Re: [issue] Special HTML characters (<, >, &) in article titles
Sorry, “pale face”, me not speak very good english ;)
Your English isn’t the problem. Forgive my stubborness, I’m trying to find the “what” amongst the “why”.
TXP lost that feature for one reason: to have valid content in the <title></title> tag.
Don’t forget feeds. And remember, entities that are valid in HTML aren’t necessary valid in XML.
My thoughs/suggestion: how many users use < and > in the article titles? how many users writes formulae in article titles?
How many use HTML in article titles? (This I don’t know either, but I do know we received more reports about the lack of encoding prior to 4.0.2, than we have about the presence of it now)
Now, if you were arguing for Textile in article titles, we might be on to something.
Alex
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
zem wrote:
Now, if you were arguing for Textile in article titles, we might be on to something.
Hey, I made that request months ago !
;->
Last edited by Jeremie (2005-11-14 05:22:55)
Offline
Re: [issue] Special HTML characters (<, >, &) in article titles
Ok, after a frustrating time with an & character in my title. I found the following approach to work.
<code><title><txp:php>echo htmlspecialchars(page_title());</txp:php></title></code>
Am I missing something? Is there a good reason why this is bad? At least it validates now.
Aside: Oddly enough I found the solution ‘htmlspecialchars()’ from zem’s excellent “how to make a plugin” series, otherwise I’m a complete php dozer.
Last edited by mrdale (2006-03-24 19:04:59)
Offline
#29 2011-08-02 23:39:13
- wavesource
- Member

- From: Australia
- Registered: 2011-08-02
- Posts: 56
Re: [issue] Special HTML characters (<, >, &) in article titles
This is an old thread, but I found this solution useful, building on mrdale’s comment, allowing me to push HTML to a title in a specific form:
<txp:php>echo html_entity_decode(str_replace(‘ ’,’ ‘,title()));</txp:php>
Offline
#30 2011-12-29 23:23:20
- NyteOwl
- Member

- From: Nova Scotia, Canada
- Registered: 2005-09-24
- Posts: 539
Re: [issue] Special HTML characters (<, >, &) in article titles
Is this issue still afloat? I hacked my old 4.0.3 so that I could use ampersands in the article titles directly. (I don’t use Textile)
Obsolescence is just a lack of imagination. / 36-bits Forever! / #include <disclaimer.h>;
Offline