Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2005-09-26 22:57:06

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

[issue] Special HTML characters (<, >, &) in article titles

Latest SVN revision: Special HTML characters (<, >, &) in article titles are not properly escaped. As a result, the output of <txp:page_title /> is invalid XHTML.

PS. Please use this instead of htmlspecialchars() to escape special HTML characters without mangling the non-ASCII characters of a string.

Offline

#2 2005-09-27 00:58:03

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] Special HTML characters (<, >, &) in article titles

Special HTML characters (<, >, &)

..yet the function you suggest doesn’t touch ‘&’.

Can you elaborate on the cases this is intended to fix please?

Last edited by zem (2005-09-27 01:07:16)


Alex

Offline

#3 2005-09-27 12:07:11

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

> zem wrote:

> ..yet the function you suggest doesn’t touch ‘&’.

Indeed… and this also affects a patch I’ve sent to the dev-list. But you get the idea.

(trying to be more elaborate….)

For example, and article is titled “a > b is a conditional”, and <txp:page_title separator=" :: " /> outputs:

<title> My Site Name :: a > b is a conditional </title>

which is invalid XHTML. The expected output would be:

<title> My Site Name :: a &amp;gt; b is a conditional </title>

Last edited by Etz Haim (2005-09-27 12:24:03)

Offline

#4 2005-09-27 14:07:37

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] Special HTML characters (<, >, &) in article titles

I checked in some code that I think helps here, though it’s still not clear to me (or perhaps anyone) exactly what characters can and can’t be in article titles.


Alex

Offline

#5 2005-09-27 14:54:15

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

> zem wrote:

> it’s still not clear to me (or perhaps anyone) exactly what characters can and can’t be in article titles.

Indeed. If it helps, I think that quotes should are correctly escaped, in case the title is used inside a HTML attribute. Ampersands should also be escaped, and it was my mistake I hadn’t noticed the function I suggested doesn’t touch them.

Offline

#6 2005-09-27 23:00:58

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] Special HTML characters (<, >, &) in article titles

Ampersands should also be escaped, and it was my mistake I hadn’t noticed the function I suggested doesn’t touch them.

I’m not sure that is the case. I can find article titles containing numeric entities, but none containing raw ampersands.

Anyone have a counterexample?


Alex

Offline

#7 2005-09-28 11:32:35

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

> zem wrote:

> I’m not sure that is the case. I can find article titles containing numeric entities, but none containing raw ampersands.

Yes you are right. But I’ve found article titles generated by <txp:title /> that contain ><‘s (using the latest SVN, of course, on a test site on TXD).

Last edited by Etz Haim (2005-09-28 11:35:35)

Offline

#8 2005-10-07 13:39:01

igner
Plugin Author
Registered: 2004-06-03
Posts: 337

Re: [issue] Special HTML characters (<, >, &) in article titles

Simply encoding all “<” and “>” creates issues with article titles that include markup (such as <em> or <strong>). Since article titles get rendered in the body, I think barring the use of markup in titles is overly restrictive. Previously this worked properly in the body, though the <title> included the tags (so I’d get page titles like The New Pornographers, <em>Mass Romantic</em>). Now the page title is correct, but the body is incorrect.

Personally, I’d be content with allowing a strict subset of tags in titles – <em>, <strong>, <span> seems reasonable. From a logical perspective, perhaps the best solution would be to exclude the “allowed” tags from the encoding when the title is written to the database, then strip them out entirely in the page title.

Thoughts?


And then my dog ate my badger, and the love was lost.

Offline

#9 2005-10-07 15:15:02

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

What you’re suggesting is an abuse of HTML markup. The W3C validator will tell you that none of <em>, <strong>, <span> and the like can be embedded inside a <title> element.

Offline

#10 2005-10-07 15:48:00

igner
Plugin Author
Registered: 2004-06-03
Posts: 337

Re: [issue] Special HTML characters (<, >, &) in article titles

No, you misunderstand me.

exclude the “allowed” tags from the encoding when the title is written to the database, then strip them out entirely in the page title.

A single escaping scheme won’t work, since the <em>article title</em> gets used in two different elements depending on context: the <title> and <body>. When the article title appears in the <body> element (in either article list or individual article mode), <em>, <strong>, etc. are perfectly valid within a title, but otherwise special characters still need to be escaped. When the article title gets appended to the <title> element, then strip out the otherwise acceptable tags entirely, and encode the remaining entities.

[Edited for typos and clarity]

Last edited by igner (2005-10-07 15:49:44)


And then my dog ate my badger, and the love was lost.

Offline

#11 2005-10-08 02:18:48

wet
Developer Emeritus
From: Schoerfling, Austria
Registered: 2005-06-06
Posts: 3,330
Website Mastodon

Re: [issue] Special HTML characters (<, >, &) in article titles

Wouldn’t it suffice if titles were entered with Textile markup as requested previously?

One would be able to use that aforementioned limited set of HTML markup (<em>, <strong>,..) without introducing actual need for escapement. Rev. 1002 introduces a per-article “use textile” preference which could be applied to titles as well in order to avoid undesired side effects.

Offline

#12 2005-10-08 04:07:37

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] Special HTML characters (<, >, &) in article titles

Wouldn’t it suffice if titles were entered with Textile markup as requested previously?

That won’t help with existing page titles.


Alex

Offline

Board footer

Powered by FluxBB