Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2005-09-26 22:57:06

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

[issue] Special HTML characters (<, >, &) in article titles

Latest SVN revision: Special HTML characters (<, >, &) in article titles are not properly escaped. As a result, the output of <txp:page_title /> is invalid XHTML.

PS. Please use this instead of htmlspecialchars() to escape special HTML characters without mangling the non-ASCII characters of a string.

Offline

#2 2005-09-27 00:58:03

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] Special HTML characters (<, >, &) in article titles

Special HTML characters (<, >, &)

..yet the function you suggest doesn’t touch ‘&’.

Can you elaborate on the cases this is intended to fix please?

Last edited by zem (2005-09-27 01:07:16)


Alex

Offline

#3 2005-09-27 12:07:11

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

> zem wrote:

> ..yet the function you suggest doesn’t touch ‘&’.

Indeed… and this also affects a patch I’ve sent to the dev-list. But you get the idea.

(trying to be more elaborate….)

For example, and article is titled “a > b is a conditional”, and <txp:page_title separator=" :: " /> outputs:

<title> My Site Name :: a > b is a conditional </title>

which is invalid XHTML. The expected output would be:

<title> My Site Name :: a &amp;gt; b is a conditional </title>

Last edited by Etz Haim (2005-09-27 12:24:03)

Offline

#4 2005-09-27 14:07:37

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] Special HTML characters (<, >, &) in article titles

I checked in some code that I think helps here, though it’s still not clear to me (or perhaps anyone) exactly what characters can and can’t be in article titles.


Alex

Offline

#5 2005-09-27 14:54:15

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

> zem wrote:

> it’s still not clear to me (or perhaps anyone) exactly what characters can and can’t be in article titles.

Indeed. If it helps, I think that quotes should are correctly escaped, in case the title is used inside a HTML attribute. Ampersands should also be escaped, and it was my mistake I hadn’t noticed the function I suggested doesn’t touch them.

Offline

#6 2005-09-27 23:00:58

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] Special HTML characters (<, >, &) in article titles

Ampersands should also be escaped, and it was my mistake I hadn’t noticed the function I suggested doesn’t touch them.

I’m not sure that is the case. I can find article titles containing numeric entities, but none containing raw ampersands.

Anyone have a counterexample?


Alex

Offline

#7 2005-09-28 11:32:35

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

> zem wrote:

> I’m not sure that is the case. I can find article titles containing numeric entities, but none containing raw ampersands.

Yes you are right. But I’ve found article titles generated by <txp:title /> that contain ><‘s (using the latest SVN, of course, on a test site on TXD).

Last edited by Etz Haim (2005-09-28 11:35:35)

Offline

#8 2005-10-07 13:39:01

igner
Plugin Author
Registered: 2004-06-03
Posts: 337

Re: [issue] Special HTML characters (<, >, &) in article titles

Simply encoding all “<” and “>” creates issues with article titles that include markup (such as <em> or <strong>). Since article titles get rendered in the body, I think barring the use of markup in titles is overly restrictive. Previously this worked properly in the body, though the <title> included the tags (so I’d get page titles like The New Pornographers, <em>Mass Romantic</em>). Now the page title is correct, but the body is incorrect.

Personally, I’d be content with allowing a strict subset of tags in titles – <em>, <strong>, <span> seems reasonable. From a logical perspective, perhaps the best solution would be to exclude the “allowed” tags from the encoding when the title is written to the database, then strip them out entirely in the page title.

Thoughts?


And then my dog ate my badger, and the love was lost.

Offline

#9 2005-10-07 15:15:02

Etz Haim
Archived Plugin Author
From: Karlstad, Sweden
Registered: 2005-01-24
Posts: 262
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

What you’re suggesting is an abuse of HTML markup. The W3C validator will tell you that none of <em>, <strong>, <span> and the like can be embedded inside a <title> element.

Offline

#10 2005-10-07 15:48:00

igner
Plugin Author
Registered: 2004-06-03
Posts: 337

Re: [issue] Special HTML characters (<, >, &) in article titles

No, you misunderstand me.

exclude the “allowed” tags from the encoding when the title is written to the database, then strip them out entirely in the page title.

A single escaping scheme won’t work, since the <em>article title</em> gets used in two different elements depending on context: the <title> and <body>. When the article title appears in the <body> element (in either article list or individual article mode), <em>, <strong>, etc. are perfectly valid within a title, but otherwise special characters still need to be escaped. When the article title gets appended to the <title> element, then strip out the otherwise acceptable tags entirely, and encode the remaining entities.

[Edited for typos and clarity]

Last edited by igner (2005-10-07 15:49:44)


And then my dog ate my badger, and the love was lost.

Offline

#11 2005-10-08 02:18:48

wet
Developer Emeritus
From: Vöcklabruck, Austria
Registered: 2005-06-06
Posts: 3,416
Website GitHub Mastodon

Re: [issue] Special HTML characters (<, >, &) in article titles

Wouldn’t it suffice if titles were entered with Textile markup as requested previously?

One would be able to use that aforementioned limited set of HTML markup (<em>, <strong>,..) without introducing actual need for escapement. Rev. 1002 introduces a per-article “use textile” preference which could be applied to titles as well in order to avoid undesired side effects.

Offline

#12 2005-10-08 04:07:37

zem
Developer Emeritus
From: Melbourne, Australia
Registered: 2004-04-08
Posts: 2,579

Re: [issue] Special HTML characters (<, >, &) in article titles

Wouldn’t it suffice if titles were entered with Textile markup as requested previously?

That won’t help with existing page titles.


Alex

Offline

#13 2005-10-08 11:36:06

wet
Developer Emeritus
From: Vöcklabruck, Austria
Registered: 2005-06-06
Posts: 3,416
Website GitHub Mastodon

Re: [issue] Special HTML characters (<, >, &) in article titles

That won’t help with existing page titles.

Obviously true. Shouldn’t be an obstacle to introduce this from now on. I strongly believe that Textile (or any other markup generator) is generally preferrable over HTML markup, as far as content authors are concerned.

Offline

#14 2005-10-27 20:20:30

maniqui
Member
From: Buenos Aires, Argentina
Registered: 2004-10-10
Posts: 3,070
Website

Re: [issue] Special HTML characters (<, >, &) in article titles

I want to share a little trick that can be useful for including HTML tags in article titles.
<a href=“http://forum.textpattern.com/viewtopic.php?pid=83109#p83109”>using HTML tag in article title and stripping it in browser title tag</a>

Please, let me know if this work for you.

BTW, I want to ask something: when I include an ampersand in the article title, I do it by typing <code>&amp;amp;</code>.
Then, I save my article.
But if I’m going to edit the article again, in the article title field, the <code>&amp;amp;</code> has been removed and simple replace by &amp;.

The problem is: if I dont notice that my article title has been modified and save the article without correcting again the ampersand, then my site will ouput invalid code (unescaped ampersands).

Last edited by maniqui (2005-10-27 20:21:57)


La música ideas portará y siempre continuará

TXP Builders – finely-crafted code, design and txp

Offline

#15 2005-11-05 14:03:25

loid
Member
Registered: 2005-03-09
Posts: 38

Re: [issue] Special HTML characters (<, >, &) in article titles

Up until 4.01, or possibly even 4.02, I used page break and italic/emphasis mark-up in article titles without problem. Now that shows up in the headlines (article titles) instead of creating a page break or italic text.

Last edited by loid (2005-11-05 14:04:22)

Offline

Board footer

Powered by FluxBB