Textpattern Forum

You are not logged in. Register | Login | Help

#1 2011-07-21 15:42:42

frickinmuck
Member
From: Vancouver, BC
Registered: 2008-05-01
Posts: 105
Website

Strip inline CSS from articles?

This seems like a long shot, but I want my client to be able to use simple HTML in articles, but they have one person working for them who is constantly messing with things to make them look really garish. Is there any way I can prevent her from using inline CSS? Ideally I’d like to only allow certain HTML tags and certain attributes, but alas… one can dream…


The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

Offline

#2 2011-07-21 15:47:39

frickinmuck
Member
From: Vancouver, BC
Registered: 2008-05-01
Posts: 105
Website

Re: Strip inline CSS from articles?

Alternately, if anyone knows of a plugin that makes Textile easier to use in the write window, I could force them to use it. * cackle *

Never mind, of course Textile wouldn’t stop them using HTML. DUH.

Last edited by frickinmuck (2011-07-21 15:51:48)


The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

Offline

#3 2011-07-21 19:03:09

jakob
Moderator
From: Germany
Registered: 2005-01-20
Posts: 1,943
Website

Re: Strip inline CSS from articles?

There’s rah_textile_bar which in turn is based on hak_textile_tags.


TXP Builders – finely-crafted code, design and txp

Offline

#4 2011-07-21 21:52:10

Gocom
Developer
Registered: 2006-07-14
Posts: 4,476
Website

Re: Strip inline CSS from articles?

You could try to build some sort of plugin that checks submitted data every time article is saved, and strips all the HTML isn’t needed.

Doing so is probably pretty easy, apart from the HTML filtering. Basic stripping can be done with strip_tags which runs some basic regex. The problem comes if the post has code blocks, like code examples shown to visitors if the contents is checked before Textiling. That can be fixed by checking the content after parsing of Textile. What strip_tags can’t do is attribute filtering, and it seriously doesn’t like invalid HTML. If further filtering is neeeded, then you will probably need more complex HTML node parser. Tho it’s also possible with regex, but not likely exactly ideal.

Let’s give a small example. By hooking to article, step edit event we can check the contents of the article easily after saving. In the example style attributes are removed using PHP: Dom. It would have to be installed to the code to work.

/**
	Run xxx_strip function after the article has been
	saved - um, xxx_strip sounds nasty.
*/

	register_callback('xxx_strip','article','edit',0);
	register_callback('xxx_strip','article','publish',0);
	register_callback('xxx_strip','article','create',0);
	register_callback('xxx_strip','article','save',0);

/**
	Our function, xxx_strip, which was called by the
	register_callback
*/

	function xxx_strip() {

		/*
			Get the ID of the saved article.
			It's in $ID global if just created,
			in POST data otherwise
		*/

		$id = isset($GLOBALS['ID']) && !empty($GLOBALS['ID']) ? $GLOBALS['ID'] : ps('ID');


		/*
			Make sure the submit is valid,
			check form token. If not end here.
		*/

		if(!$id || (function_exists('form_token') && ps('_txp_token') != form_token()))
			return;

		/*
			Get contents of the parsed body content
		*/

		$body = fetch('Body_html', 'textpattern', 'ID', $id);

		/*
			If we don't have body, don't do
			anything
		*/

		if(!$body)
			return;

		/*
			Remove style attributes from body
			using DOM
		*/

		$dom = new DOMDocument;

		/*
			Load body as the HTML.
		*/

		$dom->loadHTML($body);

		/*
			Create new XPath
		*/

		$xpath = new DOMXPath($dom);

		/*
			Find all nodes with style attribute
		*/

		$nodes = $xpath->query('//*[@style]');

		/*
			Go true the nodes and remove all
			the attributes
		*/

		foreach($nodes as $node) {
			$node->removeAttribute('style');
		}

		/*
			Return the cleaned up HTML
			to body
		*/

		$body = $dom->saveHTML();

		/*
			Save our new body
		*/

		safe_update(
			'textpattern',
			"Body_html='".doSlash($body)."'",
			"ID='".doSlash($id)."'"
		);
	}

Alternatively if you don’t have access to DOM package, you could probably use HTML purifier. Could be that such library is one day included with Textpattern too ;-). Also plain old regular expressions can be used. Simpler regular expressions may fail when encountering more complex usage cases, keep that in mind.

For example something like this could be used to replace the HTML parsing done by DOM. Just one line of regular expression.

$body = preg_replace('/(<[^>]+) style=".*?"/i', '$1', $body);

Giving us a full code like this:

/**
	Run xxx_strip function after the article has been
	saved - um, xxx_strip sounds nasty.
*/

	register_callback('xxx_strip','article','edit',0);
	register_callback('xxx_strip','article','publish',0);
	register_callback('xxx_strip','article','create',0);
	register_callback('xxx_strip','article','save',0);

/**
	Our function, xxx_strip, which was called by the
	register_callback
*/

	function xxx_strip() {

		/*
			Get the ID of the saved article.
			It's in $ID global if just created,
			in POST data otherwise
		*/

		$id = isset($GLOBALS['ID']) && !empty($GLOBALS['ID']) ? $GLOBALS['ID'] : ps('ID');


		/*
			Make sure the submit is valid,
			check form token. If not end here.
		*/

		if(!$id || (function_exists('form_token') && ps('_txp_token') != form_token()))
			return;

		/*
			Get contents of the parsed body content
		*/

		$body = fetch('Body_html', 'textpattern', 'ID', $id);

		/*
			If we don't have body, don't do
			anything
		*/

		if(!$body)
			return;

		/*
			Remove style attributes from body
			using regular expressions
		*/

		$body = preg_replace('/(<[^>]+) style=".*?"/i', '$1', $body);

		/*
			Save our new body
		*/

		safe_update(
			'textpattern',
			"Body_html='".doSlash($body)."'",
			"ID='".doSlash($id)."'"
		);
	}

Keep in mind that all the code is just examples. Examples that are not tested at all. They should probably theoretically work, but I can’t guarantee anything. To run the code you would need to have to load it as a plugin. Plugin basics can tell you more about how to do that.


Rah-plugins | What? I’m a little confused… again :-) <txp:is_god />

Offline

#5 2011-07-21 22:36:55

Gocom
Developer
Registered: 2006-07-14
Posts: 4,476
Website

Re: Strip inline CSS from articles?

Run some tests and the code I wrote seems to work, both of them should. With the regex version an article with body contents of:

<span id="button" style="color: red; font: 26px 'Comic Sans';">Important announcement</span>. Comic sans is my favorite font, and I like to use anywhere, preferably with the base red text color. "Bacon strips":http://example.com.

Would become and be saved as:

<p><span id="button">Important announcement</span>. Comic sans is my favorite font, and I like to use anywhere, preferably with the base red text color. <a href="http://example.com">Bacon strips</a>.</p>

Just to point out that as the modifications are done only once when the article is saved, it causes no performance impact on the public-side of the site.


Rah-plugins | What? I’m a little confused… again :-) <txp:is_god />

Offline

#6 2011-07-21 22:48:00

frickinmuck
Member
From: Vancouver, BC
Registered: 2008-05-01
Posts: 105
Website

Re: Strip inline CSS from articles?

See? This is exactly why I love Textpattern so much. Every foolish dream I have regarding functionality turns out to be possible. It’s amazing. Thanks so much!


The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

Offline

#7 2011-07-23 16:18:11

johnrynne
Member
Registered: 2005-07-20
Posts: 5

Re: Strip inline CSS from articles?

frickinmuck wrote:

Alternately, if anyone knows of a plugin that makes Textile easier to use in the write window, …

Can’t imagine how Textile could be made “easier” – it’s already a cinch, IMO.

John

Offline

#8 2011-07-25 19:45:27

frickinmuck
Member
From: Vancouver, BC
Registered: 2008-05-01
Posts: 105
Website

Re: Strip inline CSS from articles?

I just meant, something that gives Rich Text Editor-like features to the interface for clients who don’t bother learning new things. Sort of like hak_textile_tags, only for the write window.


The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

Offline

#9 2011-07-25 21:36:25

jakob
Moderator
From: Germany
Registered: 2005-01-20
Posts: 1,943
Website

Re: Strip inline CSS from articles?

Sort of like hak_textile_tags, only for the write window.

erm, is that not what I posted to further up. Take another look at rah_textile_bar.


TXP Builders – finely-crafted code, design and txp

Offline

#10 2012-04-24 19:53:20

etc
Plugin Author
Registered: 2010-11-11
Posts: 1,268
Website

Re: Strip inline CSS from articles?

You could try etc_dom_query to strip stuff on the fly from already published articles. Just replace (in the article form) <txp:body /> by

<txp:etc_dom_query html='<txp:body />' remove="//@style" />

to remove inline styles. You can do much more with more elaborated xpath.

To reduce the server load, you can wrap it in some caching plugin.

Last edited by etc (2012-04-24 20:16:07)


etc_[ query | search | pagination | date | tree | url ]

Offline

Board footer

Powered by FluxBB