Making plugins first-class citizens

etc · 2014-12-08 14:14:15

What makes me hesitate too, is that putting

if(strpos($thing, '<txp:') === false) return $thing;

at the very beginning of parse() (and the like), divides the runtime by 5 if $thing does not contain <txp:tags />. This optimization would not be possible with “short” tags.

colak · 2014-12-08 16:35:35

I’ve been following this thread since the beginning. Although I can understand the idea of fewer keystrokes I am wondering if this is just aimed for programmers who have to type all the time. In my view – although this would be a good update it would also mean that many plugins will just stop working (correct me if I am wrong).

Wouldn’t it be more productive for now to focus on real updates of the txp system such as extending the custom fields?

Bloke · 2014-12-08 16:53:36

colak wrote #286329:

it would also mean that many plugins will just stop working (correct me if I am wrong).

It will have no impact on the behaviour of existing plugins, aside from turbo-charging them with their very own else tag. This enables site admins to match up a plugin’s container with a plugin-prefixed else tag, which improves template readability. Aside from that, yes, less typing if you wish to take advantage of it, otherwise it’s business as usual.

The new syntax would merely be offered as an option for those that prefer it.

ruud · 2014-12-08 18:53:59

etc wrote #286328:

What makes me hesitate too, is that putting

if(strpos($thing, '<txp:') === false) return $thing;...

at the very beginning of parse() (and the like), divides the runtime by 5 if $thing does not contain <txp:tags />. This optimization would not be possible with “short” tags.

Using this, has the same effect in most cases:

if(strpos($thing, ':') === false) return $thing;

etc · 2014-12-08 19:11:32

Bloke wrote #286330:

turbo-charging them with their very own else tag.

only if the plugin is already <txp:else />-aware, right?

ruud wrote #286334:

Using this, has the same effect in most cases: if(strpos($thing, ':') === false) return $thing;...

Not if $thing contains http:, for example, which is rather common.

ruud · 2014-12-08 21:33:31

etc wrote #286335:

only if the plugin is already <txp:else />-aware, right?

Yes.

Not if $thing contains http:, for example, which is rather common.

It’s not that common in the simple if/else constructs that this optimisation targets. I’ve checked two websites (one of them being TXP.org). I could find only one case where the : optimisation would fail. In around 95% of situations where the optimisation should save time it works just fine.

etc · 2014-12-09 09:02:08

ruud wrote #286340:

It’s not that common in the simple if/else constructs that this optimisation targets.

Ruud, I really appreciate your argumentation, and you are certainly right re EvalElse() statistics. But look, I write about parse() optimization too. About 95% of article bodies contain "link":http://... (transformed into <a href="http://..."> by Textile), or some <image src="http://..." />, and no <txp:tag />. For these, optimizing parse() would make <txp:body /> processing much faster.

ruud · 2014-12-09 20:01:31

You’re right. Patching a patch:

- static $prefix = null;
+ static $istag, $prefix = null,  

  if ($prefix === null) {
-   $prefix = get_pref('allow_short_tags', '1') ? '[a-z]{3}' : 'txp';
+   if (get_pref('allow_short_tags', '1')) {
+     $prefix = '[a-z]{3}';
+     $istag = ':';
+   } else {
+     $prefix = 'txp';
+     $istag = '<txp:';
+   }
  }
+
+ if (strpos($thing, $istag) === false) return $thing;

Adding a parse attribute <txp:body parse="0" /> is another way to solve this, although you lose part of the speed increase by having to parse the attribute ;)

etc · 2014-12-10 11:59:52

Great! I would only make short tags opt-in, since it’s a new feature: if (get_pref('allow_short_tags', '0')) ...

ruud wrote #286383:

Adding a parse attribute <txp:body parse="0" /> is another way to solve this, although you lose part of the speed increase by having to parse the attribute ;)

No, we don’t want to lose a dime. Let’s win even more: why re-parse what is already parsed? When you process, say

<txp:article limit="9999">
	<txp:title />
</txp:article>

you call preg_split(... '<txp:title />' ...) 9999 times. Why not to do it once, and cache the result? How could this $thing inside <txp:article /> change anyway?

My current parse() running here looks like this (sorry for flooding):

function parse($thing='') {

	static $stack = array();
	if(strpos($thing, '<txp:') === false) return $thing;

//	don't parse $thing twice!!!

	$hash = sha1($thing);
	if(isset($stack[$hash])) $tags = $stack[$hash];
	else
	{
		$tags = array();
		$level  = 0;
		$inside = '';
		$istag  = FALSE;

		$f = '@(</?txp:\w+(?:\s+\w+\s*=\s*(?:"(?:[^"]|"")*"|\'(?:[^\']|\'\')*\'|[^\s\'"/>]+))*\s*/?'.chr(62).')@s';
		$t = '@:(\w+)(.*?)/?.$@s';

		$parsed = preg_split($f, $thing, -1, PREG_SPLIT_DELIM_CAPTURE);

		foreach ($parsed as $chunk)
		{
			if ($istag)
			{
				if ($level === 0)
				{
					preg_match($t, $chunk, $tag);

					if (substr($chunk, -2, 1) === '/')
						$tags[] = array('tag' => $tag[1], 'atts' => $tag[2], 'thing' => null);
					else
					{ # opening
						$level++;
					}
				}
				else
				{
					if (substr($chunk, 1, 1) === '/')
					{ # closing
						if (--$level === 0)
						{
							$tags[] = array('tag' => $tag[1], 'atts' => $tag[2], 'thing' => $inside);
							$inside = '';
						}
						else
						{
							$inside .= $chunk;
						}
					}
					elseif (substr($chunk, -2, 1) !== '/')
					{ # opening inside open
						++$level;
						$inside .= $chunk;
					}
					else
					{
						$inside .= $chunk;
					}
				}
			}
			else
			{
				if ($level) $inside .= $chunk;
				else $tags[] = $chunk;
			}
			$istag = !$istag;
		}
		$stack[$hash] = $tags;
	}

	$out = '';
	foreach($tags as $i => $tag) $out .= $i&1 ? processTags($tag['tag'], $tag['atts'], $tag['thing']) : $tag;

	return $out;
}

Same idea for EvalElse() and splat(). There is a risk of SHA1 collision, but none (unlike MD5) is known for the moment. The speedup on long loop structures is measurable, though preg_split() is not the slowest part of it. No particular memory consumption detected atm.

What you devs think of it?

ruud · 2014-12-10 20:41:01

Some benchmarks, each row with a different $thing string:

$thing = 'some very long text that does not contain an else tag';
$thing = '<txp:tag />';
$thing = '<txp:tag><txp:tag><txp:tag><txp:tag><txp:tag></txp:tag></txp:tag></txp:tag></txp:tag></txp:tag>';
$thing = '<txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag />';

	0.6327	0.5520	0.6313	0.9997	1.1809
	2.2664	1.3447	1.6456	2.8423	3.8831
	5.2651	0.9088	1.4774	3.7682	6.3109
	16.1752	8.4240	9.6645	14.2784	25.8554

column1: current parser (250.000 times)
column2: etc parser (250.000 loop)
column3: etc parser (10 loop, 25.000 times)
column4: etc parser (2 loop, 125.000 times)
column5: etc parser (no loop, 250.000 times)

I should add that I removed the first optimisation from the etc parser to be able to test just the effect of sha1 hashing.

Summarised. You gain quite a bit in loops, but you also lose a lot in non looping series of tags.

Bloke · 2014-12-10 21:29:03

ruud wrote #286407:

Summarised. You gain quite a bit in loops, but you also lose a lot in non looping series of tags.

Woah, there’s some great stuff in the etc parser, but that last case is a shame.

Open question then: is it possible to figure out through testing/benchmarking, the threshold at which the etc parser begins to degrade? Perhaps that would allow us to make an informed decision up-front via a simple test (even a tag count) on whether it’s beneficial to use the etc parser or the ruud-enhanced parser? Or a bit of both?

Just throwing it out there. I’ve no idea if that’s even feasible, but I’ve been looking at phpGolf today and I’m inspired by the lengths to which people will go, in order to find the most optimised (albeit in these cases unreadable) solution to a problem!

etc · 2014-12-10 22:25:10

ruud wrote #286407:

you also lose a lot in non looping series of tags.

Thanks for benchmarking! That’s not exactly what I get, but it could be server-dependent: for a string containing 1000 <txp:tag /> (more is very unlikely) parsed once I get ~0.012s for both. The most representative imho is column 3. I should add that processTags() is slightly optimized too, by replacing $out = $tag(splat($atts), $thing); inside if (maybe_tag($tag)) with

$out = $tag(ltrim($atts) === '' ? array() : splat($atts), $thing);

though this optimization shouldn’t intervene here, <txp:tag /> being fake.

ruud · 2014-12-11 07:56:07

etc wrote #286411:

Thanks for benchmarking! That’s not exactly what I get,

I’ve replaced processTags with a function that either calls parse() if there is a $thing or otherwise just returns. My aim was to test only the parse() function for this benchmark. I’m not sure why that last $thing test string is so slow, because the only difference should be the one sha1() call. I’ll try to find out tonight.

Bloke wrote #286408:

Open question then: is it possible to figure out through testing/benchmarking, the threshold at which the etc parser begins to degrade?

So far the threshold seems to be: are we looping yes or no? If there is a cheap test to see if we’re in a loop, that could work. @etc has given me some inspiration for further optimisation ;)

etc · 2014-12-12 17:17:44

ruud wrote #286422:

I’ve replaced processTags with a function that either calls parse() if there is a $thing or otherwise just returns.

That’s what I thought. Then the new parser will be slower, indeed, not because of sha1, but array assignments. But with the real processTags() the slowdown is of maybe 10%, and we gain ~40% in 10-loops (typical <txp:container /> limit).

@etc has given me some inspiration for further optimisation ;)

Great! I’m here if you need a pair of eyes. I’ve got a version that caches all tags (inside too) in one go, but the speed is roughly the same.

Last edited by etc (2014-12-12 17:27:58)

ruud · 2014-12-12 19:54:16

etc wrote #286474:

That’s what I thought. Then the new parser will be slower, indeed, not because of sha1, but array assignments.
But with the real processTags() the slowdown is of maybe 10%

The real processTags executes the corresponding taghandler function, which calls parse() when there is a $thing. I’m just skipping the taghandler function.

Great! I’m here if you need a pair of eyes. I’ve got a version that caches all tags (inside too) in one go, but the speed is roughly the same.

That’s exactly what I was testing yesterday. In my tests it’s up to 1.35 times faster (that’s without looping; 5.5 times with infinite looping) than the current TXP parser for complex tag structures [1] and a bit slower than the one you posted for simple situations. I haven’t tested this, but I wonder how this effects memory usage, because parsing all tags at once results in a hash table that contains a multiple of the original template size, due to tag nesting. On the other hand, most templates aren’t that big.

Have you run tests on an actual TXP install, compared to the original parser? How does this effect runtime in testing (or debug) mode?

[1] 5 levels of tag nesting with 5 self closed tags on the deepest level

Textpattern CMS support forum

#31 2014-12-08 14:14:15

Re: Making plugins first-class citizens

#32 2014-12-08 16:35:35

Re: Making plugins first-class citizens

#33 2014-12-08 16:53:36

Re: Making plugins first-class citizens

colak wrote #286329:

#34 2014-12-08 18:53:59

Re: Making plugins first-class citizens

etc wrote #286328:

#35 2014-12-08 19:11:32

Re: Making plugins first-class citizens

Bloke wrote #286330:

ruud wrote #286334:

#36 2014-12-08 21:33:31

Re: Making plugins first-class citizens

etc wrote #286335:

#37 2014-12-09 09:02:08

Re: Making plugins first-class citizens

ruud wrote #286340:

#38 2014-12-09 20:01:31

Re: Making plugins first-class citizens

#39 2014-12-10 11:59:52

Re: Making plugins first-class citizens

ruud wrote #286383:

#40 2014-12-10 20:41:01

Re: Making plugins first-class citizens

#41 2014-12-10 21:29:03

Re: Making plugins first-class citizens

ruud wrote #286407:

#42 2014-12-10 22:25:10

Re: Making plugins first-class citizens

ruud wrote #286407:

#43 2014-12-11 07:56:07

Re: Making plugins first-class citizens

etc wrote #286411:

Bloke wrote #286408:

#44 2014-12-12 17:17:44

Re: Making plugins first-class citizens

ruud wrote #286422:

#45 2014-12-12 19:54:16

Re: Making plugins first-class citizens

etc wrote #286474:

Board footer