Making plugins first-class citizens

etc · 2014-12-12 22:14:31

ruud wrote #286479:

In my tests it’s up to 1.35 times faster (that’s without looping; 5.5 times with infinite looping) than the current TXP parser for complex tag structures

These are nice figures! Together with

parsing all tags at once results in a hash table that contains a multiple of the original template size, due to tag nesting.

it makes me think you have done it a different way than me. Mind posting the code?

Have you run tests on an actual TXP install, compared to the original parser? How does this effect runtime in testing (or debug) mode?

Yes, though I don’t remember the exact figures. There is no real change (~10%) in runtime/memory consumption for an “average” page. But trying to help in this case (long heavy loops) I have replaced <txp:tags /> with etc_query {tokens}, that are parsed only once. It has reduced the runtime (~3s) to ~1s, which was noticeable. I guess the new parse() would give the same result in “extreme” cases, though you can not beat <txp:php /> at these.

ruud · 2014-12-12 23:48:45

Okay, here it goes. If you don’t use forms, the first parse() call puts everything you need in the hash table. The parseElse() function is equal to calling parse(EvalElse()).

One thing that may be interesting to try is to cache the $stack contents in a database. That way you don’t just benefit on loops, but also on repeated requests of the same page.

function parse($thing) {
  global $stack;

  $hash = sha1($thing);

  if(isset($stack[$hash])) {
    $tags[0] = $stack[$hash];
  } else {
    $tags[0] = array();
    $tag    = array(); 
    $level  = 0;
    $inside = array();
    $istag  = FALSE;  

    $f = '@(</?txp:\w+(?:\s+\w+\s*=\s*(?:"(?:[^"]|"")*"|\'(?:[^\']|\'\')*\'|[^\s\'"/>]+))*\s*/?'.chr(62).')@s';
    $t = '@:(\w+)(.*?)/?.$@s';

    $parsed = preg_split($f, $thing, -1, PREG_SPLIT_DELIM_CAPTURE);

    foreach ($parsed as $chunk) {
      if ($istag) {
        preg_match($t, $chunk, $tag[$level]);

        if (substr($chunk, -2, 1) === '/') {
          # self closed
          $tags[$level][] = array($tag[$level][1], $tag[$level][2], null);
          if ($level) $inside[$level] .= $chunk; 
        } elseif (substr($chunk, 1, 1) !== '/') {
          # opening
          if ($level) $inside[$level] .= $chunk;
          $level++;
          $inside[$level] = '';
          $tags[$level] = array();
        } else {
          # closing
          $sha = sha1($inside[$level]);
          $stack[$sha] = $tags[$level];
          $level--;
          $tags[$level][] = array($tag[$level][1], $tag[$level][2], $inside[$level+1]);
          if ($level) $inside[$level] .= $inside[$level+1] . $chunk;
        }
      } else {
        $tags[$level][] = $chunk;
        if ($level) $inside[$level] .= $chunk;
      }
      $istag = !$istag;
    }
    $stack[$hash] = $tags[0];
  }

  $out = '';
  foreach($tags[0] as $i => $tag) $out .= $i&1 ? processTags($tag[0], $tag[1], $tag[2]) : $tag;
  return $out;
}


function parseElse($thing, $condition)
{
  global $stack;

  if (strpos($thing, ':else') === false) {
    return $condition ? parse($thing) : '';
  }

  $tags = $stack[sha1($thing)];
  $nr   = 1;
  $tot  = count($tags);

  while ($nr < $tot and $tags[$nr][0] !== 'else') $nr += 2;

  if ($condition) {
    $out = $tags[0];
    $min = 1;
    $max = $nr - 1;
  } elseif ($nr < $tot) {
    $out = $tags[$nr + 1];
    $min = $nr + 2;
    $max = $tot;   
  } else {
    return '';
  }

  for ($i = $min; $i < $max; $i += 2) {
    $out .= processTags($tags[$i][0], $tags[$i][1], $tags[$i][2]) . $tags[$i + 1];
  }

  return $out;
}

Benchmarks

$thing1 = 'some very long text that does not contain an else tag';
$thing2 = '<txp:tag />';
$thing3 = '<txp:if>something<txp:else></txp:if>';
$thing4 = '<txp:if><txp:tag>something<txp:else></txp:if>';
$thing5 = '<txp:if><txp:tag/><txp:tag/><txp:tag/><txp:tag/><txp:tag/><txp:else /><txp:tag/><txp:tag/><txp:tag/><txp:tag/><txp:tag/></txp:if>';
$thing6 = '<txp:if><txp:tag><txp:tag><txp:tag><txp:tag><txp:tag><txp:if><txp:else /></txp:if></txp:tag></txp:tag></txp:tag></txp:tag></txp:tag></txp:if>';
$thing7 = '<txp:if><txp:if><txp:if><txp:if><txp:if><txp:else></txp:if></txp:if></txp:if></txp:if></txp:if></txp:if>';
$thing8 = '<txp:tag><txp:tag><txp:tag><txp:tag><txp:tag></txp:tag></txp:tag></txp:tag></txp:tag></txp:tag>';
$thing9 = '<txp:tag><txp:tag><txp:tag><txp:tag><txp:tag><txp:x /><txp:x /><txp:x /><txp:x /><txp:x /></txp:tag></txp:tag></txp:tag></txp:tag></txp:tag>';
$thing10 = '<txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag />';

--- 250000 runs
1	0.6190	0.5795	0.5208	1.3569	1.2031
2	1.7544	0.9276	0.8718	2.9528	2.7302
3	2.3013	0.5777	0.5196	5.2591	2.9899
4	2.7781	0.5764	0.5138	6.3647	3.4664
5	17.1107	3.2209	7.8146	18.0441	20.3693
6	41.0843	7.0340	13.1653	27.3075	49.0829
7	38.6543	4.9259	20.8092	21.5019	44.6545
8	18.0888	4.4227	4.1398	18.2603	23.1512
9	34.7269	6.7369	6.4449	25.9961	42.0318
10	10.7585	3.7477	3.7159	14.4655	14.1418

1	0.6322	0.5906	0.5217	1.3820	1.2133
2	1.7328	0.9324	0.8587	2.9548	2.7509
3	2.2970	0.5853	0.5220	5.2234	2.9572
4	2.7618	0.5822	0.5181	6.3478	3.4632
5	17.2101	3.2082	7.9638	18.0067	20.5029
6	14.1054	1.7007	7.9149	21.8310	15.9788
7	11.6276	1.5889	6.5863	17.9265	13.4327
8	18.0982	4.3862	4.1270	18.2606	23.1700
9	34.8114	6.7226	6.3946	25.9918	41.9020
10	10.6841	3.7376	3.7256	14.4354	14.2409

Column 1 = original parser
Column 2 = my parser + parseElse, infinite loop
Column 3 = etc parser + EvalElse, infinite loop
Column 4 = my parser + parseElse, no loop
Column 5 = etc parser + EvalElse, no loop

First set of results with condition true for if tags.
Second set of results with condition false for if tags.

These results are measured in seconds, but for a single parse you should divide them by 250000. So even slowest result takes only 0.2ms in reality for a single parse. Of course these are relatively short test strings, but it does put things in perspective.

ruud · 2014-12-13 00:35:13

Couldn’t resist. Same benchmarks.
Column 1 is still the original parser.
Column 2 is my parser + parseElse, infinite loop
Column 3 is my parser + parseElse, no loop

The difference with the previous results is that this time I’m simulating what would happen if you unserialize a serialized $stack after fetching it from a database. This doesn’t include the overhead of the additionally required MySQL call, but does include the effect of the unserialize() call, which is why Column 2 is faster, because it only unserializes once there.

--- 250000 runs
	0.6307	0.5672	1.0113
	1.7773	0.9071	1.6515
	2.3299	0.5715	1.2029
	2.8387	0.5705	1.2139
	17.3982	3.2207	7.4642
	42.0089	7.0624	11.6038
	39.4484	5.1024	8.7357
	18.4378	4.4360	7.5659
	35.3087	6.8439	11.4250
	11.1749	3.7019	7.1695

	0.6406	0.5751	1.0067
	1.7908	0.9112	1.6654
	2.3452	0.5795	1.2157
	2.8173	0.5747	1.2043
	17.7839	3.2729	7.5011
	14.3136	1.7049	6.1402
	11.8892	1.5756	5.1973
	18.5314	4.4291	7.5564
	35.4523	6.7621	11.4024
	11.0429	3.7263	7.1274

I think adding such a caching mechanism would only require a few lines of code, but we need some real-world parser input to test if doing so makes sense.

--- 1000 runs (default TXP page template, excluding forms):
	0.5308	0.1109	0.2300

	0.8086	0.1530	0.2716

--- 1000 runs (this time with the template + forms for an individual article page combined):
	1.1444	0.2789	0.5225

	1.4097	0.2960	0.5448

In these examples, the difference is less than 1 millisecond. Which makes me wonder if this is worth doing at all (although it’s certainly fun trying).

etc · 2014-12-13 22:22:13

ruud wrote #286483:

Brilliant work and exciting results, but strange question :)

In these examples, the difference is less than 1 millisecond. Which makes me wonder if this is worth doing at all (although it’s certainly fun trying).

Of course it’s worth doing, and wherever possible. Here is a “real-life” example (monthly archive) where you win ~0.1s (~40%) for ~250 articles:

<txp:article_custom wraptag="ul" sort="Posted ASC" limit="9999">
	<txp:variable name="year" value='<txp:if_different><txp:posted format="%Y" /></txp:if_different>' />
	<txp:variable name="month" value='<txp:if_different><txp:posted format="%Y-%b" /></txp:if_different>' />
	<txp:if_first_article>
		<li><txp:posted format="%Y" /><ul>
		<li><txp:posted format="%b" /><ul>
	<txp:else />
		<txp:if_variable name="month" value=""><txp:else />
			</ul></li>
			<txp:if_variable name="year" value=""><txp:else />
				</ul></li>
				<li><txp:posted format="%Y" /><ul>
			</txp:if_variable>
			<li><txp:posted format="%b" /><ul>
		</txp:if_variable>
	</txp:if_first_article>
	<li class="article"><txp:permlink><txp:title /></txp:permlink></li>
	<txp:if_last_article>
		</ul></li></ul></li>
	</txp:if_last_article>
</txp:article_custom>

etc · 2014-12-14 09:09:58

ruud wrote #286482:

One thing that may be interesting to try is to cache the $stack contents in a database. That way you don’t just benefit on loops, but also on repeated requests of the same page.

This could even be partly (e.g. pages and forms) done admin-size, so fetch_form() would retrieve the fully parsed form tree. No additional db query is required this way.

ruud · 2014-12-14 12:48:44

etc wrote #286509:

This could even be partly (e.g. pages and forms) done admin-size, so fetch_form() would retrieve the fully parsed form tree. No additional db query is required this way.

You could even skip loading the actual form/page contents and just pass the sha1 hash as an argument to the parse() call (except in <txp:php> constructs that contain TXP tags) and save a few more microseconds (around 10% speed increase). And while we’re at it, also store the existence and location of the else tag, which would speed it up even further.

By parsing the moment you save/edit a page/form, you could additionally warn the user about parsing errors, thus avoiding non-functional websites to some degree. This is something that can be done with the current parser as well. In that case you’d want an additional, more strict parser that warns about improper tag nesting and missing closing tags.

Having said that, using a caching plugin is a far more effective solution for some problems, like long article lists.

Bloke · 2014-12-14 20:47:39

This is getting serious… I like it, thanks!

Although possibly off-topic (ignore if so), would all this talk of cacheing and hashing give us the ability to bypass the forced linear nature of pages to avoid logic errors? An example that springs to mind is trying to use the navigation tags older/newer before a call to <txp:article />, which results in no navigation links. Or trying to use the search results before calling an article.

The current antidote of course, is to include a pgonly article tag before the nav elements. But it’s a trifle annoying, because you have to remember to update both tags and use identical attributes in both places or things get weird.

I don’t know if the current parser is able to recognise the fact it’s seen a pgonly article already and thus bypass the second call to the database, or if it makes two calls for the same content, albeit the first one short-circuits prior to completion. I suspect it makes two calls, but have never really delved into it in great detail.

Being able to effectively ‘defer’ processing of dependent tags until after the dependent content has been executed, and then replace the tags in the template with the relevant content immediately prior to page display would be terrific. Failing that, some way to minimise the round-trip impact of being forced to use two calls for the same content in some scenarios would be a step up.

As I say, might be out of scope (and I don’t know if the number of people it affects is large enough to consider, compared with the effort expended in doing it), but thought I’d throw it out there for consideration in case it was a quick win while all this parser optimisation is being bounced around.

etc · 2014-12-14 22:14:36

Bloke wrote #286528:

Being able to effectively ‘defer’ processing of dependent tags until after the dependent content has been executed, and then replace the tags in the template with the relevant content immediately prior to page display would be terrific. Failing that, some way to minimise the round-trip impact of being forced to use two calls for the same content in some scenarios would be a step up.

Adi has made a very smart plugin that does things like this:

<txp:adi_if_content>
	<txp:adi_if_content_insert>
		<!-- will be processed after txp:article -->
 		<txp:older/newer />
	</txp:adi_if_content_insert>
	<txp:article />
</txp:adi_if_content>

I have written a version, based on modified parse(), and can confirm that changing the processing order in the new parser is quite easy, but someone has to set this order. Should tags have kind of dependency?

etc · 2014-12-15 16:37:44

ruud wrote #286516:

And while we’re at it, also store the existence and location of the else tag, which would speed it up even further.

And why not store both parsed true/false parts?

Having said that, using a caching plugin is a far more effective solution for some problems, like long article lists.

A clever core solution should be on txp todo list.

ruud · 2014-12-15 19:20:15

etc wrote #286578:

And why not store both parsed true/false parts?

Because those are already stored in $stack when you parse the entire tree instead of just the current level. The only thing parseElse needs to know is if/where the <txp:else/> tag is in $stack[$hash][$level]

ruud · 2014-12-15 20:13:35

Bloke wrote #286528:

Although possibly off-topic (ignore if so), would all this talk of cacheing and hashing give us the ability to bypass the forced linear nature of pages to avoid logic errors? An example that springs to mind is trying to use the navigation tags older/newer before a call to <txp:article />, which results in no navigation links. Or trying to use the search results before calling an article.

Can’t that be solved by returning the unparsed tag when $pretext['secondpass'] === false for tags that need defered parsing?

Destry · 2015-03-26 14:00:47

I’m reading this wonderful, gentlemanly (so far) thread about something I see full value in from a user standpoint (ease of reading template markup), and I’m wondering why the conversation just ends ten days before Christmas. Too much eggnog?

Bloke · 2015-03-26 15:02:07

Destry wrote #289454:

why the conversation just ends ten days before Christmas

I simply lost track of the various patches on patches on patches, optimisations and enhancements. If someone throws a pull request or unified diff my way, I’ll test it and merge it in. This has immense value to Textpattern.

ruud · 2015-03-26 16:11:33

I can prepare a patch. It would help to know which optimisations are acceptable.
I could start with the basics and perhaps then add optimisations?

Bloke · 2015-03-26 16:38:04

ruud wrote #289464:

I can prepare a patch. It would help to know which optimisations are acceptable.

Whichever ones you think make sense. The one that has the best performance for most situations and adds the <abc:else /> syntax, with the option of toggling short tag support on an as-needed basis would be amazing. I’m not sure from the thread so far if such a hybrid exists.

Minimising XML clashes for those that need it at the page level would be the most flexible, but I’m not sure which offers the best facilities without impacting performance. A global on/off pref is simplest, but maybe a tad granular. A preference for a whitelist of tags that remain unparsed is flexible, but not necessarily easy to maintain. Perhaps some combo might work? A global on/off pref and, I dunno, a tag that can set the short tag parsing on/off during document processing? Or a tag that can add whitelisted tag prefixes that instruct the parser to leave alone? Just shooting out ideas, not sure if any of them are viable.

I could start with the basics and perhaps then add optimisations?

If it’s not too much extra work, by all means, thank you.

Textpattern CMS support forum

#46 2014-12-12 22:14:31

Re: Making plugins first-class citizens

ruud wrote #286479:

#47 2014-12-12 23:48:45

Re: Making plugins first-class citizens

#48 2014-12-13 00:35:13

Re: Making plugins first-class citizens

#49 2014-12-13 22:22:13

Re: Making plugins first-class citizens

ruud wrote #286483:

#50 2014-12-14 09:09:58

Re: Making plugins first-class citizens

ruud wrote #286482:

#51 2014-12-14 12:48:44

Re: Making plugins first-class citizens

etc wrote #286509:

#52 2014-12-14 20:47:39

Re: Making plugins first-class citizens

#53 2014-12-14 22:14:36

Re: Making plugins first-class citizens

Bloke wrote #286528:

#54 2014-12-15 16:37:44

Re: Making plugins first-class citizens

ruud wrote #286516:

#55 2014-12-15 19:20:15

Re: Making plugins first-class citizens

etc wrote #286578:

#56 2014-12-15 20:13:35

Re: Making plugins first-class citizens

Bloke wrote #286528:

#57 2015-03-26 14:00:47

Re: Making plugins first-class citizens

#58 2015-03-26 15:02:07

Re: Making plugins first-class citizens

Destry wrote #289454:

#59 2015-03-26 16:11:33

Re: Making plugins first-class citizens

#60 2015-03-26 16:38:04

Re: Making plugins first-class citizens

ruud wrote #289464:

Board footer