Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Making plugins first-class citizens
ruud wrote #286479:
In my tests it’s up to 1.35 times faster (that’s without looping; 5.5 times with infinite looping) than the current TXP parser for complex tag structures
These are nice figures! Together with
parsing all tags at once results in a hash table that contains a multiple of the original template size, due to tag nesting.
it makes me think you have done it a different way than me. Mind posting the code?
Have you run tests on an actual TXP install, compared to the original parser? How does this effect runtime in testing (or debug) mode?
Yes, though I don’t remember the exact figures. There is no real change (~10%) in runtime/memory consumption for an “average” page. But trying to help in this case (long heavy loops) I have replaced <txp:tags /> with etc_query {tokens}, that are parsed only once. It has reduced the runtime (~3s) to ~1s, which was noticeable. I guess the new parse() would give the same result in “extreme” cases, though you can not beat <txp:php /> at these.
Offline
Re: Making plugins first-class citizens
Okay, here it goes. If you don’t use forms, the first parse() call puts everything you need in the hash table. The parseElse() function is equal to calling parse(EvalElse()).
One thing that may be interesting to try is to cache the $stack contents in a database. That way you don’t just benefit on loops, but also on repeated requests of the same page.
function parse($thing) {
global $stack;
$hash = sha1($thing);
if(isset($stack[$hash])) {
$tags[0] = $stack[$hash];
} else {
$tags[0] = array();
$tag = array();
$level = 0;
$inside = array();
$istag = FALSE;
$f = '@(</?txp:\w+(?:\s+\w+\s*=\s*(?:"(?:[^"]|"")*"|\'(?:[^\']|\'\')*\'|[^\s\'"/>]+))*\s*/?'.chr(62).')@s';
$t = '@:(\w+)(.*?)/?.$@s';
$parsed = preg_split($f, $thing, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach ($parsed as $chunk) {
if ($istag) {
preg_match($t, $chunk, $tag[$level]);
if (substr($chunk, -2, 1) === '/') {
# self closed
$tags[$level][] = array($tag[$level][1], $tag[$level][2], null);
if ($level) $inside[$level] .= $chunk;
} elseif (substr($chunk, 1, 1) !== '/') {
# opening
if ($level) $inside[$level] .= $chunk;
$level++;
$inside[$level] = '';
$tags[$level] = array();
} else {
# closing
$sha = sha1($inside[$level]);
$stack[$sha] = $tags[$level];
$level--;
$tags[$level][] = array($tag[$level][1], $tag[$level][2], $inside[$level+1]);
if ($level) $inside[$level] .= $inside[$level+1] . $chunk;
}
} else {
$tags[$level][] = $chunk;
if ($level) $inside[$level] .= $chunk;
}
$istag = !$istag;
}
$stack[$hash] = $tags[0];
}
$out = '';
foreach($tags[0] as $i => $tag) $out .= $i&1 ? processTags($tag[0], $tag[1], $tag[2]) : $tag;
return $out;
}
function parseElse($thing, $condition)
{
global $stack;
if (strpos($thing, ':else') === false) {
return $condition ? parse($thing) : '';
}
$tags = $stack[sha1($thing)];
$nr = 1;
$tot = count($tags);
while ($nr < $tot and $tags[$nr][0] !== 'else') $nr += 2;
if ($condition) {
$out = $tags[0];
$min = 1;
$max = $nr - 1;
} elseif ($nr < $tot) {
$out = $tags[$nr + 1];
$min = $nr + 2;
$max = $tot;
} else {
return '';
}
for ($i = $min; $i < $max; $i += 2) {
$out .= processTags($tags[$i][0], $tags[$i][1], $tags[$i][2]) . $tags[$i + 1];
}
return $out;
}
Benchmarks
$thing1 = 'some very long text that does not contain an else tag';
$thing2 = '<txp:tag />';
$thing3 = '<txp:if>something<txp:else></txp:if>';
$thing4 = '<txp:if><txp:tag>something<txp:else></txp:if>';
$thing5 = '<txp:if><txp:tag/><txp:tag/><txp:tag/><txp:tag/><txp:tag/><txp:else /><txp:tag/><txp:tag/><txp:tag/><txp:tag/><txp:tag/></txp:if>';
$thing6 = '<txp:if><txp:tag><txp:tag><txp:tag><txp:tag><txp:tag><txp:if><txp:else /></txp:if></txp:tag></txp:tag></txp:tag></txp:tag></txp:tag></txp:if>';
$thing7 = '<txp:if><txp:if><txp:if><txp:if><txp:if><txp:else></txp:if></txp:if></txp:if></txp:if></txp:if></txp:if>';
$thing8 = '<txp:tag><txp:tag><txp:tag><txp:tag><txp:tag></txp:tag></txp:tag></txp:tag></txp:tag></txp:tag>';
$thing9 = '<txp:tag><txp:tag><txp:tag><txp:tag><txp:tag><txp:x /><txp:x /><txp:x /><txp:x /><txp:x /></txp:tag></txp:tag></txp:tag></txp:tag></txp:tag>';
$thing10 = '<txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag /><txp:tag />';
--- 250000 runs
1 0.6190 0.5795 0.5208 1.3569 1.2031
2 1.7544 0.9276 0.8718 2.9528 2.7302
3 2.3013 0.5777 0.5196 5.2591 2.9899
4 2.7781 0.5764 0.5138 6.3647 3.4664
5 17.1107 3.2209 7.8146 18.0441 20.3693
6 41.0843 7.0340 13.1653 27.3075 49.0829
7 38.6543 4.9259 20.8092 21.5019 44.6545
8 18.0888 4.4227 4.1398 18.2603 23.1512
9 34.7269 6.7369 6.4449 25.9961 42.0318
10 10.7585 3.7477 3.7159 14.4655 14.1418
1 0.6322 0.5906 0.5217 1.3820 1.2133
2 1.7328 0.9324 0.8587 2.9548 2.7509
3 2.2970 0.5853 0.5220 5.2234 2.9572
4 2.7618 0.5822 0.5181 6.3478 3.4632
5 17.2101 3.2082 7.9638 18.0067 20.5029
6 14.1054 1.7007 7.9149 21.8310 15.9788
7 11.6276 1.5889 6.5863 17.9265 13.4327
8 18.0982 4.3862 4.1270 18.2606 23.1700
9 34.8114 6.7226 6.3946 25.9918 41.9020
10 10.6841 3.7376 3.7256 14.4354 14.2409
Column 1 = original parser
Column 2 = my parser + parseElse, infinite loop
Column 3 = etc parser + EvalElse, infinite loop
Column 4 = my parser + parseElse, no loop
Column 5 = etc parser + EvalElse, no loop
First set of results with condition true for if tags.
Second set of results with condition false for if tags.
These results are measured in seconds, but for a single parse you should divide them by 250000. So even slowest result takes only 0.2ms in reality for a single parse. Of course these are relatively short test strings, but it does put things in perspective.
Offline
Re: Making plugins first-class citizens
Couldn’t resist. Same benchmarks.
Column 1 is still the original parser.
Column 2 is my parser + parseElse, infinite loop
Column 3 is my parser + parseElse, no loop
The difference with the previous results is that this time I’m simulating what would happen if you unserialize a serialized $stack after fetching it from a database. This doesn’t include the overhead of the additionally required MySQL call, but does include the effect of the unserialize() call, which is why Column 2 is faster, because it only unserializes once there.
--- 250000 runs
0.6307 0.5672 1.0113
1.7773 0.9071 1.6515
2.3299 0.5715 1.2029
2.8387 0.5705 1.2139
17.3982 3.2207 7.4642
42.0089 7.0624 11.6038
39.4484 5.1024 8.7357
18.4378 4.4360 7.5659
35.3087 6.8439 11.4250
11.1749 3.7019 7.1695
0.6406 0.5751 1.0067
1.7908 0.9112 1.6654
2.3452 0.5795 1.2157
2.8173 0.5747 1.2043
17.7839 3.2729 7.5011
14.3136 1.7049 6.1402
11.8892 1.5756 5.1973
18.5314 4.4291 7.5564
35.4523 6.7621 11.4024
11.0429 3.7263 7.1274
I think adding such a caching mechanism would only require a few lines of code, but we need some real-world parser input to test if doing so makes sense.
--- 1000 runs (default TXP page template, excluding forms):
0.5308 0.1109 0.2300
0.8086 0.1530 0.2716
--- 1000 runs (this time with the template + forms for an individual article page combined):
1.1444 0.2789 0.5225
1.4097 0.2960 0.5448
In these examples, the difference is less than 1 millisecond. Which makes me wonder if this is worth doing at all (although it’s certainly fun trying).
Offline
Re: Making plugins first-class citizens
ruud wrote #286483:
Brilliant work and exciting results, but strange question :)
In these examples, the difference is less than 1 millisecond. Which makes me wonder if this is worth doing at all (although it’s certainly fun trying).
Of course it’s worth doing, and wherever possible. Here is a “real-life” example (monthly archive) where you win ~0.1s (~40%) for ~250 articles:
<txp:article_custom wraptag="ul" sort="Posted ASC" limit="9999">
<txp:variable name="year" value='<txp:if_different><txp:posted format="%Y" /></txp:if_different>' />
<txp:variable name="month" value='<txp:if_different><txp:posted format="%Y-%b" /></txp:if_different>' />
<txp:if_first_article>
<li><txp:posted format="%Y" /><ul>
<li><txp:posted format="%b" /><ul>
<txp:else />
<txp:if_variable name="month" value=""><txp:else />
</ul></li>
<txp:if_variable name="year" value=""><txp:else />
</ul></li>
<li><txp:posted format="%Y" /><ul>
</txp:if_variable>
<li><txp:posted format="%b" /><ul>
</txp:if_variable>
</txp:if_first_article>
<li class="article"><txp:permlink><txp:title /></txp:permlink></li>
<txp:if_last_article>
</ul></li></ul></li>
</txp:if_last_article>
</txp:article_custom>
Offline
Re: Making plugins first-class citizens
ruud wrote #286482:
One thing that may be interesting to try is to cache the $stack contents in a database. That way you don’t just benefit on loops, but also on repeated requests of the same page.
This could even be partly (e.g. pages and forms) done admin-size, so fetch_form() would retrieve the fully parsed form tree. No additional db query is required this way.
Offline
Re: Making plugins first-class citizens
etc wrote #286509:
This could even be partly (e.g. pages and forms) done admin-size, so
fetch_form()would retrieve the fully parsed form tree. No additional db query is required this way.
You could even skip loading the actual form/page contents and just pass the sha1 hash as an argument to the parse() call (except in <txp:php> constructs that contain TXP tags) and save a few more microseconds (around 10% speed increase). And while we’re at it, also store the existence and location of the else tag, which would speed it up even further.
By parsing the moment you save/edit a page/form, you could additionally warn the user about parsing errors, thus avoiding non-functional websites to some degree. This is something that can be done with the current parser as well. In that case you’d want an additional, more strict parser that warns about improper tag nesting and missing closing tags.
Having said that, using a caching plugin is a far more effective solution for some problems, like long article lists.
Offline
Re: Making plugins first-class citizens
This is getting serious… I like it, thanks!
Although possibly off-topic (ignore if so), would all this talk of cacheing and hashing give us the ability to bypass the forced linear nature of pages to avoid logic errors? An example that springs to mind is trying to use the navigation tags older/newer before a call to <txp:article />, which results in no navigation links. Or trying to use the search results before calling an article.
The current antidote of course, is to include a pgonly article tag before the nav elements. But it’s a trifle annoying, because you have to remember to update both tags and use identical attributes in both places or things get weird.
I don’t know if the current parser is able to recognise the fact it’s seen a pgonly article already and thus bypass the second call to the database, or if it makes two calls for the same content, albeit the first one short-circuits prior to completion. I suspect it makes two calls, but have never really delved into it in great detail.
Being able to effectively ‘defer’ processing of dependent tags until after the dependent content has been executed, and then replace the tags in the template with the relevant content immediately prior to page display would be terrific. Failing that, some way to minimise the round-trip impact of being forced to use two calls for the same content in some scenarios would be a step up.
As I say, might be out of scope (and I don’t know if the number of people it affects is large enough to consider, compared with the effort expended in doing it), but thought I’d throw it out there for consideration in case it was a quick win while all this parser optimisation is being bounced around.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Hire Txp Builders – finely-crafted code, design and Txp
Offline
Re: Making plugins first-class citizens
Bloke wrote #286528:
Being able to effectively ‘defer’ processing of dependent tags until after the dependent content has been executed, and then replace the tags in the template with the relevant content immediately prior to page display would be terrific. Failing that, some way to minimise the round-trip impact of being forced to use two calls for the same content in some scenarios would be a step up.
Adi has made a very smart plugin that does things like this:
<txp:adi_if_content>
<txp:adi_if_content_insert>
<!-- will be processed after txp:article -->
<txp:older/newer />
</txp:adi_if_content_insert>
<txp:article />
</txp:adi_if_content>
I have written a version, based on modified parse(), and can confirm that changing the processing order in the new parser is quite easy, but someone has to set this order. Should tags have kind of dependency?
Offline
Re: Making plugins first-class citizens
ruud wrote #286516:
And while we’re at it, also store the existence and location of the else tag, which would speed it up even further.
And why not store both parsed true/false parts?
Having said that, using a caching plugin is a far more effective solution for some problems, like long article lists.
A clever core solution should be on txp todo list.
Offline
Re: Making plugins first-class citizens
etc wrote #286578:
And why not store both parsed true/false parts?
Because those are already stored in $stack when you parse the entire tree instead of just the current level. The only thing parseElse needs to know is if/where the <txp:else/> tag is in $stack[$hash][$level]
Offline
Re: Making plugins first-class citizens
Bloke wrote #286528:
Although possibly off-topic (ignore if so), would all this talk of cacheing and hashing give us the ability to bypass the forced linear nature of pages to avoid logic errors? An example that springs to mind is trying to use the navigation tags older/newer before a call to
<txp:article />, which results in no navigation links. Or trying to use the search results before calling an article.
Can’t that be solved by returning the unparsed tag when $pretext['secondpass'] === false for tags that need defered parsing?
Offline
Re: Making plugins first-class citizens
I’m reading this wonderful, gentlemanly (so far) thread about something I see full value in from a user standpoint (ease of reading template markup), and I’m wondering why the conversation just ends ten days before Christmas. Too much eggnog?
Offline
Re: Making plugins first-class citizens
Destry wrote #289454:
why the conversation just ends ten days before Christmas
I simply lost track of the various patches on patches on patches, optimisations and enhancements. If someone throws a pull request or unified diff my way, I’ll test it and merge it in. This has immense value to Textpattern.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Hire Txp Builders – finely-crafted code, design and Txp
Offline
Re: Making plugins first-class citizens
I can prepare a patch. It would help to know which optimisations are acceptable.
I could start with the basics and perhaps then add optimisations?
Offline
Re: Making plugins first-class citizens
ruud wrote #289464:
I can prepare a patch. It would help to know which optimisations are acceptable.
Whichever ones you think make sense. The one that has the best performance for most situations and adds the <abc:else /> syntax, with the option of toggling short tag support on an as-needed basis would be amazing. I’m not sure from the thread so far if such a hybrid exists.
Minimising XML clashes for those that need it at the page level would be the most flexible, but I’m not sure which offers the best facilities without impacting performance. A global on/off pref is simplest, but maybe a tad granular. A preference for a whitelist of tags that remain unparsed is flexible, but not necessarily easy to maintain. Perhaps some combo might work? A global on/off pref and, I dunno, a tag that can set the short tag parsing on/off during document processing? Or a tag that can add whitelisted tag prefixes that instruct the parser to leave alone? Just shooting out ideas, not sure if any of them are viable.
I could start with the basics and perhaps then add optimisations?
If it’s not too much extra work, by all means, thank you.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Hire Txp Builders – finely-crafted code, design and Txp
Offline