SQL Performance of Textpattern queries

Gocom · 2011-11-30 06:09:07

Vienuolis wrote:

I am running a TxP site on my own server — quite robust, clear, and still almost empty rack machine. With the same problem: from 0.2 sec runtime for an individual page to 10 sec for section, category, query pages. And with no debugging errors, no communication lags. On.lt.

Well, you are running 129 queries and have some big code heavy loops on the page (i.e. category_lists, multiple article tags). Big reason for slowness I see is that you are collecting all keywords from all articles 32 times, caused by tru_tags.

Instead of calculating tag weights for each article, you could just list the keywords from the current article. That total 1 second query time is coming from tru_tags alone. One <txp:tru_tags_from_article /> takes 0.03 in queries alone (as it seems to pick all keywords from every article for calculations).

One small things is that you seem to fetch host names of your visitors. That slows the page down too. You could turn Use DNS? off in Advanced Preferences. Also you could empty (if any) Spam blacklists (comma-separated) preference. Checking spamlists from the blacklist providers slows individual article pages.

Last edited by Gocom (2011-11-30 06:12:25)

merz1 · 2011-11-30 08:25:14

From tru_tags instructions #tru_tags_from_article

Be careful, however, before turning this (useoverallcounts=“1”) on. This attribute causes tru_tags to do an extra database query for each article displayed on a page. That extra query is equivalent to the query used to generate the overall cloud. This can cause a significant load increase on your server.

Last edited by merz1 (2011-11-30 08:26:09)

Vienuolis · 2011-12-02 00:31:20

I am sorry, I am late. Jukka, Markus, thanks a lot for your advices. I am trying to adopt TxP for quite complex, large-scale scientific publishing, and I see the great potential of Textpattern.

Instead of calculating tag weights for each article, you could just list the keywords from the current article.

Hm, I have set <txp:tru_tags_from_article /> without useoverallcounts, with no attributes at all. I expect the weight calculation in a tag cloud only — by generating a subject index, on a separate webpage for <txp:tru_tags_cloud />. What is the reason of keywords (tags), if not linking them to its indexes, without subject indexing? I just do not understand the need for SQL queering despite of context — I expect for query only by clicking the tag, since keywords are already specified by an author of every article. BTW, an overall tag cloud loads quite quickly, although it is relatively big.

You could turn Use DNS? off in Advanced Preferences. Also you could empty (if any) Spam blacklists

I have triggered DNS, with no significant effect. And comments are switched off completely in favour of Disq.us.

If I did not miss something important, the main improvement of Textpattern for large-scale publishing would flat-file disk cache implementing, IMHO.

maniqui · 2011-12-02 01:03:46

Hi Vladas.

Before or after you have squeezed your tags to spit out the last query, you could try to implement aks_cache.

Yesterday, I wrote a post on the thread with some tips and use cases for cacheing dynamically generated content in Textpattern with aks_cache.

If this tag cloud you are generating is being used site-wide (check the second tip on above link), you could manage it this way: generate it once (on first page visit), cache it with aks_cache, and reuse it on wherever page you need.
Yes, the first hit (the one that generates the tag cloud) will still be query-intensive, but the next hits to your pages will reduce the queries to render the tag cloud, from many to just one.

Vienuolis · 2011-12-02 02:36:09

Oh, what impressive tutorial, thank you, Julián, very much! I am wondering, how I have missed such important plugin, perhaps by confusing it with cnk_versioning, that is completely different approach. I am going to try and learn aks_cache, thanks!

wet · 2011-12-02 07:04:27

Also, there’s Sencer’s cache plugin asy_jpcache for real flat-file caching.

Gocom · 2011-12-02 07:41:09

Vienuolis wrote:

Hm, I have set <txp:tru_tags_from_article /> without useoverallcounts, with no attributes at all. I expect the weight calculation in a tag cloud only — by generating a subject index, on a separate webpage for <txp:tru_tags_cloud />. What is the reason of keywords (tags), if not linking them to its indexes, without subject indexing? I just do not understand the need for SQL queering despite of context — I expect for query only by clicking the tag, since keywords are already specified by an author of every article.

Well, that is strange. I don’t know why it does that, haven’t looked at tru_tag’s code since doing (or looking at) some patching some years ago. But what ever the reason is, every tag is collecting keywords from every article:

[SQL (0.00033998489379883): select name from txp_section where searchable != '1']
[SQL (0.026265859603882): select Keywords from textpattern where Keywords <> ''and Section != ''and Section != 'dalykai' and Status >= '4' and Posted <= now()]

Not just from that current article, but from all articles posted to searchable sections.

On one of my personal sites I’ve used (and still use) tru_tags. On that site, I ended up replacing <txp:tru_tags_from_article /> with little bit of PHP that generates the linked list of tags directly from article’s keywords.

<txp:php>
	global $thisarticle;
	if($thisarticle['keywords']) {
		$out = array();
		$tags = explode(',',$thisarticle['keywords']);
		foreach($tags as $tag) {
			$out[] = '<a rel="nofollow" href="'.hu.'tag/'.urlencode($tag).'">'.htmlspecialchars($tag).'</a>';
		}
		echo implode(', ',$out);
	}
</txp:php>

Makes showing list of linked tags much faster. You could probably do the same with a plugin, like rah_repeat, or you could look into some other tagging solution like smd_tags. With rah_repeat that above PHP snippet could be turned into something like:

<txp:rah_repeat value='<txp:keywords />' break=",">
	<a href="<txp:site_url />tag/<txp:rah_repeat_value />">
		<txp:rah_repeat_value />
	</a>
</txp:rah_repeat>

BTW, an overall tag cloud loads quite quickly, although it is relatively big.

Yep, likely will, as the code is only run once. Put the same tag on a page 50 times along with an article list, and you will see quite different results.

Last edited by Gocom (2011-12-02 07:48:29)

merz1 · 2011-12-02 12:18:14

Well, that is strange. I don’t know why it does that, haven’t looked at tru_tag’s code since doing (or looking at) some patching some years ago.

Please check if you all use the latest version of the tru_tags plug-in :)
Some performance enhancements were implemented.

Btw: I don’t use keyword lists in article lists because I think they are distracting. I think showing the used category (-ies) is enough.

Gocom · 2011-12-02 13:22:56

merz1 wrote:

Please check if you all use the latest version of the tru_tags plug-in :)
Some performance enhancements were implemented.

According the tag trace Vienuolis is indeed using the latest version. Only the latest version generates such queries. In previous releases tru_tags_from_article tag would always just run a query that selects tags from the current article (w/o using already loaded article data). The feature (and behavior) Vienuolis’ tag trace shows was introduced in the latest release.

Looking at the latest release’s source code, the tru_tag_from_article tag functionality was improved. In the current version it is supposed to collect keywords from only the current article without running any extra queries as long as the tag is used in article context. That big query Vienuolis is experience should only be ran when the tag is used outside article context. Other thing is why the code isn’t working as expected.

It seems the code should work just fine (it shouldn’t generate those mass queries), but I haven’t tested it, so don’t know for sure. There could be some underlying issue or conflict somewhere.

Last edited by Gocom (2011-12-02 13:28:35)

Vienuolis · 2011-12-03 00:57:16

Please check if you all use the latest version of the tru_tags plug-in :)

Version 3.6 — the Current Release. Andreas has made tru_tags-3.7-alpha a year ago, that fixes significant performance problem. I will ask Nathan for this feature in his tru_tags forum thread.

That big query Vienuolis is experience should only be ran when the tag is used outside article context.

Here is an excerpt of an article form (simplified and translated):

...	<div id="essence">
	<h1><txp:title /></h1>
<txp:body />
	</div><!-- essence -->
	<dl id="abstract">
<dt>Summary:</dt><dd><txp:excerpt /></dd>
<dt>Thesis:</dt><dd><txp:custom_field name="Thesis:" />.</dd>
<dt>Category:</dt><dd><txp:category1 link="1" title="1" />: <txp:category2 link="1" title="1" />.</dd>
<dt>Keywords:</dt><dd><txp:tru_tags_from_article />.</dd>
...
<dt>Author:</dt><dd><txp:author link="1" />, <txp:posted />.</dd>
	</dl>

And an excerpt of a listform:

	<dl class="abstract">
<dt><txp:article_image thumbnail="1" class="" /></dt>
<dt><strong><txp:permlink><txp:title /></txp:permlink></strong></dt>
<dd><txp:posted /> <txp:custom_field name="Description:" />.</dd>
<dd><txp:article_id />. <txp:section title="1" link="1" />: <txp:category1 title="1" link="1" />: <txp:category2 title="1" link="1" />: <txp:tru_tags_from_article />.</dd>
<dd><txp:author link="1" />: <txp:custom_field name="Thesis:" />.</dd>
<dd><txp:excerpt /></dd>
	</dl>

I do not see where I am out of context.

I don’t use keyword lists in article lists because I think they are distracting. I think showing the used category (-ies) is enough.

Yes, in most cases, I agree. But for serious research papers, in strict bibliography categories and keywords have different meanings. Like in UDK, categories are sole and hierarchical (suitable for chapters of Wikipedia, for example), while keywords indicate exact subjects (like Wikipedia article name) for subject indexing.

You could probably do the same with a plugin, like rah_repeat, or you could look into some other tagging solution like smd_tags.

Jukka, thank you very much, I did not find your plugin in the repository some time ago. Rah_repeat and smd_tags are looking great, apparently, I should replace the tagging plugin.

Also, there’s Sencer’s cache plugin asy_jpcache for real flat-file caching.

Thank you, Robert, I have missed out this plugin, too. Why this and some important plugins are not registered in TxP repository? I think we should support old good services, instead of making new ones. BTW, Textpattern.org stands also as an example of quite good data architecture and usability.

Last edited by Vienuolis (2011-12-03 09:21:17)

Vienuolis · 2011-12-04 21:51:06

I have just replaced tru_tags v3.6 (the current realease) with Andreas’ tru_tags-3.7-alpha, resulting page runtime as forty times (40x) as faster — from former 22 sec to 0.5 sec now reduced load of the biggest webpage. Hooray!

I am willing to implement your caching solutions, too. Thank you for your lean, smart, and powerful carver — the brilliant Textpattern.

Textpattern CMS

Textpattern CMS support forum

#13 2011-11-30 06:09:07

Re: SQL Performance of Textpattern queries

#14 2011-11-30 08:25:14

Re: SQL Performance of Textpattern queries

#15 2011-12-02 00:31:20

Re: SQL Performance of Textpattern queries

#16 2011-12-02 01:03:46

Re: SQL Performance of Textpattern queries

#17 2011-12-02 02:36:09

Re: SQL Performance of Textpattern queries

#18 2011-12-02 07:04:27

Re: SQL Performance of Textpattern queries

#19 2011-12-02 07:41:09

Re: SQL Performance of Textpattern queries

#20 2011-12-02 12:18:14

Re: SQL Performance of Textpattern queries

#21 2011-12-02 13:22:56

Re: SQL Performance of Textpattern queries

#22 2011-12-03 00:57:16

Re: SQL Performance of Textpattern queries

#23 2011-12-04 21:51:06

Re: SQL Performance of Textpattern queries

Board footer