Textpattern Forum

You are not logged in. Register | Login | Help

#1 2009-07-28 01:58:46

johnstephens
Plugin Author
From: Harrisonburg, VA
Registered: 2008-06-01
Posts: 834
Website

[resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

Hi!

I’m working on a site with a lot of categories— I currently have 4,084 records in the txp_category table, but the category ids go up to 4,211 because I have successfully added more, several of which have been deleted. The site does not use link, image, or file categories.

But now, Textpattern chokes when I add a new article category using the “Categories” tab under “Content”: After clicking “Create”, Textpattern jumps to a blank screen with no error message. In phpMyAdmin, I see that new rows are added to the txp_category table, but the lft and rgt fields have a value of zero. I think it’s choking on the rebuild_tree_full('article') function, since I also get a blank screen when I delete an article category.

I think it might be hitting some kind of memory limit when rebuilding the category tree. If that’s the case, is there a way to extend the memory allowance for this function? Alternatively, is there a way to rebuild the txp_category table to remove any unnecessary data that might have built up— like all those deleted category ids (any change in the category ids would need to be promulgated to the associated articles).

What are some other ways I could troubleshoot this issue?

Thanks!

Offline

#2 2009-07-28 10:44:47

jsoo
Developer
From: NC, USA
Registered: 2004-11-15
Posts: 1,730
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

You might be hitting a query limit set by your web host. rebuild_tree_full() will result in two queries per category (I think), so running it several times in succession with so many categories is a lot of queries. Is this on a hosted server?


Txp tags not doing what you expect? Learn to use a tag trace. And the Tag Reference.

Offline

#3 2009-07-28 14:32:02

ruud
Developer emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 4,513
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

Or a timeout on the PHP process. Check the webserver error logs!

If the tree were updated on changes instead of being completely rebuilt, the number of required queries would depend on the number of changes (new/deleted/changed categories) instead of on total number of remaining categories.

PS. How come you have that many categories?

Offline

#4 2009-07-28 15:35:59

johnstephens
Plugin Author
From: Harrisonburg, VA
Registered: 2008-06-01
Posts: 834
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

Thanks for your feedback!

ruud:

How come you have that many categories?

My client wants the site to group our articles by the publication they refer to. He hired two people to enter thousands of publications into the the system, and together they completed about on fourth of the list— so he intends to add more.

jsoo:

You might be hitting a query limit set by your web host. rebuild_tree_full() will result in two queries per category

That may be. The host has no limits on MYSQL queries beyond the normal resource usage limits. I can’t tell if this operation is within usage limits or not, but I’m willing to check with their support team.

ruud:

Or a timeout on the PHP process. Check the webserver error logs!

I checked my error.log, and the last error is over ten days ago— before this problem appeared.

If the tree were updated on changes instead of being completely rebuilt, the number of required queries would depend on the number of changes (new/deleted/changed categories) instead of on total number of remaining categories.

Is that something I can toggle in Textpattern?

jsoo:

Is this on a hosted server?

Yes, but I have the same behavior on my development server, a Mac running MAMP.

Last night after posting here, I did a few things that have improved performance significantly, but it’s still pretty slow. Here’s what I did:

  1. The category data was entered in all caps before I upgraded to the latest version of Textpattern; this resulted in uppercase data in both the “title” and “name” columns. I have a notion (which may be misguided) that uppercase data takes up more memory than lowercase data, so I used Textmate to convert the category titles to title case.
  2. Then I used a SQL UPDATE to convert the category names to lowercase. After these two operations, I could add or delete a category without Textpattern choking, but it was very slow.
  3. I found and deleted several duplicate categories. Then I ran a SQL query that found and removed all rows that duplicate both name and type. This reduced the number of article category records to 4,003. (I kept a backup of the original table, in addition to my routine backups!)
  4. I ran rebuild_tree_full() from the Category tab to make get lft and rgt columns in line with the existing categories.
  5. I repaired and optimized all tables after each major change.

Adding and deleting categories works again, with a very long wait. My sense is that this doesnt resolve the problem, but only postpones it temporarily.

Thanks again for your feedback! Please let me know if you get any other ideas for troubleshooting or solving this.

Offline

#5 2009-07-28 16:03:05

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 5,925
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

johnstephens wrote:

they completed about on fourth of the list— so he intends to add more.

Holy sardines!

If the tree were updated on changes… Is that something I can toggle in Textpattern?

Sort of. I think instead of calling rebuild_tree_full('article') you can call rebuild_tree('root', leftmost_index, 'article') where leftmost_index is some number that corresponds to the ‘lft’ of a node. Most times you use 1 here, though whether that actually improves performance I couldn’t say.

I did some digging and found a few articles. One of them states (about halfway down) :

The nested sets model is not good for trees which require frequent updates, and is pretty much unsupportable for large updatable trees

(source)

I believe nested set model is another name for the type of tree used in TXP (willing to be proved wrong). If it’s getting super slow on updates and inserts and reindexing from a midpoint doesn’t work, then there’s not much you can do short of ripping out TXP’s algorithm and using something like a red-black tree, one that has better performance on inserts/modifications, or not using cats at all. That’s a lot of work and you’ve gotta be careful not to invalidate the existing 1000 or so already entered :-(

I used a SQL UPDATE to convert the category names to lowercase

That’s a good move. TXP enforces lower case by default now.

I found and deleted several duplicate categories

I think nodes in this type of structure have to be unique (stress: I think). Even if the structure itself allows duplicates (depending on parents I guess), TXP itself doesn’t like duplicates so that would help.

In all, a bit of a conundrum. Two thoughts:

1) Are your guys using nested categories? If not, could they? I wonder if parents would help on the server side (perhaps because you could rebuild the tree from the parent down instead of the whole tree).

2) My suspicion is that a lot of the processing time is because TXP’s Categories tab shows all categories so at least 70+% of the time will be physically drawing the TXP interface to fit them all in. Tables are notoriously slow in most browsers, especially if they don’t know the number of rows up front. If there was some way the data entry people could use some other interface — perhaps a customised version of the Categories Tab, or a plugin that allows them to insert cats without displaying the tree each time — it might help.

Last edited by Bloke (2009-07-28 16:27:56)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern.

Txp Builders – finely-crafted code, design and Txp

Online

#6 2009-07-28 16:45:41

ruud
Developer emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 4,513
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

johnstephens wrote:

My client wants the site to group our articles by the publication they refer to. He hired two people to enter thousands of publications into the the system, and together they completed about on fourth of the list— so he intends to add more.

I think that’s the wrong approach. You’ll get a category list that is way too long. If they’ve added only a quarter of the articles and are already at 4000 categories, I’m guessing they’ll end up with 16000 categories. Imagine having to scroll through that in a select box.

Using a custom field would work a lot better or perhaps the publication date contains all the information needed to group articles according to publication.

> Or a timeout on the PHP process. Check the webserver error logs!

I checked my error.log, and the last error is over ten days ago— before this problem appeared.

How long does it take to create a new category? Compare that time to the ‘max_execution_time’ setting in php.ini.

> If the tree were updated on changes instead of being completely rebuilt, the number of required queries would depend on the number of changes (new/deleted/changed categories) instead of on total number of remaining categories.

Is that something I can toggle in Textpattern?

No. TXP doesn’t have the necessary functions for supporting incremental tree updates. It could, but I don’t think that is the right solution for this problem. Consider this, if an incremental update goes wrong, the tree is corrupted. And the only way to correct that is… by doing a full tree rebuild.

bloke wrote:

Sort of. I think instead of calling rebuild_tree_full(‘article’) you can call rebuild_tree(‘root’, leftmost_index, ‘article’) where leftmost_index is some number that corresponds to the ‘lft’ of a node. Most times you use 1 here, though whether that actually improves performance I couldn’t say.

Won’t work. You need to start at the root category to make the rebuilt process work properly, otherwise the ‘rgt’ value for ‘root’ would be incorrect.

Offline

#7 2009-07-28 20:11:01

johnstephens
Plugin Author
From: Harrisonburg, VA
Registered: 2008-06-01
Posts: 834
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

ruud:

I think that’s the wrong approach. You’ll get a category list that is way too long. If they’ve added only a quarter of the articles and are already at 4000 categories, I’m guessing they’ll end up with 16000 categories. Imagine having to scroll through that in a select box.

Using a custom field would work a lot better or perhaps the publication date contains all the information needed to group articles according to publication.

Using a custom field might improve performance and make the code more elegant in several places. I’m almost ready to embrace that solution and re-factor my other decisions around it.

The most pressing question is how to get the Title Case data from each category into the associated articles. This query comes to mind:

UPDATE textpattern SET custom_17=category1;

But that gives me the sanitized-for-url data.

My MySQL skills aren’t quite at the required level for getting the Category Titles into my newly created custom field.

Thanks again for your input. The idea of using a custom field for this really broke the problem out of my assumed constraints.

Offline

#8 2009-07-28 20:50:33

ruud
Developer emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 4,513
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

Perhaps something like this (not tested):

UPDATE texpattern, txp_category
  SET textpattern.custom_17=txp_category.title 
  WHERE textpattern.Category1=txp_category.name;

Last edited by ruud (2009-07-28 21:00:21)

Offline

#9 2009-07-28 21:35:21

johnstephens
Plugin Author
From: Harrisonburg, VA
Registered: 2008-06-01
Posts: 834
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

UPDATE textpattern, txp_category SET textpattern.custom_17=txp_category.title WHERE textpattern.Category1=txp_category.name;

That query seems to have done the trick! Thanks!

Now I have to replace all of my <txp:category-based code to use the custom field instead.

What’s the safest way to purge the article categories, both in the textpattern table and the txp_category?

Incidentally, I’m also curious what the textpattern_category table is for— throughout this whole scenario, it only has five entries, and they don’t seem to make any sense.

Last edited by johnstephens (2009-07-28 21:36:43)

Offline

#10 2009-07-29 14:04:20

ruud
Developer emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 4,513
Website

Re: [resolved] rebuild_tree_full('article') chokes after 4,200 article categories.

johnstephens wrote:

What’s the safest way to purge the article categories, both in the textpattern table

UPDATE textpattern SET Category1='', Category2='';

and the txp_category?

Not tested, but probably something like this:

DELETE FROM txp_category WHERE type='article' AND name!='root';

And then you’d have to add and delete a category in the category tab to rebuild the tree correctly or do something like this:

UPDATE txp_category SET lft=1, rgt=2 WHERE name='root' AND type='article'

Incidentally, I’m also curious what the textpattern_category table is for— throughout this whole scenario, it only has five entries, and they don’t seem to make any sense.

That’s a table created by the rss_unlimited_categories plugin.

Offline

Board footer

Powered by FluxBB