Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2025-04-09 15:01:54

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,714
Website GitHub

Building huge category trees

Say I have a database of a few hundred (potentially a few thousand) nested items. I don’t want to make all these by hand so I use a PHP script like this:

$rows = file('path/to/file.txt');
... split into fields blah blah ...
...
foreach ($row as $bits) {
      $brand_safe = doSlash(strtolower(sanitizeForUrl($bits['brand'])));
      $family_safe = doSlash(strtolower(sanitizeForUrl($bits['family'])));
      safe_upsert('txp_category', "type='article', title = '" . doSlash($bits['brand']) . "'", "name = '$brand_safe'");
      safe_upsert('txp_category', "type='article', title = '" . doSlash($bits['family']) . "', parent = '$brand_safe'", "name = '$family_safe'");
      ...
}
rebuild_tree_full('article');

Looks good. Inserts all rows. But none of the parents are assigned: everything is a flat tree structure with ‘root’ as parent.

If I shove the rebuild_tree_full('article'); inside the loop, I can see(via phpMyAdmin) that the table is being reindexed and the parents are gradually being applied, but the script runs out of memory or execution time.

Is there a way to rebuild only a partial part of the tree, without knowing the $lft? I could probably use rebuild_tree($brand_safe, LEFT_VALUE_HERE, 'article'); if I knew the correct value of the $lft pointer. I tried to grab it with a second query but it’s not known (it’s 0) until the root node is rebuilt, which doesn’t buy me anything apart from more wasted memory.

My only possible course of action I can think of is to run it in multiple sweeps. The first creates all the primary first-tier nodes dangling off root, then I rerun the script, passing a paremeter ‘2’ (or something) which triggers it to assume all the tier 1 nodes have been populated and then upserts the next branch layer, and so forth. Bit of a pain.

Any bright ideas on how to bulk insert in a better way? Maybe use the script to build a single query and execute it in one command at the end?


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#2 2025-04-09 15:19:59

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,714
Website GitHub

Re: Building huge category trees

Bloke wrote #339445:

My only possible course of action I can think of is to run it in multiple sweeps. The first creates all the primary first-tier nodes dangling off root, then I rerun the script, passing a paremeter ‘2’ (or something) which triggers it to assume all the tier 1 nodes have been populated and then upserts the next branch layer, and so forth. Bit of a pain.

Nope, that doesn’t work. Still runs out of memory during the 2nd sweep :(


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#3 2025-04-09 16:28:18

Dragondz
Moderator
From: Algérie
Registered: 2005-06-12
Posts: 1,546
Website GitHub Twitter

Re: Building huge category trees

Hi Stef,

You can use my plugin dzd_multicat_creator

https://github.com/dragondz/dzdplugin

You can construct an excel sheet with categories with their parents and copy/past to the plugin textarea.

May be that can help you.

Cheers.

Offline

#4 2025-04-09 16:41:50

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,714
Website GitHub

Re: Building huge category trees

Nice idea, thanks. If I can’t get anywhere with code, I might try that.

The main issue is that this is someone else’s spreadsheet and they have a single column in the middle of a HUGE product spreadsheet, and this column contains all the mappings, delimited in a very specific format.

So far, I’ve taken that field and put it in a file and used PHP to split it up, since it’s already in a nested tree (sort of) format. I guess I could iterate over the file and spit out an HTML table by duplicating the parent data in each row, then copy and paste that into Excel and then drag that into your plugin.

But I’m trying to make it easy for the client to keep his product lines up-to-date so we can export them using page templates into formats for many different ecommerce systems. He has thousands of products in various Excel sheets at the moment, so I’m looking for the best way to just upload the files and have Txp do the heavy lifting of inserting stuff into the database and making links between content types.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#5 2025-04-09 20:26:48

etc
Developer
Registered: 2010-11-11
Posts: 5,360
Website GitHub

Re: Building huge category trees

Bloke wrote #339445:

Is there a way to rebuild only a partial part of the tree, without knowing the $lft?

IIRC, insert_nodes(null, compact('name', 'title', 'parent')) (in place of safe_upsert) tries to avoid rebuilding the full tree when adding a category. I’m not sure it checks whether the category already exists, but you probably can keep trace of already created parents?

Offline

#6 2025-04-09 21:14:15

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,714
Website GitHub

Re: Building huge category trees

etc wrote #339449:

IIRC, insert_nodes(null, compact('name', 'title', 'parent')) tries to avoid rebuilding the full tree when adding a category.

Sneaky! When did that land and how come this is the first I’ve heard of it? ;)

That could be just the ticket if I do a cheeky lookup beforehand to grab the id of any previous category name, to avoid duplicates.

In the meantime, I have actually managed to get the current list into the DB (just) under the memory limit by keeping track of what has been seen before in an array, and silently skipping it if so. But I think I’ll rewrite the importer to use insert_nodes() because it’s way more efficient than rebuilding the entire tree.

Thank you.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#7 2025-04-09 21:32:36

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,714
Website GitHub

Re: Building huge category trees

Bloke wrote #339450:

That could be just the ticket

*blinks* Well that took processing down from nearly a minute to 1.1 seconds for upserting over 200 entries. That is indeed a winner!


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#8 2025-04-11 07:49:40

etc
Developer
Registered: 2010-11-11
Posts: 5,360
Website GitHub

Re: Building huge category trees

Bloke wrote #339450:

Sneaky! When did that land and how come this is the first I’ve heard of it? ;)

  1. Git says 3 years ago
  2. It was in August :-)

The optimisations weren’t meant for batch category creation, so there is a (small) room for improvements, but it’s not a 4.9 stopper.

Offline

#9 2025-04-11 11:15:38

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,714
Website GitHub

Re: Building huge category trees

Blimey, three years. It’s brilliant.

Not sure if this related but I’m still getting a weird thing where if I add or delete cats from the Category panel, the announce message appears but nothing changes in the list above. I need to refresh to see the results of my adds/deletes.

Is it repeatable elsewhere or just me?


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#10 2025-04-11 12:39:07

etc
Developer
Registered: 2010-11-11
Posts: 5,360
Website GitHub

Re: Building huge category trees

Bloke wrote #339472:

Blimey, three years. It’s brilliant.

That’s where it all started.

Is it repeatable elsewhere or just me?

Seems to behave well for me. Any clue in js console?

Offline

#11 2025-04-11 13:33:26

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,714
Website GitHub

Re: Building huge category trees

etc wrote #339473:

Seems to behave well for me. Any clue in js console?

I’ll check later when I get a moment.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#12 2025-04-11 16:30:38

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,714
Website GitHub

Re: Building huge category trees

etc wrote #339473:

Seems to behave well for me. Any clue in js console?

Nothing. Network panel and Console all show usual stuff. No errors, no warnings.

It’s bizarre. The announce confirms everything’s fine but the list doesn’t change. If I refresh the page, the new category shows up. Likewise, if I delete it. The category remains on the screen even though the announce shows success of deleting the item. If I refresh, it’s gone from the list.

EDIT: This is the latest Firefox on MacOS if that has any bearing.

Last edited by Bloke (2025-04-11 16:32:53)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

Board footer

Powered by FluxBB