Global inventory for Txp content

Destry · 2015-07-17 15:24:15

Did you know the Txp project has an exciting rolling initiative to play with content? Well it does, and this psychedelic document is at the root of the fun…

Global content inventory: Textpattern

Apologies in advance for this long post, but it may provide useful info for those who want to help out.

The inventory was started several years ago to audit .com content and the FAQs. Phil Wareham is the doc owner. Then we all fell asleep, then woke up again. Now exciting things are in the wind. I’ve updated the auditing doc with a few more tabbed sheets because the effort has grown. We now want to asses content in the entire Textpatternsphere, because — content, baby! And professionalism. And marketing. And usability. And customer experience…

Anyone who would like access to help can give a shout to Phil or I via our forum profiles and provide a Google Drive-friendly address.We can’t make this open public edit for obvious reasons.

The overall content assessment process is essentially two phases: establish the inventory in the respective site tab of the sheets, then audit the content for each URL in the inventory.

INVENTORY PROCESS

Someone adds you as a contribute on the sheets.
You use Screaming Frog, or any other URL scraping tool you want, to generate a list of URLs (at least) as a CSV file.
Depending on what data the scraping tool collects for each URL, the data that would be useful at some point in the auditing process is:
- URL (mandatory)
- title
- meta description
- meta keywords

If nothing else, we want the URLs. The meta items would be useful for metadata assessment, of course, but not all scraping tools gather that for free. Any data you manage to collect for which there’s no columns in the main auditing sheet yet, we’ll create the columns, no big deal.

Once you have your CSV list (the inventory), you can port it to the central sheets. But here’s where you have to be careful. You won’t be able to cleanly auto-import the list to the existing sheet with the headers and columns and such as they are. So you could either…

Use Google sheet functionality to import your inventory to your own raw Google sheet to get the format the same and without headers, then manually copy paste single columns of data one at a time from your rough sheet to the main inventory sheet, careful not to overwrite any rows used as sheet headers.
Do the same thing as described, except import the data to the main sheets as a new sheet. Then copy the data columns from that sheet to the existing sheet with headers, then delete the imported sheet when done.
Send me the CSV file and I will get the data into the main inventory.

Once you do that, the “inventory” part (the easy part) is more or less done.

AUDITING PROCESS

Then comes the “auditing” part (the more tedious part), where we (the more hands the better) evaluate the content for each URL using the auditing columns in the sheet. Though, you should also add to the Topics column too, and especially if no keywords were pulled from the scraping step in #2 above, because topics help writers determine how to pull ideas for future blog articles, and so forth.

So, without going into excruciating detail you’re basically going to use the auditing columns (and perhaps some inventory columns too like “Topic”) to evaluate the integrity of the content at each URL. You’ll ask questions like these (Tables 1 and 2) and fill in row data accordingly.

Key columns of the master sheet will be Location (URL), Status, Relocate to (URL)*…, and *Notes. Note some columns, like Status offer a selection list for response. This column will be important for knowing how to address the content accordingly (and prioritizes the response effort). Be sure to use the Relocate to (URL) and Notes columns in relation, so that anyone fixing the broken content will have a better Idea of what the problems are, respectively.

KEEP IN MIND

Each tabbed sheet in the main Google worksheet file represents a different place where potentially useful content is located. Owners of those sites are expected to get involved with this if they honestly recognize their content has some ROT (Redundant, Outdated, Trivial) problems, and I guarantee your site does. ;)

No two sheets may be structured the same. Hopefully that’s obvious. A given sheet should have whatever inventory and audit columns needed that reflect the aims of that given website. Sheet owners(?) should add columns as needed. (Ask if you need help; I’m good with sheets.)

Likewise, the breadth of the inventory and the depth of the audit will just depend on the current goals for a given site. For example any website that:

will undergo an entire content overhaul, be redesigned, or be taken offline should be thoroughly inventoried (e.g., .com, FAQs, Docs (wiki)…). One obvious goal here is to help Phil establish a content model for the .com redesign.
could be re-located to another website location where it’s in better content with similar information (many possibilities here)
blog authors, mag article writer, or the newsletter editor could use to develop new and relevant content from the information.

That said, this global inventory can be valuable for a number of content endeavors for the project. Let’s get it on!

Keep it as simple as possible. Maximum value for the minimum amount of effort needed — but making no effort doesn’t get us far.

If you have questions, ask them here.

Last edited by Destry (2015-07-23 08:23:25)

Destry · 2015-07-17 15:43:08

Forgot to mention… Some entries may seem odd, like FAQ relocates to wiki pages. That’s because the original effort kicked off like three years ago at least. Things have changed. Just fix as we go.

With regard to docs, specifically, the current location will be wiki URLs, the new location would be wherever a Jekyll page ends up being. We need headway on that before it will really make sense. Right Phil?

hcgtv · 2015-07-17 16:03:37

Destry wrote #293410:

With regard to docs, specifically, the current location will be wiki URLs, the new location would be wherever a Jekyll page ends up being. We need headway on that before it will really make sense. Right Phil?

Destry, what was the reasoning behind leaving the Wiki concept behind?

Would moving to let’s say DokuWiki, which is flat file based – easier to keep synced to a repository, be an option?

Destry · 2015-07-17 18:30:23

hcgtv wrote #293411:

Destry, what was the reasoning behind leaving the Wiki concept behind?

MediaWiki bloat, mostly, then later a collective decision, more or less, to move it to Github in some respect. I will admit, after many years of “managing” .net and making decisions for it, I am relieved to be free of it (as platform manager). I agree with the decision that fewer docs of better quality are what’s needed at this point, and thus a wiki and mob contribution is not really necessary. I was recently informed of docs direction when I made an ill-timed suggestion to use a Github wiki (which could still work). .net will now be obsolete, seemingly, and that’s probably not a bad thing. Fewer domains would be better. Anyway, Phil is your man for further docs tech/platform questions, and that other thread is probably a better place.

Destry · 2015-07-23 08:24:51

/ crickets chirping /

Destry · 2015-07-23 08:25:48

The head post of this thread has been re-written for perspective and process. Hopefully that helps.

Textpattern CMS

Textpattern CMS support forum

#1 2015-07-17 15:24:15