Docs site - changes not propagating; search broken

gaekwad · 2025-08-12 20:04:21

Edit: issues resolved.

docs.textpattern.com

Our docs site is a little sick. The long-standing Jekyll instance that’s worked ‘reasonably OK’ for some years – minus the search feature on Safari, and a few other things – has been caught in the crossfire of an update and has ceased to work ‘reasonably OK’.

Changes made to the docs repo are not being carried across to the site, and search is broken. Neither of these things are a) as expected; b) desirable; or c) trivial to fix.

There is a medium term plan to shift our docs to a hosted service (see github.com/textpattern/textpattern.github.io/issues/212 and github.com/textpattern/textpattern.github.io/issues/218 for background reading), and when the current Jekyll site returns to working ‘reasonably OK’ or better, that will become top priority.

Thank you for your patience while I ~~loudly curse and flail wildly at~~ resolve the Jekyll stuff. I will post here when things are resolved.

Last edited by gaekwad (2025-08-13 20:21:01)

bici · 2025-08-13 02:44:47

is this not an issue of concern?
GitHub to merge in Microsoft AI
Read more at: IndiaTimes

gaekwad · 2025-08-13 05:20:15

bici wrote #340137:

is this not an issue of concern?
GitHub to merge in Microsoft AI

No. At least not at the present time.

Textpattern uses Git to maintain our various piles of code. Git replaced Subversion, which was a common theme for other open source collaborative projects.

Our Git platform is GitHub. Our Subversion platform was Google. Our Git setup is largely portable in the sense that we can switch providers if the need arises.

To the best of my knowledge, there is no AI / LLM-generated code in our code base aside from some translated international language strings (aka Textpacks).

To the best of my knowledge, there are no plans to introduce any AI / LLM-generated code into the code base.

I can only speak for myself when I say I actively avoid using any AI / LLM platforms in all aspects of life. This is my own decision for ideological reasons. Anyone else who wishes to use AI / LLMs – that’s outside my remit.

In an ideal world, we’d have a fleet of power-efficient rack servers in datacenters dotted across continents with self-hosted source control systems, mirrors, redundant backups, load balancers, CDNs, automated build systems etc, all provided by sustainable means.

Thanks to the DigitalOcean open source program, we have a generous annual grant that we can put towards servers in datacenters in three continents, with backups and various useful things like demo sites. In a typical year, we use all our server credit and we top-up some $ to keep the lights on. It’s not a huge amount of top-up $ but it all needs to be carefully considered given the project finances.

Our docs were previously living on a MediaWiki instance on a donated Solaris (?) server at Joyent. Then we switched to GitHub Pages, which is Jekyll. Then we had some limitations with GitHub Pages Jekyll and moved to self hosted Jekyll, which seemed to be largely OK. Then it started misbehaving.

I want to move docs to a third party service that can make pretty, searchable docs pages. See the two issues in the opening post for background. When docs are working, I can switch off the Jekyll docs server and recoup ~$100 a year in server credit to assign elsewhere.

When we are at the stage where we can automate formatted content into Textpattern, we can have Textpattern docs powered by Textpattern. Until then we’re leaning on other providers, with our open source project cap in hand.

I will fix docs search when I figure out what’s broken in Ruby / its Gem system / its version management system / Jekyll, and I suspect the repo changes not being propagated is somehow related.

Then I’ll get docs onto a better platform. Then old server seppuku can happen, and we’ll be a step closer to a better setup.

bici · 2025-08-13 06:00:33

thank-you for a very detailed explanation.
And thanks for all the work you do. grateful.

jakob · 2025-08-13 06:44:13

bici wrote #340139:

thank-you for a very detailed explanation.
And thanks for all the work you do. grateful.

+1 on that!

gaekwad wrote #340136:

while I ~~loudly curse and flail wildly at~~ resolve …

And I thought that was just me. Phew!

gaekwad · 2025-08-13 08:17:50

jakob wrote #340140:

And I thought that was just me. Phew!

Not just you!

Bloke · 2025-08-13 10:02:08

gaekwad wrote #340138:

When we are at the stage where we can automate formatted content into Textpattern…

So what do we need, specifically to fulfil this? How much more work do we need on the importer to make it happen? I want to design out as many pain points as possible for you, because you do so much for us keeping the ecosystem running.

I guess in moving to Txp docs. we also lose community GitHub updates, unless we can find a way to sync any repo changes to Txp with some script/hook glue.

I’m more than happy to invest some time doing this if we can nail down exactly how we want it to work.

gaekwad · 2025-08-13 10:15:24

Bloke wrote #340143:

So what do we need, specifically to fulfil this? How much more work do we need on the importer to make it happen? I want to design out as many pain points as possible for you, because you do so much for us keeping the ecosystem running.

Broad strokes shopping list (with pre-emptive apologies for any terseness, I’m low on sleep – please assume best intentions):

A route in for pre-formatted content that can imported into Textpattern on a site install, schema tbd. The automated site build process would grab files from a repo [branch / subdirectory] and drop this content into the Textpattern ‘inbox’, then Textpattern builds articles, sections, tags, overall structure.
A way for repo content to be translated into this format, ideally automatically with CI / CD pipeline on commits & PR merges. I suspect PanDoc might be our pal for this.

I guess in moving to Txp docs. we also lose community GitHub updates, unless we can find a way to sync any repo changes to Txp with some script/hook glue.

Not necessarily. We can steward changes in whatever Git platform we’re using (Github until that explodes), and those changes get imported into the docs site Textpattern instance. That site – like the themes and plugins sites – would be rebuilt on the regular.

Or…

The docs site can be more permanent, and a shell script to essentially splat all articles (and corresponding scaffold), reset the IDs, and reinstall from the ‘inbox’ material. That separates content from presentation, and is less sledgehammer-esque overall.

I’m more than happy to invest some time doing this if we can nail down exactly how we want it to work.

Thank you. I am confident having a CMS with an input hopper would be a selling point for the static site generator crowd looking for something a bit less complex.

Last edited by gaekwad (2025-08-13 21:28:56)

Bloke · 2025-08-13 11:36:59

If it’s just transmogrifying input to Txp, that’s easy™.

Comparatively.

I’ve imported WP to Txp with relative ease recently. But that was a database, so it’s more a case of field mapping than anything else.

With the docs as they are, we have two main issues:

It’s in Jekyll format which is a bastardised evil cousin of Markdown, and has a half-assed system of includes and such-like. It needs converting, because ..
… long-term, Jekyll is an awful markup language and I don’t think we want to keep docs in that format.

The pragmatist in me thinks that simply iterating over some sitemap and fetching each current live page as html, and running it through Pandoc to generate Textile would be the smart move. But that loses the notion of “common attributes” and the various other includes, essentially making everything “flat”. So we’d need to unpick that and rebuild the common elements by hand. Which, for 140-odd tag doc pages, is a bit of a ballache. And all this is a one-off development effort just to get the content into a format we can work with.

Assuming we can do that and Pandoc makes some decent conversion effort, the second leg is to create a new repo made from the Txp content in a flat-file layout so the (Textiled) content can be edited in git for version control.

And the final layer is some importer that then needs to suck that into Txp either on a schedule, or as a commit hook, and populate its database.

Our importer currently works on XML as its input, which is stupendously flexible but comes with the slight irritation that it’s not very human friendly (when editing in the repo), it’s super finicky and the tiniest foul in a commit renders a file useless, and whatever we use to spit out the flat file format needs to embed the Textiled content into a strict XML template format that we can import into Txp.

With regards import, splat and repave is certainly simplest, for the sake of a few moments of downtime that we can mitigate with a Please Standby message in the templates during the rebuild. But that requires a schedule rather than a commit hook approach, as we can’t very well rebuild the entire database every time someone makes a commit. Something to consider at least.

Merging is harder as it needs a known “key” field (which we don’t have at the moment in Jekyll) so it knows whether the incoming data is update or insert. So if someone creates a new file in the repo – a new doc page in a given section – we need to consider how we’re going to inject the new ID back into the repo after Txp assigns the ID to the inserted record. That’s non-trivial.

A complete content rebuild also doesn’t allow for onter-page links to be made easily. As the IDs are going to change each time the site content is rebuilt.

If we are clear and confine all content – even includes – to the Content area (e.g. includes are built from article snippets in a hidden section) then we don’t have the headache of template / Page management in the docs repo. That can be a separate repo (or branch?!) which we can use to rebuild the templates every so often.

Okay, I’ve upgraded my estimation. It’s not easy™. But it’s certainly not impossible.

Last edited by Bloke (2025-08-13 11:44:19)

Bloke · 2025-08-13 11:47:14

P.S. at the back of my mind is also to think about ways to keep this as generic as possible so the import mechanism side of things can be reused for other purposes (the static site example you cite is a good use case). That requires thinking about assets like images and files which we don’t use (much?) in the docs.

Last edited by Bloke (2025-08-13 11:47:56)

gaekwad · 2025-08-13 11:47:30

Bloke wrote #340145:

With the docs as they are, we have two main issues:

It’s in Jekyll format which is a bastardised evil cousin of Markdown, and has a half-assed system of includes and such-like. It needs converting, because ..

… long-term, Jekyll is an awful markup language and I don’t think we want to keep docs in that format.

+1 on both.

But that loses the notion of “common attributes” and the various other includes, essentially making everything “flat”. So we’d need to unpick that and rebuild the common elements by hand. Which, for 140-odd tag doc pages, is a bit of a ballache. And all this is a one-off development effort just to get the content into a format we can work with.

I volunteer as tribute for the ball ache. Not a sentence I was expecting today…

Easily done as a one-off process, as you say, and worth the time & energy. Besides, that’s what music and caffeine’s for.

Assuming we can do that and Pandoc makes some decent conversion effort, the second leg is to create a new repo made from the Txp content in a flat-file layout so the (Textiled) content can be edited in git for version control.

+1. Or even a subdirectory for the existing docs repo, it’s trivial to lift & shift a dir during the site build preflight, plus we’d have the commit history for the old-old docs should we need it.

And the final layer is some importer that then needs to suck that into Txp either on a schedule or as a commit hook and populate its database. Our importer currently works on XML as it’s input, which is stupendously flexible by comes with the slight irritation that it’s not very human friendly (when editing in the repo), it’s super finicky and the tiniest foul in a commit renders a file useless, and whatever we use to spit out the flat file format needs to embed the Textiled content into a strict XML template format that we can import into Txp.

This could be learning-by-doing territory. I don’t know if PanDoc does XSLT in any sensible fashion, but it feels like there’s a route in by some means.

With regards import, splat and repave is certainly simplest, for the sake of a few moments of downtime that we can mitigate with a Please Standby message in the templates during the rebuild. But that requires a schedule rather than a commit hook approach, as we can’t very well rebuild the entire database every time someone makes a commit.

Plenty of headroom on a server for a every-15-mins rebuild, the whole process is already documented and it’s working well elsewhere. We’re not overwhelmed with docs commits, so we could even make it hourly.

Merging is harder as it needs a known “key” field (which we don’t have at the moment in Jekyll) so it knows whether the incoming data is update or insert.

This is presumably avoided by nuke-and-repave? If so, let’s nuke-and-repave.

If we are clear and confine all content – even includes – to the Content area (e.g. includes are built from articles in a hidden section) then we don’t have the headache of template management. That can be a separate repo which we can use to rebuild the templates every so often.

See above re: subdirectory rather than own repo (purely for less to maintain) – viable?

Okay, I’ve upgraded my estimation. It’s not easy™. But it’s certainly not impossible.

Thanks, Bloke. I really appreciate this.

Bloke · 2025-08-13 11:52:08

gaekwad wrote #340147:

I volunteer as tribute for the ball ache.

Saint.

Or even a subdirectory for the existing docs repo

That could work.

This could be learning-by-doing territory. I don’t know if PanDoc does XSLT in any sensible fashion, but it feels like there’s a route in by some means.

Yes. Last time I used it I was impressed with its results.

[ID issue] is presumably avoided by nuke-and-repave? If so, let’s nuke-and-repave.

Yes, it’s a non-issue if we rebuild every time.

Textpattern CMS

Textpattern CMS support forum

#1 2025-08-12 20:04:21