Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2019-10-03 12:27:35

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,220
GitHub

Request for comment: (re)submit article to Wayback Machine after Save

I’ve gone back and forth over this for years, and since I am not a coder it’s been languishing in my “To Do One Day When I Can Figure It Out” work box. I’ve wiped some dust off that folder to present the following for discussion and gauge interest. Life’s too short to sit on this any longer.

Heavy caveat: I really don’t make good use of Textpattern plugins, I’ve been burned a few times by plugins being abandoned from former Textpattern users and so I tread extremely carefully when even talking about plugins.

Scenario: a public website running on Textpattern, articles are fully public, no restrictions on access (i.e. no password protection, articles are discoverable via links, article lists link to articles, syndication works as intended, nothing preventing good bots accessing the site etc). At the moment, if an article is published on this site, the Internet Archive Wayback Machine may eventually find it, index it, and in a given time it will become available for viewing in their mirror. Articles that are around for a while will get found, and time really helps — generally the more time a URL is available, the better the chance of it being indexed. Please note: I’m excluding very busy sites with this broad strokes generalisation as they are indexed far more often than lesser-known sites. There are ways to manually submit a link to the Wayback Machine, should it a) not exist in their database, or b) it needs updating from a later revision.

Example: I will pick an arbitrary forum discussion…let’s use forum.textpattern.com/viewtopic.php?id=50378 for this. Right now, this URL is not in the Wayback Machine index, according to that URL’s status page:

https://web.archive.org/web/*/https://forum.textpattern.com/viewtopic.php?id=50378

That link tells me I can use a special URL to submit that link to the indexer:

https://web.archive.org/save/https://forum.textpattern.com/viewtopic.php?id=50378

…so, https://web.archive.org/save/ with an apparently unescaped URL appended. I can curl that (trimmed for brevity and clarity) special URL, and that adds the linked forum article to the Wayback Machine database. That was easy. The full curl output is linked here in a safe-for-work gist, but the notable part is right at the bottom:

<!--
     FILE ARCHIVED ON 12:07:24 Oct 3, 2019 AND RETRIEVED FROM THE
     INTERNET ARCHIVE ON 12:07:25 Oct 3, 2019.
     JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.

     ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
     SECTION 108(a)(3)).
-->

I literally just added a page to the archive. Neat, huh? At the time of writing, it hasn’t immediately updated the original index page, but that may just take more time. For Textpattern users with immutable content, it’s a simple way to have a backup taken at a given point in time, and for authors who have content that’s fluid (or: immutable content until it’s actually not immutable any longer, thank you very much), there are timestamped versions available.

So, taking this a step further: if the Save button can be armed to notify the Wayback Machine of an article publication or update, there’s effectively a backup copy ready in the wings. If an article gets popular and appears on whatever social networks drive lots of traffic these days, the site will get busy, may buckle under the number of connections (you see this a lot with Wordpress, by virtue of it being so widespread – a previously OK Wordpress site will spit out a Error establishing a database connection message when the database flips out), but at least another copy exists on better infrastructure.

If this functionality were to materialise, whether in a plugin or core, it would be very useful. A global on / off setting and individual article on / off override would be great. This seems to work for comments just fine.

Over to you. What do you think?

Offline

#2 2019-10-03 12:47:13

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,419
Website GitHub

Re: Request for comment: (re)submit article to Wayback Machine after Save

Publishing stuff to the archive is a neat idea. It’s a cinch to do from the admin side with a few-line plugin. Assumptions:

  • You only push it once, not automatically each time you save/update – or have it as some ‘opt-in’ additional save parameter like an Archive checkbox alongside the Save button.
  • You only push it when the Status is Live/Sticky.
  • The URL isn’t subsequently modified by hand.

Alternatively (or additionally as part of the same plugin) it would be just as easy to add a public hover button so that logged-in users could browse the site and choose to snapshot articles at will on a case-by-case basis. Think in terms of the way rss_article_edit (if you’ve ever used that) allows you to jump to the Write panel from the public site to edit the article you’re viewing if you spot a typo or something.

If we can nail down a suitable workflow, I can see this being useful and fairly easy to implement.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#3 2019-10-03 12:50:02

etc
Developer
Registered: 2010-11-11
Posts: 5,134
Website GitHub

Re: Request for comment: (re)submit article to Wayback Machine after Save

Hi Pete.

I had a plugin (etc_ping?) that was able to ping user-definable URLs on article post/save. It could work as you describe, but has no individual on/off switch.

Offline

#4 2019-10-03 13:06:24

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,220
GitHub

Re: Request for comment: (re)submit article to Wayback Machine after Save

Bloke wrote #319514:

Assumptions:

  • You only push it once, not automatically each time you save/update – or have it as some ‘opt-in’ additional save parameter like an Archive checkbox alongside the Save button.

Assuming the Archive checkbox is checked to submit the initial publication on Save, would it be unchecked automatically after this first ping? An external link (with adornment saying as much) to the Wayback Machine page for a given article might be useful, though there’s latency with the initial submission so that might cause confusion if it’s not explained clearly.

  • You only push it when the Status is Live/Sticky.

Yes, +1.

  • The URL isn’t subsequently modified by hand.

Caveat emptor, I think. The article is submitted with its URL at that point in time, so any subsequent change to it is on the author. If the author factors in redirects from the old article to the new one, the WB will follow them and it may sort itself out.

Alternatively (or additionally as part of the same plugin) it would be just as easy to add a public hover button so that logged-in users could browse the site and choose to snapshot articles at will on a case-by-case basis. Think in terms of the way rss_article_edit (if you’ve ever used that) allows you to jump to the Write panel from the public site to edit the article you’re viewing if you spot a typo or something.

Yes, I can think of a few use cases where this might be useful. Batch submit from the articles list, also — perhaps from the dropdown menu. Should consider a rate limit with that, if so.

If we can nail down a suitable workflow, I can see this being useful and fairly easy to implement.

Excellent, thank you very much!

Offline

#5 2019-10-03 13:08:21

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,419
Website GitHub

Re: Request for comment: (re)submit article to Wayback Machine after Save

I think a general-purpose ‘ping the world’ plugin would have benefit. Think about this:

  1. A pref (textarea or series of configurable text boxes) that allows you to specify URL endpoints – one per line – in which you could inject replacement content from the article, like {URL}or {Title}.
  2. When the Save button is hit (or ping box is checked or whatever triggers the plugin) and status is first set Live, the plugin goes through the pref line by line, injecting the relevant content for each one and firing off a curl() request, collecting the response codes and reporting them.
  3. Pre-configured examples for the Wayback machine and any others we might like to bundle up that could be enabled and tweaked.

As well as pinging the Wayback machine, people could use it for workflows like the way postmaster can add stuff to MailChimp via its API. Or dispatch a post title/link to social media via their API. If all the Auth stuff (access token/API key) can be sent in the request, that might be possible. If the endpoints were somehow linked to status, and someone set up a listening service/API, they could send out notifications to editors that Draft content is waiting approval with a link to the article and its title.

Thinking out loud really. But ‘notify-on-publish’ seems quite a useful facility for a host of applications.

Last edited by Bloke (2019-10-03 13:11:20)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#6 2019-10-03 13:08:40

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,220
GitHub

Re: Request for comment: (re)submit article to Wayback Machine after Save

etc wrote #319515:

I had a plugin (etc_ping?) that was able to ping user-definable URLs on article post/save. It could work as you describe, but has no individual on/off switch.

I’ll have a look, thank you! I’ve literally avoided plugins from anyone not on the core dev team, which I know makes me a bit of a strange use case, but I don’t have the skills to create my own implementations of things I want / need.

Offline

#7 2019-10-03 13:12:32

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,419
Website GitHub

Re: Request for comment: (re)submit article to Wayback Machine after Save

gaekwad wrote #319519:

I’ve literally avoided plugins from anyone not on the core dev team

Hehe, there is only one plugin you need, ever: etc_query. It can do everything!


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#8 2019-10-03 13:12:44

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,220
GitHub

Re: Request for comment: (re)submit article to Wayback Machine after Save

Bloke wrote #319518:

I think a general-purpose ‘ping the world’ plugin would have benefit. Think about this:

  1. A pref (textarea or series of configurable text boxes) that allows you to specify URL endpoints – one per line – in which you could inject replacement content from the article, like {URL}or {Title}.
  2. When the Save button is hit (or ping box is checked or whatever triggers the plugin) and status is first set Live, the plugin goes through the pref line by line, injecting the relevant content for each one and firing off a curl() request, collecting the response codes and reporting them.

As well as pinging the Wayback machine, people could use it for workflows like the way postmaster can add stuff to MailChimp via its API. Or dispatch a post title/link to social media via their API. If all the Auth stuff (access token/API key) can be sent in the request, that might be possible. If the endpoints were somehow linked to status, and someone set up a listening service/API, they could send out notifications to editors that Draft content is waiting approval with a link to the article and its title.

I really like this. Could anything be cribbed from the spam blocklist functionality? Whether or not there’s scope for a webhook to facilitate two-way comms…dunno, possibly too far a reach right now.

Offline

Board footer

Powered by FluxBB