Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2020-01-17 10:19:12

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,137
GitHub

Sites outage, January 17th 2020

Overview
Textpattern sites were offline from 0630 to 1015 UTC today. They are now back online.

I am investigating when they went down, and why. This post thread will be updated as I find out more.

Last edited by gaekwad (2020-01-17 10:40:22)

Offline

#2 2020-01-17 10:46:53

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,137
GitHub

Re: Sites outage, January 17th 2020

On checking the PHP 7.3 error log, it was a zero-byte file. The php.ini file that is reconfigured to use a custom error log was set incorrectly, and as a result logging was turned off. This has been rectified.

On some other servers I maintain, I have implemented a scheduled Nginx + PHP service restart if something is not working correctly (i.e if you’re seeing a 502 error, bounce the services and alert me that it’s down). This works very well elsewhere, and before our recent outages I was intending to implement this in spring as part of the 2020 server build out.

Given the two most recent outages in a week, I’ll bring this forward and start work on it tonight after work, so we should have much more automated resiliency.

Edits: words and numbers.

Last edited by gaekwad (2020-01-17 18:24:54)

Offline

#3 2020-01-17 11:36:27

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 9,011
Website GitHub Mastodon Twitter

Re: Sites outage, January 17th 2020

Hi pete,

I did experience the outage this morning. I was meaning to write but I had one meeting after the next. I’m glad that you are on top of it, and thankful to all te work you are doing.


Yiannis
——————————
NeMe | hblack.art | EMAP | A Sea change | Toolkit of Care
I do my best editing after I click on the submit button.

Offline

#4 2020-01-17 12:35:17

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,595
Website

Re: Sites outage, January 17th 2020

gaekwad wrote #321167:

On some other servers I maintain, I have implemented a scheduled Nginx + PHP service restart if something is not working correctly (i.e if you’re seeing the 503 error, bounce the services and alert me that it’s down).

Just FYI, the error I was getting was 502: Bad Gateway.

colak wrote #321172:

I’m glad that you are on top of it, and thankful to all the work you are doing.

I second that 👍


TXP Builders – finely-crafted code, design and txp

Offline

#5 2020-01-17 18:23:50

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,137
GitHub

Re: Sites outage, January 17th 2020

jakob wrote #321176:

Just FYI, the error I was getting was 502: Bad Gateway.

I meant 502, typo on my part – I was rushing to get it all back online.

Anything in the 50* region is usually PHP misbehaving with Nginx, or vice versa. Now that PHP logging is working properly (ahem) I should have more idea of what’s happening, so I can diagnose properly next time.

Offline

#6 2020-01-17 20:45:02

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,137
GitHub

Re: Sites outage, January 17th 2020

OK, I’ve installed the restart-if-down scripts. This should reduce unexpected downtime to almost nothing.

For anyone interested: there’s a cron task that runs every 5 minutes, it uses curl to check the headers of a loopback-only website for a status report. If the site is up (200 OK), it does nothing. If it’s anything else, it gracefully restarts the web server and PHP processes.

Offline

#7 2020-01-17 21:29:29

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,270
Website GitHub

Re: Sites outage, January 17th 2020

Clever.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#8 2020-01-17 21:45:34

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,137
GitHub

Re: Sites outage, January 17th 2020

Bloke wrote #321202:

Clever.

It’s one of a few things I thought up that I’m really quite pleased with. Took me a while to iron out the kinks, and I’m sure I can make it more efficient, but it seems to work.

Offline

#9 2020-01-17 21:48:00

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,137
GitHub

Re: Sites outage, January 17th 2020

Aside: my username at gmail.com is open for reports if things are broken, it’s a different email address to my proper forum email, but it’s easier to remember. Assume I don’t know about broken things, and a notification/alert email is welcome.

Offline

#10 2020-01-21 17:01:08

gaekwad
Server grease monkey
From: People's Republic of Cornwall
Registered: 2005-11-19
Posts: 4,137
GitHub

Re: Sites outage, January 17th 2020

I’ve boosted the resources allocated to PHP on our sites, which should further reduce the possibility of outages.

Please let me know if things are exploding in flames and screaming not working as you expect.

Offline

Board footer

Powered by FluxBB