Sorry for the downtime! Unfortunately our secondary firewall took over for some reason, and haproxy failed to properly come up.
I’ll be scheduling a maintenance window in the next few days to do some further digging, so I can make sure this is fully resolved.
As Canadian as it gets :) Apologizing for the free service you offer.
Don’t sweat it, everyone here appreciates the effort put in to run this, even if they don’t reach out and express it.
I abhor the fact that status.lemmy.ca says anything other than 100% uptime.
During your next maintenance put in a second, secret uptime counter and once it hits a reasonable amount of time swap them, nobody will notice!
Uptime counter failover, indistinguishable from a clock.
The high work ethic is plain to see. Nice debug and recovery.
Even hyperscaler cloud service providers don’t aim for 100%, don’t sweat it
I had the shakes, deetees, drools and bends for a while.
This is the only instance I’m not banned on.
.
.
.
./s
Oh hold on, let me go fix that…
Wearing my Ops hat (what I do for a living can’t help it) have a few questions I’m more than happy to assist with depending on the answers.
- What TZ are you in?
- Are you open to volunteer support staff in other TZ?
- What monitoring solutions are you using? Nagios, Zabbix, etc.
- PagerDuty has a free tier do you have alerting and escalation setup?
- Do you have run books on the associated issues that come up?
- Could access be limited for a volunteer support staff?
I woke up just before 7am EDT and noticed the status page makes reference to PDT so I can only assume you’re based out of BC. Given Canada has so many TZ it may be worthwhile investigating a more mature support model and give you some breathing room?
- Myself, Otter and the server are all in Vancouver. Smorks is out east and mp3 is in the middle of all of us. Between us we actually have pretty good coverage most days.
- Yes, if someone has professional SRE experience and wants to help out I’m open to it. There’s very little on-going maintenance as things just run smoothly 99.999% of the time, but some additional eyes can’t hurt and if someone wants to build new stuff there’s things I’d like to do =)
- Betterstack at the moment, combined with some custom healthscript scripts which write to our discord and an improperly configured alert config on my phone that didn’t wake me up.
- Oh neat, I didn’t realize pagerduty had a free tier, I’ll check it out.
- Nope. There haven’t been issues that come up, execpt for this one stupid opnsense issue everything has been amazingly stable.
- To some degree yes.
Not admin team, but I think I remember them saying the servers are in Vancouver.
I’m personally not willing to be on call but I was up around the same time and willing to help if they need it. I’ve been a Linux Admin for 13+ years and have my own k8s cluster in the basement.
I also don’t mind the downtime. Thank you guys for everything you’re doing!
Thank you for keeping the lights on!
All good. Made me fall asleep easier last night as i didn’t doom scroll. Haha
wb
You need that SRE team you said you don’t have. :)
Glad to see you back, posted about it on !fediverse@lemmy.zip https://lemmy.dbzer0.com/post/39687739?scrollToComments=true
Love you. Mwaaaaah! 😘
more like
nshaproxy
deleted by creator