Omega Owners Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

Welcome to OOF

Pages: 1 2 3 [All]   Go Down

Author Topic: OOF Outage  (Read 4224 times)

0 Members and 1 Guest are viewing this topic.

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
OOF Outage
« on: 08 October 2017, 12:35:38 »

Unfortunately we suffered a bizarre incident last night that caused everything here to fail...  ...including the alerting system, which meant it went undetected until early this morning.

Clearly we are back up and running, but we will have to go down again for an hour or so later to resync the databases - not quite sure when, as I still have a lot to do getting all the infrastructure up and running.


It looks like we had a near simultaneous failure of 2 of our 3 storage solutions, one being a hyperconverged, high availability one, and the other being a standalone SAN solution. Trying to work back through the logs to see if it genuinely was 3 simultaneous storage server failures (unlikely) or 2 simultaneous hypervisor failures (unlikely), or something common that I haven't yet found (likely).


Sorry, as always, for the outage, and the fact you've probably had to talk to wives/partners or go down the pub!
Logged
Grumpy old man

Bigron

  • Omega Baron
  • *****
  • Offline Offline
  • Gender: Male
  • Witham, Essex
  • Posts: 4808
    • Omega 2.6 V6 Auto '51 Reg
    • View Profile
Re: OOF Outage
« Reply #1 on: 08 October 2017, 12:56:13 »

I missed you, obviously, but considering how long the Forum has been online and how few failures there have been, you do a great job. Thanks.  :y 8)

Ron.
Logged

STEMO

  • Guest
Re: OOF Outage
« Reply #2 on: 08 October 2017, 13:17:25 »

You shouldn’t really be hyperconverging on a Sunday, Jaime, try sitting in the bath till you go all wrinkly, much more relaxing.  ;D
Logged

b4ndit

  • Omega Knight
  • *****
  • Offline Offline
  • Gender: Male
  • chester
  • Posts: 1827
  • Retired
    • VW Phaeton
    • View Profile
Re: OOF Outage
« Reply #3 on: 08 October 2017, 13:24:53 »

I missed you, obviously, but considering how long the Forum has been online and how few failures there have been, you do a great job. Thanks.  :y 8)

Ron.
I agree sterling job :y
Logged

Rods2

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Male
  • Sandhurst Berkshire
  • Posts: 7604
    • 1999 3.0 Elite Estate
    • View Profile
Re: OOF Outage
« Reply #4 on: 08 October 2017, 14:32:48 »

Well done for getting it sorted, :y :y :y not the sort of stress you need, especially at the weekend. :( :( :(
Logged
US Fracking and Saudi Arabia defending its market share = The good news of an oil glut, lower and lower prices for us and squeaky bum time for Putin!

VXL V6

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Male
  • Solihull
  • Posts: 9810
    • 530D M Sport, Elite 3.2
    • View Profile
Re: OOF Outage
« Reply #5 on: 08 October 2017, 17:36:05 »

Nice one  :y
Logged

Lizzie Zoom

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Female
  • South
  • Posts: 7370
    • Omega 3.2 V6 ELITE 2003
    • View Profile
Re: OOF Outage
« Reply #6 on: 08 October 2017, 18:15:14 »

We owe you again TB! :-* :-* :y :y
Logged

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #7 on: 08 October 2017, 18:21:25 »

The outage to resync the databases will likely be early tomorrow morning now, which hopefully will minimise inconvenience :)
Logged
Grumpy old man

BazaJT

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Male
  • SLady bitshorpe N.Lincs.
  • Posts: 9086
    • Omega 3 litre Elite
    • View Profile
Re: OOF Outage
« Reply #8 on: 08 October 2017, 18:42:20 »

I wondered where it'd gone.Good job someone knows what they're doing :y Don't know how you do it for the money :D ;D
Logged

Rods2

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Male
  • Sandhurst Berkshire
  • Posts: 7604
    • 1999 3.0 Elite Estate
    • View Profile
Re: OOF Outage
« Reply #9 on: 08 October 2017, 18:58:26 »

I wondered where it'd gone.Good job someone knows what they're doing :y Don't know how you do it for the money :D ;D

I think it's a labour of love, sweat and tears. ::) ::) ::)
Logged
US Fracking and Saudi Arabia defending its market share = The good news of an oil glut, lower and lower prices for us and squeaky bum time for Putin!

BazaJT

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Male
  • SLady bitshorpe N.Lincs.
  • Posts: 9086
    • Omega 3 litre Elite
    • View Profile
Re: OOF Outage
« Reply #10 on: 08 October 2017, 19:02:01 »

Mind you it could be TB's cull list that's overloading the system in the first place ;D
Logged

Shackeng

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Male
  • Ramsbury
  • Posts: 7762
    • 3.2 Elite 2.0 TitX Mondeo
    • View Profile
Re: OOF Outage
« Reply #11 on: 08 October 2017, 19:32:02 »

As soon as it went off line, I said to myself, I bet that's the hyperconvergence again. Great to be proved right. ::) ::) ::)
Logged

Rods2

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Male
  • Sandhurst Berkshire
  • Posts: 7604
    • 1999 3.0 Elite Estate
    • View Profile
Re: OOF Outage
« Reply #12 on: 08 October 2017, 21:42:47 »

As soon as it went off line, I said to myself, I bet that's the hyperconvergence again. Great to be proved right. ::) ::) ::)

It was down last night when I got in just before 11pm and the same this morning at about 8:30am and the thought went through my mind, does he know it's down and then gas bottles and garages. :o :o :o
Logged
US Fracking and Saudi Arabia defending its market share = The good news of an oil glut, lower and lower prices for us and squeaky bum time for Putin!

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #13 on: 09 October 2017, 17:30:40 »

does he know it's down and then gas bottles and garages. :o :o :o
That very thing was mentioned at work on one of our conf calls, as one of the guys couldn't make it due to having leccy and gas meters changed.  "TB is good with gas and leccy" was the smart alec comment   >:(
Logged
Grumpy old man

Lazydocker

  • Omega Queen
  • *****
  • Offline Offline
  • Gender: Male
  • Woodbridge, Suffolk
  • Posts: 18848
  • Constantly Bullied by a certain Admin
    • View Profile
Re: OOF Outage
« Reply #14 on: 09 October 2017, 17:46:10 »

does he know it's down and then gas bottles and garages. :o :o :o
That very thing was mentioned at work on one of our conf calls, as one of the guys couldn't make it due to having leccy and gas meters changed.  "TB is good with gas and leccy" was the smart alec comment   >:(

Well, to be fair :-X :-X ::) :D
Logged
Whatever it is... I didn't do it

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #15 on: 09 October 2017, 17:56:31 »

Cough. That was over 4yrs ago. Ancient history. To be forgotten by all.
Logged
Grumpy old man

Lazydocker

  • Omega Queen
  • *****
  • Offline Offline
  • Gender: Male
  • Woodbridge, Suffolk
  • Posts: 18848
  • Constantly Bullied by a certain Admin
    • View Profile
Re: OOF Outage
« Reply #16 on: 09 October 2017, 18:31:12 »

Cough. That was over 4yrs ago. Ancient history. To be forgotten by all.

Of course... I'll never mention it again. Or tree shaped air fresheners :-X ::) :D
Logged
Whatever it is... I didn't do it

Bigron

  • Omega Baron
  • *****
  • Offline Offline
  • Gender: Male
  • Witham, Essex
  • Posts: 4808
    • Omega 2.6 V6 Auto '51 Reg
    • View Profile
Re: OOF Outage
« Reply #17 on: 09 October 2017, 18:39:45 »

???
Logged

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #18 on: 09 October 2017, 18:53:05 »

Cough. That was over 4yrs ago. Ancient history. To be forgotten by all.

Of course... I'll never mention it again. Or tree shaped air fresheners :-X ::) :D
Punishment will be swift if you do...
Logged
Grumpy old man

biggriffin

  • Omega Lord
  • *****
  • Offline Offline
  • huntingdon, Hoof'land
  • Posts: 9740
    • Vectra in a posh frock.
    • View Profile
Re: OOF Outage
« Reply #19 on: 09 October 2017, 20:53:43 »

Cough. That was over 4yrs ago. Ancient history. To be forgotten by all.

Of course... I'll never mention it again. Or tree shaped air fresheners :-X ::) :D
Punishment will be swift if you do...
.

Mmm newbie docker again
Logged
Hoof'land storeman.

Lazydocker

  • Omega Queen
  • *****
  • Offline Offline
  • Gender: Male
  • Woodbridge, Suffolk
  • Posts: 18848
  • Constantly Bullied by a certain Admin
    • View Profile
Re: OOF Outage
« Reply #20 on: 09 October 2017, 22:34:43 »

Cough. That was over 4yrs ago. Ancient history. To be forgotten by all.

Of course... I'll never mention it again. Or tree shaped air fresheners :-X ::) :D
Punishment will be swift if you do...
.

Mmm newbie docker again

Probably... Been a while :-X ::)  ;D
Logged
Whatever it is... I didn't do it

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #21 on: 11 October 2017, 18:08:50 »

Moving the webserver and primary database off the flash storage onto a traditional HDD raid 10 storage device.

There should be no outage (its moving right now), but obviously spinning media is slower than flash, so page load times may increase slightly.


This is to run diags on the flash storage that crashed at the weekend.
Logged
Grumpy old man

Shackeng

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Male
  • Ramsbury
  • Posts: 7762
    • 3.2 Elite 2.0 TitX Mondeo
    • View Profile
Re: OOF Outage
« Reply #22 on: 11 October 2017, 18:29:45 »

^^^
Wot he said. ::) ::) ::)
Logged

Migv6 le Frog Fan

  • Omega Queen
  • *****
  • Offline Offline
  • Gender: Male
  • Webs End.
  • Posts: 11734
  • Nicole's Papa
    • 3.2 Elite. Boxster. C1.
    • View Profile
Re: OOF Outage
« Reply #23 on: 11 October 2017, 18:33:34 »

I would have done exactly the same.  :y



 ;D ;D
Logged
Women are like an AR35. lovely things, but nobody really understands how they work.

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #24 on: 16 October 2017, 14:18:50 »

Blimey, WTF is happening lately  >:(

One of the hypervisors has sorta crashed, following a minor health alert this morning.  What should happen is if the hardware health is suspect, all the load moves over to another, healthy hypervisor.  In this case its just shat it pants.  The VMs are still running, but unmanageable.

So, no choice but to power cycle the underlying hardware, which will ungracefully power cycle the VMs running on it.


From an OOF prespective, this includes the primary webserver, the primary database, and the secondary database (which affinity rules say should never be on same hypervisor as primary, so something else has gone wrong there).


I shall do this when I finish work today.
Logged
Grumpy old man

Entwood

  • Omega Queen
  • *****
  • Offline Offline
  • Gender: Male
  • North Wiltshire
  • Posts: 19566
  • My Old 3.2 V6 Elite (LPG)
    • Audi A6 Allroad 3.0 DTI
    • View Profile
Re: OOF Outage
« Reply #25 on: 16 October 2017, 14:26:19 »

Blimey, WTF is happening lately  >:(

One of the hypervisors has sorta crashed, following a minor health alert this morning.  What should happen is if the hardware health is suspect, all the load moves over to another, healthy hypervisor.  In this case its just shat it pants.  The VMs are still running, but unmanageable.

So, no choice but to power cycle the underlying hardware, which will ungracefully power cycle the VMs running on it.


From an OOF prespective, this includes the primary webserver, the primary database, and the secondary database (which affinity rules say should never be on same hypervisor as primary, so something else has gone wrong there).


I shall do this when I finish work today.

Good luck, and thank you in advance   :y :y
Logged

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #26 on: 16 October 2017, 17:43:29 »

Panic over - managed to finally get sufficient control of the bastid hypervisor to allow all the VMs to be move gracefully, and then rebooted the hypervisor :)

No outage needed. Happy days.  Sometimes simple victories give me a real smile :)
Logged
Grumpy old man

Lizzie Zoom

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Female
  • South
  • Posts: 7370
    • Omega 3.2 V6 ELITE 2003
    • View Profile
Re: OOF Outage
« Reply #27 on: 16 October 2017, 17:45:33 »

Panic over - managed to finally get sufficient control of the bastid hypervisor to allow all the VMs to be move gracefully, and then rebooted the hypervisor :)

No outage needed. Happy days.  Sometimes simple victories give me a real smile :)

Does that mean TB your cull is postponed?? :-\ :-\ ;D ;D ;)
Logged

TD

  • Omega Knight
  • *****
  • Offline Offline
  • Gender: Male
  • Swindon
  • Posts: 1235
    • Nowt!
    • View Profile
Re: OOF Outage
« Reply #28 on: 16 October 2017, 17:49:46 »

Panic over - managed to finally get sufficient control of the bastid hypervisor to allow all the VMs to be move gracefully, and then rebooted the hypervisor :)

No outage needed. Happy days.  Sometimes simple victories give me a real smile :)

You need to upgrade from win95  ;) ;D ;D

Well done TB  :y
Logged

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #29 on: 16 October 2017, 18:03:03 »

Panic over - managed to finally get sufficient control of the bastid hypervisor to allow all the VMs to be move gracefully, and then rebooted the hypervisor :)

No outage needed. Happy days.  Sometimes simple victories give me a real smile :)

Does that mean TB your cull is postponed?? :-\ :-\ ;D ;D ;)
No. I'll just be happier executing it.
Logged
Grumpy old man

TheBoy

  • Administrator
  • *****
  • Offline Offline
  • Gender: Male
  • Brackley, Northants
  • Posts: 105847
  • I Like Lockdown
    • Whatever Starts
    • View Profile
Re: OOF Outage
« Reply #30 on: 16 October 2017, 18:05:30 »

Panic over - managed to finally get sufficient control of the bastid hypervisor to allow all the VMs to be move gracefully, and then rebooted the hypervisor :)

No outage needed. Happy days.  Sometimes simple victories give me a real smile :)

You need to upgrade from win95  ;) ;D ;D

Well done TB  :y
I suspect part of the issue is quite the opposite, going too cutting edge.  Already been bitten this month by applying a patch that kills the hypervisor if running Intel based 10GE cards...  ...fortunately in that case the hypervisor had no VMs do to the planned patching :).
Logged
Grumpy old man

Lizzie Zoom

  • Omega Lord
  • *****
  • Offline Offline
  • Gender: Female
  • South
  • Posts: 7370
    • Omega 3.2 V6 ELITE 2003
    • View Profile
Re: OOF Outage
« Reply #31 on: 16 October 2017, 18:23:36 »

Panic over - managed to finally get sufficient control of the bastid hypervisor to allow all the VMs to be move gracefully, and then rebooted the hypervisor :)

No outage needed. Happy days.  Sometimes simple victories give me a real smile :)

Does that mean TB your cull is postponed?? :-\ :-\ ;D ;D ;)
No. I'll just be happier executing it.

Ooooooo! You are so heartless TB! :o :o :o

 ;D ;D ;D ;D ;D ;D ;)
Logged
Pages: 1 2 3 [All]   Go Up
 

Page created in 0.048 seconds with 18 queries.