Windows Reliability - Yeah Right...

Morning, Grumpy Admin here – Yesterday I took a break from my boring documentation and other project work – Hence not many blog posts as well I have to actual earn a living, blogging doesn’t really pay the bills as much as I enjoy sharing with you guys.

Well I decided to take a break when I was asked a question, which allowed me to do some geeky stuff again…  The question was quite a simple question.

Is there an easy way to get a record of all the system crashes?

I thought about this over a coffee before jumping in with an instant response, as I wanted the best solution.  Of course there is the option of filtering log events and checking for crash dumps and parsing the dump file date/times. But then I remembered a hidden little gem. The Windows Reliability Monitor.

There quite a few administrators out there that don’t know that since Windows Vista – Microsoft have been capturing a lot of performance and reliability data for a Windows installation. Typically on a client desktop OS this performance information is enabled by default. Remember this small but important fact.

This reliability information can sometime be a god send when troubleshooting issues. Even better, this was a feature developed when the GUI was king, before the paradigm shift towards the command prompt/powershell again!
So you can access this via a wonderful information from a nice GUI. Now how do you load this GUI I hear you asking…. Well that is simple, type the following inside a “run” dialog box

Perfmon /rel

Which is neat really don’t you think, there options like save the reliability history, show events etc… and a time line as well broken down in to dates.  Very easy and intuitive to use!

What more, this reliability data is exposed through WMI as well… and as you know you can access and tinker and poll WMI classes and the returned data via Powershell Get-WMIObject.

So may I introduce to you WIN32_ReliabiltyRecords

Using the power of the shell, we can get the records and dump them to screen just to prove it works!

 

People that know me, will know that the next thing that I like to do is to perform an get-member command on object it to see what we have to play with. This will give me the field that are available to me to. I like to think I have some noggin about me but I don’t know everything and I am a strong believer if you know how to get the answers you can figure stuff out yourself.  As you will see, if I am unsure I will run a get-help or if a command with the /? Switch… no point swimming in the dark, not sure ask for help or google it!

So we know can grab this data lets dump all the records to screen simple to see what the data looks like and so we can make an informed choice if we need to filter the results.

WOW – a screen of text! At least it isn’t a sea of red, cause that would just make me grumpy!

Let’s perform a very quick

$data.count

 

This count show  us that there are 3584 records in the returned data! We aren’t filtering nothing. I think rather than using a –filter on returning the data from the WMI query, we should return everything and we have that in memory and then do other stuff with it!

So I am going to use the basic FOREACH and use conditional if statements to help look for records that have the EventIdetifier with a value of 6008 event ID in the event logs, so we can do some code like this

 

$data = Get-WmiObject Win32_ReliabilityRecords

       ForEach ($record in $Data)

       {

                If($record.EventIdentifier -eq 6008)

                                {

Write-Host “Message:       ” $record.Message

Write-Host “Event” $record.EventIdentifier

Write-Host “Product Name:  ” $record.ProductName

Write-Host ” ”           

}

}

Forgive the write-host – this is just proof of concept code as ever with most of these blog post codes! Dirty to get the job done and demo the reliability monitor stuff. If you do feel very strongly, feel free to comment and kill a puppy or my ex-wife!

So event id 6008 are related to unexpected system shutdowns. This answers the initial question, but while were are here lets dive a bit deeper in to this… basically I really don’t want to finish up this documentation and this a great excuse!

We can also filter by the content of the Message

For example,

Let’s do the following, in true grumpy admin style!

$data.Message | get-member

We can see it’s just a string so we can use neat PowerShell features like

StartsWith

For example let’s modify our above code to do something more English like, I don’t know about you but I don’t go around remembering all the Event ID! If you do, you’re a geek!

 

$data = Get-WmiObject Win32_ReliabilityRecords

ForEach ($record in $Data)

{

If($record.Message.StartsWith(“Windows Installer”))

{

Write-Host “Message:       ” $record.Message

Write-Host “Event” $record.EventIdentifier

Write-Host “Product Name:  ” $record.ProductName

Write-Host ” “

}

}

If we run this you can see all the Windows installer events, quite handy I think and is kinda neat so your mileage is only limited to your code, and if statements, you have the raw records… so you can do whatever you want…. Go wild!!! I did… I used write-host 😛

As you can tell from the screenshots, I did this on my local work laptop as I knew there would be lots of events to view and I also knew that Windows 7 machines have this feature turned on by default.

Let move over to Windows Server 2012 and see if we can do the same…

Let get that nice GUI up from Perfmon /rel and lets enjoy our performance and reliability data.

Hang on, it dead and boring… nothing to see here… what is going on?????

Ok, maybe the GUI has been depreciated, let’s try the top PowerShell code to grab the reliability records.

Oh this isn’t good is it!  Provider load failure 🙁 guess it doesn’t work on Server 2012 by default our out of the 🙁

The answer to the problem seems logical to me is we need to turn this Windows feature on again. There might be reasons why it is disabled and not configured on servers. But meh! I couldn’t find any decent reason why not to turn it on, so figured why the hell not. If you know the reason why not share that will everyone! There are a few steps, so let go these now.

First thing we need to do is enable the RACtask on your servers if you want to collect/analyse the reliability data. This task runs every so often to parse the data and format it! So it need to be enabled.

Grumpy admin always tries to do things the easiest way – For this I will use the GUI MMC console for task scheduler. Launch the task scheduler MMC snapin and then navigate to Microsoft/Windows/RAC – If you don’t see anything you need to make sure hidden tasks are displayed from the view menu!


Then right click and hit ENABLE

Right click and hit run!

AS you can see from this message it takes ages to parse the entries, so I will keep going with the other steps that need to be done to get this feature up and running and hope that the GUI data drops in before I post this blog!

I am going reboot and apply some windows patches to the test box in a moment to generate some events anyway. Hopefully when we reboot and rerun the perfmon /rel we will see the entries in the reliability monitor proving that part of the feature is now working!

Next we need to get the WIN32 provider working so we can get our script to do its stuff! Exciting stuff.

There is policy setting that turns this feature on as well, let create and modify a GPO and see if we can enable the WIN32 Reliability provider 🙂

 

Let’s link this new GPO to our test OU which is where our test server happens to be sitting thanks to some cunning drag and dropping in AD users and computers.

After a quick gpudate /force and gpresult /r

to confirm the setting is being applied let’s retest our PowerShell and see if it works – dam it still didn’t work – what is going on! We have faith that we have enabled this provider, but it isn’t loading. 

Well this is where the old school Windows Administrators will know what is going on. The DLL file which provides the WMI provider functions isn’t loaded or registered. A quick google and we can work out using the MSDN page that the DLL file is called RACwmiprov.dll.

https://msdn.microsoft.com/en-us/library/ee706632%28v=vs.85%29.aspx

Right let’s try and register the actual dll file, hoping it still is in Windows and wasn’t removed. To do this we shall using our trusted friend

regsvr32.exe

regsvr32 RACwmiprov.dll

Bingo! Let’s retest without get-wmiobject statement to the now registered class.

We can now use our reliability checking script/code on Windows Server 2012 as well!

And lets recheck our perfmon /rel gui and see if it’s now working or still waiting for stuff to happen in the background!

There it’s working, perhaps Windows lied and needs this DLL registered for the GUI to work. Grump Admin is far too lazy to bother waiting and testing that out especially as it is working now!

So the answer to my bosses question can I get the latest system crashes on a server is yes, from both client and server very easily with PowerShell. Yes I had to do some additional steps to get the features working on Server 2012, but what is the day without a bit of GPO and bit of DLL registering.

Hazzy