Tuesday, February 28, 2017

SCOM 2016 - The Curious Case of the Missing Agent Patch List Property and Static Agent Version Value

Last week Microsoft released the second update rollup (UR2) for SCOM 2016 and a common trend I've noticed with these UR's is that the Patch List property is missing from the Agents by Version view in the Monitoring workspace of the console.


This is a bug with the SCOM 2016 agent and a bit of an annoyance when deploying update rollups as it's handy to know which agents need to be upgraded and which ones don't.

A quick check in the Agent Managed view of the Administration workspace will show a version for the agent but this version won't update to any new UR versions. The following image shows the default SCOM 2016 agent version even though I've deployed UR1 to this environment months ago...


Now, if you're thinking that after an update, all agents always drop into the Pending Management view of the Administration workspace and patiently wait until you're ready to upgrade them, then you'd be wrong. Unfortunately, depending on how you deploy the update rollup (e.g. non-admin permissions, manually installed etc.), there's a good chance that some if not all of these agents will not appear in Pending Management and you'll end up with something similar to this...


So, now your only option in the console to upgrade the agents is to run a series of bulk Repair jobs from the Agent Managed view on all of them and then hope for the best that all agents have been successfully upgraded. This is not a fun process and I really don't like having a central view of all my agent versions direct in the console.

Thankfully Microsoft's Kevin Holman (SCOM Deity and all-round awesome community contributor) has created the new SCOM Agent Version Addendum Management Pack to help address this exact problem!

This management pack runs a script that disables the built in discovery for Microsoft.SystemCenter.DiscoverHealthServiceProperties (which has a display name of 'Discover Health Service Properties') and replaces it with a new discovery that attempts to retrieve the actual update rollup Agent Version value from a DLL file in the agent installation path. 

Straight after I import this new MP, my agent version in the Agent Managed view changes to reflect the existing agent versions (the 8.0.10931.0 version shows the UR1 agents that I currently have running) and after I've deployed UR2,  I can select those agents for a Repair job as shown in the image below...


When the Repair job has completed, the version changes to show that my agents have now been updated to UR2 as shown here:


I love this MP as it adds some much needed functionality to the Agent Managed view within the console. An extra bonus is that this MP also works perfectly on SCOM 2012 R2 too!

If you want to know more, check out Kevin Holman's blog post here and you can download it directly from the TechNet Gallery here.

Enjoy!

Wednesday, February 22, 2017

SCOM 2016 Update Rollup 2 (UR2) Now Available

Today, Microsoft released a new Update Rollup (UR2) for SCOM 2016.


This update contains twenty documented fixes with the following few of particular interest to me (based on what I've come across on customer sites so far):

  • When alerts are closed from the Alerts view after you run a Search, the closed Alerts still appear in the View when the Search is cleared.
  • Groups disappear from Group view after they are added to a Distributed Application.
  • When the maintenance mode option for the dependency monitor is set to “Ignore,” and the group (consisting of the server to which this dependency monitor is targeted) is put in Maintenance mode, the state of the monitor changes to critical and does not ignore maintenance mode.
  • Because of a rare scenario of incorrect computation of configuration and overrides, some managed entities may go into an unmonitored state. This behavior is accompanied by 1215 events that are written to the Operations Manager log.

You can see the full list of fixes from the official UR2 knowledge base article here.

To get access to this update, you can choose to either manually download it from the Microsoft Update Catalog here or you can use Windows Update to pull down the update automatically to your SCOM 2016 environment.

Whatever method you choose to deploy this update, make sure to read through the full installation instructions as there are some manual tasks to carry out once the update has been applied to each SCOM role and if you're not confident, I'd always recommend waiting for Microsoft's Kevin Holman to add his walk-through post for this UR to his blog here.

Finally, this update is one part of a larger UR2 release for covering other products in the System Center 2016 suite. If you've deployed additional components of the suite alongside SCOM, then you might be interested to check out the updates now available for DPM 2016, SCSM 2016, SPF 2016 and SCVMM 2016.

Full details of all the fixes in the main System Center 2016 UR2 downloads can be viewed at:



Tuesday, February 14, 2017

Scandinavian SCOM Solutions with a Global Reach

A few months before the Christmas break, I had the pleasure of being invited over to the excellent SCOM Day event in Sweden to present a session and hang out with some of my friends from the Scandinavian region.


The event was organised by Approved Consulting in Gothenburg and the target audience had a mix of IT administrators, consultants and senior IT managers. This was my first-time visiting Sweden and from the venue, to the food, the craft beers and of course, the people, it was a really enjoyable experience.

While I was over there, I had the chance to sit down with Approved CEO Jonas Lenntun and go through some of the solutions they offer to complement System Center and OMS. I was already aware of the free community SCOM Health Check Report they released a couple of years ago (if you haven’t tried this out yet, then download it from here):


Free solutions like this for SCOM are always good and the Health Check Report delivers an excellent overview of the health of your SCOM deployments - showing you information about the top alerts, events, performance counters, discoveries and even state changes along with database space usage and grooming history.

IT Service Analytics from Approved

Another cool solution that Jonas and the guys have been working on is their new IT Service Analytics platform. This plug and play solution enables organisations to analyse their IT services being monitored with SCOM and then forecast potential issues – well before they occur. If you’ve deployed Service Manager (SCSM) or even Microsoft’s new Operations Management Suite (OMS), then the IT Service Analytics platform can pull data from any combination of SCOM, SCSM and OMS to give you an even deeper analysis of your IT estate.

Here’s an overview taken from their blog on how it works:

By optimizing and combining data from System Center Operations Manager, Microsoft OMS and System Center Service Manager into one holistic data model, you are able to put the IT service in focus. This allows you to extract, correlate and predict information about IT Service Management processes for things like event, capacity, availability, incident and change management.

We utilize most of the Microsoft Business Intelligence tools, such as SQL Server, SSIS, SSAS, R and SSRS. This allows our analytical platform to seamlessly blend with your System Center installation and tap software and hardware resources that are readily available.



Taking it for a Test Drive

Earlier this week I had a chance to take the IT Analytics platform for a test drive and my first impression is that it’s an awesome reporting tool to have in your locker to help with troubleshooting and predictive analysis.

From the home screen, you can choose from a wide range of pre-built reports with information about alerts, capacity management, events, configuration changes and IT service overviews to name just a few.


One of the reports I really like is the Services report. Clicking this tile from the main reports window brings me to the Service Overview shown in following image:


This report gives me a 30-day availability overview of all the IT services that I have modelled and monitored in my SCOM environment along with information about alerts, change tracking, capacity and predictive event risks.

Here’s a description of what the information in each of the report columns mean:

  • Goal – Has the SLA goal been met or not? IT Services that have met their SLA will be displayed as green instead of red (in this demo environment, I’ve sorted the column to display all SLA’s that haven’t been met).
  • Service – The name of the IT service.
  • Availability – Displays the last 12 months of the IT service availability.
  • Percentage – The SLA percentage that has been reached. The upwards arrow means that the SLA has reached a better result than the previous month.
  • Failures – The number of outages for the service during this period.
  • Downtime – Displays the number of minutes the service has been unavailable for the month.
  • Alerts – The number of alerts that have been generated by the service during this defined report period. The arrow shows decreasing or increasing compared to last month.
  • Events – The number of events that have been generated by the service during this period. The arrow shows decreasing or increasing compared to last month.
  • Change Tracking – The amount of changes made to servers or other components of the service.
  • Capacity Risks – Shows if there are risks with capacity, such as a server running out of free memory based on the usage.
  • Event Risks – Shows if there are any predicted events for the service.

Identifying Bottlenecks

When I drill into a particular IT Service from the Service Overview report, I get a more targeted Service Details report with a number of informational tiles and a Top N view of common KPI’s like % CPU, % Memory and % Disk Space used.

The Bottlenecks tile sparked my interest here so I clicked this one first…


This brought me deeper to the following view – where I could see that two of my servers in this IT service were displaying potential bottlenecks.


Clicking into the server with two potential bottlenecks identified, I was then presented with a performance chart that showed a very high percentage of bandwidth used on a new network adapter we recently installed into the server to support DPM backups. The performance chart also confirms for me that although my network adapter spiked on and off for the past few days (no doubt when backup jobs are running), the overall average performance of it seems fine and it’s projected to stay around the 10% utilisation mark for the next few months.


The other potential bottleneck that was identified relates to the % Free Disk Space of a logical disk on the Hyper-V server. I can see from the chart that in the past year, the free disk space on this logical disk has fluctuated from approx. 30% free to a minimum value of less than 1%. The chart looks ahead a few months and predicts that the best I can hope for (assuming I leave things as they are) is no more than 7% free disk space.


Predictive Alerts

Back at the Service Details report, I can click the Events tile shown in the image below to give me an Events Report with a heads-up on the forecasted events and alerts that are likely to occur in my environment within the next 24 hours.


All Alert and Event reports have built-in filters for every chart to give you a more scoped analysis view of what's going on. From the Event Report shown in the image below, I can see there are some predicted alerts and events that I need to pay attention to.


Drilling further into the predicted alert value for a particular monitored object, I’m presented with a ‘IIS 8 Web Server is unavailable’ alert that´s been predicted and the amount of times it has happened over the last month. I can see the time of day the alerts usually show up. In this example, these alerts typically occur around 6am every day.


If I go back to the previous view and click into the Events tile, I can see it’s broken down into three sections.

The first section is a summary where you can see information on the top hosts, data channels, rules, management packs etc. which are generating the most events. In the image below, we can see that the server generating the most events is SEGOTSQL01. The grey bar in the middle displays last month´s value. You can also see that this server alone has generated 88% of all events for the current period.


The middle section of this report displays the time and day of the week that the events are generated.


The final section of this report gives us an insight into both the last 30 days and the last 12 months for how events are being generated.


Custom Reports

It's easy to create your own custom reports and you can export them to PowerBi or Microsoft Excel in a matter of minutes. Here's a nice example of one-such custom exported report...


Licensing

I mentioned earlier that I love free solutions for SCOM and when I quizzed Jonas on how much this awesome offering costs to license, I was delighted to hear that Approved have decided to release it for free! They do require a one-off nominal setup and training fee but aside from that, there's no other limitations on the platform.

Summary

If you're interested in deploying these free solutions into your SCOM environment, then use the contact info here to get in touch with the team at Approved. For more information on the IT Analytics platform, take a read of some blog posts written by well known SCOM community blogger Daniel Örneling here and here.



Monday, January 30, 2017

Update Rollup 12 (UR12) Just Released for SCOM 2012 R2

Today, Microsoft released Update Rollup 12 for SCOM 2012 R2. This update contains a decent number of fixes along with some new enhancements for both Windows and cross-platform monitoring scenarios.


A full list of all the fixes and enhancements can be seen here:

https://support.microsoft.com/en-us/help/3209587/system-center-2012-r2-om-ur12


I've yet to deploy this update into my lab but I'm particularly intrigued by this one:

  • Because of incorrect computations of configuration and overrides, some managed entities go into an unmonitored state. This behavior is accompanied by event 1215 errors that are logged in the Operations Manager log.

I've noticed managed entities going into an unmonitored state after applying overrides or changing distributed application configurations a lot over the last couple of years and it'll be interesting to see if this update sorts out the issue.

As should be the case for everyone deploying this update, test it in non-production environments first and be sure to read through Kevin Holman's excellent step-by-step guide to understand the order for which to apply the update and the additional manual steps that are needed:

https://blogs.technet.microsoft.com/kevinholman/2017/01/30/ur12-for-scom-2012-r2-step-by-step/


Friday, January 27, 2017

KB 3216755 contains a fix for Windows Server 2016 monitoring with SCOM

I've just noticed a new Microsoft KB article (KB3216755) that points to an update that contains a fix for some scenarios where monitoring Windows Server 2016 or Windows 10 might fail.


I haven't come across the exact scenario (yet) that this fix applies to, but it's useful to know there's an update that can help if you run into problems.

Also, be sure to take a read of the 'Known Issues' section towards the end of the KB article where it states:

"The Cluster Service may not start automatically on the first reboot after applying the update."

The workaround for this known issue is to either use PowerShell to start the cluster service on the node you've applied the update to or simply to reboot the node once more.

Check out the KB article here for more info:

https://support.microsoft.com/en-us/help/4011347/windows-10-update-kb3216755


Wednesday, December 7, 2016

SCOM - The Topology Widget, Visio and a suped-up HD display!

Recently, I ran into an issue while creating some dashboards in the SCOM console for a customer and I thought it might be worth sharing.

Normally I use the Topology Widget to light up an image file that I initially put together using Visio and the end-result typically turns out something like this…


The difference this time though was that I’ve been using a new Windows 10 laptop that has some pretty awesome specs and a kick-ass HD display. The downside of having a laptop with Windows 10 and these specs is that application scaling becomes a nightmare and there’s a whole merry-go-round of custom tweaks that I needed to make when I started using it so as to deliver an experience where I don’t need a giant magnifying glass to work!

Here's how I have my Windows 10 laptop scaling settings configured (notice the 250% size setting)..


With these scaling settings in place on the new laptop, I went about my business by first creating a new dashboard image in Visio and then saving it as a PNG file before finally importing the file into SCOM.

When I worked my way through configuring the Topology Widget wizard to map my custom IT services (Distributed Applications) onto the image, the dashboard disappointingly turned out like this...


The problem with this dashboard view is that its grainy quality and tiny health state icons make it hard to read and understand. I've created hundreds of these dashboard views in the past and this was the first time that I've encountered a problem like this so it was time to dig a little deeper to find the solution.

The first thing I tried was to copy the problematic PNG file to another SCOM environment and create a new Topology Widget dashboard there. In this separate environment, the grainy image and tiny health state icons were still there so the problem pointed to an issue with the PNG file.

Another test I tried was to import a completely different dashboard PNG file that I knew worked fine in another customers environment and thankfully this displayed as expected. With this validation, I was confident that I was dealing with an issue either with the original problematic PNG or the Visio image that I created the PNG from.

As I traced back through my steps, I opened the Visio file again that I created this dashboard in and clicked the the Save As option from the File menu to save it as a new PNG. When I did this, I was presented with the following PNG Output Options window:


Notice the default Resolution and Size settings Visio 2016 selects for me when I go to save a new PNG file. I figured that due to the 250% display scaling option that my laptop was configured with, these settings were creating the PNG file at too high a resolution for SCOM to work with.

I went back to the original problematic PNG file and checked the Image Properties and I could see that it was configured to use 2044 x 1548 pixels as shown here....


When I checked the other dashboard PNG file that I knew worked (and which I created on my old laptop), I could see that it was configured to use a much lower pixel size.

So, back to the Visio diagram of my new dashboard and this time, when I clicked the Save As option from the File menu, I manually configured the PNG Output settings to use a resolution of Source and a pixel size of 1123 x 794 as shown in this image...


When I imported this new leaner version of the PNG file back into the same Topology Widget, I finally got the results I was looking for where the health state and image quality were far easier on the eye.

Hopefully this easy fix helps someone else out with their future SCOM dashboard creations!


Tuesday, November 29, 2016

The Most Useful SCOM Article on the Web Just Got an Update!

As anyone who's ever worked with SCOM will know, it's a fairly heavy and complex product to get your head around at first and the larger the environment to be monitored, the more administration and troubleshooting tasks you'll need to teach yourself.


Way back when I started working with SCOM, I quickly found myself lost in a myriad of blog posts and TechNet articles searching for help on how to extract information from the SQL databases to help me better understand the problems I was experiencing.

The one thing that kept coming up trumps for me in my searches time and time again was Kevin Holman's 'Useful Operations Manager 2007 SQL Queries' post. This post brought together a virtual treasure chest of SQL queries that the 'non-SQL admin' like me could easily copy and paste into my SQL Management Studio window for instant information or configuration changes in my customers SCOM environments.

It was probably the first SCOM reference on the web that I saved as a favourite into my web browser and was always a location that I'd tell new SCOM admins to go check out and bookmark.

As the title of Kevin's post suggests, it was originally put together nine years ago as a central repository of SQL queries for SCOM 2007. When System Center 2012 and ultimately 2016 came around, these queries still worked with the newer releases of SCOM but there was often some confusion from people trying to understand if they only worked with SCOM 2007.

So to address this, just recently Kevin took the time to archive his original 2007-named post and create a new one titled simply 'SCOM SQL Queries'.


Not only has he renamed the post but he has also formatted it in a way that all queries are now much easier to read from and copy/paste as required.

Check out the new location for what is most likely, the most useful SCOM article on the web here:

https://blogs.technet.microsoft.com/kevinholman/2016/11/11/scom-sql-queries/


Tuesday, November 22, 2016

Experts Live NL 2016

Today I've just finished up presenting my last public conference session of 2016 at the awesome Experts Live conference in the Netherlands.


This is my second year to attend Experts Live NL and it already seems like the conference attendee and speaker count has grown significantly in that short space of time.

My presentation this year was titled 'Hacking OMS with your OpsMgr Skills' and is an extension of the session that I co-presented with my good friend Cameron Fuller at System Center Universe 2016 in August.

The original idea and title for this session was all Cameron's and with his blessing, I put my own spin on the content to ensure that Experts Live attendees were treated to a significantly different version of the one we delivered previously at SCU. Also, with the vast number of changes and feature additions that we've now become accustomed to with OMS, there was much to show on the day.

My session was the first to open after the keynote and it was refreshing to see the room filled with a large number of current OpsMgr users waiting to hear how to advance their skillsets with OMS.


(Photo credit Pedro van Vliet)

When my presentation was done, I took some time to hang out with old friends and to network with the attendees and various booth vendors around the event.


All in all, Experts Live NL was a good closure for me to a hectic few months of traveling and presenting. I'm looking forward to now refocusing my attention back onto my poor neglected blog and bringing some useful posts into the community over the coming months!


Wednesday, October 19, 2016

Important SCOM 2016 and 2012 R2 Updates!

If like me, you've jumped aboard the SCOM 2016 bandwagon and started deploying the recently released GA version to your production environments, then you'll need to be aware of two very important updates that need to be added ASAP.

The first one is Update Rollup 1 for SCOM 2016:

https://support.microsoft.com/en-us/kb/3190029

Microsoft have recommended that people deploy this update rollup immediately after deploying the intial SCOM 2016 GA build as it contains fixes for a number of issues that were recently highlighted by users of the Technical Preview 5 release.

The next update is better identified as a patch (KB3200006) that Microsoft needed to quickly release in response to a widespread spate of console crashes on both SCOM 2016 and 2012 R2.

People are understandably frustrated at these crashes as you can read from here and here.

You can get access to the new patch that (hopefully) fixes this problem from the following link:

https://support.microsoft.com/en-us/kb/3200006

Hopefully this helps people out and feel free to use the comments section below (or add your thoughts to the TechNet forums mentioned above) if this patch doesn't solve the console crash issue for you.


Thursday, October 6, 2016

Updated: SCOM 2016 & 2012 R2 Prerequisites Script

Last year when I was starting work on my new Getting Started with Operations Manager book, I needed a PowerShell script that would help me deploy the SCOM 2016 and 2012 R2 prerequisites without fail every time.


The script was a derivative of an earlier SCOM 2012 SP1 script that I published a few years back and it worked fine up until the download link for the ReportViewer prerequisite changed to support Windows Server 2016. I had it on my to-do list to update this script to reflect the new download link but before I got around to it, I noticed that my good friend (and the tallest Dutch guy I know) Oskar Landman had taken my original script and added his scripting magic to it!

Oskar's updated script now has interactive prompts to check which version of SCOM you're installing and whether or not you are deploying the Web Console role (which requires the most prerequisites) - awesome!



Taking your inputs from those prompts, it will then go and download the SQLSysClrTypes and ReportViewer prerequisites to a folder of your choice, install them and then deploy all required roles and features based on your input - nice!

You can review Oskar's original blog post about his work on this script here.

The updated script can be downloaded from its original TechNet Gallery location here: