Month: January 2015

System Center Operations Manager – EventID 20070, 21016 & 20000

Recently we got a problem with our OpsMgr server where all the agents stopped reporting but still appear green in the console and all that is logged is event id 20000 on the Management Server and 20070 on the server with the agent. You get no alerts and no more clues to why this is happening. So in search for a fix for this I’ve seen several blogs about certificate issues, OpsMgr Gateway servers and Firewall issues. But no where does it say what to do in case you have this on all or some of you agents that are part of the domain, has been working and suddenly stopped with no apparent errors.

If above is not enough to make you go mad, reinstalling agents, adding new agents or flusing the health cache won’ help with. So lets start from the begning.

The server wich is beeing monitored. If you have this issue on multiple servers pick one and look at that one, don’t look at all servers in one go. So event ID 20070 and 21016.

Agent20070 Agent21016

This tells us that during communication with the Management server the agent has problems AFTER authentication. OK so one good thing, the agent can find the server and gets a response of some sort.

On the management server you have the issue of Event ID 20000

And this tells us that the agent should be in a pending state approval state. Ok so off to the console, check pending agents and 0, none, zip, nada. Well the agent have been working so lets check under Agents Managed.


Well this is strange, the agent should be in pending approval but isn’t and when looking at the agent is all green and saying everything is OK but we still get no alerts from them.

Everything then seems ok, agents are green, DNS is working. Management server is up and running, all services are running. Connection to SQL DB is ok. So here is the kicker there has been a change from OpsMgr 2007 to OpsMgr 2012. In 2007 and erlier versions you can put machines into Maintenance mode. They then temporarily stop reporting so you can do maintenance work, reboots and other fixes. In 2012 this has changed and maintenance mode can now be applied to any object. This means an agent can be up and running, the management server is up and running but the IIS on the monitored server is in maintenance mode so IIS won’t give you any alerts. And the same thing can then be done on the management server. The server is up but the port that accepts connections is in maintenance mode and thus does not accept any connections or generats any alerts.

So how do we find this since the console sure ain’t telling us anything about this and the event log isn’t giving of any clues
Powershell to the rescue! There is a couple of commands related to scom maintenance mode that can be used. First off is Get-ScomMaintenanceMode, this will list an objects in maintenance mode  but only with there guidnumbers and you might have servers that should be in maintenance mode. Instead we use Get-ScomMonitoringObject combined with a small filter that looks like this.

Get-ScomMonitoringObject | Where-Object { $_,InMaintenanceMode -eq $true }


This will give you a list of servers that has some or all of their objects in maintenance mode. Now you can start to figure out which of these shouldn’t be there.Next up since I had none of my servers needing to be in maintenance mode we run another small powershell command to reset maintenance mode.

Get-ScomMaintenanceMode | Set-ScomMaintenanceMode –EndTime (Get-Date) –Comment “Remove from maintenance mode”


This will get all maintencemode objects and set a new time to end the maintenancemode to “now”. You can after this run Get-SCOMMaintenanceMode to verify no more objects are in there. After this all the agents will start reporting in again.


Operations Manager – Clear Healthservice

I was recently at a customer where we installed Operations Manager and while installing we did notice some strange behaviors in some of the agents. As it turned out there was and old OpsMgr installation with the same management group name as the one we had chosen.

This meant that the agents always reverted back to the old server. To remove this behavior all references in AD where removed and the new server published new AD integration information. This is however not enough as you also need to clear the agent health cache and remove the old Management Group information from the registry.

To make this easier I wrote a script that using remote Power Shell connects to the computer/computers, stops the health service, clears the health cache, removes the registry entries and then starts the health service again.

The script can be downloaded here and is pretty straight forward. To run the script just enter a computername of the server where the agent is installed and the health cache will be reset. When used with the -ClearGroups switch it will also remove any references in the registry to all management groups.

For mor information on the script use get-help clear-healthcache.ps1

Please leave a comment if you find something that is not working or if you just like and use the script.