System Center Operations Manager – EventID 20070, 21016 & 20000

Recently we got a problem with our OpsMgr server where all the agents stopped reporting but still appear green in the console and all that is logged is event id 20000 on the Management Server and 20070 on the server with the agent. You get no alerts and no more clues to why this is happening. So in search for a fix for this I’ve seen several blogs about certificate issues, OpsMgr Gateway servers and Firewall issues. But no where does it say what to do in case you have this on all or some of you agents that are part of the domain, has been working and suddenly stopped with no apparent errors.

If above is not enough to make you go mad, reinstalling agents, adding new agents or flusing the health cache won’ help with. So lets start from the begning.

The server wich is beeing monitored. If you have this issue on multiple servers pick one and look at that one, don’t look at all servers in one go. So event ID 20070 and 21016.

Agent20070 Agent21016

This tells us that during communication with the Management server the agent has problems AFTER authentication. OK so one good thing, the agent can find the server and gets a response of some sort.

On the management server you have the issue of Event ID 20000
Server20000

And this tells us that the agent should be in a pending state approval state. Ok so off to the console, check pending agents and 0, none, zip, nada. Well the agent have been working so lets check under Agents Managed.

AgentManaged

Well this is strange, the agent should be in pending approval but isn’t and when looking at the agent is all green and saying everything is OK but we still get no alerts from them.

Everything then seems ok, agents are green, DNS is working. Management server is up and running, all services are running. Connection to SQL DB is ok. So here is the kicker there has been a change from OpsMgr 2007 to OpsMgr 2012. In 2007 and erlier versions you can put machines into Maintenance mode. They then temporarily stop reporting so you can do maintenance work, reboots and other fixes. In 2012 this has changed and maintenance mode can now be applied to any object. This means an agent can be up and running, the management server is up and running but the IIS on the monitored server is in maintenance mode so IIS won’t give you any alerts. And the same thing can then be done on the management server. The server is up but the port that accepts connections is in maintenance mode and thus does not accept any connections or generats any alerts.

So how do we find this since the console sure ain’t telling us anything about this and the event log isn’t giving of any clues
Powershell to the rescue! There is a couple of commands related to scom maintenance mode that can be used. First off is Get-ScomMaintenanceMode, this will list an objects in maintenance mode  but only with there guidnumbers and you might have servers that should be in maintenance mode. Instead we use Get-ScomMonitoringObject combined with a small filter that looks like this.

Get-ScomMonitoringObject | Where-Object { $_,InMaintenanceMode -eq $true }

GetMonitoring

This will give you a list of servers that has some or all of their objects in maintenance mode. Now you can start to figure out which of these shouldn’t be there.Next up since I had none of my servers needing to be in maintenance mode we run another small powershell command to reset maintenance mode.

Get-ScomMaintenanceMode | Set-ScomMaintenanceMode –EndTime (Get-Date) –Comment “Remove from maintenance mode”

SetMaintenancemode

This will get all maintencemode objects and set a new time to end the maintenancemode to “now”. You can after this run Get-SCOMMaintenanceMode to verify no more objects are in there. After this all the agents will start reporting in again.

/Peter

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s