Understanding the business and its requirements
In addition to understanding the various roles and how they are installed, as well as the performance, capacity sizing, and availability requirements for System Center 2012 R2 Operations Manager, it is equally important to understand what the business requires from its monitoring solution.
Without getting a clearly defined set of requirements, you could run the risk of not implementing high availability on the roles in highest demand, not implementing them at all, or focusing on monitoring areas of the business that provide no value.
Getting ready
The following information should give you a good idea of the areas and questions that you can then take back to the business and seek answers from those involved in the decision-making processes.
How to do it...
The following information will provide you with areas of thought when discussing the monitoring requirements with the business.
Availability/percentage uptime required
Are you mandated to provide a five nines (99.999 percent) service or in reality can you provide a 98 percent uptime service? Most organizations like the sound of a five nines service, but in reality when they see the costs and controls associated with obtaining this uptime, requirements are often re-thought.
Try gathering information regarding your key systems and their priority. Once ranked, work with the business to agree on individual uptime percentages for each application rather than as a whole, as some may be less critical to the business and therefore shouldn't have the same amount of high availability and expense associated with them.
Rather than concentrating on only the time that an application should be up, again work with the business to correctly identify periods of time that the application is able to be taken out of service for planned maintenance. This can help maximize the percentage uptime by allowing you to schedule work around that application's maintenance window and track the different types of downtime to provide accurate metrics defining unplanned downtime, which lowers uptime, and planned downtime.
Cost of downtime
The cost of downtime helps to get an understanding from the business with regard to what downtime of the application actually does cost the business.
Is it a financial loss such as a stock exchange or mining corporation may see if a critical system is down? Maybe, it's a loss of productivity or reputation or a loss of life in the case of systems used within hospitals.
Whatever the cost of downtime may be, knowing this in advance as you start designing your monitoring solution will enable you to focus on priorities and develop targeted reports that can represent the costs, highlighting areas doing well or others that need investments.
Services within the monitoring scope
Alongside simply deciding to deploy agents to monitor servers, you must also consider what business services are within the scope of monitoring. As part of this, you need to ensure individual components (servers, network, applications, and so on) are accounted for and the solution scaled to support.
With these business services to be monitored, there also arises the questions regarding any specific SLAs for performance and availability that may need to be set up against the services, along with any reports that may be needed.
This requires you to take into account not only the scale but also the extra work involved in the creation and maintenance of your services.
Financial penalties
Alongside knowing the cost of downtime, you should also know whether there are specific areas of the infrastructure that, if down, will cause business-specific financial penalties so that these can again be prioritized for monitoring.
Resource metering – showback/chargeback
In addition to ensuring that you are monitoring key systems that may cause expenses to the business if problems aren't quickly identified, you may need to also capture areas within your business that earn revenue.
As multi-tenancy or even just the requirement to recoup costs from individual parts of the organization grows ever more important, you should start gathering information related to how much capital was expended on your infrastructure and how that can be equated to costs for individual resource usage of the components of that infrastructure.
Typically, you would assign costs to CPU, memory, storage, and networking utilization.
Capacity planning
Not so much an area to gather specific information for, but to gain understanding from the business regarding at what level of utilization they require foresight into, for capacity planning and the purchasing of new equipment or redistribution of workloads.
For example, would the business like to know when the drive space is down to 20 percent or 40 percent of free space? Are they happy with the utilization of server memory at 80 percent or 95 percent?
Having this information on hand will help with the initial tuning of your new monitoring environment and the creation of any forecasting reports.
How it works...
By gathering information from the outset before implementing your SCOM design, capacity planning allows you to understand exactly what the business is trying to achieve and how the SCOM implementation can best achieve that.
For example, if the business has no requirements to monitor access to files and systems, then implementing the ACS roles may be a waste of resources better served elsewhere. Again, if the business decides it has 50 applications that require extensive distributed apps for creating and monitoring, then be sure to scale the number of management servers appropriately.
Another area to consider is other systems and their integration. For example, does information regarding NetFlow data from another system need to be fed into SCOM or does SCOM need to output information into a Service Desk tool such as System Center 2012 R2 Service Manager?
These interactions, along with normal notifications and other subscriptions, can again place load on the solution and must be taken into consideration.