Sizing and Architecture – The Science of Capacity Planning
Hi folks, decided to write an article about Capacity Planning because it has always been one of my passions. I’ve 14 years of consulting experience across major accounts in EMEA and I’ve been involved on hundreds of ECM related IT projects. I’ve found adequate capacity planning mechanisms on just a few of those, and by “magic” those were/are the most successful and long lasting projects.
What is Capacity Planning on a ECM context
Capacity Planning is the science and art of estimating the space, computer hardware, software and connection infrastructure resources that will be needed over some future period of time. It’s a mean to predict the types, quantities, and timing of critical resource capacities that are needed within an infrastructure to meet accurately forecasted workloads
Capacity planning is critical to ensure success of any ECM implementation. The predicting and sizing of a system is impossible without a good understanding of user behavior as well as an understanding of the complete deployment architecture including network topology and database configuration.
A high level description on my concept of capacity planning is shown below. Basically a good capacity planning mechanism/application implements each one of the phases outlined below, considering a customized Peak Period Methodology explained below.
The capacity planning approach that I’m refereeing to is done after general deployment, so prior to this capacity planning you’ll need a good sizing exercise to define the initial architectural requirements for your ECM platform.
In this article we are assuming that we have a fully deployed production environment where we will focus our capacity planning efforts.
Peak Period Methodology
I consider that the Peak period Methodology is the most efficient way to implement a capacity planning strategy as it gathers vital performance information when the system is under more load/stress. On its genesis the peak period methodology collects and analyzes data during a configurable peak period. This allows the application to estimate the number of CPU’s, Memory and cluster nodes on different layers of the application required to support a given expected load.
The peak period may be an hour, a day, 15 minutes or any other period that is used to collect utilization statistics. Assumptions may be estimated based on business requirements or specific benchmarks of a similar implementation.
On my personal approach to ECM capacity planning implementation, i focus my efforts on 6 key layers, obtaining specific metrics during a defined peak period:
- Web Servers machines ( Apaches/WebServer for static content )
HTTP hits/Sec – Useful for measuring the load on the Web servers.
- Application Servers machines Holding the client application
Page Views / Second – Understand the throughput of client applications
- Application Servers machines holding the Ecm Server ( Alfresco/FileNet/Documentum/SharePoint )
Transactions / Second – Understand the throughput of the ECM server
- LDAP Servers machines ( LDAP )
Activities ( reads ) / Second – Understand the throughput of LDAPS
- Database Servers machines ( Oracle )
Database Transactions/Sec – Measuring the load on the database servers.
- Network
KB/Sec – A measure of the raw volume of data received and sent by the application. Useful for measuring the load on the network and machine network card.
On top of that i also collect a very important aspect on the main application client , the response time ( time taken for a client application do respond to a user request ). The values i take in consideration for capacity are :
A.R.T – Average response time
M.R.T – Maximum response time
How to implement capacity Planning ?
I normally use a Collector agent that collects the necessary data from the various sources during the defined peak period. The collector runs daily and stores its data on Elastic Search for peak period analysis. The more data gets inside elastic search along the application live cycle, the more accurate are the capacity predictions because they do represent the “real” application usage during the defined peak period.
The Collector Agent uses zookeeper to store important information and definitions regarding repositories, machines, peak period definition, urls and other environment related constants.
To minimize impact on overall system performance the collector executes every day at a chosen period (outside business hours). That is configured at the OS level (using the crontab functionality or similar)
Integration Capacity Planning with monitoring systems
This approach is designed to integrate with most of the existing system monitoring software units such as HPOpenView, JavaMelody. I’m currently implementing this approach to perform capacity planning on Alfresco installations and I’m integrated it with our existing open source stack for motorization framework (great job of Miguel Rodiguez from Alfresco support ).
Capacity Planning Troubleshooting
Gathering this relevant data represents an important role on troubleshooting, on the capacity planning implementations I’ve seen, to Analise capacity data represents a crucial role while troubleshooting an application.
Data Analysis to predict architecture changes
By performing regular analysis to our capacity planning data, we know exactly when and how we need to scale our architecture, this represents a very important role when modeling and sizing our architecture for the future business requirements.
What’s next
In next September and October i will be speaking in the Alfresco Summit in San Francisco and London on how to appropriate size an Alfresco Installation. This presentation will also include relevant information about capacity planning and the implementation of this approach on a real life scenario. Consider yourself invited to join the Alfresco Summit and to assist my presentation, http://summit.alfresco.com/2014-speakers/luis-cabaceira
Until them, all the best. One Love.
Luis