In a human body the heart and the brain are the 2 most important organs, if those are not performing well, nothing else is. The Alfresco database and filesystem where the content store resides are the brain and heart of Alfresco. Those are the 2 most important layers of every Alfresco architecture.

Get to know the Alfresco Database throughput

If your project will have lots of concurrent users and operations or the number/estimate number of documents is very big (> 1M)) you need to be informed about your database throughput.

Most common throughput factor of the database is transactions per second.

DB performance in a transactional system are usually the underlying database files and the log file. Both are factors because they require disk I/O, which is slow relative to other system resources such as CPU.

In the worst-case scenario, in big alfresco databases with a big number of documents ):

Database access is truly random and the database is too large for any significant percentage of it to fit into the cache, resulting in a single I/O per requested key/data pair.

Both the database and the log are on a single disk. This means that for each transaction, the AlfrescoDB is potentially performing several filesystem operations:

Disk seek to database file
Database file read
Disk seek to log file
Log file write
Flush log file information to disk
Disk seek to update log file metadata (for example, inode information)
Log metadata write
Flush log file metadata to disk

Faster Disks normally can help on such sittuations but there are lots of ways (scale up, scale out) to increase transactional throughput.

In Alfresco the default RDBMS configurations are normally not suitable for large repositories deployments and may result into:

I/O bottlenecks in the RDBMS throughput
Excessive queue for transactions due to overload of connections
On active-active cluster configurations, excessive latency

Alfresco Database treads pool.

Most Java application servers have higher default settings for concurrent access, and this, coupled with other threads in Alfresco (non-HTTP protocol threads, background jobs, etc.) can quickly result in excessive contention for database connections within Alfresco, manifesting as poor performance for users.

If tomcat is being considerer this value is normally 275. The setting is called db.pool.max and should be added to your alfresco-global.properties (db.pool.max=275).

http://docs.oracle.com/cd/E17076_04/html/programmer_reference/transapp_throughput.html

How to calculate the size of your Alfresco database

All operations in Alfresco require a database connection, the database performance plays a crucial role on your Alfresco environment. It’s vital to have the database properly sized and tuned for your specific use case.

To size your alfresco database in terms of space we’ve done a series of tests by creating content (and metadata) on an empty repository and analyzing the database growth.

Be aware that:

Content is not stored in the database but is directly stored on the disk
Database size is un-affected by size of the documents or the document’s content
Database size is affected by the number/type of metadata fields of the document

Hi all, back with another Alfresco related post, this time to show you how to size your alfresco database in terms of space.

The following factors are relevant to calculate the approximate size for an Alfresco database

Number of meta data fields
Permissions
Number of folders
Number of documents
Number of versions
Number of users

I’ve made a series of tests where i could verify how the Alfresco database grows.

I’ve made a bulk import with the following data.

Document creation method	In-place bulk upload
Number of Documents Ingested	148
Total Size of Documents	929.14 MB
Number of metadata fields per document	13 fields
Total number of metadata fields	1924

The table below shows the types of documents and its average sizes

Document Type	Extension	Average Size (KB)
MS Word Document	.doc	1024
Excell Sheet	.xls	800
Pdf document	.pdf	10240
PowerPoint presentation	.ppt	5120
Jpg image	.jpg	2048

Checking the diagram below we can see that the database indexes grow more than the data itself. By observing the growth of the database size we’ve concluded that the average metadata field occupation on the Alfresco database is approximately 5.5 K per metadata field

Also interesting is to verify the tables that grow in size(KB) after the content ingestion. Note that we are not applying any permission.

To size your database appropriately you must ask the right questions whose answers will help you to determine the database sizing.

Estimated number of users in Alfresco
Estimated number of groups in Alfresco
Estimated number of documents on the first year
Documents growth rate
Average number of versions per document
Average number of meta-data fields per document
Estimated number of folders
Average number of meta-data fields per folder
Estimated number of concurrent users
Folder based permissions (inherited to child documents)?

Database sizing formulas

Consider to following figures to determine your approximate database size.

- DV = Average number of document versions

- F = Estimated number of folders

- FA = Estimated number of folder metadata fields (standard + custom)

- D = Number of Documents * DV – Estimated number of documents including the versions

- DA =Estimated number of documents metadata fields (standard + custom)

The number of records on specific alfresco tables is calculated as follows:

- Number of records on alf_node (TN = F + D * DV)

- Number of records records on node_properties (TNP = F * FA + D * DA)

- Number of records records on node_status = (TNS = F + D)

- Number of records records on alf_acl_member= (TP = D) assuming permission will be set at the folder level and inherited

The approx. number of records in the database will be TRDB = TN + TNP + TNS + TP

The following formula is based on the number of database records. On our benchmarks we’ve observed that each database record takes about 4.5k of db space.

Formula #1 Database size = TRDB * 4.5K

Alternatively, we can base our calculations on the number of metadata fields of the documents, considering 5.5k for each metadata field and use the following formula.

Formula #2 Database size = (D * DA + F * FA) * 5.5K

The 2 formulas provided are only approximations on the size that your database will need and are based on benchmarks executed against a vanilla Alfresco version 4.2.2.

If we wish to consider users and groups add consider 2k for each user and 5k for each group.

Note that the formulas are not taking in consideration additional space for logging, rollback, redolog, etc.

Tuning your Alfresco database

In Alfresco the default RDBMS configurations are normally not suitable for large repositories deployments and may result into:

• Wrong or improper support for ACID transaction properties

• I/O bottlenecks in the RDBMS throughput

• Excessive queue for transactions due to overload of connections

• On active-active cluster configurations, excessive latency

Considering that your database layer will be used under concurrent load I’ve come up with a set of hints that will contribute to maximize your Alfresco database performance.

Database Thread pool configuration

A default Alfresco instance is configured to use up to a maximum of forty (40) database connections. Because all operations in Alfresco require a database connection, this places a hard upper limit on the amount of concurrent requests a single Alfresco instance can service (i.e. 40), from all protocols.

It’s recommended to increase the maximum size of the database connection pool to at least [number of application server worker threads] + 75. If tomcat is being considerer this value is normally 275. The setting is called db.pool.max and should be added to your alfresco-global.properties (db.pool.max=275).

After increasing the size of the Alfresco database connection pool, you must also increase the number of concurrent connections your database can handle, to at least the size of the Alfresco connection pool. Alfresco recommends configuring at least 10 more connections to the database than is configured into the Alfresco connection pool, to ensure that you can still connect to the database even if Alfresco saturates its connection pool.

Database Validation query

By default Alfresco does not periodically validate each database connection retrieved from the database connection pool. Validating connections is, however, very important for long running Alfresco servers, since there are various ways database connections can unexpectedly be closed (for example by transient network glitches and database server timeouts). Enabling periodic validation of database connections involves adding the db.pool.validate.query property to alfresco-global.properties and the query is specific for your database type.

Database	Value for db.pool.validate.query
MySQL[1]	SELECT 1
PostgreSQL	SELECT VERSION()
Oracle	SELECT 1 FROM DUAL

Database Scaling

Alfresco relies largely on a fast and highly transactional interaction with the RDBMS, so the health of the underlying system is vital. Considering our existing customers, the biggest running repositories are under Oracle (most of them RAC).

If your project will have lots of concurrent users and operations, consider an active-active database cluster with at least 2 machines. This can be achieved using Oracle RAC or a Mysql based solution with haproxy[1] (opensource solution) or a commercial solution like MariaDB[2] or Percona[3].

[1]http://haproxy.1wt.eu

[2]https://mariadb.org

[3]http://www.percona.com

You can use either solution, depending also you the knowledge that you have in-house. The golden rule is that the response time from the DB in general, should be around 4ms or lower. At this layer i don’t recommend to use any virtualization technology. The database servers should be physical servers.

The high availability and scalability for database is vendor dependent and should be addressed with the chosen vendor to achieve the maximum performance possible.

Database Monitoring

Monitoring your database performance is very important as it can detect some possible performance problems or scaling needs.

I’ve identified the following targets that should be monitored and analysed on a regular base.

Transactions
Number of Connections
Slow Queries
Query Plans
Critical DM database queries ( # documents of each mime type, … )
Database server health (cpu, memory, IO, Network)
Database sizing statistics (growth, etc)
Peak Period of resource usage
Indexes Size and Health

I hope this post can help you to understand the importance of the Alfresco database and that you can make use of it on your sizing exercise. Stay tuned for more Alfresco related posts.

All the best, One love,

Luis

Sizing and tuning your Alfresco Database

Get to know the Alfresco Database throughput

How to calculate the size of your Alfresco database

Database sizing formulas

Formula #1 Database size = TRDB * 4.5K

Formula #2 Database size = (D * DA + F * FA) * 5.5K

Tuning your Alfresco database

Database Thread pool configuration

Database Validation query

Database Scaling

Database Monitoring

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112