What exactly RPO means

Igor Nemy
2 min readJun 29, 2015

During any Disaster Recovery planning you probably operate such SLA terms like RTO and RPO. They are very common and extremely important notions which can simplify research of right method/tool to meet your DR requirements. In short, these abbreviations stand for Recovery Time Objective and Recovery Point Objective which you of course know. But, I found that far not every one specialist who participates DR negotiations fully understand what exactly RPO means and how it is actually performs.

So, in short, RPO is measured in time and indicates a portion of data that you tolerate to lose during a failure or disaster. Most people think that if set replication or backup job frequency to, as example, 1 hour, they will get RPO of 1 hour. But unfortunately this is MISLEADING thought! Actual RPO will be the time to transfer data change which might be much longer than 1 hour. You may ask, WHY? Ok, I will explain on the given example.

Replication/backup job is set to 1 hour. Our dataset, i.e. combination of data blocks (or any other granular objects), is constantly changing over the time. Not fast, but it changes. And our business requirement is to be able to recover data younger than given 1 hour interval. Each 1 hour interval! For each of data block! See? No? Ok, read further.

Here is illustration. For example, our given data block was committed at 10am and replication job took up to 50 minutes to to transfer this point to recovery site. Thus at 10:50am we have recovery point with condition of 10am and it will have only 10 minutes before it will be stale and don’t much our given RPO rule. So, we must immediately perform another replication job with lightning speed to fully transfer changes of last 50 minutes of that given block to recovery site in next 10 minutes. And no longer! If transfer for next replication job performs longer that 10 minutes, for example 40 minutes, we will have RPO violation between 11:00am and 11:30am, because first recovery point will stale at 11:00am and second recovery point will be received only at 11:30am.

RPO timeline

So, why I must to know actually how often replications perform if replication algorithm deals with this, you may ask? Answer as usually lies in proper planning of DR design, and in this case — in the network saturation consideration to fully comply with replication requirement on amount of channel bandwidth between protected and recovery sites.

BTW, ask your vendor representative to clarify how exactly their replication tool deals with RPO in your environment.

--

--