Disaster Recovery for the OpenEdge Database (Part 1)
Being able to recover quickly when a disaster strikes is key for any running production system. Let’s talk through the details of disaster scenarios.
Normal Operations
When the OpenEdge database is operating normally, everyday, changes are being made in memory and eventually being written to disk. The active state of the database is made up of 3 parts. The database on disk, the database in memory, and the changes recorded in the Before Image (BI) file, commonly referred as the Recovery Log for other database systems.
During normal shutdown, all open transactions are rolled back and all the modified database blocks that reside in memory are written to disk. These writes are “standard” writes that write to the Operating System (OS) Buffer Cache. There is a risk that the power goes out before the data is written from the OS Buffer Cache to physical media, hence the BI file is needed to verify all the database blocks are current.
Database Crashes
When the database crashes due to an error or executing “Emergency Shutdown” the database blocks are purposefully not written to disk, because if the error found would corrupt the database, it is better to not corrupt the database on disk but let Crash Recovery make the database intact. All open transactions remain open, and the transaction table is written to the BI file to be dealt with during Crash Recovery
Crash Recovery Process
Crash Recovery is performed anytime the database is opened for operations. This could be starting a server on the database, a single-user session, truncate bi, offline backup, and many other utility operations. Its purpose is to make sure the database is physically intact before letting any new operations happen on the database.
Crash Recovery has 3 phases being Cluster Fix-Up, Physical Redo, Logical Undo.
Cluster Fix-Up walks the BI Cluster Ring to make sure the Next and Previous pointers are correct. If any are broken, they get repaired.
Physical Redo is a process where BI file is read, extracting “notes” from the BI file, which are used to make sure the database blocks are up to date. Every change to the database creates a note describing that change. The note contains the block being changed, the version number of that block, and information on how to make that change, and in some cases information for how to revert that change. Physical Redo, will read the note, get the block number and version number for that block, then read that database block and make sure the block version number is equal to or higher than the version in the note. If for some reason the block version shows the block is not current, then note is then reapplied, changing the block version in the process. At the end of Physical Redo, all the database blocks will be current.
Logical Undo is the process of reverting any changes made for any open transaction. During a normal shutdown all transactions are backed out as part of that process, so typically there is nothing to be done in this phase, however if the database was shutdown abnormally, due to an error, or running Emergency Shutdown, then there may be active transactions in the database that need to get resolved prior to allowing any new changes on the database.
Configuration Risks
The database by default runs in a Reliable mode (-R) which writes to the BI file directly to disk, and writes to the data files are written to the OS Buffer Cache. This guarantees that Crash Recovery will work. However, this Reliable mode can be overridden with other parameters.
Running with No Crash Protection (-r). This mode changes the writes to the BI file from unbuffered, meaning writing directly to disk, to buffered, meaning writing to the OS Buffer Cache. There are times where operationally this makes sense to do. An example would be when you are loading large amounts of data in the database. Provided you could repeat this action in the event of a system failure, it may make sense to run with No Crash Protection. The risk of running in this mode is if the OS crashes or power failure the database is sure to be corrupt.
Running with No Integrity (-i). Like running with No Crash Protection (-r), No Integrity also performs all writes unbuffered, but this also reduces the amount of information written to the BI file. The BI file contents are limited to information to do Physical Redo phase of Crash Recovery but contains no information for the Logical Undo phase of Crash Recovery. This means if the database shuts down abnormally, the database will be corrupt. The only time the No Integrity option should be used during the load phase of dump and load.
Conclusion
The OpenEdge Database is designed to protect itself in the event of a failure. Basic failures like errors, Operating System crashes, or power failures are overcome by the Crash Recovery Process, however bad configuration choices can make bad situations. It is best to consult with an expert when making any kind of configuration change on the database.
Look for Disaster Recover for the OpenEdge Database (Part 2) to see how you can configure your system to survive worse failures like corruption, missing files, missing filesystems, and more
OpenEdge System Health Check
Is your Progress OpenEdge database running optimally? Could there be an unknown issue that could potentially damage or bring down your production database?