Disaster Recover for the OpenEdge Database (Part 2)
Being able to recover quickly when disaster strikes is key for any running production system. Let’s talk through the details of disaster scenarios.
In Disaster Recover for the OpenEdge Database (Part 1) the discussion was around how the OpenEdge database prevents corruption if/when an error occurs and how it recovers, provided there are no missing parts of the database. This Part 2 in the series will discuss how to recover the database when a part of the database is missing (physical problem), or an errant program modified data that it shouldn’t have (logical problem).
Backups
The most basic part of any Disaster Recovery strategy is running backups. Backups can be done online with the OpenEdge probkup utility or offline with any utility including the OpenEdge probkup utility.
I often get asked if I can use a VMWare snapshot or VEEM backups or other types of Operating System backup with the database is running. The answer in general is NO, however this is not a hard no as there are things you can do to allow an Operating System backup with a running OpenEdge database, but we will get to that later.
Offline Backup using OpenEdge probkup
This is the simplest approach, however, for the duration of the time it takes to backup the database, the database will be unavailable to application users. The process is simple.- Perform Crash Recovery to make sure the database is intact
- Start from the first area and backup the blocks in order up to the high-water mark of the area
- Continue with all the other areas until all blocks are backed up.
- Mark the database as backed up and exit
The result will be a single file (or set of files) that can be used with the OpenEdge prorest utility to restore a database.
Online Backup using OpenEdge probkup
This is more complex than the offline version for the OpenEdge product, but it is tried and true as online backup has been available since version 6.3 of Progress/OpenEdge released back in 1991. The Online Backup utility provides a snapshot in time represented by when the online backup starts. The process is as follows.
- Connect to the database
- Pause all update activity
- Backup the Before Image file (later versions only backup the Active Before Image Clusters)
- Let update activity continue
- Start from the first area and backup the blocks in order up to the high-water mark of the area
- Continue with all the other areas until all blocks are backed up.
- However, if a database block is going to be updated, that block will be backed up first before it is allowed to
- be updated. This preserves the snapshot in time
- Mark the database as backed up and exit
It is worth noting that people believe that an Online Backup will flush all the contents out of the Database Buffer Pool (-B). This is not the case. Unlike any other process, which reads a Database Block and places it at the front of the Least Recently Used (LRU) chain, the OpenEdge probkup utility places the block at the end of the LRU chain, so that block gets overwritten in memory with every OpenEdge probkup database read.
The resulting file(s) are different from an offline backup. The Online backup includes the Before Image data, and the blocks are likely not in sequential order. The OpenEdge prorest utility knows how to deal with both the offline backup and online backup without having to specify any special parameters.
Online Backup using an Operating System command, or VMWare Snapshot or Veem, etc
It is possible to do an online backup without using the OpenEdge probkup utility, but care has to be taken. The OpenEdge Crash Recovery system depends on a protocol called Write Ahead Logging (WAL). This is not unique to OpenEdge, in-fact every database system that have crash recovery built into it depends on this WAL protocol. What is WAL.
Write Ahead Logging means that the Before Image file or Recovery Log data is written to disk BEFORE the updates to the database. From Part 1 of this series, we learned that Recovery Log notes contain the version number of the block and how to make an update from one version to another version. During Physical redo, if the database block version is higher than the note version, the note gets skipped since that note has already been applied, however during Logical Undo, what happens if the database block has a higher version number than the note that is needed to undo the change? The undo cannot occur, and crash recovery fails. When crash recovery fails, the database is unusable, and it needs to be thrown away and a backup restored.
How can you do an Operating System or other backup while preserving Write Ahead Logging?
OpenEdge has a quietpoint utility that will flush all the Before Image blocks and Database Blocks to disk. It also pauses all update activity, however read activity can continue uninterrupted. During this quietpoint period, an Operating System backup can occur, Mirrored Disks can be split, VMWare or VEEM snapshots can be taken. Once these backup methods are complete, the quietpioint can be disabled and database updates can once again occur.
Backup Frequency and Retention
Now that we have established how to backup the database, the next question is how often to backup the database and how long to keep those backups. There is no hard and fast rule for this. But typically, people backup the database daily and keep a month’s worth of those backups. It is important to make copies of the backups to a different machine than the one holding the database. What happens if the disaster is the production machine died and cannot be fixed. If the production database AND the backups were on this machine and nowhere else, there is no opportunity to restore the backup.
What happens when the production machine is no longer available? Is data lost? You can find a new machine, install OpenEdge and the Application and all their components, then you can restore the database with the last good backup. What does this mean? This means you will lose data. How much data lost depends on how much time has passed between the last backup you could restore and the time of the failure.
In Disaster Recover Part 3 we will talk about ways to minimize the amount of data loss when the database needs to be restored.
OpenEdge System Health Check
Is your Progress OpenEdge database running optimally? Could there be an unknown issue that could potentially damage or bring down your production database?