Being able to recover quickly when disaster strikes is key for any running production system. Let’s talk through the details of disaster scenarios.
In Disaster Recover for the OpenEdge Database (Part 1) the discussion was around how the OpenEdge database prevents corruption if/when an error occurs and how it recovers, provided there are no missing parts of the database. This Part 2 in the series will discuss how to recover the database when a part of the database is missing (physical problem), or an errant program modified data that it shouldn’t have (logical problem).
The most basic part of any Disaster Recovery strategy is running backups. Backups can be done online with the OpenEdge probkup utility or offline with any utility including the OpenEdge probkup utility.
I often get asked if I can use a VMWare snapshot or VEEM backups or other types of Operating System backup with the database is running. The answer in general is NO, however this is not a hard no as there are things you can do to allow an Operating System backup with a running OpenEdge database, but we will get to that later.
This is more complex than the offline version for the OpenEdge product, but it is tried and true as online backup has been available since version 6.3 of Progress/OpenEdge released back in 1991. The Online Backup utility provides a snapshot in time represented by when the online backup starts. The process is as follows.
It is possible to do an online backup without using the OpenEdge probkup utility, but care has to be taken. The OpenEdge Crash Recovery system depends on a protocol called Write Ahead Logging (WAL). This is not unique to OpenEdge, in-fact every database system that have crash recovery built into it depends on this WAL protocol. What is WAL.
Write Ahead Logging means that the Before Image file or Recovery Log data is written to disk BEFORE the updates to the database. From Part 1 of this series, we learned that Recovery Log notes contain the version number of the block and how to make an update from one version to another version. During Physical redo, if the database block version is higher than the note version, the note gets skipped since that note has already been applied, however during Logical Undo, what happens if the database block has a higher version number than the note that is needed to undo the change? The undo cannot occur, and crash recovery fails. When crash recovery fails, the database is unusable, and it needs to be thrown away and a backup restored.
OpenEdge has a quietpoint utility that will flush all the Before Image blocks and Database Blocks to disk. It also pauses all update activity, however read activity can continue uninterrupted. During this quietpoint period, an Operating System backup can occur, Mirrored Disks can be split, VMWare or VEEM snapshots can be taken. Once these backup methods are complete, the quietpioint can be disabled and database updates can once again occur.
Now that we have established how to backup the database, the next question is how often to backup the database and how long to keep those backups. There is no hard and fast rule for this. But typically, people backup the database daily and keep a month’s worth of those backups. It is important to make copies of the backups to a different machine than the one holding the database. What happens if the disaster is the production machine died and cannot be fixed. If the production database AND the backups were on this machine and nowhere else, there is no opportunity to restore the backup.
What happens when the production machine is no longer available? Is data lost? You can find a new machine, install OpenEdge and the Application and all their components, then you can restore the database with the last good backup. What does this mean? This means you will lose data. How much data lost depends on how much time has passed between the last backup you could restore and the time of the failure.
In Disaster Recover Part 3 we will talk about ways to minimize the amount of data loss when the database needs to be restored.
Is your Progress OpenEdge database running optimally? Could there be an unknown issue that could potentially damage or bring down your production database?