Don’t Let Historical Data Sink Your OpenEdge Performance
Archive and Deleting Data
Databases are growing larger and larger. What seemed like a large database 20 years ago is now the typical size of a database. It is no longer uncommon to see OpenEdge databases that are 1 Terabyte or larger.
Operating environments have been able to keep up with this growth, but at a cost, both financially as well as risk. The larger the database the higher the chance of something going wrong with it, and the longer it takes to return the environment to normal.
Problems with large databases
There are several issues to be aware of for large databases.
1) Performance related issues. The typical application has a “working set” of data that it needs to operate. Customer lookups, items and inventory lookups, etc. All of this fit in memory, but then someone runs an unbounded report, and it inadvertently brings in all the historical data from 20 years ago, flushing out the “working set” from database buffer pool. If there was regular archiving and deleting, this unbound process would not have as much historical data to read from. Many customers use a reporting database, fed from theOpenEdge Pro2 product, for historical reporting.
2) Backup space and time. Larger databases take more disk space to backup and more time to do the backup. Operational performance suffers when backups duration extends into operational hours.
3) Longer recovery times. In the event of an incident, the recovery time will be dictating by how long it takes to backup the existing database, and how long it takes to restore a previous backup of the database. With large databases this is measured in dozens of hours.
How to manage your database size
Determine what data is required to run the business. Some teams may demand access to all historical data. Question them and offer solutions that get data out of the production database. A possible solution is an archive database. Restore a copy of a backup to a different location. When data is archived out of production, move the archived data to the archive database.
The QAD application has archive utilities built into the application for just this purpose.
Manually Archiving data
What is the fastest way to delete data from a table. This k-base article shows this code:
https://community.progress.com/s/article/P36834
DEFINE VARIABLE iRecord AS INTEGER NO-UNDO.
DEFINE VARIABLE iBatch AS INTEGER NO-UNDO INITIAL 10000.
OUTER:
DO WHILE TRUE TRANSACTION:
FOR EACH <YourTableNameGoesHere> EXCLUSIVE-LOCK:
iRecord = iRecord + 1.
DELETE <YourTableNameGoesHere>.
/* Commit the current transaction(iteration) and proceed with the next.
This will restart the FOR EACH block with the new first record. */
IF (iRecord MODULO iBatch) EQ 0 THEN NEXT OUTER.
END.
LEAVE.
END.
The above code works well for tables that have no unique indexes and have less than 5 million rows to be deleted. For tables that don’t meet that requirement, a simpler approach will be much faster:
FOR EACH <YourTableNameGoesHere> EXCLUSIVE-LOCK:
DELETE <YourTableNameGoesHere>.
END.
The reason the second approach is better is because of Unique Index Deleted Placeholders. There is another
article about this subject here.
Post Archive Activities
Archiving should be done regularly, monthly, quarterly or at worst case annually. If archiving has not been done for years or decades, then the initial archive will be time consuming, and the end result of the database will need maintenance.
A massive archive will require a dump and load of the tables involved to both reclaim unused space. Just because you deleted rows doesn’t mean the database shrunk. A dump/truncate/load is required.
Regular archive is not as drastic of an event, but after the archive an Index Compact is required to remove deleted place holders and remove empty space in the index b-tree. For more information on this see this article.
Conclusion
Archiving data is just one of the many activities required to have a well performance application environment. Regular monitoring and periodic maintenance are required. Stale systems have the risk to running into problems like poor performance or unplanned outages. This article is addressing one aspect of the required maintenance. If in doubt, ask a professional.
OpenEdge System Health Check
Is your Progress OpenEdge database running optimally? Could there be an unknown issue that could potentially damage or bring down your production database?