How to analyse and repair a corrupted database
<< Database corruption | Database technology articles | Transaction options explained >>
How to analyse and repair a corrupted database
By IBPhoenix writers February 2010 - Copyright IBPhoenix.com
You have to try very hard to corrupt a Firebird database — it’s designed to survive the hard knocks that break databases in other systems. You will know a database has been corrupted if you cannot connect to it or back it up and a message in the firebird log, or from gbak –b[ackup]
, tells you that there is corruption, or some other message reports a checksum error.
This topic describes the steps you need to take, primarily using the command-line tools gfix
and gbak
, to try to identify some kinds of corruption and repair the damage. Be aware, however, that there are certain kinds of severe corruption that this procedure cannot fix.
Recognising possible corruption
Evidence of possible corruption, although rare enough, might appear during the database validation that you do as part of your routine housekeeping to detect minor anomalies and recycle misallocated space. You might have run a validation to check whether some misbehaviour could be associated with structural damage, perhaps prompted by one of the following circumstances:
- A corrupt database or consistency check error appearing in the
firebird.log
or in a client application - A backup that ends abnormally
- Power failure or brownout without UPS protection or with suspected UPS failure
- Suspected or system-reported hard disk, network, or memory faults
- A database shadow taking over from a dead database after a disk crash
- A production database is about to be moved to another platform or storage system
- Suspected compromise of the network or database by malicious attack
Preparing for analysis and repair
If you suspect you have a corrupt database, it is important to follow a proper sequence of recovery steps in order to avoid further corruption. Because some of the steps make irreversible changes to structures within the database, it is essential to isolate the original database file from these operations and work with a file copy.
Take the database off-line as soon as evidence of a possible corruption appears and isolate the file to prevent any further connections. If it is not practicable to move the file then rename it.
Important: Never attempt a recovery task by operating directly on the production database file.
1. Get the database off-line.
- Skip this step if the server is not running or is unable to accept connections.
- If the server is running and is still accepting connections then ask all users to cancel their work and log out. If necessary, use
gfix -force
with a sufficient timeout to let them try to get off gracefully. - If logged-in users are unable to log out, the last resort will be to stop the server (Superserver or Superclassic) or kill the processes (Classic).
CAUTION: If the Superserver or Superclassic server is serving other databases, be sure to get those databases off-line before you stop the server.
2. Make a working copy of the database. Once the database is off-line, take a file copy of it using a reliable file system utility and store it to disk in the location where you intend to work with it. Take note:
- Do not locate the copy in the same directory as the suspect database.
- If you suspect hard disk damage, make at least two copies and place them on a separate physical drive. Work on one and keep the other as a fall-back.
- Make sure enough space is available to accommodate the copied file.
- Do not start the analysis and repair steps until you are certain the copying operation has finished.
- Make sure you have sufficient disk space available to both create a backup and restore the backup to a new version of the databases.
3. Make an alias for the working datababase in aliases.conf
. For example,
tempdb.fdb = /var/databases/copy1.fdb
4. Set the environment variables ISC_USER
and ISC_PASSWORD
.
gfix
and gbak
, requiring repeated logins as SYSDBA, it will save some typing.
SET
command in the command shell or the specialised applet under Control Panel -> Adminstration Tools -> Advanced.
ISC_USER=SYSDBA ISC_PASSWORD=masterkey
- Take the long road: skip this step and be prepared to do the typing; or
- Use Superuser or Administrator privileges to masquerade as SYSDBA. This is possible on Linux and configurable on Windows if the server is Firebird 2.1 or higher.
Now you should be ready to begin the analysis and repair procedure.
Steps for recovery using command-line tools
For our purposes here, let's assume that
- you are set up so that you don't have to supply login credentials
- you have aliased your working copy (in
aliases.conf
) ascopy1.fdb
The steps described below involve use of privileged options of the command-line tools gfix
and gbak
. They will not work if you do not have SYSDBA or equivalent privileges. This procedure can mend some forms of corruption and return the database to a state wherein it becomes usable. It usually results in some data loss, since its strategy is to disable and remove troublesome records and even pages, in some situations.
1. Check for database corruption. The gfix
switches –v[alidate]
and –f[ull]
are used first, to check record and page structures. This checking process reports corrupt structures and releases unassigned record fragments or orphan pages (i.e., pages that are allocated but unassigned to any data structures).
fix -v -full copy1.fdb
–n[o update]
can be used with –v
, to validate and report on corrupt or misallocated structures without attempting to fix them:
fix -v -n copy1.fdb
–i[gnore]
switch to have the validation ignore them:
fix -v -n -i copy1.fdb
- If errors were reported, the next step is for you.
- If the outcome of all this is that no errors were reported, skip the next two steps: attempting to mend databases without broken structures can actually create breakages.
2. Mend corrupted pages
gfix
validation reports damage to data, the next step is to mend (or repair) the database by putting those structures out of the way.
–m[end]
switch marks corrupt records as unavailable, so they will be skipped during a subsequent backup.
–f[ull]
switch to request mend for all corrupt structures and an –i[gnore]
switch to bypass checksum errors during the mend.
fix -mend -full -ignore copy1.fdb
fix -m -f -i copy1.fdb
3. Validate after –mend
–mend
command completes its work, again do a full validation to check whether any corrupt structures remain:
gfix -v -full copy1.fdb
4. Clean and recover the database
gbak
, even if errors are still being reported. Include the –v[erify]
switch to watch progress. In its simplest form, the backup command would be (all in one command):
bak -b -v -i copy1.fdb [pathto]copy1.fbk
gbak
:
gbak
to fail. If this happens, run it again, adding the –g
switch to signify no garbage collection:
gbak -b -v -i -g copy1.fdb{path}copy1a.fbk
–l[imbo]
switch:
gbak -b -v -i -g -l {path}copy1a.fbk
5. Restore the cleaned backup as a new database.
–v[erify]
switch to watch what is being restored:
gbak -create -v [pathto]copy1a.fbk [pathto]reborn.fdb
gbak -c
with other switches to eliminate the sources of these problems. For example:
–i[nactive]
switch will eliminate problems with damaged indexes, by restoring without activating any indexes. Afterwards, you can activate the indexes manually, one at a time, until the problem index is found.
–o[ne_at_a_time]
switch will restore and commit each table, one by one, allowing you restore good tables and bypass the problem ones.
6. Validate the restored database.
–n[o_update]
switch:
gfix -v -full [pathto]reborn.fdb
If identifiable damage to page and record structures were found and fully eliminated by this procedure, you should now be able to log in to the "reborn" database.
All that remains now is to store the good backup in a safe place, move the repaired copy back to its proper directory and rename it.
If the preceding steps do not work, but you are still able to access the copy of the corrupt database, you may still be able to transfer table structures and data from the damaged database to a new one, a technique known as data pumping. Several free and commercial tools are available. Data pumping on pre-Firebird 2 servers can also be done using the QLI (Query Language Interpreter) command line tool that can be found in Firebird's /bin
directory.
See also:
Database validation
Backup database
Restore database
Database repair
Database properties
Database corruption
Firebird database housekeeping utility
GFIX
GBAK
and GSPLIT
Forced writes
Firebird for the database expert: Episode 3 - On Disk Consistency
Preventing data loss
Alternative database repair methods
Detect and avoid database errors
Tracking down crashes on Win32 systems
Tracking down crashes on Linux
back to top of page
<< Database corruption | Database technology articles | Transaction options explained >>