The Rhapsody Integration Engine includes data validation functionality which is part of the start-up process in Rhapsody and provides the engine with the ability to recover from power failures. It also repairs and recovers data damaged by OS-level write errors.
When the engine detects an unclean shutdown (caused by a power failure, for example) upon start-up, Rhapsody performs data validation on the Rhapsody message store to ensure data integrity.
The start-up process as relevant to data validation consists of the following stages:
Unclean Shutdown Detection
Rhapsody can take several minutes to completely shut down, particularly if you have Out->In communication points that are currently blocking, waiting for a response. The Rhapsody Service Monitor makes it difficult to get this wrong on Windows®, but on Unix® it is possible to restart at the wrong time.
Rhapsody can be shut down cleanly in one of the following ways:
- Rhapsody IDE - Navigate to Rhapsody>Server Actions>Stop Rhapsody Engine....
- Rhapsody Service Monitor (Windows® only) - right-click the Rhapsody Service Monitor on your taskbar, then select Stop Rhapsody Service.
- Unix® only - enter the command
./rhapsody.sh stop
. - Soft process kill (do not use a hard process kill:
kill -9
).
When Rhapsody starts, the initialization service creates a file called Rhapsody\rhapsody\data\engine.running
. Just before the initialization service terminates, it deletes that file. If the engine.running
file is present when Rhapsody starts, an unclean shutdown has occurred.
The presence of the engine.running
file causes data validation to occur during startup. As validation of potentially corrupt files takes place on demand, there is no visual indication of this process in the Management Console.
Data Validation
In versions of Rhapsody prior to Rhapsody 6.3.0, validation is performed on all files when Rhapsody is first started after an unclean shutdown. As of Rhapsody 6.3.0, validation is performed on a file marked as open when that file is first accessed by Rhapsody. If a file is never accessed, it will not be validated. This prevents unnecessary validation on files that are no longer in use.
When Rhapsody is running and processing messages, it frequently writes to files on disk. Only files open for writing at the time of an unclean shutdown are at risk of corruption – files not currently being accessed are not affected by an unclean shutdown. Thus, when a file is currently open by Rhapsody, an associated marker file is created, which is named using the same filename as the file and the .mk
file extension. When Rhapsody is finished using the file, the file is closed and the corresponding marker file is removed. If Rhapsody encounters an unclean shutdown, any files open at the time will have associated marker files. Upon engine startup, the Data Validation Service validates a file on first access only if it detects an associated marker file.
In order to force a file to be validated as well, create a corresponding marker file in the same directory.
B-tree Validation and Repair
Rhapsody has an internal b-tree service, which is responsible for creating all the b-trees used by any other component in the engine. During repair of an open b-tree on first access, a log entry is added at the TRACE
level:
TRACE [ DefaultQuartzScheduler_Worker-4] [rhapsody.btree.impl.BTreeServiceImpl] Explicitly repairing btree 'E:\data\idgeneration\idGenerator.store'... TRACE [ DefaultQuartzScheduler_Worker-3] [rhapsody.btree.impl.BTreeServiceImpl] Took 3ms to repair btree 'E:\data\idgeneration\idGenerator.store'. TRACE [ DefaultQuartzScheduler_Worker-4] [rhapsody.btree.impl.BTreeServiceImpl] Explicitly repairing btree 'E:\data\logs\log.index'... TRACE [ DefaultQuartzScheduler_Worker-3] [rhapsody.btree.impl.BTreeServiceImpl] Took 3ms to repair btree 'E:\data\logs\log.index'. ...
Input Queue Validation and Repair
Rhapsody repairs broken input queues during start-up, because corruption in the head, tail or archive pointers can effectively result in a communication point not working. Input queue validation is performed only after any open transactions have been rolled back during startup. This is because, in some cases, the transaction rollback can cause the problem depending on exactly when the OS decides to flush each data store.
Any input queue repairs result in a log message:
INFO [ DefaultQuartzScheduler_Worker-3] [.rhapsody.transaction.TransactionService] Beginning transaction recovery... INFO [ DefaultQuartzScheduler_Worker-3] [.rhapsody.transaction.TransactionService] Recovered 0 transactions. INFO [ DefaultQuartzScheduler_Worker-3] [.rhapsody.transaction.TransactionService] Transaction service initialised. INFO [ DefaultQuartzScheduler_Worker-2] [tence.spi.AbstractComponentTrackingIndex] Validating live tracking index at 'E:\data\tracking\communicationPointTrackingLiveV3.idx'... INFO [ DefaultQuartzScheduler_Worker-2] [tence.spi.AbstractComponentTrackingIndex] Validating live tracking index at 'E:\data\tracking\routeTrackingLiveV3.idx'... INFO [ DefaultQuartzScheduler_Worker-2] [king.persistence.spi.TrackingStatusIndex] Validating tracking status index... INFO [ DefaultQuartzScheduler_Worker-1] [rhapsody.queue.input.file.FileInputQueue] Validating input queue in 'E:\data\queue\input\178'... INFO [ DefaultQuartzScheduler_Worker-1] [rhapsody.queue.input.file.FileInputQueue] Validating input queue in 'E:\data\queue\input\294'... ...
Transaction Recovery
Transactions are used to keep the data store consistent as messages flow through the system. During startup, Transaction Recovery ensures that the data stores are in a consistent state by checking that all transactions are either fully committed or rolled back.
A progress indicator displays the number of records recovered and what stage the Transaction Recovery process is in:
- Initialization.
- Scanning transaction records.
- Deserializing transaction records.
- Recovering transaction records.
- Cleaning up transaction records.
- Completion (this stage could also indicate Transaction Recovery failed, in which case you should check the Rhapsody log).
Configuration Validation and Repair
Validation and repair of the configuration are performed as part of the start-up process. Unlike other data stores, it is not always possible to completely repair the configuration structures. Depending on the type of corruption, Rhapsody may be prevented from starting up. In this case, the configuration should be restored from a backup before starting Rhapsody again.
The main error that Rhapsody will automatically recover from is if the most recent change-sets in the history are either not present or corrupt. If this is detected during startup, Rhapsody discards the invalid change-set and leaves the configuration in a consistent state:
WARN [ DefaultQuartzScheduler_Worker-1] [figuration.persistence.spi.ChangeHistory] Error occurred getting configuration for revision 236 - it will be rolled back. com.orchestral.rhapsody.configuration.persistence.ConfigurationPersistenceException: IO error getting set of changes in revision 236 at com.orchestral.rhapsody.configuration.persistence.spi.ChangeHistory.getChangeSet(ChangeHistory.java:314) at com.orchestral.rhapsody.configuration.persistence.spi.ChangeHistory.<init>(ChangeHistory.java:168) at com.orchestral.rhapsody.configuration.persistence.spi.VersionedConfigurationServiceImpl.doActivate(VersionedConfigurationServiceImpl.java:124) at com.orchestral.rhapsody.configuration.persistence.spi.VersionedConfigurationServiceImpl$1.run(VersionedConfigurationServiceImpl.java:104) at com.orchestral.rhapsody.taskscheduler.spi.TaskExecutor.execute(TaskExecutor.java:40) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529) Caused by: java.io.IOException: Location '1319936' does not contain correct record type. Type=UNUSED, Expected=START at com.orchestral.rhapsody.blobstore.file.FileReadableBlob.readBlock(FileReadableBlob.java:189) at com.orchestral.rhapsody.blobstore.file.FileReadableBlob.<init>(FileReadableBlob.java:71) at com.orchestral.rhapsody.blobstore.file.FileBlobStore.getReadableBlob(FileBlobStore.java:121) at com.orchestral.rhapsody.blobstore.file.rolling.RollingFileBlobStore.getReadableBlob(RollingFileBlobStore.java:158) at com.orchestral.rhapsody.configuration.persistence.spi.ChangeHistory.getChangeSet(ChangeHistory.java:301) ...