The purpose of this section is to provide an approach to troubleshooting Archive Cleanup problems in Rhapsody for Rhapsody Support staff.
Overview
Rhapsody’s Archive Cleanup processes files in the Datastore and determines which ones can be removed safely without losing any live or archived messages.
Archive Cleanup process starts by retrieving a set of the message metafiles that currently exist in Rhapsody. This set of metafiles is then filtered through the queues, the message tables (error and hold queues), and the live message tracking store to determine which of these files are still live and/or contain messages that have completed processing within the configurable Archive Cleanup period. During this process, any queue files that only refer to messages beyond the cleanup period are added to the list of files to delete.
The filtering process identifies a set of message metafiles that are no longer required and therefore eligible for cleanup. Once Archive Cleanup has identified the files to be removed it will attempt to delete them.
Archive Cleanup Problems
On occasion, Archive Cleanup is unable to delete the files from the Datastore. As a result, customers often report the following problems:
- Archive Cleanup is unable to remove the files.
- Rhapsody is not releasing space from the Datastore.
- Archive Cleanup is not working.
- Rhapsody is running out of disk space.
- Available disk space for Datastore has been slowly decreasing.
Archive Cleanup Troubleshooting
Archive Cleanup can operate incorrectly for a wide variety of reasons. Thus, Archive Cleanup problems are usually handled on a case by case basis to identify the root cause. However, there are some general steps you can take to narrow down the root cause.
Initial Steps
Messages are not eligible for cleanup if:
- The message is live on an input queue, output queue, or processing queue. This normally is the case for very recent messages (in other words, those that have not yet completed processing), or can be caused by a stopped communication point or route with a backlog of messages.
- The message is in the Error Queue or Hold Queue. Messages in these queues are still considered live by Rhapsody and so cannot be cleaned up. The Error Queue and Hold Queue should be checked regularly and any messages on these queues dealt with (in other words, fixed and reprocessed, or removed).
- Message tracking is still waiting for a response for the message. In this case, the message itself has been sent, but no response has yet been received. This is only likely to ever be an issue when very large response timeouts (in other words, in the order of days rather than minutes) are used with message tracking.
- The message is present on an input queue archive, but was received after the archive cleanup cut-off time.
- The message is present on an output queue archive, but was sent after the archive cleanup cut-off time. Tracked messages are added to the output queue archive every time that they are sent, and so are not eligible for cleanup unless the last send before the Archive Cleanup cut-off time.
The first two conditions are the most likely reasons for Rhapsody not removing specific message meta files during an Archive Cleanup run. Therefore, before going through the effort of obtaining a trace log for Archive Cleanup and analyzing it, check for:
- Stopped communication points or routes with waiting messages.
- Messages on the error or hold queues.
Enable TRACE Logging
Trace logging can be enabled by adding the following lines to the log4j.properties
file:
log4j.logger.com.orchestral.rhapsody.cleanup=TRACE,CLEANUP log4j.appender.CLEANUP=org.apache.log4j.RollingFileAppender log4j.appender.CLEANUP.File=log/cleanup-log.txt log4j.appender.CLEANUP.MaxBackupIndex=9 log4j.appender.CLEANUP.MaxFileSize=5MB log4j.appender.CLEANUP.layout=org.apache.log4j.PatternLayout log4j.appender.CLEANUP.layout.ConversionPattern=%d %5.5p [%32.32t] [%40.40c] %m%n
# [1] Logs all the Archive Cleanup events log4j.logger.com.orchestral.rhapsody.cleanup.CleanupService=TRACE,ArchiveCleanupAppender log4j.appender.ArchiveCleanupAppender=org.apache.log4j.RollingFileAppender log4j.appender.ArchiveCleanupAppender.File=log/archiveCleanupLog/cleanup-log.txt log4j.appender.ArchiveCleanupAppender.MaxBackupIndex=9 log4j.appender.ArchiveCleanupAppender.MaxFileSize=5MB log4j.appender.ArchiveCleanupAppender.encoding=UTF-8 log4j.appender.ArchiveCleanupAppender.layout=org.apache.log4j.PatternLayout log4j.appender.ArchiveCleanupAppender.layout.ConversionPattern=%d{DATE} [%-5p] {%t} (%c) %m%n # [2] Logs only info about deleted files log4j.logger.ArchiveCleanup=TRACE,DeletedFilesAppender log4j.additivity.ArchiveCleanup=false log4j.appender.DeletedFilesAppender=org.apache.log4j.RollingFileAppender log4j.appender.DeletedFilesAppender.File=log/archiveCleanupLog/deleted-files-log.txt log4j.appender.DeletedFilesAppender.MaxBackupIndex=9 log4j.appender.DeletedFilesAppender.MaxFileSize=5MB log4j.appender.DeletedFilesAppender.layout=org.apache.log4j.PatternLayout log4j.appender.DeletedFilesAppender.layout.ConversionPattern=%d %5.5p [%32.32t] [%40.40c] %m%n
Interpreting Archive Cleanup TRACE Logs
The following example is a sample archive cleanup trace log (key lines are in bold):
2014-06-26 15:00:00,031 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Starting cleanup of data older than Jun 12, 2014 3:00:00 PM
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Candidate files for cleanup from MessageService
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105093
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105092
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105095
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105094
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105089
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105088
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105091
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105090
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105101
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105100
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105103
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105102
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105097
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105096
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105099
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105098
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105108
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105109
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105110
2014-06-26 15:00:00,047 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 105111
….
….
….
2014-06-26 15:00:00,265 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Filtering candidates for input queues
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 104799 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 104798 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105125 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105124 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105036 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 107340 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105037 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 106524 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 107101 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 106853 removed: used by queue 1049
2014-06-26 15:00:00,437 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 106852 removed: used by queue 1049
…
…
…
2014-06-26 15:00:23,307 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Filtering candidates for routes
2014-06-26 15:00:23,307 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Filtering candidates for output queues
2014-06-26 15:00:25,740 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Marked 1 queue files for deletion
2014-06-26 15:00:30,139 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105093 removed: used by queue 296
2014-06-26 15:00:30,139 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105092 removed: used by queue 296
2014-06-26 15:00:30,139 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105095 removed: used by queue 296
2014-06-26 15:00:30,139 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105094 removed: used by queue 296
2014-06-26 15:00:30,139 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105089 removed: used by queue 296
2014-06-26 15:00:30,139 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105088 removed: used by queue 296
2014-06-26 15:00:30,139 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105091 removed: used by queue 296
2014-06-26 15:00:30,139 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 105090 removed: used by queue 296
…
…
…
2014-06-26 15:00:30,233 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Marked 1 queue files for deletion
2014-06-26 15:00:44,772 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Marked 1 queue files for deletion
2014-06-26 15:00:47,518 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Marked 1 queue files for deletion
2014-06-26 15:01:12,727 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Marked 1 queue files for deletion
2014-06-26 15:01:19,623 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Marked 1 queue files for deletion
2014-06-26 15:01:22,493 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Checking candidates in error queue
2014-06-26 15:01:22,493 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Checking candidates in hold queue
2014-06-26 15:01:22,493 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Filtering candidates for message tracking
2014-06-26 15:01:22,493 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleaning up message bodies and meta information
2014-06-26 15:01:22,524 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Deleting 993 unused files.
2014-06-26 15:01:22,899 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleaning up message events
2014-06-26 15:01:23,086 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleaning up message property indexes
2014-06-26 15:01:23,429 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleaning up message tracking indexes
2014-06-26 15:01:23,429 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Scanning tracking status index to find eligible files for cleanup.
2014-06-26 15:01:23,523 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleaning up old log entries
2014-06-26 15:01:23,523 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleaning up throughput statistics
2014-06-26 15:01:23,523 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleaning up statistics files from the last week
2014-06-26 15:01:23,523 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Skipping statistics file 7110 as it has already been cleaned up for this period
2014-06-26 15:01:23,523 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Skipping statistics file 7111 as it has already been cleaned up for this period
2014-06-26 15:01:23,523 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Skipping statistics file 7112 as it has already been cleaned up for this period
2014-06-26 15:01:23,523 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Skipping statistics file 7113 as it has already been cleaned up for this period
…
…
2014-06-26 15:01:23,523 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleaning up statistics files from beyond the last month
2014-06-26 15:01:23,523 TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Moved 0 statistics entries to the permanent statistics store
2014-06-26 15:01:23,523 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Will delete 0 statistics files
2014-06-26 15:01:23,523 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Deleting 24 unused files.
2014-06-26 15:01:23,538 INFO [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] Cleanup complete. There is now 41,393MB free disk space (10,729MB cleaned up).
Filtering Candidates for Input and Output Queues
The log entries indicate that metafile 104799 has been removed from the list of candidates eligible for cleanup because it is in use by input communication point with id 1049. Similarly, the metafile 105093 has been removed from the list of candidates eligible for cleanup because it is in use by output communication point with id 29:
Filtering candidates for input queues file 104799 removed: used by queue 1049 Filtering candidates for output queues file 105093 removed: used by queue 296
The phrase "in use" has two meanings here. Either a communication point or a route with the respective ID is:
- Holding the live messages. This means communication point or route queues have live messages on them, or
- Holding the archived messages. This is how usually it is. This means the communication point or route is holding messages within the Archive Cleanup period. For example, if the Archive Cleanup period is 28 days, the messages were on this communication point or route within the last 28 days. This is why the queues appear empty despite the log saying that a component is holding the metafile.
This log file does not contain the names of the communication points or routes holding the references to the metafiles. The component names can be determined using the available ids from the routes or communication points pages in the Management Console.
Additional Steps
For some cases, general troubleshooting steps are unlikely to help identify the root cause of the Archive Cleanup issues. In that case, there are additional steps you can take, depending on the version of Rhapsody.
Rhapsody Versions 5.4 and Later
An Archive Cleanup diagnostic tool was introduced in Rhapsody 5.4 for troubleshooting a situation in which the disk is out-of-space and Archive Cleanup does not free up any space. The Management Console enables you to trigger a cleanup with tracing enabled or simulate a cleanup with tracing. Refer to Archive Cleanup Diagnosis for details on Archive Cleanup diagnostic tool.
Rhapsody Versions Before 5.4
To determine the root cause in the earlier versions of Rhapsody, specifically Rhapsody 3.4, 4.1, 5.1, and 5.2:
- Enable Archive Cleanup trace logging.
- Obtain a directory listing of the data directory using
dir *.* /S > list_before.txt
, which writes the file names in the specified directory and all subdirectories into a text file. - Run the Archive Cleanup.
- Obtain a directory listing of the data directory again after the Archive Cleanup using
dir *.* /S > list_after.txt
. - Restart Engine and repeat steps 1 through 4.
- Collecting the required trace logs and directory listings before and after restart.
- Execute the steps in Path A or Path B.
Path A
- Compare the data directory listings obtained in Steps 2 and 4 above before and after Engine restart.
- Search for any metafiles that are identified by Archive Cleanup process as candidates for cleanup before and after the restart:
TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] 107019
but do not have the following entry that removes the metafile from the list of eligible files for cleanup:
TRACE [ecution Thread - Archive Cleanup] [chestral.rhapsody.cleanup.CleanupService] file 107019 removed: used by queue 1792
- The existence of such metafiles suggests that Rhapsody identifies the files correctly for cleanup before and after the restart, but for some reason removes them only after the Engine restart. In this case, one needs to determine why Rhapsody does not remove these files. Use Process Monitor to check whether it is some other process or Rhapsody itself that is holding the locks.
Path B
Check if there are any metafiles in
<Rhapsody Install Directory>\rhapsody\data\messagestore\meta
that are older than Archive Cleanup retention period.If so, find the oldest (latest/highest) meta id.
Look up the oldest metafile id in the Archive Cleanup trace log file to identify which component is causing that metafile to not get removed because the log will contain lines like:
file <oldest meta file id> removed: used by queue
<component id>
. Check for the component in the Management Console using the<component id>
to verify if there are any live messages on the component.If there are any live messages, then release them from the queues. Please note that releasing the messages from the queues will not instantly make them eligible for cleanup. Releasing the messages from the queues only means now the messages are moved to archives. So unless the Archive Cleanup window has passed the messages won’t be eligible for cleanup. This leads to the case where the identified component has empty queues (despite the
file xxxx removed: used by queue yyyy
entry in the cleanup log) which means the messages are potentially still within the archive cleanup period and will be cleaned up only after the cleanup windows has passed, making them eligible for cleanup. If the oldest metafile id cannot be found in the trace log at all, then use tools like Process Monitor to identify what is holding the locks on the metafile. If a third party process like an Anti-virus or Backup or Replication software is holding the locks, then kill the process to release the locks. If Rhapsody itself is holding the locks, it is most likely a bug causes the meta and body files to leak, preventing from cleaned up. This will require an Engine restart to allow cleanup of leaked meta and body files. This bug was identified in Rhapsody 5.4.1, fixed in Rhapsody 5.5.0 and backported to 5.4.3 service pack.
Corrupt Queues
In rare cases, one or more queue files in the Datastore can become corrupted due to unclean shutdowns or environment-related issues, preventing the Archive Cleanup process from operating correctly. Queue file corruption corrupts the timestamps on the queue files, thereby making the files ineligible for cleanup. In this case, the only current solution is to recreate the communication point or route corresponding to the corrupt queue.