Performance Problems

The default Rhapsody settings provide good performance for most situations and hardware. Performance tuning is typically only required for large solutions that require extremely high performance.

One of the problems with performance testing is that it is common for an issue to cause a backlog of messages in another area of the system. To assist with performance testing, Rhapsody has inbuilt performance monitoring that can be activated to identify the routes and filters that are using the greatest system resources. This is recommended as the first step in identifying processing and resource bottlenecks. Refer to the Performance Statistics in Monitoring to learn how to turn on monitoring.

Performance Problem Categories

Rhapsody configurations are typically limited by the following issues:

CPU Processing
Disk Access
Filter Queries
Communication Point Queries
FIFO Configuration
Memory
Communication Point Limits

CPU Processing

CPU processing limits are typically identified by the processor being pegged at 100%. This indicates that the limiting factor is that too much work must be done to process the messages.

Cause:

Large messages.
Complex messages requiring heavy processing.
Compute-intensive transformations, for example encryption, applied to a message.
A large volume of messages (typically in combination with another point on this list).

Resolution:

Use the Performance Monitoring tools to identify the most time-consuming filters. Attempt to minimize the processing through these filters and look at combining multiple filters.
Reduce message parsing. Parsing messages is expensive, especially if multiple definitions are used on a single message.
Add additional load-balanced servers.

Disk Access

Rhapsody depends on a fast disk being available for reading and writing messages and other data. If the disk has a heavy load, then the performance is affected.

This can normally be monitored by looking at the disk queue. The disk queue tells you how many I/O operations are currently waiting for the disk to become available. As a general rule, an Average Disk Queue Length greater than 2 (per hard disk) for extended periods of time is considered undesirable.

Cause:

Slow disk hardware.
Slow connection to disk hardware (controller, network and so on).
Conflicts with non-Rhapsody applications.
Excessive use of indexed properties.
Improper software (drivers).

Resolution:

Span the Rhapsody archives across multiple disks.
Reduce the number of indexed message properties for each message.
Reduce the number of disk-intensive applications on the Rhapsody server.
Install Rhapsody's data store on its own disk partition, or ideally on its own disk drive.

Filter Queries

If a Rhapsody filter is using an external resource, then the time required to access that resource can have a great impact on system performance. These scenarios often show symptoms of low activity (disk and CPU), but a backlog forming.

A common scenario is the use of a Rhapsody database filter query. While this filter is awaiting a response from the database, the route execution thread is currently locked. If the default of 10 threads is being used, then it is possible to have all the route threads waiting on a database lookup.

Cause:

High-latency communication to a database or another external system.

Resolution:

Convert external lookups to communication points where possible.
Optimize database queries for performance.
Increase the number of route execution threads in the engine (though this may not always resolve the issue).

Communication Point Queries

Unlike filters, communication points have their own threads to process messages. This means they do not tie up THE route execution threads when awaiting responses. However, if the communication point is configured as Out->In, there will still be a backlog awaiting responses.

This problem is typically recognized by messages queuing up to be sent on an Out->In communication point.

Cause:

High-latency communication to a database or another external system.

Resolution:

Optimize the external system to see why the response is slow.
Determine whether the communication point can be converted to a mode other than Out->In. For example, is synchronous processing required?

FIFO Configuration

Rhapsody routes can be configured to use one of a number of First-In, First-Out message ordering options. By default, Rhapsody routes will specify that the order of messages leaving the route must be the same as the order of messages received by the route.

In situations where some messages take much longer to be processed than others on the same route, this can cause delays in some messages leaving the route due to the FIFO rules. In these situations, it may be better to split the message processing into multiple routes, or modify the FIFO rules for the route.

Memory

The Java™ Virtual Machine (JVM) that Rhapsody runs in can be allocated an amount of memory that is different from the memory available on a machine. Rhapsody will increase to this limit, and not use all the available memory.

The JVM periodically triggers a garbage collection to recover memory when required. During garbage collection, the performance of the engine decreases due to the higher processing load. Refer to Java HotSpot Garbage Collection for details.

If Rhapsody's memory is configured too low, it will cause Rhapsody to run garbage collection more frequently. This could have an impact on performance.

Having Rhapsody's memory configured too high may cause Rhapsody to run a very large garbage collection when required. This could have the impact of processing being stopped for a number of seconds while the system is freeing this memory. Also, if the memory allocation is too large, it may compete with the operating system's use of memory, causing excessive paging in the machine's virtual memory system.

Communication Point Limits

Each communication point has some limit to the number of messages that can be sent and/or received. This is defined by the communication point protocol, the configuration, and various system limits. For example, disk access will limit directory communication points.

Cause:

Stressing the machine's limits for a given communication point's protocol.

Resolution:

Spread the load over multiple communication points.
Add additional load-balanced servers.