Chapter 11. Scheduling Tasks and Events

Table of Contents
11.1. Scheduler Configuration
11.2. Configuring Persistent Schedules
11.3. Schedule Examples
11.4. Service Implementer Notes
11.5. Scanning Data to Trigger Tasks

OpenIDM enables you to schedule reconciliation and synchronization tasks. You can also use scheduling to trigger scripts, collect and run reports, trigger workflows, perform custom logging, and so forth.

OpenIDM supports cron-like syntax to schedule events and tasks, based on expressions supported by the Quartz Scheduler (bundled with OpenIDM).

If you use configuration files to schedule tasks and events, you must place the schedule files in the openidm/conf directory. By convention, OpenIDM uses file names of the form schedule-schedule-name.json, where schedule-name is a logical name for the scheduled operation, for example, schedule-reconcile_systemXmlAccounts_managedUser.json. There are several example schedule configuration files in the openidm/samples/schedules directory.

You can configure OpenIDM to pick up changes to scheduled tasks and events dynamically, during initialization and also at runtime. For more information, see Changing the Configuration.

In addition to the fine-grained scheduling facility, you can perform a scheduled batch scan for a specified date in OpenIDM data, and then automatically execute a task when this date is reached. For more information, see Section 11.5, “Scanning Data to Trigger Tasks”.

11.1. Scheduler Configuration

Schedules are configured through JSON objects. The schedule configuration involves two types of files:

  • The openidm/conf/scheduler.json file, that configures the overall scheduler service

  • One openidm/conf/schedule-schedule-name.json file for each configured schedule

The scheduler service configuration file (openidm/conf/scheduler.json) governs the configuration for a specific scheduler instance, and has the following format:

{
 "threadPool" : {
     "threadCount" : "10"
 },
 "scheduler" : {
     "instanceId" : "scheduler-example",
     "executePersistentSchedules" : "true",
     "instanceTimeout" : "60000",
     "instanceRecoveryTimeout" : "60000",
     "instanceCheckInInterval" : "10000",
     "instanceCheckInOffset" : "0"
 },
 "advancedProperties" : {
     "org.quartz.scheduler.instanceName" : "OpenIDMScheduler"
 }
}
  

Some of the optional properties are not in the default configuration file and are used specifically in the context of clustered OpenIDM instances.

The properties in the scheduler.json file relate to the configuration of the Quartz Scheduler.

  • threadCount specifies the maximum number of threads that are available for the concurrent execution of scheduled tasks.

  • instanceID can be any string, but must be unique for all schedulers working as if they are the same 'logical' Scheduler within a cluster.

  • instanceTimeout specifies the number of milliseconds that must elapse with no check-ins from a scheduler instance before it is considered to have timed out or failed. Default: 60000 (60 seconds). When this timeout is reached, the instance is considered to be in a "recovery" state.

  • instanceRecoveryTimeout specifies the number of milliseconds that must elapse while an instance is in the "recovery" state (meaning that it has failed and another instance is now attempting to recover its acquired triggers) before the scheduler instance recovery is considered to have failed. Default: 60000 (60 seconds).

  • instanceCheckInInterval Specifies the period (in milliseconds) after which an instance checks in to indicate that it has not timed out or failed. Default: 10000 (10 seconds).

  • instanceCheckInOffset An offset (in milliseconds) that can be used to shift the checkin events of instances to prevent all instances from accessing the repository simultaneously (if the instances are started simultaneously and have the same check-in intervals). This offset can help to minimize MVCC warnings from multiple instances simultaneously trying to recover the same failed instance. Default: 0.

  • executePersistentSchedules allows you to disable persistent schedule execution for a specific node. If this parameter is set to false, the Scheduler Service will support the management of persistent schedules (CRUD operations) but it will not execute any persistent schedules. The value of this property can be a string or boolean and is true by default.

  • advancedProperties (optional) enables you to configure additional properties for the Quartz Scheduler.

For details of all the configurable properties for the Quartz Scheduler, see the Quartz Scheduler Configuration Reference.

Each schedule configuration file, openidm/conf/schedule- schedule-name.json has the following format:

{
 "enabled"             : true,
 "persisted"           : false,
 "concurrentExecution" : false,
 "type"                : "cron",
 "startTime"           : "(optional) time",
 "endTime"             : "(optional) time",
 "schedule"            : "cron expression",
 "misfirePolicy"       : "optional, string",
 "timeZone"            : "(optional) time zone",
 "invokeService"       : "service identifier",
 "invokeContext"       : "service specific context info",
 "invokeLogLevel"      : "(optional) debug"
}

The schedule configuration properties are defined as follows:

enabled

Set to true to enable the schedule. When this property is set to false, OpenIDM considers the schedule configuration dormant, and does not allow it to be triggered or executed.

If you want to retain a schedule configuration, but do not want it used, set enabled to false for task and event schedulers, instead of changing the configuration or cron expressions.

persisted (optional)

Specifies whether the schedule state should be persisted or stored in RAM. Boolean (true or false), false by default. For more information, see Section 11.2, “Configuring Persistent Schedules”.

concurrentExecution

Specifies whether multiple instances of the same schedule can run concurrently. Boolean (true or false), false by default. Multiple instances of the same schedule cannot run concurrently by default. This setting prevents a new scheduled task from being launched before the same previously launched task has completed. For example, under normal circumstances you would want a liveSync operation to complete its execution before the same operation was launched again. To enable concurrent execution of multiple schedules, set this parameter to true. The behavior of "missed" scheduled tasks is governed by the misfirePolicy.

type

Currently OpenIDM supports only cron.

startTime (optional)

Used to start the schedule at some time in the future. If this parameter is omitted, empty, or set to a time in the past, the task or event is scheduled to start immediately.

Use ISO 8601 format to specify times and dates ( YYYY-MM-DD Thh:mm :ss).

endTime (optional)

Used to plan the end of scheduling.

schedule

Takes cron expression syntax. For more information, see the CronTrigger Tutorial and Lesson 6: CronTrigger.

misfirePolicy

For persistent schedules, this optional parameter specifies the behavior if the scheduled task is missed, for some reason. Possible values are as follows:

  • fireAndProceed. The first execution of a missed schedule is immediately executed when the server is back online. Subsequent executions are discarded. After this, the normal schedule is resumed.

  • doNothing, all missed schedules are discarded and the normal schedule is resumed when the server is back online.

timeZone (optional)

If not set, OpenIDM uses the system time zone.

invokeService

Defines the type of scheduled event or action. The value of this parameter can be one of the following:

  • "sync" for reconciliation

  • "provisioner" for LiveSync

  • "script" to call some other scheduled operation defined in a script

invokeContext

Specifies contextual information, depending on the type of scheduled event (the value of the invokeService parameter).

The following example invokes reconciliation.

{
    "invokeService": "sync",
    "invokeContext": {
        "action": "reconcile",
        "mapping": "systemLdapAccount_managedUser"
    }
}

For a scheduled reconciliation task, you can define the mapping in one of two ways:

  • Reference a mapping by its name in sync.json, as shown in the previous example. The mapping must exist in the openidm/conf/sync.json file.

  • Add the mapping definition inline by using the "mapping" property, as shown in the example in Alternative Mappings.

The following example invokes a LiveSync action.

{
    "invokeService": "provisioner",
    "invokeContext": {
        "action": "liveSync",
        "source": "system/OpenDJ/__ACCOUNT__"
    }
}

For scheduled LiveSync tasks, the "source" property follows OpenIDM's convention for a pointer to an external resource object and takes the form system/resource-name /object-type.

The following example invokes a script, which prints the string Hello World to the OpenIDM log (/openidm/logs/openidm0.log.X) each minute.

{
    "invokeService": "script",
    "invokeContext": {
        "script": {
            "type": "text/javascript",
            "source": "java.lang.System.out.println('Hello World’);"
        }
    }
}

Note that these are sample configurations only. Your own schedule configuration will differ according to your specific requirements.

invokelogLevel (optional)

Specifies the level at which the invocation will be logged. Particularly for schedules that run very frequently, such as LiveSync, the scheduled task can generate significant output to the log file, and the log level should be adjusted accordingly. The default schedule log level is info. The value can be set to any one of the SLF4J log levels:

  • "trace"

  • "debug"

  • "info"

  • "warn"

  • "error"

  • "fatal"

11.2. Configuring Persistent Schedules

By default, scheduling information, such as schedule state and details of the schedule execution, is stored in RAM. This means that such information is lost when OpenIDM is rebooted. The schedule configuration itself (defined in the openidm/conf/schedule- schedule-name.json file) is not lost when OpenIDM is shut down, and normal scheduling continues when the server is restarted. However, there are no details of missed schedule executions that should have occurred during the period the server was unavailable.

You can configure schedules to be persistent, which means that the scheduling information is stored in the internal repository rather than in RAM. With persistent schedules, scheduling information is retained when OpenIDM is shut down. Any previously scheduled jobs can be rescheduled automatically when OpenIDM is restarted.

Persistent schedules also enable you to manage scheduling across a cluster (multiple OpenIDM instances). When scheduling is persistent, a particular schedule will be executed only once across the cluster, rather than once on every OpenIDM instance. For example, if your deployment includes a cluster of OpenIDM nodes for high availability, you can use persistent scheduling to start a reconciliation action on only one node in the cluster, instead of starting several competing reconciliation actions on each node.

You can use persistent schedules with the default OrientDB repository, or with the MySQL repository (see Installing a Repository For Production).

To configure persistent schedules, set the "persisted" property to true in the schedule configuration file (schedule-schedule-name.json).

If OpenIDM is down when a scheduled task was set to occur, one or more executions of that schedule might be missed. To specify what action should be taken if schedules are missed, set the misfirePolicy in the schedule configuration file. The misfirePolicy determines what OpenIDM should do if scheduled tasks are missed. Possible values are as follows:

  • fireAndProceed. The first execution of a missed schedule is immediately executed when the server is back online. Subsequent executions are discarded. After this, the normal schedule is resumed.

  • doNothing. All missed schedules are discarded and the normal schedule is resumed when the server is back online.

11.3. Schedule Examples

The following example shows a schedule for reconciliation that is not enabled. When enabled ("enabled" : true,), reconciliation runs every 30 minutes, starting on the hour.

{
    "enabled": false,
    "persisted": false,
    "type": "cron",
    "schedule": "0 0/30 * * * ?",
    "invokeService": "sync",
    "invokeContext": {
        "action": "reconcile",
        "mapping": "systemLdapAccounts_managedUser"
    }
}

The following example shows a schedule for LiveSync enabled to run every 15 seconds, starting at the beginning of the minute. The schedule is persisted, that is, stored in the internal repository rather than in memory. If one or more LiveSync executions are missed, as a result of OpenIDM being unavailable, the first execution of the LiveSync action is executed when the server is back online. Subsequent executions are discarded. After this, the normal schedule is resumed.

{
    "enabled": false,
    "persisted": true,
    "misfirePolicy" : "fireAndProceed",
    "type": "cron",
    "schedule": "0/15 * * * * ?",
    "invokeService": "provisioner",
    "invokeContext": {
        "action": "liveSync",
        "source": "system/ldap/account"
    }
}

11.4. Service Implementer Notes

Services that can be scheduled implement ScheduledService. The service PID is used as a basis for the service identifier in schedule definitions.

11.5. Scanning Data to Trigger Tasks

In addition to the fine-grained scheduling facility described previously, OpenIDM provides a task scanning mechanism. The task scanner enables you to perform a batch scan for a specified date in OpenIDM data, on a scheduled interval, and then to execute a task when this date is reached. When the task scanner identifies a condition that should trigger the task, it can invoke a script created specifically to handle the task.

For example, the task scanner can scan all managed/user objects for a "sunset date" and can invoke a script that executes a sunset task on the user object when this date is reached.

11.5.1. Configuring the Task Scanner

The task scanner is essentially a scheduled task that queries a span of managed users. The task scanner is configured in the same way as a regular scheduled task, in a schedule configuration file named (schedule-task-name.json), with the "invokeService" parameter set to "taskscanner. The "invokeContext" parameter defines the details of the scan, and the task that should be executed when the specified condition is triggered.

The following example defines a scheduled scanning task that triggers a sunset script. This sample configuration file is provided in the OpenIDM delivery as openidm/samples/taskscanner/conf/schedule-taskscan_sunset.json. To use this sample file, you must copy it to the openidm/conf directory.

{
    "enabled" : true,
    "type" : "cron",
    "schedule" : "0 0 * * * ?",
    "invokeService" : "taskscanner",
    "invokeContext" : {
        "waitForCompletion" : false,
        "maxRecords" : 2000,
        "numberOfThreads" : 5,
        "scan" : {
            "object" : "managed/user",
            "_queryId" : "scan-tasks",
            "property" : "sunset/date",
            "condition" : {
                "before" : "${Time.now}"
            },
            "taskState" : {
                "started" : "sunset/task-started",
                "completed" : "sunset/task-completed"
            },
            "recovery" : {
                "timeout" : "10m"
            }
        },
        "task" : {
            "script" : {
                "type" : "text/javascript",
                "file" : "script/sunset.js"
            }
        }
    }
}

The "invokeContext" parameter takes the following properties:

"waitForCompletion" (optional)

This property specifies whether the task should be performed synchronously. Tasks are performed asynchronously by default (with waitForCompletion set to false). A task ID (such as {"_id":"354ec41f-c781-4b61-85ac-93c28c180e46"}) is returned immediately. If this property is set to true, tasks are performed synchronously and the ID is not returned until all tasks have completed.

"maxRecords" (optional)

The maximum number of records that can be processed. This property is not set by default so the number of records is unlimited. If a maximum number of records is specified, that number will be spread evenly over the number of threads.

"numberOfThreads" (optional)

By default, the task scanner runs in a multi-threaded manner, that is, numerous threads are dedicated to the same scanning task run. Multithreading generally improves the performance of the task scanner. The default number of threads for a single scanning task is ten. To change this default, set the "numberOfThreads" property.

"scan"

Defines the details of the scan. The following properties are defined:

"object"

Defines the object type against which the query should be performed.

"_queryId"

Specifies the query that is performed. The queries that can be set here are defined in the database configuration file (either conf/repo.orientdb.json or conf/repo.jdbc.json).

"property"

Defines the object property against which the range query is performed.

"condition" (optional)

Indicates the conditions that must be matched for the defined property.

In the previous example, the scanner scans for users for whom the property sunset/date is set to a value prior to the current timestamp at the time the script is executed.

You can use these fields to define any condition. For example, if you wanted to limit the scanned objects to a specified location, say, London, you could formulate a query to compare against object locations and then set the condition to be:

            "condition" : {
                "location" : "London"
            },

For time-based conditions, the "condition" property supports macro syntax, based on the Time.now object (which fetches the current time). You can specify any date/time in relation to the current time, using the + or - operator, and a duration modifier. For example: "before": "${Time.now + 1d}" would return all user objects whose sunset/date is before tomorrow (current time plus one day). You must include space characters around the operator (+ or -). The duration modifier supports the following unit specifiers:

s - second
m - minute
h - hour
d - day
M - month
y - year
"taskState"

Indicates the fields that are used to track the status of the task.

"started" specifies the field that stores the timestamp for when the task begins.
"completed” specifies the field that stores the timestamp for when the task completes its operation.
"recovery" (optional)

Specifies a configurable timeout, after which the task scanner process ends. In a scenario with clustered OpenIDM instances, there might be more than one task scanner running at a time. A task cannot be executed by two task scanners at the same time. When one task scanner "claims" a task, it indicates that the task has been started. That task is then unavailable to be claimed by another task scanner and remains unavailable until the end of the task is indicated. In the event that the first task scanner does not complete the task by the specified timeout, for whatever reason, a second task scanner can pick up the task.

"task"

Provides details of the task that is performed. Usually, the task is invoked by a script, whose details are defined in the "script" property:

"type" - the type of script. Currently, only JavaScript is supported.
"file" - the path to the script file. The script file takes at least two objects (in addition to the default objects that are provided to all OpenIDM scripts): "input" which is the individual object that is retrieved from the query (in the example, this is the individual user object) and "objectID" which is a string that contains the full identifier of the object. The objectID is useful for performing updates with the script as it allows you to target the object directly, for example: openidm.update(objectID, input['_rev'], input);. A sample script file is provided in openidm/samples/taskscanner/script/sunset.js. To use this sample file, you must copy it to the openidm/script directory. The sample script marks all user objects that match the specified conditions as "inactive". You can use this sample script to trigger a specific workflow, or any other task associated with the sunset process. For more information about using scripts in OpenIDM, see the Scripting Reference.

11.5.2. Managing Scanning Tasks Over REST

You can trigger, cancel, and monitor scanning tasks over the REST interface, using the REST endpoint http://localhost:8080/openidm/taskscanner.

11.5.2.1. Triggering a Scanning Task

The following REST command executes a task named "taskscan_sunrise". The task itself is defined in a file named openidm/conf/schedule-taskscan_sunset.json.

$ curl
 --header "X-OpenIDM-Username: openidm-admin"
 --header "X-OpenIDM-Password: openidm-admin"
 --request POST
 "http://localhost:8080/openidm/taskscanner?_action=execute&name=schedule/taskscan_sunset"
   

By default, a scanning task ID is returned immediately when the task is initiated. Clients can make subsequent calls to the task scanner service, using this task ID to query its state and to call operations on it.

For example, the scanning task initiated previously would return something similar to the following, as soon as it was initiated:

{"_id":"edfaf59c-aad1-442a-adf6-3620b24f8385"}

To have the scanning task complete before the ID is returned, set the waitForCompletion property to true in the task definition file (schedule-taskscan_sunset.json). You can also set the property directly over the REST interface when the task is initiated. For example:

$ curl
 --header "X-OpenIDM-Username: openidm-admin"
 --header "X-OpenIDM-Password: openidm-admin"
 --request POST
 "http://localhost:8080/openidm/taskscanner?_action=execute&name=schedule/taskscan_sunset&waitForCompletion=true"
    

11.5.2.2. Canceling a Scanning Task

You can cancel a scanning task by sending a REST call with the cancel action, specifying the task ID. For example, the following call cancels the scanning task initiated in the previous section.

$curl
 --header "X-OpenIDM-Username: openidm-admin"
 --header "X-OpenIDM-Password: openidm-admin"
 --request POST
 "http://localhost:8080/openidm/taskscanner/edfaf59c-aad1-442a-adf6-3620b24f8385?_action=cancel"
    

The output for a scanning task cancelation request is similar to the following, but on a single line:

    {"_id":"edfaf59c-aad1-442a-adf6-3620b24f8385",
     "action":"cancel",
     "status":"SUCCESS"}
    

11.5.2.3. Listing Scanning Tasks

You can display a list of scanning tasks that have completed, and those that are in progress, by running a RESTful GET on "http://localhost:8080/openidm/taskscanner". The following example displays all scanning tasks.

$curl
 --header "X-OpenIDM-Username: openidm-admin"
 --header "X-OpenIDM-Password: openidm-admin"
 --request GET
 "http://localhost:8080/openidm/taskscanner"
    

The output of such a request is similar to the following, with one item for each scanning task. The output appears on a single line, but has been indented here, for legibility.

{"tasks": [
    {
      "_id": "edfaf59c-aad1-442a-adf6-3620b24f8385",
      "progress": {
        "state": "COMPLETED",
        "processed": 2400,
        "total": 2400,
        "successes": 2400,
        "failures": 0
      },
      "started": 1352455546149,
      "ended": 1352455546182
    }
  ]
}
    

Each scanning task has the following properties:

_id

The ID of the scanning task.

progress

The progress of the scanning task, summarised in the following fields:

state - the overall state of the task, INITIALIZED, ACTIVE, COMPLETED, CANCELLED, or ERROR
processed - the number of processed records
total - the total number of records
successes - the number of records processed successfully
failures - the number of records not able to be processed
started

The time at which the scanning task started, .

ended

The time at which the scanning task ended.

The number of processed tasks whose details are retained is governed by the "openidm.taskscanner.maxcompletedruns" property in the conf/boot.properties file. By default, the last one hundred completed tasks are retained.