# Changes for 4.2.2 ## Changes for central agents: * FileRouter agent: * expire "no source" transfers quickly to free the queue, fixes issue discussed in https://github.com/dmwm/PHEDEX/issues/1072 * adjustment to route cost calculations: increase default staging penalty for staging from 30min to 7 days, to favor source replicas available on disk vs MSS nodes, fix to https://github.com/dmwm/PHEDEX/issues/1100 * implemented --staging-latency option which allows to override the default value ## Changes for Site Agents * FTS3 backend: * transfer jobs priorities are now propagated to FTS, see discussion in https://github.com/dmwm/PHEDEX/issues/1068 * pass PhEDEx metadata to FTS tasks for hadoop monitoring, see discussion in https://github.com/dmwm/PHEDEX/issues/1085 ## Configuration files: * added example of environment setup for fts client tools installed on cvmfs * improve config files processing, see https://github.com/dmwm/PHEDEX/issues/1101 ## Changes for Schema * enable/fix help in privilege granting script and OracleCoreBlock.sql ## Changes for packaging and deployment * Added p5-crypt-ssleay as external dependency, see details in https://github.com/dmwm/PHEDEX/issues/1065 Changes for 4.2.1 Changes for Site Agents - FileDownload agent: FTS3 backend now properly applies -default-link-pending limit. Fix for #1063 - FileDownload agent: FTS3 backend now recognizes files in the CANCELED state in FTS3. Fix for #1061 Changes for 4.2.0 Changes for all Agents - An agent unable to identify itself via 'ps' will terminate via fatal error, to avoid going into an unrecoverable state. The Watchdog agent should then restart it - fix for #1038 (previous fix in 4.1.8 did not work) - Suppress a warning during Watchdog agent startup - fix for #1021 Changes for Site Agents - FileDownload agent: add support for FTS3 backend using fts3-client CLI -- https://twiki.cern.ch/twiki/bin/view/CMS/PhedexAdminDocsUsingFTS3Backend - FileDownload agent: add support for Circuit awareness -- http://www.evernote.com/l/AqlhTDGpmUdGe4q3OkXdY5xs0EslGYs1RLg/ - BlockDownloadVerify agent: add gfal Namespace (using gfal2-utils) -- requires gfal2.10 or newer by default; gfal2.9 or older versions are supported with option "--gfalv 29" - BlockDownloadVerify agent: change eos Namespace to use "eos -b" command - Modernized config templates in PHEDEX/Custom/Templates Changes for Central Agents - BlockAllocator agent: will not block subscriptions updates via website and datasvc - fix for #1032 - PerfMonitor agent: query performance optimization - fix for #1051 Changes for Schema - Fixes in privilege granting script Changes in core libraries - Add Tests module and corresponding scripts directory - Fix SSL connection issues in PhEDEx UserAgent Changes for 4.1.8 Changes for all Agents - An agent unable to identify itself via 'ps' will terminate via fatal error, to avoid going into an unrecoverable state. The Watchdog agent should then restart it - fix for #1012 Changes for Central Agents - PerfMonitor agent: wait 30 minutes before considering empty timebins in link parameter calculation, to give all agents time to update link statistics, and avoid treating a busy link as empty in routing - fix for #772 - FileRouter agent: don't extend transfer paths for transfers that don't have any replica of the file along the path, to allow rerouting if the replica was deleted in the meantime - fix for #792 Changes for Site Agents - FileDownload agent: add FTS3 state machine to Transfer backend Changes for Core libraries - Inject library: for newly injected files on Disk nodes, set the file replica state to 'staged' to get proper routing - fix for #1024 Changes for 4.1.7 Changes for Site Agents - BlockDownloadVerify agent: use cache of TrivialFileCatalogue rules to avoid going back to the DB on every test - fix for #1006 Changes for 4.1.5 Changes for all Agents - DB disconnection code enhanced to pro-actively disconnect from the DB at the end of a cycle if the Connection Life (specified in the DBParam) is shorter than the time to the next cycle. This should solve the issue with too many handles being kept open to the database Changes for Central Agents - BlockAllocator: fix for #946, "Don't remove move flag from block-level subscriptions if there are still unsubscribed T1_Disk replicas left" Changes for Site Agents - FileDownload agent: set LAST_WORK to zero instead of 'now', so on the first cycle the agent will think it hasn't done anything. Otherwise it waits 4 hours before finally disconnecting from the database. This has no effect at all on the agent behaviour, but is kinder to the DB. - Added FileDownloadGFALVerify and FileDownloadGFALDelete template scripts using gfal2 libraries - Removed obsolete DBS2 consistency/invalidation scripts, and dropped dependency on DBS2 client in PHEDEX-micro toolkit Changes for 4.1.4 Changes to Schema and Schema-related tools - Schema/OraclePrivs.sh allow update on certain fields to enable by-hand fixes for operators - Schema/OracleResetAll.sql correct manner in which views and materialised views are dropped - Schema/OracleCoreFunctions.sql update function definitions for Oracle 12, mostly by adding 'authid' fields. Also stub the proc_delete_node procedure, which is obsolete. - Schema/OracleCoreTriggers.sql Comment out the tr_dps_block_dataset definition. It doesn't compile, and the results aren't used anywhere anyway. - Schema/tnsnames.ora updated from official location - Schema/OracleCoreTopo.sql back out some changes made by Ricky to clean the schema. Too much code still depends on them, so it breaks new deployments in testbeds. Comment where the code is backed out (look for 'TW'). - Schema/OracleInit.sql move the call to OracleCoreFunctions, so it gets the dependencies correct Changes for all Agents / Tools - perl_lib/PHEDEX/Core/Config/Factory.pm LIMIT field correctly turned into an array ref if it was a string correct handling of limits when they have not yet been calculated - perl_lib/PHEDEX/Core/JobManager.pm fix bug: use of $self->Log instead of $self->Logmsg Changes for Site Agents / Tools - Custom/Template/FileDownloadDelete correct example for exit code - perl_lib/PHEDEX/File/Stager/Agent.pm correct reporting of failed stage commands - Toolkit/LoadTest/LoadTestFileInfoToInjectXML Enable creation of LoadTestSource datasets for T1_*_Disk endpoints (#942) - Utilities/StorageConsistencyCheck allow user to give site-name or node-name (#948) - perl_lib/PHEDEX/File/Stager/Agent.pm fix reporting of stager command in case of failure - perl_lib/PHEDEX/CLI/UserAgent.pm improve handling of certificates, remove dependency on PHEDEX::Core::Logging - perl_lib/PHEDEX/Core/SQL.pm add getSiteReplicasByName function - perl_lib/PHEDEX/Core/Config/Factory.pm correct handling of 'limit' parameters Changes for Central Agents - perl_lib/PHEDEX/BlockDelete/Agent.pm delete from _Disk when making moves elsewhere (#958, #931) - perl_lib/PHEDEX/RequestAllocator/SQL.pm set LongReadLen to 1 million, not 10K - perl_lib/PHEDEX/RequestAllocator/Core.pm handle T0 correctly w.r.t. _Disk and moves. This is cosmetic, since there is no T0_*_Disk node, but if there ever is, it should work. Changes for 4.1.3 - Initial release for SLC5 and SLC6. The SLC6 release is not supported yet, use at your own risk. File any bug-reports or issues at https://github.com/dmwm/PHEDEX/issues/new instead of with Savannah, please. Changes for central agents - Fix for #101405: Move Subscription not removing files from Disk Changes for PhEDEx 4.1.2 Changes for site agents - Fix for #99534: update DBH of Catalogue object before every test (to avoid using an expired DB connection) Changes for PhEDEx 4.1.1 Changes for Schema and DB Config - Add PHEDEX/Schema/OracleDropRole.sh script to drop obsolete roles - Revoke delete privileges on t_xfer_replica for site roles (Sav #95710) - Rename central agent roles from CERN to CENTRAL - Also drop views when resetting DB schema - Add new func_tablespace_used_space and func_table_used_space functions to generate reports - Add new procedures proc_add_node and proc_delete_node to manage nodes and the corresponding partitions - Grant public execution privileges on NOW, GMTIME, VERSION functions Changes for central agents - BlockAllocator, FileFump, InfoStatesClean: print debug statistics for latency SQLs - FileRouter: fix alert in BlockArrive SQL preventing update of routing stats tables (Sav #96911) - FileRouter: add support for reading link performance stats from external sources - BlockAllocator: Don't wait for block replica creation and activation before creating block latency entry (to fix skipping file transfers in latency records form some blocks) - RequestAllocator: fix missing subscriptions for re-evaluated requests (Sav #97148) Changes for site agents - Update all site agents to use the new Agent core library - WatchdogLite: new agent to monitor the health of the Watchdog. Doesn't connect to the DB, to make sure that it will continue to run in almost any situation. - Watchdog: send notifications to new WatchdogLite agent - Watchdog: fix minor bug where it fails to pick up contact with an agent that was already running when the watchdog starts - FileRemove: enable creation of dropbox directories, allow update of config file on a reload - FileDownload: remove debugging hooks from transfer Manager - BlockDownloadVerify: fix warning for undefined checksum value - BlockDownloadVerify: keep TFC rules from DB in cache - DropTMDBPublisher: DEPRECATED Changes for Namespace framework - Framework: pass LFN as argument instead of PFN in all namespace plugins, performing lfn2pfn conversion internally. Update namespaces accordingly. - Framework: move all common options (VERBOSE etc.) from plugins to base class, providing functions to set/get the options in the plugins. Update namespaces accordingly. - Framework: Remove obsolete Namespace.pm - Framework: add base Cache.pm class - Framework: pass CATALOGUE and PROTOCOL to plugins in constructor - dcache Namespace: add storage accounting functions to dcache namespace - srm Namespace: fix unrecognized permissions - srm Namespace: add support for lcg-util backend, activated with --use_lcgutil - eos Namespace: add support for directory caching Changes for operator tools - Add PHEDEX/Utilities/GroupNew to create new user_groups - Updated PHEDEX/Utilities/setool and PHEDEX/Utilities/BlockConsistencyCheck to changes in Namespace framework - Update MakeDailyReport to use new func_tablespace_used_space and func_table_used_space DB functions, so that the script can run with Reader privileges Changes for core libraries - Replace old Agent.pm and AgentLite.pm in all agents with new modular PHEDEX::Core::Agent library - PHEDEX::Core::Agent::Dropbox - deprecate relayDrop function - PHEDEX::Core::Agent::Dropbox - allow skipping workdir creation (LOAD_DROPBOX_WORKDIRS=0) - PHEDEX::Core::Agent::Dropbox - Change OUTDIR to OUTBOX, create TASKDIR and ARCHIVEDIR - PHEDEX::Core::Config - by default, reject agents not found in config file - PHEDEX::Core::Catalogue - provide uniform access to TFC from database or local file - PHEDEX::Core::DB - escape characters in role passwords - PHEDEX::Core::Identity - Fix bug where DN was recorded instead of CERTIFICATE when certificate is too long. - PHEDEX::Core::SQL - Modify getNodes to work also with DMWMMON schema Changes for custom templates - Update FTS mapfile template - Update PHEDEX/Custom/Template/gfal_prestage-status.py to adapt to the change of gfal request status codes introduced in recent versions of the UI Changes for dependencies - Update dependencies to HG1210d tag - For full details see https://svnweb.cern.ch/trac/CMSDMWM/ticket/3991 Changes for PhEDEx 4.1.0 Changes for Schema and DB Config - New Latency monitoring tables: t_dps_block_latency for blocks currently in transfer, t_xfer_file_latency for files currently in transfer, t_log_block_latency for historical block records, t_log_file_latency for historical file records See https://twiki.cern.ch/twiki/bin/view/CMS/PhedexProjLatency for details - t_dps_block_dest: introduce new block destination state values for blocks that cannot be routed - t_status_block_request: new table used to aggregate at block-level the file routing requests in t_xfer_request - t_status_block_path: add column time_arrive for the predicted block arrival time, based on the file arrival times calculated by FileRouter and recorded in t_xfer_path - t_status_block_arrive: new table to record the predicted arrival time for blocks, or provide a reason for blocks that cannot complete - Updates to tnsnames.ora and sqlnet.ora Oracle config files Changes for Central Agents - All central agents migrated to new AgentLite package - FilePump: populate new t_xfer_file_latency table - BlockAllocator: populate new t_dps_block_latency table - InfoStatesClean: migrate latency logs for completed blocks from t_dps_block_latency to t_log_block_latency and from t_xfer_file_latency to t_log_file_latency. Clean up latency logs older than 30 days from t_log_file_latency. - BlockAllocator: Change replica block subscriptions to move if dataset subscription is changed to move (fix for #91423) - FilePump: Auto-export Buffer->MSS tasks, to solve the most common cause of FilePump alerts (fix for #79042) - FilePump: Force file request deactivation when any task in the transfer path fails, to force re-routing also for transfers to T1 MSS nodes (fix for #63923). - FileRouter: populate new t_status_block_path.time_arrive column, new t_status_block_request table, new t_status_block_arrive table. - FileRouter: set t_dps_block_dest.state=-2 for blocks that cannot be activated due to missing incoming links - FileRouter: set t_dps_block_dest.state=-1 for blocks that cannot be activated because the destination queue is full - FileRouter: Don't revalidate expired paths (fix for #85117) - FileRouter: Don't extend expiration time for failed transfers (fix for #86943) - FileRouter: Convert 6-level path priority back to 3-level subscription priority when populating t_status_block_path, to fix double-counting and wrong priority assignment in Activity::Routing web page. - InfoStatesClean: Commit at the end of the cycle, instead of the middle of the next cycle. - PerfMonitor: add 30-minute offset to calculation of t_adm_link_param.rate/latency, to ensure that all agents have time to update stats. (fix for #92143) - LoadTestInjector: report alerts to Watchdog agent properly Changes for Site Agents - FileDownload: do not override -batch-files with (-default)-link-active-files in job splitting if -batch-files is lower. (fix for #88176) - FileDownload: Log (glite-transfer-)status command invocation errors (fix for #55491) - FileDownload: properly set status of lost file transfers to "abandoned", to avoid accumulating lost transfers in the agent, eventually blocking the queue (fix for #62173) - FileMSSMigrate: Do not auto-export Buffer->MSS tasks, now handled by central FilePump agent. NOTE: This update becomes MANDATORY for all T0/T1s after the central FilePump agent is updated (fix for #79042). - FileMSSMigrate: Wait until tasks are set to states 'export' (by FilePump) and 'inxfer' (by FileMSSMigrate itself) before checking migration status. This change has no effect on transfers themselves, but it ensures that all xfer worflow events are logged correctly also for Buffer->MSS transfers. - BlockDownloadVerify: report status to DB and Watchdog regularly, so that agent doesn't appear "DOWN" while executing long-running tests. - BlockDownloadVerify: deprecate tmdb Namespace - BlockDownloadVerify: add support for eos Namespace - Watchdog: properly receive ping messages from agents - Watchdog: properly support single limit settings in the configuration Changes for Operator Tools - phedex CLI documentation fix - phedex CLI: remove obsolete VERBOSE argument from inject command Core library changes - PHEDEX::Core::AgentLite : new base package for PhEDEx Agents, with support for modular plugins - PHEDEX::Core::Agent : fix crash in updateAgentStatus when node was not set. - PHEDEX::Core::Agent : cleanly suppress statistics printout if STATISTICS_INTERVAL is not set - PHEDEX::Core::DB : print SQL statement in case of failure in dbprep - PHEDEX::Core::Identity : truncate certificates longer than 4k characters to fit in the DB column. - PHEDEX::Core::JobManager : make hearbeat interval configurable via the KEEPALIVE variable - PHEDEX::Core::SQL : escape undescore characters in like clauses - PHEDEX::Core::Util : allow specification of field type in flat2tree Changes for dependencies - Update architecture to slc5_amd64_gcc461 - Update dependencies to HG1204f tag - Includes update of Oracle client to 11.2.0.3 and of oracle-env - For full details see https://svnweb.cern.ch/trac/CMSDMWM/ticket/3331 Changes for PhEDEx 4.0.1 Changes for Schema and DB config - Adding missing foreign key constraints and indexes on destination node in 4.0 subscription tables - Add OPS* roles with necessary write privileges for operations (e.g. FileDeleteTMDB) but no admin privileges - Add devdb11 (Oracle 11gR2 development DB) in tnsnames.ora Changes for Central Agents - BlockAllocator: trigger reactivation of inactive block replicas after direct deletion, see Sav #59403 - BlockAllocator: Update files/bytes counts for entries in latency log table for active blocks (Sav #79857) - BlockDeactivate: wait before deactivating recently deleted blocks, to let BlockMonitor update the block replicas (partial fix for Sav #79639) - FileRouter: Reduce expiration time for requests with no valid transfer path (#75456). This will cause the FileRouter to check more often if the agents/links for this transfer have been (re-)enabled. - PerfMonitor: fixes in link latency calculation that caused the link latency to be either 0 or overestimated in the link_param table, causing the FileRouter to use incorrect values for routing. (#83051 and #83102) - RequestAllocator: re-evaluate pending wildcard requests, to make sure that the request size preview shown in the request approval page is up-to-date - FileIssue: don't issue transfer tasks to/from T0/T1 Buffer nodes if there is a pending deletion task for that file on the connected MSS node (#65239) Changes for Site Agents - FileDownload: add transfer backend for FDT transfers - FileDownload: when fetching new tasks from the DB, set their status to 'infxer' in bulk (should increase agent efficiency, and protect central agents from #82112) - FileDownload: enable caching mechanism for TFC rules (to avoid fetching the rules from the DB for every transfer, also increasing agent efficiency and protecting central agents from #82112) - FileDownload: trim central part of transfer log before uploading to DB if the logfile is longer than 100k characters - FileDownload: add glite-transfer-submit error message to transfer log in case of failure in FTS submission - FileDownload: add -max-tasks option to set the maximum number of transfer tasks to fetch from DB (default 15000) - FileDownload: fix to prevent losing tasks after an agent restart (#62299) - FileDownload: crash fix in FTS backend when trying to submit FTS jobs with 0 tasks - FileDownload: fix for using wrong space-tokens in FTS jobs (#82373) - FileDownload: crash fix for FTS backend when copyjob file cannot be written (#73735) - FileDownload: FTS backend: immediately fetch more tasks from the DB when all current tasks are done - FileDownload: SRM backend - added newline at end of copyjob file - FileDownload: transfer jobs that cannot be monitored are now correctly abandoned after the timeout - BlockDownloadVerify: Using a more aggressive clean-up of the namespace cache - BlockDownloadVerify: in the dcache namespace, pass the PRELOAD library correctly from the agent configuration - Watchdog: agent name is now properly registered as "Watchdog" instead of "AgentFactory" - Watchdog: report status of Watchdog Agent to DB every 30 minutes - Watchdog: print alert to log file in case of DB connection error - Watchdog: remove "mail" plugin for report notification (duplicate of "email") - Watchdog: added support for options "-summary_interval", "-report_plugin", "-notify_plugin" in the config file Changes for Operator Tools - Add phedex CLI support for the ErrorLogSummary datasvc API. - Fix space-token handling in phedex CLI for the LFN2PFN datasvc API - Print the actual error code and message in the event of errors Core library changes - PHEDEX::Core::JobManager : do not die if the JobManager cannot open the job logfile, simply write to STDOUT instead. - Fixes some crashes, including a FileDownload agent crash reported in Sav #73735 - PHEDEX::Core::Util : fix for an "uninitialized variable" warning - PHEDEX::RequestAllocator::Core : fix parsing of arguments for 'flat' data format (used by website) Changes for dependencies - Update dependencies to HG1109b tag Changes for PhEDEx 4.0.0 Changes for Schema - See details in https://twiki.cern.ch/twiki/bin/view/CMS/PhedexProjSubscriptions - All subscriptions are now tracked internally at block-level - t_dps_subscription table replaced with t_dps_subs_dataset, t_dps_subs_block, t_dps_subs_param - Support for time_start in transfer requests - If time_start is set, only blocks created after time_start will be subscribed to destination - Add NOT NULL constraint to user-grop in transfer requests and subscriptions - Existing subscriptions with undefined group will be assigned to deprecated-undefined group - Cannot assign new subs to deprecated-* groups - Add t_req_size, t_req_replica tables to store status of a request (not yet used) - Add dataset_name column to v_dbs_block_replica view provided for DBS data location - Add a schema_version function to return the version of the schema - Update OracleInit.sql deployment script to new schema - Update to latest tnsnames.ora - Documentation updates Changes for Central Agents - Updates to BlockAllocator, BlockMonitor, BlockDelete, RequestAllocator agents for new schema - BlockAllocator changes: - Support for time-based subscriptions - Always track all subscriptions at block level - Create/update/delete block destinations from block-level subscriptions - Propagate dataset subscription parameters to linked block subscriptions - Evaluate dataset subscription state from state of linked block subscriptions - Propagate dataset subscription suspensions to linked block subscriptions - Fix for move flag never removed if block is inactive - Track move flag at block level; remove after one week and mark sub done if no unsubscribed replicas exist - Support for subscriptions with finite suspension time; suspension is automatically cleared when time is passed - Additional BlockDelete fixes: - Fix for Savannah #71759: wait until Buffer replica removal before marking deletion as completed at T1 - Fix for Savannah #65239: prevent creation of file deletion tasks when Buffer replica has active transfers - Remove cleanup of old file deletion tasks (already performed after block deletion completion) - Additional RequestAllocator agent fixes: - check for duplicate subscriptions when inserting, instead of trying to insert them and catch unique constraint errors - also skip block subs that would be anyway created by BlockAllocator - Additional updates to RequestAllocator package (used by website/dataservice): - added/updated/replaced several methods to support new schema in website/dataservice - added deleteSubscription to support dataset unsubscription (can be used to remove dataset-level sub without removing block-level subs) - disallow time-based move/delete requests - Disallow requests/subscriptions that would result in a custodiality change of the subscription - Disallow changing the move flag in a subscription - Disallow transfer/deletion requests to Buffer nodes in Prod - allow data defined by the output of getExistingRequestData in validateRequest - Allow moves to T1s when data is already subscribed to T[01], excluding T[01] from source node deletion lists - BlockDownloadVerifyInjector: add table aliases to sql (needed with Oracle 10.2.0.5) Changes for Site Agents - FileDownload - FTS backend: removed obsolete -myproxy and -passfile options - FTSbackend: added support for FTS checksumming, enabled with option -checksum - if available, adler32 checksum stored in TMDB is passed to FTS to be verified against adler32 checksum calculated by destination SRM after transfer - All backends: fix for agent crash when max-active < batch-files in config options - BlockDownloadVerify agent - Disallow the use of deprecated parameters: --storagemap,--use_srm,--use_rfdir - Adding option -protocol to the BDV agent - Reworked usage help to show namespace specific usage help, when --namespace and --help options are called together - Reject test inmediately instead of retrying, when those are not allowed for given namespace, it fix Savanah #65919. - Cleanup of rfio namespace documentation - Allow cksum test, namespace plugins already coded (using adler32) for castor, posix, dpm and dcache technologies - Added support for checksum from disk server in castor plugin, with fallback to old tape checksum method if not present - Other updates to dpm namespace plugins, now fully functional - Add table aliases to sql (needed with Oracle 10.2.0.5) - Set file/block test status to "Indeterminate" for checksum tests if no adler32 in TMDB - Watchdog agent - Fix for Savannah #73896: watchdog will check if pid file is missing to avoid starting duplicate agents - Will not send alerts about missing agents on first cycle, to give time to agents to register themselves - Adding summary report - Every cycle a statistics summary of all running agents is sent to the admin. - It needs the email address of the admin, which it is got from sitedb. The cycle is by default every 24h - Report can be sent by any given method: email, logfile, rss, etc.; those are plugings which can be coded as needed - Currently supported plugins: logfile, mail - Also provided additional UNSUPPORTED twitter plugin, requires manual installation of external Net::Twitter::Lite client - Migrate and Stager agents: removed obsolete Custom agents; now only generic FileStager and FileMSSMigrate are supported Changes for Operator Tools - PHEDEX/Utilities/DBSCheck - fix for path of Toolkit/DBS scripts - PHEDEX/Utilities/NodeRemove - update for new schema - PHEDEX/Utilities/phedex - change target datasvc URL to https - add a --target argument, to bypass the usual format/instance/api structure and provide arbitrary datasvc url - allow a global "nocache" argument - change user-agent string to conform to CMS conventions - filereplicas command: support the full-set of command-line arguments - subscribe command: - add support for time-start in new schema - add --no-mail option to suppress request email - change '--request-only' to '--(no-)auto-approve' - add 'updaterequest' command - PHEDEX/Toolkit/Request/TMDBInject - handle duplicate file injection errors Core library changes: - DB.pm: Fix for execute_array always returning 0 instead of number of statements executed - Inject.pm: set verbosity off by default - Inject.pm: fail on duplicate file injection instead of ignoring silently, returning first duplicate file - Loader.pm: evaluate namespaces recursively, to allow the loader to work with structured module hierarchies - SQL.pm: order groups by name in getGroups - Timing.pm: add support for YYYYMMDDZhhmm[ss] format in dates Changes for PhEDEx 3.3.2 Dependency Changes - Drop dependency of PHEDEX agent distributions on dbs-client, dls-client - PHEDEX-micro distribution adds the dependency on dbs-client for DBS-TMDB consistency checks - Update Dependencies to latest version of COMP/CMSDIST packages - external+base+REV16 - oracle: 11.2.0.1.0p1 -> 11.2.0.1.0p2 - fix for the 64bit libocci/selinux problem - oracle-env: 24.0 -> 25.0 - Update to latest tnsnames.ora - Add sqlnet.ora config file to avoid creation of ~/oradiag_$USER directories with oracle 11 clients. - python 2.4.2->2.6.4 - cms+dbs-client+DBS_2_1_0_patch_2 Changes for Site Agents - Watchdog - change the default receive-buffer in the watchdog agent from 1KB to 1MB. this should stop the occasional message from being truncated - BlockDownloadVerify - Set requests to Error when test is abandoned after 10 failures - Recover request left in Active state from previous failure runs - Add stop hook which will upload to the DB any request in memory before Agent is stopped - Add global caching container for all technologies - Remove old namespace methods and making new pluging technologies mandatory - posix technology is the default Namespace - use the global cache provided by the agent if exists, for all technologies but pnfs - posix and dCache caching mechanims ON by default - added Chimera dump parsing option for dCache technology - added support for gzipped Chimera dump files - Test queue is sorted by dataset ID, after test priority and test expiration time - Should increase cache efficiency by ensuring that files in the same dataset are tested consecutively - Tests for more recent blocks are executed first - dbs consistency tests are now deprecated and will be rejected by the agent - BlockDBSPopulate - Agent was obsolete since the CAF DBS descoping and has been removed from the release Changes for Central Agents - FileRouter - Re-activate requests even if over $WINDOW_SIZE. This avoids queue blockage if the a priority window overflows for a node (#66119). Print warning when priority window has overlowed. - Log total re-activated/newly-activated bytes as well as files Changes for all Agents - call stop() before closing database connection, in case the agent cleanup needs it. - Loader.pm: transparent change to reduce the size of the error logfile for the data-service Changes for Operator Tools - FileDeleteTMDB: additional debugging output - DBSCheck: update to use new version of DBS client scripts - Toolkit/DBS: - Removed from PHEDEX agent distributions. Now only packaged with PHEDEX-micro - Unused DBS client scripts removed - Useful scripts updated to work with python 2.6/DBS_CLIENT_2_1_0 - Added --size option to printout file size in DBSgetLFNsFromBlock - Utilities: - Various operator tools added to PHEDEX-micro distribution Changes for Schema - OracleInit.sql: reorder schema deployment Changes for PhEDEx 3.3.1 Debendency Changes - Oracle: 10.2.0.3 => 11.2.0.1p1 - Fix architecture lookup in oracle dependencies which broke deployment on 'athlon' systems Changes for Site Agents - BlockDownloadVerify: - Add --dbs_url option, for specifying which DBS to execute dbs consistency checks on - Bugfix: avoid hanging transactions by properly committing 'Active' state - Optimization: bulk insert of test results - Optimization: dynamic queue filling to optimize duty cycle - More performance-related output when PHEDEX_DEBUG enabled - Optional caching mechanism for dcache and posix plugins - posix plugin uses 'ls' instead of 'stat' - implemented checksum test for posix plugin - updated help messages for all plugins Changes for Central Agents - BlockActivate: checks consistency of activations and rolls back any inconsistent one - BlockDownloadVerifyInjector: - Fix "Indeterminate" test status - Submit migration tests to Buffer node instead of MSS - For logging, correctly count tests injected when not using the dropbox - BlockDBSPopulate: migrate blocks in read-write mode in order to migrate parentage - BlockMonitor: lock block replica table for the duration of the update in order to prevent block activation/deactivation in the middle - PerfMonitor: SQL optimization Operator Tools - OraclePrivs.sh updated to give "flashback" privileges to CERN role - FileDeleteTMDB bugfix to be compatible with new BlockActivate logic: now expands collapsed blocks before doing file-level invalidation - RouterSuspend: supports wildcard node names Changes for Developers - Included MakePerlDocs.pl, for building developer documentation for agents and schema Changes for PhEDEx 3.3.0 Dependency Changes - DBI: 1.5.0 => 1.609 - DBD::Oracle: 1.17 => 1.23 - POE: 1.003 => 1.287 - Log::Log4perl 1.16 => 1.26 - Log::Dispatch 2.21 => 2.26 - Log::Dispatch::FileRotate 1.17 => 1.19 - XML::Parser 2.34 => 2.36 Changes for Site Agents - FileDownload FTS backend: - Add the 'Hold' FTS job state - Prevent flooding logs with warnings about unknown job stats - FileRemove: - Optimizations to allow faster processing of large deletion queues. Latencies now ought to be dominated by time to execute deletion command. - Add "dstats" line to the log files for each file deletion, giving relevant timestamps and deletion latencies - Add --timeout option for timing out deletion commands - FileMSSMigrate: - provide file checksum to migration checking routine - allow default Castor routine to check migration of non-custodial data - BlockDownloadVerify - Add --queue_length argument, to tweak the number of tasks to fetch from the DB per cycle - Rework test retry logic to work better with large queues. File existence tests are allowed to fail up to 10 times and are retried with a lower priority than untried tests. - Correctly expire requests (#58378) - Correctly set the interval to check (#58223) - Watchdog - New --limit argument, for restarting agents when memory or CPU thresholds are reached. - Master: correctly ignore "stop" when agent was never started - TMDBInject: updated SQL to protect against bad query execution plan Changes for Central Agents - File Routing: - Allow path-probing probability to be configured with --probe-chance (#51482) - Fix priority mismatch when adding router statistics to t_history_link_stats (#47059) - Change "priority window" accounting to not be a strict 10 TB per priority. Cease to count requests which have more than 5 routing attempts against the queue limit, giving new files an opportuntiy to be routed. This prevents a destination from being blocked when the request queue has been completely filled by unroutable or untransferrable files. (#32517) - Put "forgotten blocks" into a special suspension state so that the router can forget about them for awhile. Blocks which have over 50 attempts for every file and were activated 30 days ago are put into this suspension state, deactivating them for 15 days. - Better rounding of filesize when calculating cost prototypes, preventing small files from having an exagerated latency. (#54285) - Failed routings have their request set to a distinct inactive state in the database. Totals of failed routings are written to the logs. No longer store the list of failed routings in memory. - In the slow flush (every 30 minutes) make 3 invalid paths valid for every source/destination pair. This prevents the router from starving a link completely, guarunteeing a minimum number of queued transfers for each destination. (#47060) - Increase rate threshold for expiration time exention to 0.5 MB/s from 1 kB/s. Also, do not extend expiration time if the estimated latency over a link is greater than 3 days. This is done in order to match the extension threshold with the nominal rate and maximum latency values used for path prototype calculation, and to decrease the latency until re-routing for failing links. - Deactivate expired requests before removing paths with deactivated requests. This reduces re-routing latency by 0.5 hours. - No longer add a random delay to file re-routing after expiration. - Randomize path choice when all link parameters are equal, instead of falling back to perl's hash ordering. - Correctly count paths per block for t_status_block_path (#30860) - PerfMonitor: - Fix calculation of recent link performance. First, require at least 1 hour of pending queue statistics before giving a link a rate of "0", in order to give the system enough time to allow a transfer attempt. Second, correctly join link _stats and _events tables in a way that doesn't require simultaneous entries in each. (#56009) - Increase the minimum time window for observing statistics to 2 hours from 1 hour. This is to enable the above "at least one hour pending" change above. - Optimize slow SQL for "closing" t_history_link_stats - Monitor missing files/bytes in t_history_dest - Log inserted/updated row counts - FilePump: removes logical replicas due to a deletion, instead of FileRemove. This reduces lock contention in the database. Logical replica is now deleted *after* the deletion job is finished, instead of before. - BlockMonitor: rewrite to optimize and eliminate ORA-08177 errors. Entire DB state now updated in one transaction, meaning that the block replica table contains a complete and consistent snapshot of the database state at one moment in time. Slight change to monitoring of newly subscribed blocks: the statistcs are now updated before the block is re-expanded by BlockActivate. This fixes #58516. - BlockActivate: only activates blocks which have node_files = the total number of files in the block. Allow a mixture of is_active state. - BlockDelete: optimize SQL to reduce cycle times (#59722) - InvariantMonitor: change all log output to "Alert". Everything this agent prints is because of some problem. - LoadTest generation: set default protocol to srmv2 instead of srm in srm.sh Operator Tools - RouterSuspend: view/unsuspend the "forgotten blocks" now suspended by the FileRouter Changes for Developers - All calls to &dbexec and &dbbindexec now return their execution time. - Add PHEDEX_DB_SIM_FAIL environment variable for simulating random SQL statement failures. Changes for PhEDEx 3.2.10 - Dependency Change: dbs-client DBS_2_0_6_patch5 -> DBS_2_0_9_patch4. - Dependency Change: dls-client DLS_1_1_0 -> DLS_1_1_2 - Consistent dependency platform for SLC4 and SLC5 builds. Build with PKGTOOLS V00-11-00 (from V00-09-02) - Move from 'cms' to 'comp' repository. - BlockAllocator: fix bug which prevented properly updating "done" blocks Changes for PhEDEx 3.2.9 This is a minor release, fixing a few bugs in 3.2.6. All sites should upgrade to this version. Changes for Site Agents - FileDownload - fix bug in protection for rare case of badly-defined tasks. Without this the agent would crash, now it will skip the task and move on. Genuine tasks will be retried in time, so no transfers will be lost. - BlockVerify - fix bug in handling of TFC lookup when using SRM(2) and the new Namespace framework. - FileRemove - add an extra commit() to release row-locks early. This prevents the agent from blocking central agents because of these locks. - add --no-retry option. The behaviour of FileRemove changed in 3.2.0, such that failed transfers will be retried forever. If you do not want that, you can use this flag to revert to the previous one-shot behaviour. Note that, if your deletion script returns a non-zero value, the deletion is considered to have failed. If the deletion succeeds, or the file was not there in the first place, you should return zero from your file-deletion script. - change in format/content of deletion-job logfiles. The previous log file format was $time.$node.$fileid.log, now the $time is missing, if it simply $node.$fileid.log. This may affect any scripts that you have for analysing these files. Also, the contents may now be different. In the event that multiple attempts are needed to delete a single file, all attempts are recorded in the same file. Changes for central/all agents - BlockAllocator - fix bug relating to suspension of requests - Core JobManager will not now abandon a job that cannot be killed. This is a change in behaviour, and only affects agents that run processes that will not respond to kill -KILL. This means that your agent may block now, whereas previously it would instead have gone on to the next failure. In practise, if this condition is hit, it means you have a bigger problem to fix anyway. This condition is now logged more clearly. Changes for PhEDEx 3.2.6 This is a minor release, fixing a few bugs in 3.2.1 and adding a few small features. The watchdog agent is now considered production-ready, and you are encouraged to use it. (N.B. There are no versions 3.2.2 -> 3.2.5) Changes for Site Agents - Watchdog - no longer needs the '-config' argument, it takes the config file it was started from instead - watchdog will read the configuration file on startup, and will abort if it cannot successfully do so, since it can't start any agents in this case. - PHEDEX_NOTIFICATION_PORT does not need to be exported in the config-file environment, merely defined, like other parameters. - watchdog will abort if there is no PHEDEX_NOTIFICATION_PORT defined. It cannot function correctly without one. - watchdog will back off from trying to start an agent if it fails three times in a row. It will wait an hour, then try three more times, ad infinitum. Changes for central/all agents - The oracle error 'Can not safely replay action' will now cause a connection to be dropped and re-created. In the past this was not the case. This may help one or two cases of agents getting stuck (notably, the blockdownloadverify agent, during DB interventions). - BlockAllocator: Optimized. Memory use and cycle time drastically reduced. Changes for utilities/tools - Utilities/Master no longer depends on external Perl modules. This was accidentally introduced when the 'checkdb' argument was added in 3.2.1, and is now fixed. - new tool, Utilities/CheckDBConnection. Takes '-db $PHEDEX_DBPARAM' as arguments, and checks that your connection to TMDB is operational. This is used internally by Utilities/Master to implement the checkdb option, but can also be used standalone. It requires the full PhEDEx environment for access to external Perl modules. - new tool, Utilities/ping-watchdog.pl. Takes '--config $CONFIG_FILE' and attempts to extract the UDP port to communicate with the watchdog, and send a message to it. '--help' shows more options. Use this to check that your config file is set correctly for your agents to communicate with the watchdog agent. Changes for PhEDEx 3.2.1 Changes for Site Agents - FileDownload - pre-validation now used by default if '--validate' is given. In 3.2.0 there was a bug that caused pre-validation to only happen if it was explicitly specified on the command line. This restores the behaviour to what it should have been. - FileRemove - several minor bug-fixes related to handling deletions asynchronously. In 3.2.0, the agent could lose tasks if stopped/started while doing work, and could get asymptotically slowed down if it could not complete its work in one cycle. This release corrects that, and restores the expected behaviour. - note to sites: specify '-jobs ' in your configuration to allow deletion tasks to run in parallel. This may improve throughput. 2-5 is probably a good choice - do not set '-limit' too high, it will slow down the agent. The agent will execute the next cycle sooner rather than later if it had a full load of work in the current cycle, so throughput should be improved. - new agent: watchdog - this agent is optional, you are not required to run it. - this agent will watch other agents and restart them if they are down. Configuration is similar to normal agents: ### AGENT LABEL=watchdog PROGRAM=Utilities/AgentFactory.pl -db ${PHEDEX_DBPARAM} -node ${PHEDEX_NODE} -config ${PHEDEX_CONFIG_FILE} -agent_list download -agent_list download-migrate -agent_list remove -agent_list ... - The watchdog will only watch watch agents it is configured for, checking every 90 seconds. If an agent is down, it will be restarted. - You must set PHEDEX_NOTIFICATION_PORT=nnnn in your 'common' environment. The watchdog will then receive alert and fatal messages from all the other agents, as well as informational messages (startup, shutdown, and hourly process-level statistics). These will all be re-printed in the watchdog logfile, so you can check this one logfile to see the health of all your agents. If you run multiple sets of agents on one vobox, choose a different port-value per configuration. - If an agent does not send notification to the watchdog within 75 minutes, the watchdog prints a warning to that effect in its logfile. If an agent does not send notification for 2 hours, the watchdog assumes it is stuck, explicitly kills it, and restarts it. - The watchdog agent may be started or stopped at any time, regardless of which agents are running at that time. Start or stop it with Utilities/Master, in the usual manner. You can start it on its own, in which case it will start all the agents it knows about, or you can start all the agents by default, and the watchdog will figure it out. Stopping the watchdog will _not_ stop the agents it monitors. - if you stop an agent explicitly, using Utilities/Master, the agent will no longer remove the stop-file on exit. This allows the watchdog to know that you have stopped this agent, and it will not restart it. To restart the agent, either remove the stop file from the state directory and let the watchdog take care of it, or use Utilites/Master to start it. - You may be tempted to set agents to DEFAULT=off, so only the watchdog is started by default. If you do this, you will have to stop your other agents explicitly, by name, because they won't be stoppable by any other means. That is probably not what you want. - The watchdog agent will monitor the configuration file itself, and if you change it, it will re-read it. This allows you to add or remove agents from the list of monitored agents without restarting the agent itself. N.B. If you import other configuration files in the top-level config file, these are _not_ monitored for changes! Only the top-level config file is monitored. You can 'touch' it to force a re-read if you change a lower- level configuration file. - Recommended practise: leave all agents configured as they are, add them to the configuration of the watchdog, and stop/start your agents in the usual way. If you want an agent to be down, stop it with Utilities/Master, then restart it with Utilities/Master when you are done. Changes for Site Tools: - Master: Check DB connectivity by default when starting agents. Use '--checkdb' to check connectivity without starting an agent, or '--nocheckdb' to inhibit the checks Changes for PhEDEx 3.2.0 Changes for Site Agents: - FileDownload: - Options for constraining number of parallel transfers on a per-link basis or by transfer state are now available for all backends, not only FTS. This includes --link-pending-files, --max-active-files, --link-active-files, and --default-link-active-files. - --prepare-jobs is obsoleted. Pre-transfer tasks (pre-validate, pre-delete) are now queued and executed with a tranfer batch, instead of immediately when fetched. This should allow for smoother operation of the FileDownload agent. - Pre/post transfer tasks (validate, delete) now put into a priority queue to optimize for faster transfer job completion. - FTS backend: timeout for glite-transfer-submit - FTS backend: recover from errors when glite-transfer-status gives unexpected output (#46379) - FTS backend: group batches by SE-to-SE pairs, as required for FTS jobs - SRM backend: new -syntax option, with support for BestMan srmcp syntax - BlockDownloadVerify: - Add --namespace argument for loading a plugin from the new namespace plug-in framework. This framework contains plug-ins for querying or interacting with anything that provides a file namespace (e.g. storage system, TMDB). Plugins exist for castor, dcache, dpm, pnfs, posix, rfio, srm, and tmdb. - --storage-mapping optional. If not present, TFC saved to TMDB is used instead. - FileStager: new generic staging agent, based off of Custom/Castor/FileStager, which accepts arbitrary scripts for issuing staging submission and status commands from a list of PFNs. Templates added for submition/status scripts for castor, dcache and gfal based stage-ins. Changes for Site Tools: - ErrorQuery: --lfn option to list the lfns affected by the errors found - Master: add 'clean' command for removing state directories Bugfixes for Central Agents: - protect against file deletion vs. file transfer race conditions - FileIssue: fix duplicate error messages for indeterminate destination PFN - FileIssue: optimization: only check for TFC changes at 5 minute intervals, instead of for every task - LoadTestInject: change generated LFN structure (non-LoadTest07 compatibility mode only) to reduce the number of files that will appear in the same directory. - BlockMonitor: prevent "updating flags" log lines for all non-subscribed + collapsed blocks Changes for Admin Tools: - FileDeleteTMDB: fix deletion of empty blocks when provided a list of block names - TMDBInject: - update to use "version 2.0" XML and bring code in sync with the data service "inject" API. - supports injecting multiple checksum types (cksum: and adler32:) - note: has lost the ability to inject replicas at non-source nodes. Documentation: - Update ftsmap templates - Remove out-of-date README documentation - Remove obsolete configuration templates Changes for Developers: - Refactor PHEDEX::Core::JobManager to use POE::Component::Child, bringing PhEDEx command-management into the POE realm - Refactor FileDownload and its backends to use POE events to manage its workflow and state transfer, greatly simplifying the logic * Changes in PhEDEx 3.1.3 Bugfixes for Site Agents: - BlockDownloadVerify: - fix crash with un-'prepare'd DB handle - replace srm with srmv2 protocol * Changes in PhEDEx 3.1.2 Bugfixes for Site Agents: - FileDownload: - pre-validate false-positive bug fixed. A pre-validation script which was killed by a signal (either from an external command or a hangup signal due to the job timing out) would incorrectly be evaluated as a success, resulting in inconsitencies. - return codes fixed. 3.1.1 overloaded report-code=-2 xfer-code=-2, which could mean either a vetoed transfer or a lost task. Now vetoed transfers are -86,-2 and lost tasks are -2,-2 as always. - fix a bug where expired tasks would still be sent to the transfer backend - custodial flag now sent to post-validation as well as pre - expired tasks are now logged in the log files, but not in the database error tables. This will prevent the database error tables (visible from the web page) to be filled with expiration errors for a link - BlockDownloadVerify: - add a -preload argument, the value of which will set the LD_PRELOAD environment variable for dCache interactions. - protect against crashes due to transient database problems Bugfixes for Central Agents: - fix bug where changes to group would not be updated for inactive blocks - BlockDBSPopulate: - configurable timeout for commands - fix bug on deletion/retransfer issuing oscillating deletion/migration commands Changes for Tools: - InspectPhedexLog: fix for bug #37073: trailing whitespace removed from classification - FileDeleteTMDB: - redefine file invalidation syntax from "-nodes all" to an explicit -invalidate option - error check arguments relating to -invalidate and -bulk - separately look up containers in order to allow deletion of empty datasets/blocks - BlockConsistencyCheck: - -preload argument, as in BlockDownloadVerify - -incomplete-blocks option, otherwise default to check only completed blocks * Changes in PhEDEx 3.1.1 Bugfixes for Site Agents - FileDownload 'execute_array failed' error when uploading the error report fixed. This prevented all agents from uploading error reports and would result in even successful transfer tasks to be thrown away. - FileRemove documentation corrected, -storagemap is no longer a valid option - Custom migration agents (FileCastorMigrate, FiledCacheMigrate, FileSRMMigrate) removed, as they are obsolete in favor of FileMSSMigrate. Bugfixes for All Agents - 'uninitialized value ... Logging.pm line 107' warning fixed * Changes in PhEDEx 3.1.0 This is a major upgrade over the 3.0.x series of PhEDEx, and is a mandatory upgrade. There is a new schema which implements custodiality and user-group management. Old agents and tools are not necessarily backwards compatible, some agents have new or deprecated options so your configurations need updating. T1s need to consider their TFCs carefully in order to implement correct handling of custodial data. Changes for All Agents - allow easier debugging by optionally sending all error messages and alerts to a UDP port, specified by PHEDEX_NOTIFICATION_HOST and PHEDEX_NOTOFICATION_PORT in the environment. Alternatively, can specify NOTIFICATION_PHOST and NOTIFICATION_PORT per-agent. You will need a udp-client to listen for the messages, we do not yet provide one in PhEDEx. Changes for Central Agents - FileIssue no longer issues the PFN for source and destination of transfers. These are instead calculated in the download agents themselves, and FIleIssue simply checks if the transfer is possible. Changes for Site Agents - FileMSSMigrate now takes an optional --auto-migrate-noncust argument, which takes no value. The default is false, specifying this argument causes the agent to assume that all non-custodial data is 'migrated' automatically, and to flag it as such. Also the checkFileInMSS function signature has changed to checkFileInMSS($pfn,$is_custodial), i.e. there is an extra argument to denote custodiality (values 1 or 0). - FileRemove no longer takes '-storagemap', instead it gets its catalogue from TMDB. This is to ensure consistency across all agents, so they are all using the same version of a sites catalogue. Note that FileRemove is agnostic to custodiality. This is probably not a problem. - FileDownload now supports pre-validation of transfers, using the same validation script for pre- and post-transfer validation. The pre-validation case is distinguished by the 'pre' string passed as the transfer status. Several new arguments are added to allow control of the behaviour: -no-pre-validate disable pre-validation -no-pre-delete disable pre-deletion -util-jobs=i max number of jobs to run in parallel -util-timeout=i timeout per job -prepare-jobs=i number of jobs to queue for preparation FileDownload also calculates the PFNs for source and destination, instead of taking them as input. This allows rapid response to changes of a sites TFC, and allows immediate termination of data-transfer from a site by removing its TFC from TMDB. Equivalently, stopping your export agent will soon mark the TFC as stale, and transfers will halt. FileDownload also passes a custodiality flag to the pre-validate script, which now takes to_pfn, filesize, checksum, and is_custodial (0|1). - FTS backend no longer takes a global spacetoken option. It is expected that spacetokens are retrieved and used from the TFC only. Failure to open an FTS mapfile is now a fatal error. Changes for Tools - new Utilities/ShowAgents, show the status of agents at sites - FileDeleteTMDB: remove whitespace around searches - InspectPhedexLog: bugfixes for XML format and start/end time calculations - TestCatalogue: Add -C option for custodiality (y|n). This is needed for the round-trip PFN->LFN->PFN lookup, since you lose knowledge of custodiality on the PFN->LFN lookup. - phedex: correct case-setting for keys in different formats, friendlier error-messages unless -debug is set. - phedex subscribe: add -custodial, -group, and -request-only options * Changes in PhEDEx 3.0.7 Changes for Site Agents - FileDownload FTS backend: - Fix 3.0.6 bug which allowed unrestrained submission of transfer jobs - Count "Submitted" state against pending and active totals - Mark all jobs as active as long as the monitor knows about them. This effectively makes "agent lost the transfer" errors impossible. - Allow -jobs to be set, if the user wants to limit the number of jobs directly. the default is 0 for infinite. - Fix logged FTS password bug - BlockDownloadVerify: reduce cycle time to 30 seconds Changes for Tools - InspectPhedexLog: allow in-line repetition for srm regexp - FileDeleteTMDB: escape underscore characters when using wildcards in LFN/Block name - MakeDailyReport: avoid using separate events to compute ratios in the transfer quality columns Deployment Changes - Fix Oracle libs installation problem on athlon architectures - Explicitely set execution bit where needed * Changes in PhEDEx 3.0.6 Changes For All Agents - process statistics now included in all agents, allows to spot memory leaks etc. Interval for reporting statistics controlled by STATISTICS_INTERVAL, detail of statistics controlled by STATISTICS_DETAIL. Look for AGENT_STATISTICS in your logfiles. Changes for Site Agents - FileDownload agent bugfixes... - for 'agent lost the transfer'. The short story is that switching to anynchronous glite-* commands was not fully debugged in 3.0.4, and now it is. Hopefully. - debug printout of the agent-dump at the beginning of the job did not censor the FTS password (which had been read from disk by the time the dump was made). Now it is properly censored. General advice, don't give your logfiles to anyone you don't trust! - pre-delete hook now restored, as it was in the 2.x series of PhEDEx. - submission errors are now correctly reported, instead of being mangled. - debug printout streamlined to make future problem-solving a little easier - new backend option, '--job-awol', takes an integer number of seconds. If the monitoring is unable to determine the status of a job after this length of time, the job is abandoned. This is _not_ a timeout on the glite-status command, it's a total time period for which successful monitoring of a given job has failed, however often that may be. Set to zero (inactive) by default, it should be set high (3600?) to start with if you want to get experience with it. - per-glite-command options can now be passed in the configuration file. Eg: -glite-options Submit='options to glite-submit'. Changes for Tools - Utilities/phedex packaged for the first time, along with the PHEDEX::CLI modules it relies on. This is the PhEDEx command-line tool, not yet intended for general use, but we need to learn how to package it. * Changes in PhEDEx 3.0.5 Changes For All Agents - Logging is now done using the Log4Perl module. A default configuration file is used which makes this change transparent. By defining a new $PHEDEX_LOG4PERL_CONFIG variable one can override this default file and change the logging format or define new log files. Changes in logging can dynamically take effect by sending HUP to the agent process, or by restarting the agents. See perl_lib/PHEDEX/Core/phedex_logger.conf for more details and guidelines. Bug Fixes For Central Agents - RequestAllocator: duplicate key error fixed - BlockAllocator: Oracle bug workaround; now correctly deletes block-level subscriptions when a dataset-level subscription is made - BlockDownloadVerifyInjector: historic use of wrong date-formats now corrected - BlockDBSPopulate: fix for DBS not updating on deletion and increased DBS migration timeout - PerfMonitor: correctly report "Resident Data" for plots Changes for Tools - Toolkit/DBS/DBSInvalidateFile: added block invalidation - Toolkit/DBS/DBSMigrateDataset: check if block is already migrated; clean up SEs - Utilities/BlockConsistencyCheck: fixed documentation - Utilities/BlockDownloadVerify-injector.pl: add --incomplete option to allow taking all blocks, default is now to take only completed blocks. Add --force option to allow injecting duplicate test-requests - Utilities/MakeDailyReport: improve memory usage - Utilities/StorageConsistencyCheck: fixed documentation New Tools - Utilities/ProcMon.pl: watch all processes belonging to a given user for activity above threshold. Use for spotting runaway cronjobs or shells belonging to a given user * Changes in PhEDEx 3.0.4 Changes for Site Agents - FTS backend now calls glite* commands asynchronously to prevent failed commands from blocking the agent - FileDownload: Faster reconnection attempt when an error has made the database handle invalid New Central Agent: RequestAllocator - This agent expands requests for data with wildcards (only '*' is supported) into more subscriptions. This means that one can make a request for /A/*/* and get all of primary dataset 'A' forever, or until the request re-evaluation is cancelled. Changes for Central Agents - FileIssue: Improve block completion by explicitly ordering tasks by block. Centrally, tasks are now ordered by priority, request creation (newer first), then block id (~newer first) - BlockDeactivate: fix default HOLDOFF time - BlockDBSPopulate: bugfix to correctly fetch deleted blocks - BlockDLSUpdate: prevent cached records of deletions from preventing DLS update in the case of a retransfer followed by another deletion - BlockDelete: allow immediate re-deletions by removing file deletion requests for completed block deletions Changes for Utilities - TMDBInject: change hardcoded T0_CERN* protection to T0_CH_CERN for new naming convention - BlockConsistencyCheck: make -lfnlist optional, document -lfnlist - ErrorSiteQuery, InspectPhedexLogs: can now produce XML output - fts-transfer.pl: Now uses asynchronous glite* commands - ErrorQuery: add new patterns Schema Changes - Add foreign key to source request to subscriptions and block deletions - In tnsnames.ora, synchronize cms_transfermgmt* accounts with official CERN tnsnames.ora - Fix bugs in schema deployment script * Changes in PhEDEx 3.0.3 This is another minor upgrade over 3.0.2. No significant changes in functionality, but a number of small bugs have been fixed. This affects in particular the logging information available and the functionality of some of the utilities. Changes for all Agents - PHEDEX::Core::Logging fixed to support procedural calls everywhere, so non-agent scripts are no longer broken (TMDBInject et.al) - PHEDEX::Core::DB fixed to use procedural access to Logging.pm, for the same reason - PHEDEX::Core::Agent fixed to suppress spurious '1' printed at the begining of the line when PHEDEX_DEBUG is set - BlockDownloadVerify, a bug in the 'use_srm' option has been fixed Changes in the FTS backend - Further minor improvements in logging information Changes in other tools - InspectPhedexLog error-patterns improved - new tools, Utilities/ErrorQuery and Utilities/ErrorSiteQuery. See the built-in help for details * Changes in PhEDEx 3.0.2 This is a minor upgrade over 3.0.0. No significant changes in functionality have been made, some small bugs have been fixed. * Changes in the FTS backend - spurious warnings about orphans have been supressed - logging information sent to TMDB has been improved. This is necessary for the logmining to be able to produce sensible results. * Changes in PhEDEx 3.0.0 This release is a mandatory upgrade. The main changes with respect to the pre-release series are a number of minor fixes and enhancements. * Changes in the FTS backend - option to specify the maximum number of active files per link - options to pass the name of a myproxy server, an FTS password file, or an SRM spacetoken, to the FTS server. - job-state is saved and recovered across agent restarts - improvements to the logging, making more relevant information available. - verified that the SRM backend was not broken by these changes * Changes in PhEDEx 3.0.0.pre13 - Fixed a bug with the wrong version of FTS.pm in pre12. * Changes in PhEDEx 3.0.0.pre12 Changes for All Agents - minor typos/bugfixes w.r.t. 3.0.0.pre7 corrected. - other agents which had not yet been 'consolidated' have been split out into separate modules, so at least the basic consolidation is done. This is to allow the possibility to run them all as multiple-agents per process, rather than having some still requiring their own process just because we hadn't split out the modules * Changes in PhEDEx 3.0.0.pre7 Changes for All Agents - new Agent infrastructure based on the Perl Object Environment (http://poe.perl.org/). This introduces a new dependency and a new RPM is provided to satisfy it. Installation should proceed normally. - not all agents are using the POE framework yet, there is a POEAgent.pm for those that do. POEAgent.pm should be a drop-in replacement for Agent.pm for all agents, this simply hasn't been tested everywhere yet. - running multiple agents in a single process is now possible with no changes to the agent beyond using the POE framework. Agents in the same process can share a database connection or not, configurable on a per-agent basis. Configuration-file syntax is compatible with existing syntax, only the semantics change. We are testing this at CERN, documentation will follow. Changes for Site Agents - new FTS backend. This is a significant re-write of the original. The SRM backend is unchanged. The new agent is in Toolkit/Transfer/FileDownload2, this is a temporary measure until it is fully tested. It has the following features: - per-link monitoring of transfers, with statistics and an 'isBusy' function calculated per link. This makes it practical to once again run only one FileDownload agent per site, instead of one per link. - asynchronous file-based monitoring, not synchronous job-based. FTS servers are monitored at a fixed rate, independant of the number of jobs submitted, so the FTS server will never be overloaded. - per-file and per-job callbacks allow the FTS backend to react to state changes as soon as they become known, not only after job completion. - number of jobs is no longer an inflexible limit. The process uses the actual link-state to determine if it is safe to submit more transfers. - eliminates use of transfer-wrapper and ftscp scripts, which will greatly reduce the number of processes running on a busy vobox. - uses the POEAgent infrastructure. Unfinished features - FTS backend - the FTS backend does not support SRM yet. - nor does it remember jobs in the queue across stop/restarts - some effort is doubtless needed for optimisation - SRM backend - need confirmation that we haven't broken anything. - Multiple agents per process - agent-reporting in the Component::Status page is not yet perfect. - better monitoring is needed to make sure that one agent does not starve another for CPU (POE is cooperative multi-tasking, not pre-emptive) * Changes in PhEDEx 2.6.3 Bug Fixes For Central Agents - BlockDLSUpdate: Fix bug in update of deleted blocks which resulted in them not being reported to DLS Changes for Site Agents / Tools - TMDBInject: Fix problem where some character data was injected as type NCHAR, resulting in extremely slow injections - InspectPhedexLog: extended date regexp, minor changes in output (per site rates, better formatting) - BlockConsistency: Fix forced expiry which prevented tests from being performed - BlockDownloadVerify-report.pl: provide sensible default for age, update help - StorageConsistencyCheck: added an option to check an LFN list against the replicas in a given storage element - Master: on 'stop', wait for processes to be terminated before exiting T0 / CAF - Special code for supporting the CAF transfers added: Utilities/stagercp, Utilities/wrapper_rfcp, PHEDEX::Transfer::Stager - BlockDBSPopulate added: New agent that migrates blocks to a target DBS (usually local) once the block is completely transferred to node (SE). Et Cetera - Toolkit/DBS/DBSInvalidateFile: Added option to change the status of a whole block - Toolkit/DBS/DBSMigrateDataset added: Helper script to migrate a dataset between DBSes * Changes in PhEDEx 2.6.2 Bug Fixes for Site Agents - Fixed the "misconfigured -accept" bug in FileDownload, which caused a locking of the entire system if an -accept or -ignore flag contained nodes which do not exist in the databaes. - Bad -nodes, -accept, or -ignore arguments are now a fatal error, killing the agent - Better argument checking for state and logfile directories, missing directories are now a fatal error Changes for All Agents - PHEDEX_DEBUG environment variable can be switched on to turn on DEBUG output for agents. For all agents one will at least be able to see the cycle time (for idle()) and sleep time printed to the logs. - PHEDEX_VERBOSE environment variable can be switched on, for future use. Most agents don't have special verbosity levels at this time. New Features for Central Agents - Block latency logging done by BlockAllocator Bug Fixes for Central Agents - BlockDelete ensures distinct results from move-triggered deletions * Changes in PhEDEx 2.6.1 - No changes; just a fix to the dependencies configuration * Changes in PhEDEx 2.6.0 Code Consolidation - BlockActivate, BlockDeactivate, BlockMonitor, BlockAllocator, BlockDLSUpdate split into perl modules - Toolkit/Common code depreciated, all dependent code now uses PHEDEX::Core::* library New Features for Site Tools - ftscp updated for use with SRM v2.2 (-token option added) - Custom/Template/FileDownloadSRMVerify updated for use with SRM v2.2 (make option -d compatible) - Utilities/StorageConsistencyCheck and Utilities/BlockConsistencyCheck work with single LFNs or lists of LFNs from the command line. Bug Fixes for Site Tools - Master can now be called from an absolute path - FileDownload purges tasks before filling backend, as an optimization - Remove -accept and -ignore options from FileRemove, which are uneccessary Schema and Central Agents - BlockMonitor works in groups of blocks, saving memory - BlockDeactivate works in groups of blocks and deactivates more frequently than before - Fixes in BlockDeactivate mean LoadTestCleanup will be more effective Dependencies - Dependency on srmcp package removed, new v2 srmcp package to be made available as a separate install - dls 1_0_3 -> dls-client 1_0_4 * Changes in PhEDEx 2.5.4.2 New Features for Site Tools - /Toolkit/Transfer/FileRemove changes - New '-limit' option limits file deltion attempts per minute. The default is 100. - Delete TMDB replica information from buffer node when deleting for an MSS node. - Utilities/Master: rewritten according to Consolidation project guidelines, including POD documentation. The changes are fully backwards-compatible, with a few new features: - use '--dummy' on the command-line to get a printout of the action, for all actions (not just 'start') - use the 'jobcount' command to get the total number of jobs that will be run in a given configuration. - Utilities/BlockDownloadVerify-report.pl now offers filters on age of the reported test and on block-name Schema and Central agents - Toolkit/Verify/BlockDownloadVerifyInjector now cleans t_dvs_files for very old checks, to prevent the table growing too big. - Toolkit/Workflow/BlockAllocator and Toolkit/Workflow/BlockDelete completely rewritten: - Improved logging messages about work that was done - Bugs in handling of deletions/moves fixed - Moves are now always done on the block level. Moves of datasets trigger deletion of the source blocks when the replication is complete. - Moves now only remove source blocks for sources with no subscription. Web interface handles the request and removal of the source subscription. - Moves will not delete data from a T1. - Moves "expire" 1 week after replication is done. Finished or expired move subscriptions automatically turn into replica subscriptions. - Rewritten schema for web requests * Changes in PhEDEx 2.5.4.1 New Features for Site Tools - BlockDownloadVerify agent released for the first time. Only namespace checks for size of files and migration at castor-sites for now, plus a placeholder for DBS checking, to follow soon - BlockDownloadVerify and associated tools moved to new PERL module structure. PERL5LIB must be set correctly, the RPM installation should take care of this automatically - InspectPhedexLog expired transfers are no longer counted for success rate. Added options to inspect files with multiple failures (-m flag) this helps to identify reasons for long transfer tails external to the PhEDEx algorithms (resource related problems) Schema and Central agents - BlockDLSUpdate and BlockDownloadVerifyInject now both trigger insertion of test-requests, either on block-closure (BlockDLSUpdate) or when files are not moving on the LAN or WAN (BlockDownloadVerifyInject) - LoadTestInject agent now only throttles injections when there is enough data to transfer at the rate for 6 hours - BlockDLSUpdate agent remembers successfully issued dls-add commands and does not re-execute them every cycle. Additionally, log output is reduced by only saving one log file per SE name / block ID pair instead of one logfile per error (timestamped). - BlockDLSUpdate now uses routines from the BlockDownloadVerify::Core module, so it too needs PERL5LIB to be set up correctly Bug Fixes - InspectPhedexLog fix for start/end dates when running over multiple logs * Changes in PhEDEx 2.5.3.6 New Features for Site Tools - No changes wrt. PhEDEx 2.5.3.5 Schema and Central agents - FileIssue does destination-match instead of remote-match for TFC rules - FileRouter random probing fixed for case where link is of the worst possible quality - Centralised LoadTest injection and management added Operator's Tools: - No changes wrt. PhEDEx 2.5.3.5 Bug Fixes - No changes wrt. PhEDEx 2.5.3.5 * Changes in PhEDEx 2.5.3.5 New Features for Site Tools - LoadTest07Inject now packaged with PhEDEx and with new features with respect the other versions floating around. This version by default closes blocks once they reach the user-specified file limit (default 100) and marks datasets as "transient". This allows the blocks to collapse when they are finished and another agent (LoadTestCleanup) will incrementally clean up finished blocks. - New LoadTest file creation scripts - Addition of FileMSSMigrate agent, which combines separate backend scripts (Castor, dCache) for file migration into one agent. This is now the officially supported way of running a migration agent. - Template of FileDownloadSRMVerify added code compatibility with SRM 2.2 systems - InspectPhedexLog - Canceled/Expired files treated seperately. - Simple histograms on per error class basis Schema and Central agents - FileRouter more strictly enforces transfer window limits - FileRouter now fills priority windows to WINDOW_SIZE per priority level. WINDOW_SIZE can be specified by a command-line argument - FileRouter observes the 'is_active' column in the links table, allowing us to enable and disable links easily - FileCastorStager.New is renamed to FileCastorStager. There is no FileCastorStager.New anymore! Operator's Tools: - FileDeleteTMDB is made more robust, with optional bulk deletion and easier to use block-expansion - DDTLinkManage for enabling and disabling links - BlockConsistencyCheck code cleaned up, SQL improved, some output moved to STDERR to keep STDOUT cleaner, online help improved. Bug Fixes - Minor documentation updates in tools and agents - FileDownload does a sanity check for node existence in the DB * Changes in PhEDEx 2.5.3.4 Schema and Central agents - FileRouter now keeps a status table of paths from source to destination, aggregated per block. These statistics will be available via the web and can show current routing status as well as the number of transfer attempts per block and the time a block has been waiting to be transferred. - FileRouter no longer writes multi-WAN-hop paths to the DB. - InfoStateClean now keeps 100 recent errors by link New Features for Site Tools - ftscp: Added the ability to specify MyProxy server by adding flag '-m=myproxy-fts.cern.ch'. Made -passfile optional, to adapt to passwordless proxy delegation mode of FTS2.0 - BlockConsistencyCheck: Allow explicit Migration check, as opposed to implicit check by selecting SE. Better printout. Support LFN and autoBlock options. Bug Fixes - ftscp: Fixed a small bug in error message when invalid arguments are given. Fix argument parsing of -mode option - StorageConsistencyCheck: Fixed reading of STDIN - InspectPhedexLogs: Fixed problem of ignored date options. Fixed bug in histogram printout. Started parsing of validate failures for errors without details. * Changes in PhEDEx 2.5.3.3 Bug fixes - Fix status checking in ftscp script to handle Finished FTS state New tools - Utilities/StorageConsistencyCheck and Utilities/BlockConsistencyCheck for comparisons against TMDB * Changes in PhEDEx 2.5.3.2 Bug fixes - Fix exit code of ftscp to handle Finished FTS state - Fixes to FileRouter to stop dead path build up Dependencies - dbs_client 1_0_5 - dls 1_0_1 * Changes in PhEDEx 2.5.3 General changes: - Updates to ftscp script to support new FTS2.0 states and to support multiple end points, currently via static map file. - Updated FileDownload to reflect changes in ftscp - Updated FileRemove to not activate blocks - Updated InspectPhedexLog - Added InvariantMonitor to central agents to detect problems and inconsistencies in the database - Improved reporting from UtilsDB - Reduced number of database connections in DBSCheck Web site changes: - Not released in RPMs. Added Data Manager to the site Schema changes: - Changes necessary for the updates to the website - Removal of site/user information - now stored in SiteDB * Changes in PhEDEx 2.5.2 General changes: - Prevent expiration of tasks in queue, for well working links. Definition of well working: More than 1 MB/s on average. - Partial fix to the memory leak in FileDownload (reduced leakage). - Fixed an issue with partial paths being left over in TMDB. - Improved argument checking for backends. - Preparation for proper FTS backend, that allows per file tracking. Full implementation scheduled for 2.5.3. - PerfMonitor now can handle already compressed data, when compressing. - TMDBInject will ignore T0 when called via CERN-SE. Injection into T0 will be done by providing the node name explicitly. - BlockDLSUpdate can now handle DBS-2. - Improved FileExporter config in template files. - Documenation updates for some agents. Web site changes: - Not released in RPMs. Schema changes: - Improved agent feed-back to TMDB. * Changes in PhEDEx 2.5.1 General changes: - BlockDLSUpdate now supports block de-registration in DLS. - Infrastructure agents now make the system more reactive to dead links and sites recovering from previous errors. - Corrected some issues with the template configuration files. - FileDownloadDelete retruns proper exit code. - Prevent the Glite environment from overwriting the proxy location. - FileDownloadVerify supports now DPM via dpns-ls. - TestInstallation script updated to support version 2.5.x. - Bug fixes in FileExporter and PerfMonitor. - Reduce DB load caused by FileRemove agents. Web site changes: - Not released in RPMs. Schema changes: - Prevent close loops between FilePump and FileIssue agents. - Update table privilege settings. * Changes in PhEDEx 2.5.0.1 General changes: - Correct -accept agent option handling. - Stop BlockDelete from being chatty. - Improvements to template config files. - Removal of obsolete scripts and templates. Web site changes: - Not released in RPMs. Schema changes: - Combine all site registrations to a single script. - Relax constraints on identities. - Update table privilege settings. * Changes in PhEDEx 2.5.0 The changes for this release are too numerous to list them in great detail. Hence, what follows is a brief overview of the main features and improvements. General changes: - Transfer workflow optimized, so that sites need to run less agents. Most sites will only need three agents: FileDownload, FileRemove and FileExport. - Individual agents can run and complete tasks for multiple sites. - New agent work scheduling leading to: - Reduced DB load. - Reduced amount of parallel DB connections. - Agent shutdown doesn't interrupt running transfers and allows draining of task queue. - Improved fair share taking into account link performance and preventing starving of slower links. - Logs of failed transfers accessible via web page. - Transfer prioritization. - Only one cool-off with reduced cycle time. No resetting needed anymore. - PhEDEx now keeps record of the SE names for each node. - BlockDLSUpdate can run centrally for multiple sites. - Subscriptions now possible for full datasets and individual blocks. - Improved robustness of agent communication with TMDB preventing agents from blocking each other. - PhEDEx now benefits from running on a DB cluster. - Tools for automatic synchronization with DBS. Web site changes: - Subscription allocation, modification and deletion via web page for site administrators authenticated by grid certificates. - Improved status and summary web pages, including better plotting. - New sites can now sign up via the web site. - Various improvements concerning the layout of the web interface (e.g positioning of scroll-bars, etc). Schema changes: - General performance improvements by adding indexes and partitioning tables. - Improved data consistency by adding constraints to most of the columns. - t_xfer_* tables changed to reflect new transfer status tracking by avoiding updating of tables in order to prevent DB locks. - Re-organized status tables t_history_* and t_status_* and put them in a separate module. - Improved performance of OracleStatsUpdate. - Rename administrative tables t_node and t_link* to t_adm_*. * Changes in PhEDEx 2.4.2 General changes: - Corrected FileDeleteTMDB deletion of local files. - DropLoadBalance updated; the agent did not work at all. Web site changes: - Corrected "interesting rows" selection on replica pages. - Subscription page now points to replica page. Schema changes: - TNS service names updated to allow use of streaming extensions required by database applications other than PhEDEx. PhEDEx service definitions enable TCP keep-alive option for improved detection of lost database connections. * Changes in PhEDEx 2.4.1 General changes: - BlockDLSUpdate now accepts an ASCII txt list as option to '-se', containing pairs of nodes and storage elements. The old form of one '-node' and one '-se' ist still supported. - FileDeleteTMDB is now allowing sites to delete replicas from their site. Detailed instructions can be found in README-DeleteReplica.txt. - FileRouter now supresses multihop transfers. Web site changes: - Website was rewritten. - Extended documentation on how to setup the PhEDEx web service can be found in Documentation/Webconfig. Schema changes: none. * Changes in PhEDEx 2.4.0 General changes: - New BlockDLSUpdate agent to add fully transferred blocks to DLS. Please see README-Deployment.txt for configuration details. - FileCastorStager.New allows an option to throttle back stage-in to avoid Castor overload. Use option "-protect" with an external script to detect overload. - FileDeleteTMDB updated to work also by blocks, to accept wildcards for node names, to remove files from the storage and to show more progress feedback. The script now acquires required locks to protect data integrity. - Default site configuration templates (Custom/Template) updated. - Minor bug fixes to the SRM stager agent, daily report generation and block monitoring and deactivation agents. - T0FeedTest testbed to generate and feed large file samples. Web site changes: none. Schema changes: - tnsnames.ora updated to include other CMS databases. * Changes in PhEDEx 2.3.12 General changes: - DropTDMBPublisher -node option is now optional. If the agent is started without the option and default node to inject files to, the drops will be required to have a non-empty PhEDEx-Nodes.txt. This prevents production system malfunction from registering files for a wrong node in TMDB by accident. - FileRouter now routes for nodes running just export agents but no download agents. This allows site to participate as a source. - New FileDeleteTMDB for removing replicas and files from TMDB. Includes many features from Brian Bockelman's retransfer tools. Web site changes: - PHP scripts now hard code TTF font file path to avoid issues when the AFS web site is automatically recreated from CVS. * Changes in PhEDEx 2.3.11 General changes: - Added documentation to run in the LCG VOBOX environment. - Obsolete DropMTCCPublisher updated from last used version. - DropTMDBPublisher now takes "Options.txt" in the drop, which can include "strict", "!strict", "verbose", "!verbose" options to TMDBInject. DropTMDBPublisher now requires the file data file name to end in .xml (previously .xml and .txt). - TMDBInject now adds the DLS attribute for a DBS. - FileRouter routing threshold increased from 1 TB to 5 TB. The transfer expire time is now automatically extended by two hours on links performing well enough. That is, the transfers expire at the set limit only if there is no recent progress. - Added a missing table name alias in BlockActivate. - NodeNew now only creates new nodes. New NodeRemove. - LinkNew, LinkRemove are new tools to administer links. Web site changes: - Preliminary work on updated transfer request handling. - Infrastructure for future web site changes. - Better coexistence of multiple schema versions. - Cumulative transfer plots match better transfer rate page. - Considerably reduced memory usage in the PHP scripts. - Bugs fixed for TTF font handling for newer web servers. Schema changes: - t_dps_dbs table now includes DLS contact as well. - Preliminary work on updated transfer requests (t_req_*). - Preliminary work on administrative tables (t_adm_*). - Various typo and interference fixes in *.sql files. * Changes in PhEDEx 2.3.10 General changes: - Various bug fixes and improvements to dCache migration agent. - Download agent now records the error message printed into the logs ("alert:" line) to the transfer state table, so remote sites can see the details of why files are in error states. - File router now requires a live download agent at destination and a live export agent at source to consider a link usable. - File router now routes files by earliest estimated time of arrival, aka "dynamic routing". The time estimates are based on recent transfer rate performance. - Performance monitor now guarantees "closed" link performance statistics: for links with previous non-null statistics, a row of null performance data is inserted to guarantee a drop to "unused" state. Also if there are no pending traffic on a link, transfer rate is reset to null (as opposed to zero). Web site changes: - State detail and transfer rate page allow node selection. - The transfer states are now shown as names instead of numbers. - Plot pages show one plot at a time and plot names are better. - PHP scripts doing plots now consume massively less memory. - Transfer state detail page now offers a link to files in each state combination. For files in transfer this shows the UTC time of the transfer start, and for files in error states, up to 4000 characters of the (last) full error message as printed to the download agent logs at the destination site. Schema changes: - t_xfer_state adds column 'last_error' for the first 4000 characters of the last error message (if any). - sqlplus login script sets variable substitution character to ':'. * Changes in PhEDEx 2.3.9 General changes: - Further improvements to the site configuration template. There are now better defaults and much more extensive documentation. - Continued bug fixes and tuning for time-out handling. Agents now remember which subprocesses they have timed out to ensure they really have the intended grace time to exit. - Agents now redirect input from /dev/null instead of closing it. - DropTMDBPublisher updated for TMDBInject changes. - File router no longer accesses t_link_param as a part of the migration to the new link performance estimates. Previously failed transfers are given slightly more time each subsequent time (8-12 hours instead of previous 8 hours transfer time). - FTS transfers no longer abort on "Hold" status. Spurious status will lead to cancelling the job so we don't leave behind junk. Web site changes: - Rate monitoring page now reports estimated link rate and latency. - Tier-2 workshop documents added to the list of documents. - PHP scripts given even more memory. Schema changes: - Updated Oracle server details. - A trigger now removes matching file request when a new replica is created. This should avoid the race condition between router and recycler where we deleted both the file request and file replica. - t_link_param recreated with values for estimated link performance. - t_link_histogram now records then-current link parameters. * Changes in PhEDEx 2.3.8 General changes: - Considerably improved site configuration template. - Database handling changes: - Try harder to detect dead / unusable database connections. - When LogSQL is "on", print null values nicely. - Download agent changes: - Post-transfer deletes on failed transfer weren't working; fixed. - Timed out transfers now get 30 seconds to exit per signal sent. - Improvements to a configuration using pure SRM interface: - Download verification script trusts the exit code from the copy. The checking of download success using SRM was too expensive. Now also passes certificate proxy options to the srm-* commands. - Deletion script only deletes after failed transfer, no longer before every transfer as the latter was too expensive and post failure handling was considered good enough. Now also passes certificate proxy options to the srm-* commands. - Migration agent rewritten. - Improvements to the stage-in agent. - Main export agent provided. - TestCatalogue did not correctly verify PFN-to-LFN, corrected. Web site changes: - Status browser updated for hosted with cmsdoc + mod_perl. The transfer rate table now shows only one time span at a time. Fixed the quoting of magic characters (<, >, etc.) in database report. - Various tweaks to the plots to better lay out the legend. - The legend in transfer queue page is now corrected to include only the items actually included in the plot. - The PHP plotting scripts are now hosted on cmsdoc.cern.ch. - Instructions for hosting the web site. Schema changes: - t_authorisation.role_name and t_info_agent_status.agent extended. - Performance data (t_*_histogram) older than about two days is now automatically compacted to time seris in 1-hour bins; newer data remains in the current 5-minute bins. * Changes in PhEDEx 2.3.7 General changes: - Template site configuration updated. - Download agent now defaults to 30 concurrent jobs. - Better handling of subprocesses on download agent exit. - File router and monitoring tweaked not to slow each other down. - Ftscp now handles timeout cancellation (= signal handling) and sleeps longer time as the download takes time. The final FTS transfer status is now dumped to the output, i.e. PhEDEx logs. Ftscp ignores up to five glite-* server side errors to avoid death by intermittent glite server side database connection pooling problems. Web site change: - Rate page now allows source and/or destination to be filtered. - CERN network plots removed from the rate page. - Minor HTML improvements here and there. - Plots no longer have a black frame. Schema changes: - Nodes now have a new column 'kind' for administrative purposes. * Changes in PhEDEx 2.3.6 General changes: - Various improvements to ftscp. - Preliminary SRM-only agents for stage-in and tape migration. - Job manager changes for Castor stager agent: subprocess output receives "log" prefixing now only if explicitly requested. - Derek's catalogue test tool, slightly adapted. Web site change: - PHP scripts allowed more memory so link stats can be plotted. Schema changes: none. * Changes in PhEDEx 2.3.5 General changes: - Correction in handling -ignore/-accept options. If a wildcard was used, not all matching nodes were filtered. - Download agent applies the -ignore/-accept option also when marking files wanted. This makes no functional difference, but more correct. Schema changes: none. * Changes in PhEDEx 2.3.4 General changes: - Download agent improvements: - Log files are now prefixed by agent "boot time" so logs from different runs of the agent remain separate. - Globus backend got same logging improvements as SRM previously. - Block management agents: - Suspended blocks are now reactivated correctly if the suspend time (time_suspend_until) is reset to null, as opposed to just expiring. - Minor improvements to the status web browser. - The browser plots are now by destination by default, by link was too much information for default overall situation. By link display is still recommended when filter for a site as it shows in one view all incoming and outgoing transfers. - Plot legend font size shrinks if many sites are included. - Remove rare warnings in daily stats generation. Schema changes: - INT2R TNS entries adjusted. - Further review of indexes. - t_xfer_replica authorisations corrected. - Certain tables are now partitioned by node. The partitions are automatically added and removed by triggers on insert to or delete from t_node. * Changes in PhEDEx 2.3.3 General changes: - Download agent improvements: - The agent now uses from_node to determine to_pfn so the trivial file catalogue can offer different destination PFNs based on the source site using "destination-match". This now reflects export side better: both download source and destination use a compatible convention. - Error logging has changed. The log file for a failed transfer is now mentioned in the file transfer failure alert, not in a separate message. The -validate etc. utility commands log the output directly to the main agent log, not separate log files. The agent now removes logs older than three days when the agent is idle (was 15 minutes of idle time). The log for the transfer command now includes the full command line and the exit code. The log file prefixing scheme should no longer lose newlines. - Subprocess queue is no longer overfilled. This should reduce the number of instances of time_transfer > timeout setting; the still runs only for the timeout period, the rest is queue wait. - Stopping the agent now correctly runs transfer validation. - File routing improvements: - The agent now routes only for nodes which have recently ran a download agent. This prevents creation of routes for sites which are down, and therefore excessive transfer expiration. - File routing requests are now removed if the underlying dataset subscription has been suspended or removed. - Idle transfer requests with no file request are now removed. This prevents unnecessary transfers when routing changes, and automatically removes transfers when subscription is suspended. - If a file has already reached the destination, then removed, routes are now recreated automatically. (See schema changes.) - Prevent race conditions with new file injection and routing. - Minor improvements to the status web browser. - Faster loading transfer rate page. - Transfer rate sorts by more values for better ordering. - Plot of pending transfer queue is now correct. Schema changes: - Additional trigger to reinsert transfer requests for removed replicas. - Corrected time in trigger to insert transfer requests for new files. - File routing tables enforce access only at CERN. * Changes in PhEDEx 2.3.2 General changes: - Update of "ftscp". - Download agent errors with -ignore/-accept options fixed. - Master now supports "kill" option, and can "renice" agents. - Database deadlocks with download/export agents should now be solved. - Improvements and minor fixes in the file routing algorithms. - Many performance optimisations and streamlining in various agents. Anything involving node filters (-nodes, -ignore, -accept) was reimplemented to be more database performance friendly. - Minor updates to the web pages and plots. - Some new tests have been included. Schema changes: - Database index review. Many indexes have been added and removed. - t_dps_block_dest has a new 'state' column. - New trigger tr_xfer_file_insert, which inserts files for active block destinations automatically into t_xfer_request. - t_info_* tables now have constraints.