Google Search Appliance Connector Manager
Release Notes
This document contains the release notes for Google Search Appliance
Connector Manager. The following sections describe the release in
detail and provide information that supplements the main documentation.
See the Issues Tab on the Code Site for the current list of known
issues and work-arounds.
Web Site: http://code.google.com/p/google-enterprise-connector-manager/
Release 3.2.10, 24 March 2015
=============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in
the previous release. Most users do not need to upgrade. Connector
developers and customers manually deploying connectors with Tomcat 8
are encouraged to upgrade.
Issues Fixed Since 3.2.6
------------------------
18248756 - Increase the default traversal batch timeout from 30 minutes
to 2 hours.
Note: The default value of the traversal.time.limit property
is set in applicationContext.xml, but it may be overridden in
applicationContext.properties.
18969469 - NDC and MDC logging context strings were empty under Tomcat 8.
Note: When manually upgrading Connector Manager, this fix
requires the new META-INF/context.xml file from the
connector-manager.war file.
Version Compatibility
---------------------
This version of the Connector Manager requires Java 6 JRE or newer.
For connector developers, Subversion 1.8 and Java 8 are supported.
Release 3.2.6, 25 April 2014
==============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade.
Issues Fixed Since 3.2.4
------------------------
14303102 - The DatabaseConnectionPool utility class did not detect
dead connections to SQL Server. This class now requires a
JDBC driver that supports the isValid method of
java.sql.Connection, part of the JDBC 4.0 specification in
Java 6. The embedded H2 database has a compliant driver.
For Oracle or SQL Server, a compliant driver must be
configured. This class is used by the AD Groups connector
starting with version 3.2.6, and by the Lotus Notes
connector since version 2.8.4.
Version Compatibility
---------------------
This version of the Connector Manager requires Java 6 JRE or newer.
The embedded JUnit JAR file was upgraded from version 3 to version 4.8.2.
The embedded EasyMock JAR files were upgraded from version 3.0 to version
3.2.
Deprecated Features
-------------------
The following util package feature has been deprecated and is
targeted for removal in a future release.
* The IOExceptionHelper class. Replaced in Java 6 by the
IOException(String, Throwable) constructor.
Release 3.2.4, 16 January 2014
==============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade.
Issues Fixed Since 3.2.2
------------------------
11718150 - GetDocumentContent was returning the google:aclinheritfrom:
fragment property as metadata. This property is intended for
internal use, and should not have been provided to the GSA.
12176042 - EncryptedPropertyPlaceholderConfigurer should not try to
decrypt the empty string. This could prevent Tomcat startup
because the default H2 database password is empty.
12576455 - Add support for crawl-immediately and crawl-once feed record
properties. You can use these properties with a document filter,
such as the AddPropertyFilter.
Version Compatibility
---------------------
This version of the Connector Manager requires Java 6 JRE or newer.
Removed Features
-----------------
Support for the gsa.admin.requiresPrefix configuration property has been
removed. If the property is specified, it will be ignored.
Only GSA versions prior to v5.0.4 used this feature, and they were never
supported by the connectors.
Release 3.2.2, 24 October 2013
==============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
Fix issue 7766078: Reduce the default feed backlog configuration parameters.
The default feed.backlog.ceiling dropped from 10000 to 4000.
The default feed.backlog.floor remains 1000.
The default feed.backlog.interval dropped from 900 to 120
seconds.
Fix issue 7881428: Support more unregistered Office media types from SharePoint.
application/vnd.ms-excel.macroEnabled.12
application/vnd.ms-powerpoint.macroEnabled.12
application/vnd.ms-word.macroEnabled.12
Fix issue 9534168: Protect invalid Traversal Rate configuration. Negative
values would confuse the HostLoadManager's traversal rate
calculations.
Fix issue 10253530: H2 1.2 Database corruption. This release updates the
embedded H2 Database to version 1.3.173. For more
information, see http://www.h2database.com .
Fix issue 10373520: Roles were misinterpreted in DENY principals. This problem
was introduced by the fix to issue 10263958 in 3.2.0.
Fix issue 10404326: A NullPointerException would be thrown out of the
SkipDocumentFilter if the source document supplied
a null value for the property being checked.
Fix issue 10845203: Remove URL-encoding from document IDs for delete URLs
in the diffing package. The DeleteDocumentHandle class
was violating the contract for the DocumentHandle
interface, where the value returned by getDocumentId()
"must match the value returned by calling
getDocument().findProperty(PROPNAME_DOCID)."
Fix issue 11341445: A diffing connector could get into a loop deleting all
of the snapshot files on each pass, and therefore
feeding everything to the GSA for indexing on each
pass. At least the two most recent snapshot files are
now saved.
Version Compatibility
---------------------
This version of the Connector Manager requires Java 6 JRE or newer.
Diffing connectors that use non-URL safe docids and expect the
DeleteDocumentHandle class to URL encode the docids will have to modified
to use consistent URL-encoding in the connector or to use URL safe docids.
Deprecated Features
-------------------
The following SPI features have been deprecated and are targeted for
removal in a future release. These features were never fully implemented
or functional in any version of the Connector Manager or GSA.
* The SPI LocalDocumentStore, which would have been available via
the SPI ConnectorPersistentStore interface. The LocalDocumentStore
was never fully implemented and ConnectorPersistentStore has always
returned null from getLocalDocumentStore(). The following SPI constants
associated with the LocalDocumentStore have also been deprecated:
* SpiConstants.PERSISTABLE_ATTRIBUTES
* SpiConstants.PROPNAME_MANAGER_SHOULD_PERSIST
* SpiConstants.PROPNAME_CONNECTOR_INSTANCE
* SpiConstants.PROPNAME_CONNECTOR_TYPE
* SpiConstants.PROPNAME_PRIMARY_FOLDER
* SpiConstants.PROPNAME_TIMESTAMP
* SpiConstants.PROPNAME_MESSAGE
* SpiConstants.PROPNAME_SNAPSHOT
* SpiConstants.PROPNAME_CONTAINER
* SpiConstants.PROPNAME_PERSISTED_CUSTOMDATA_1
* SpiConstants.PROPNAME_PERSISTED_CUSTOMDATA_2
* SpiConstants.PROPNAME_FEEDID
The following diffing package feature has been deprecated and is
targeted for removal in a future release.
* SnapshotStore.getOldestSnapsotToKeep [the method name is misspelled]
Release 3.2.0, 13 August 2013
=============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
Fix issue 8161087: Lister/Retriever connectors may now supply metadata at
crawl-time, rather than feed-time, for GSA 7.2 and newer.
This avoids issue 6781122, wherein the internal metadata
extracted from documents is overwritten by the external
metadata when the document is sent again in a feed.
Fix issue 8256145: Passwords longer than 9 characters were incorrectly
encrypted when using the command-line EncryptPassword
tool. This typically manifested as failures logging
into content reposistories for the connectors.
Fix issue 9415184: MimeTypeDetector would inadvertently attempt to open
some of the supplied files when it was attempting to
determine the MIME type by filename extension only.
Since the supplied names were rarely local files,
the MimeTypeDetector would quickly become backlogged
on blocked file I/O. The use of the underlying third
party MimeUtil library was altered to ensure that no
attempt would be made to open the named file.
Fix issue 10263958: ACL Principals with a peeker role were not removed from
the feed. This would inadvertently grant full read
access to users and groups that had only peeker access.
This first appeared in Connector Manager version 3.0.0,
but only if used with GSA version 7.0 and newer. This
affected the SharePoint connector and any third-party
connectors that implemented the peeker role.
Version Compatibility
---------------------
This version of the Connector Manager requires Java 6 JRE or newer.
The use of a Java 5 runtime is no longer supported.
Deprecated Features
-------------------
The following SPI features have been deprecated and are targeted for
removal in a future release. These features were never fully implemented
or functional in any version of the Connector Manager or GSA.
* Roles in ACLs ('reader', 'writer', 'owner', 'peeker').
This includes the SpiConstants.RoleType enum and the associated Role
Properties prefixes SpiConstants.GROUP_ROLES_PROPNAME_PREFIX, and
SpiConstants.USER_ROLES_PROPNAME_PREFIX. Roles were never used, and
were stripped from ACL entries if supplied.
* SpiConstants.PROPNAME_CONTENTURL. This was never used.
* SpiConstants.PROPNAME_SECURITYTOKEN. This was never used.
Release 3.0.8, 08 May 2013
==========================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
Fix Issue 6513938: Allow the connector to specify a document's content encoding
with a google:contentencoding SPI property. This allows the
connector to return content that has already been encoded in
one of the supported content encodings (e.g. base64binary).
Fix issue 6734754: GSA cannot handle null Schedule. The GSA, while claiming
to support null or empty schedules, actually does not.
If a connector does not have a schedule, getConnectorStatus
servlet will supply a schedule with default load, retryDelay,
and interval, but disabled.
Fix issue 7040140: The GetDocumentContent servlet should provide Last-Modified
date in HttpResponse.
Fix issue 7928861: Avoid unordered snapshots in the snapshot files of Diffing
Connectors.
Fix Issue 8078850: Add a configuration option to AclPropertyFilter to specify
a domain for the users in Prinicipals.
Fix issue 8207127: Reduce the memory footprint of DocumentHandle
serialization for Diffing connectors.
Fix issue 8237465: Connector instantiation would fail with non-ASCII
characters in the advanced properties XML.
Fix Issue 8394155: SkipDocumentFilter and ModifyPropertyFilter would throw
NullPointerException if the target property has a null
string value.
Fix Issue 8461333: Add google:authmethod. Added an optional, single-valued
string SPI property, google:authmethod, to specify the
authentication method. Users can override the default
(httpbasic) by adding an AddPropertyFilter to set
google:authmethod in a connector's advanced configuration:
Fix issue 8592252: GetDocumentContent now optionally supplies a Content-Length
header in its servlet response. This adds a new SPI property,
google:contentlength, that the connector may supply,
which specifies the length of the document content, in bytes.
If the google:contentLength is supplied by the connector,
that value is used in the Content-Length HTTP header in
the servlet response. This feature only applies to
connectors with a Retriever implementation.
Release 3.0.4, 19 November 2012
===============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
Fix Issue 7341404: Diffing connectors ignored Retry Delay setting. Diffing
connectors will now schedule incremental traversals
according to the Retry Delay setting specified on the
connector configuration page.
Fix Issue 7364602: Inaccessible LDAP server kills Connector Manager.
Loading the connector instances asynchronously now moves
their creation out of the Connector Manager servlet StartUp
code path, so an individual connector instance timing out
will not cause the Connector Manager to fail start-up.
Fix Issue 7409537: Support more unregistered Microsoft Office 2007 media types:
"application/vnd.ms-excel.12",
"application/vnd.ms-powerpoint.12",
"application/vnd.ms-word.12".
Fix Issue 7554816: Support DENY with flattened ACLs in SharePoint connector.
This change adds a TraversalContext.supportsDenyAcls method
that is distinct from the supportsInheritedAcls method.
Previously DENY ACL support was implied if inherited ACLs
were supported. This change allows DENY ACLs to be sent
to GSA 7.x, even if feed.disable.inherited.acls property
is set.
Fix Issue 7584684: GsaFeedConnection needs to invalidate the cached DTD when
switching GSAs. If a running connector manager is
reregistered with a different version of the GSA, it
would continue to assume the capabilities of the original
GSA version, based on an inspection of the feed DTD.
Release 3.0.2, 17 October 2012
==============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
Fix Issue 7166051: Deny HTTP HEAD requests from legacy authorization, which
would always permit access to secure documents.
Fix Issues 7312517, 7343328: Multiple cases of NullPointerException
thrown from the diffing connector SnapshotStore.
Fix Issue 7343330: BasicChecksumGenerator.getDigest() throws NullPointerException.
Fix Issue 7369686: Diffing connectors ignore the configured schedule intervals.
Diffing connectors would continue to run, consuming
resources and traversing the repositories when outside
of a scheduled traversal interval.
Release 3.0.0, 13 September 2012
================================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
Fix Issue 4765123: Set a logging context in the diffing package. This provides
more informative logging from the various diffing connector
threads.
Fix Issue 6299907: Connector Manager restart resets the logging level.
Fix Issue 6441063: Set the feed.contenturl.prefix advanced configuration
property when Connector Manager is registered with the GSA.
Fix Issue 6449767: Initial diffing connector snapshot fails in Java 7.
Fix Issue 6861210: Diffing fileSystemMonitorsByName is not cleared when
DocumentSnapshotRepositoryMonitorManagerImpl is stopped.
Fix Issue 6867242: Diffing connector may leak DocumentSnapshotRepositoryMonitor
threads when toggling "Disable traversal" check box and the
connector may go into inconsistent state.
Fix Issue 6942176: CheckpointAndChangeQueue throws NullPointerException.
Fix Issue 6996468: Provide advanced configuration option to not use inherited
ACLs with GSA v7.0. This is provided as a workaround for
GSA issue 6969557, where inherited ACLs do not work correctly
with Distributed Crawl and Serve.
Release 2.8.10, 28 November 2012
================================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to review
the changes below to determine whether to upgrade.
Summary of Changes
------------------
Fix Issue 4765123: Set a logging context in the diffing package. This provides
more informative logging from the various diffing connector
threads.
Fix Issue 5599305: Retry Connector startup if instantiation fails. The
FileSystem Connector fails Connector bean instantiation if
the file share is off-line. The other connectors that use
the Diffing package (LDAP and Database) can suffer similar
failures. This fix allows a failed Connector instantiation
to be retried after a period, in hopes that any transient
errors may have been corrected.
Fix Issue 6299907: Connector Manager restart resets the logging level.
Fix Issue 6861210: Diffing fileSystemMonitorsByName is not cleared when
DocumentSnapshotRepositoryMonitorManagerImpl is stopped.
Fix Issue 6867242: Diffing connector may leak DocumentSnapshotRepositoryMonitor
threads when toggling "Disable traversal" check box and the
connector may go into inconsistent state.
Fix Issue 6942176: CheckpointAndChangeQueue throws NullPointerException.
Fix Issues 7312517, 7343328: Multiple cases of NullPointerException
thrown from the diffing connector SnapshotStore.
Fix Issue 7341404: Diffing connectors ignored Retry Delay setting. Diffing
connectors will now schedule incremental traversals
according to the Retry Delay setting specified on the
connector configuration page.
Fix Issue 7343330: BasicChecksumGenerator.getDigest() throws
NullPointerException.
Fix Issue 7364602: Inaccessible LDAP server kills Connector Manager.
Loading the connector instances asynchronously now moves
their creation out of the Connector Manager servlet StartUp
code path, so an individual connector instance timing out
will not cause the Connector Manager to fail start-up.
Fix Issue 7369686: Diffing connectors ignore the configured schedule intervals.
Diffing connectors would continue to run, consuming
resources and traversing the repositories when outside
of a scheduled traversal interval.
Fix Issue 7409537: Support more unregistered Microsoft Office 2007 media types:
"application/vnd.ms-excel.12",
"application/vnd.ms-powerpoint.12",
"application/vnd.ms-word.12".
Release 2.8.6, 5 May 2012
=========================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to review
the changes below to determine whether to upgrade.
Summary of Changes
------------------
* Fix Issue 6305209 - Text conversion fails on PDF files when skipping
the content. Handle PDF documents that are zero-length or too long more
gracefully. Rather than skip the document entirely, feed a stub
document with just the document's title, if available.
* Fix file system connector code site issue 32 - Initial snapshot fails
in Java 7 with error "two snapshots with the same number". Note that
Java 7 is not officially supported.
* Differentiate between no password and empty-string password in the
user authentication servlet.
* Remove the google:feedid property from records in the feeds.
Release 2.8.4, 23 February 2012
===============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of the previous release are encouraged to upgrade.
Summary of Changes
------------------
* Fix Issue 5973714: Exceptions thrown while prefetching the Authorization
Manager on connector startup would leave the connector instance in an
inconsistent state.
* Fix Issue 5723358 - Escape special characters in user and group names
returned in an Authentication response. User or group names that contained
certain characters that have special meaning in XML syntax would cause
failures reading the Authentication response.
* Fix Issues 5370948 and 5481676 - Better recovery from FeedExceptions.
If submitting a feed to the GSA fails for some reason, the Connector
Manager retrys the feed after 15 minutes. But first, the Connector
Manager would test the GSA to verify that it is accepting feeds.
Unfortunately, that test would actually kill a functioning GSA
feedergate, disabling feeds for a short period of time while it
restarts. Effectively, the feed problem recovery strategy would
kill feeds every 15 minutes. This problem only affects GSA version
6.12. This fix avoids the problem, using a slightly different strategy
to check for GSA feed availability.
* Address Issue 5382030 - If Flexible Authorization is misconfigured
to use connector authorization with a credential group which has no
authentication rules defined, the GSA sends a null Identity to the
Connector Manager during Authorization. This was handled poorly
by most Connectors. Although Issue 5382030 is actually a problem
with the Security Manager, the Connector Manager now considers a null
Identity to be an error, and returns an error status code to the
GSA.
* Adds rudimentary GData configuration for Connectors. The new
googleFeedHost property supplied to Connectors may be used to
access the GData interface on the GSA. This should be considered,
at best, a temporary solution. This change also removes the
googleWorkDir and googleConnectorWorkDir properties from the saved
properties files, to avoid problems when moving connector instances
to a different directory. The properties still appear in the Properties
objects in the SPI.
* Adds several improvements to the Document Filters.
The ModifyPropertyFilter adds support for modifying the CONTENT
property of text documents (determined according to MIME type).
A SkipDocumentFilter can force a document to be skipped (or not)
based upon the presence/abscence of a specific Property, or based
upon a match on one of the values of that property.
The JavaDoc documentation for the Document Filters has been
improved, including example configurations.
* Various improvements in diagnostic logging.
* Fix Issues 233, 5028655, 6019938 - Fix logic bug in diffing where
recovery-files' age comparison was broken. This could lead to the
connector resending the same files again after Tomcat was restarted.
* Fix Issue 232 - A small memory leak in ThreadPool would leak
QueryTraversers (and all the objects they held).
Version Compatibility
=====================
The diffing library has a change effecting diffing connectors (File System,
LDAP, and Database). The method for assigning file name extensions to
recovery files has changed. This change causes no issues migrating forward to
this release, but reverting to an earlier release after running 2.8.4 requires
diffing connectors to be reset.
Release 2.8.2, 10 October 2011
==============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of the previous release are encouraged to upgrade.
Summary of Changes
------------------
* Enable editing of Connector Advanced configuration XML from the
GSA Admin console.
* Adds support for Flexible Authorization for Connectors.
* Improves MIME type recognition for many Microsoft Office file formats
when using the third-party mime-util detector.
* Reduces the likelihood of search result authorization timeouts.
Release 2.8.0, 08 July 2011
===========================
Introduction
------------
This release has significant infrastructure changes, fixes several
problems, and adds several new utility classes to the connector SPI
for the benefit of connector developers.
Summary of Changes
------------------
* Issue 104 - Servlet to dynamically change logging levels. New servlets
getConnectorLogLevel, setConnectorLogLevel, getFeedLogLevel and
setFeedLogLevel allow the connector administrator to adjust the connector
and feed logging verbosity without shutting down the connector.
These are especially useful in conjunction with the existing
getConnectorLogs and getFeedLogs servlets.
* Issue 168 - Make Base64 encode/decode available to the connector developer.
Base64, Base64FilterInputStream, and Base64ChecksumGenerator are several
of the new utility classes made available to connector developers.
* Issue 199 - SPI enhancement to expose JDBC to the connector developer.
Enabled by the new ConnectorPersistentStore SPI interface. Connectors
that wish to be given access to the JDBC DataSource should implement the
ConnectorPersistentStoreAware interface in their Connector implementation.
Given the DataSource, the connector developer may also take advantage
of several new database access utility classes, such as JdbcDatabase,
DatabaseConnectionPool, and DatabaseResourceBundle.
Note that ConnectorPersistentStore.getLocalDocumentStore() is disabled
in this release.
* Fixed Issue 4062256 - Failure to delete snapshot files would throw
IllegalStateException. This affected the File System, LDAP, and
Database Connectors.
* Fix Issue 4524076 - Backward compatibility issue in diffing connectors
for recovery files. This affected the File System, LDAP, and
Database Connectors.
* Fix Issues 4581062, 4613042 - Add configurable diffing connector delay
interval after each scan: 'introduceDelayAfterEachScan'. This should
relieve some of the continuous file system scanning behaviour in the
File System Connector.
* The SPI AuthenticationManager and AuthenticationResponse classes have
been enhanced to allow the connector to return repository local groups
for a user.
* A new pagerank document property is now supported.
The SpiConstants.PROPNAME_PAGERANK property allows the connector to
recommend a pagerank (0-100) for the document if it matches queries.
For more information on pagerank see:
http://code.google.com/apis/searchappliance/documentation/610/feedsguide.html#defining_the_xml
* Fixed a minor problem that prevented connectors from running in the
JBoss Application Server. See the JBoss deployment wiki page:
http://code.google.com/p/google-enterprise-connector-manager/wiki/JBossCM
* Added support for these Microsoft Office 2007 and later media types:
- application/vnd.ms-outlook
- application/vnd.ms-excel.sheet.12
- application/vnd.ms-powerpoint.presentation.12
- application/vnd.ms-word.document.12
* Added support for Secure Socket Layer (SSL) feeds to the Google Search
Appliance. At the present time, SSL feeds must be manually configured.
For additional details, see the Advanced Configuration wiki page:
http://code.google.com/p/google-enterprise-connector-manager/wiki/AdvancedConfiguration
* New Document Filters utility package additions to the SPI for use by
connector developers, connector administrators, and systems integrators.
Document filters act to transform their source Document's Properties.
Document filters can add, remove, or modify a document's properties,
including the document content. Properties in which the filter has
no interest are passed through unmodified. A document filter might
even throw a SkippedDocumentException to prevent a document from being
fed to the Google Search Appliance.
Multiple document filters may be chained together, forming
a transformational document processing pipeline. Similar to a
Unix command pipeline, the filters are linked together, each using
the previous one as its source Document.
For more information see the Document Filters wiki page:
http://code.google.com/p/google-enterprise-connector-manager/wiki/DocumentFilters
* New additions to the SPI:
- ConnectorPersistentStore - Provides access to the LocalDatabase
- ConnectorPersistentStoreAware - Advertises that the Connector
wishes access to the LocalDatabase
- DatabaseResourceBundle - Vendor-specific SQL language translations
- LocalDatabase - Provides access to the configured JDBC DataSource and
DatabaseResourceBundles
- SpiConstants.PROPNAME_PAGERANK
For additional information, please refer to the JavaDoc at:
http://google-enterprise-connector-manager.googlecode.com/svn/docs/javadoc/2.8.0/index.html
* New utility package additions to the SPI for use by connector developers:
(available in package com.google.enterprise.connector.util)
- Base64 - Base64 encode/decode utility
- Base64DecoderException
- Base64FilterInputStream - InputStream filter that Base64 encodes
data read from its input
- ChecksumGenerator - Interface for checksum generators
- BasicChecksumGenerator - Generates MD2, MD5, SHA-1, SHA-256, SHA-384
and SHA-512 message digest checksums of data from an InputStream
- Base64ChecksumGenerator - Derived from BasicChecksumGenerator, but
returns Base64 encoded checksums
- Clock - a interface for getting the time; useful to replace for testing
- SystemClock - a Clock implementation using System.getCurrentTimeMillis()
- EofFilterInputStream - InputStream filter that avoids a read at
end-of-file problem with Apache Commons IO AutoCloseInputStream
- IOExceptionHelper - creates IOExceptions with a root cause on Java 5
- UniqueIdGenerator - Interface for producing unique IDs
- UuidGenerator - UniqueIdGenerator implementation based on UUID
- XmlParseUtil - utility methods for parsing XML data
- SAXParseErrorHandler
For additional information, please refer to the JavaDoc at:
http://google-enterprise-connector-manager.googlecode.com/svn/docs/javadoc/2.8.0/index.html
* New utility database package additions to the SPI for use by connector
developers:
(available in package com.google.enterprise.connector.util.database)
- JdbcDatabase - database info, utilities for creating and maintaining
database tables for connector instances.
- DatabaseConnectionPool - a pool of connections to the JDBC DataSource
- DatabasePropertyResourceBundle - DatabaseResourceBundles implemented
as properties files
- DatabaseResourceBundleManager - loads DatabaseResourceBundles
For additional information, please refer to the JavaDoc at:
http://google-enterprise-connector-manager.googlecode.com/svn/docs/javadoc/2.8.0/index.html
* New utility diffing package addition to the SPI provides a snapshot
diffing connector framework for use by connector developers:
(available in package com.google.enterprise.connector.util.diffing)
- Change, ChangeQueue, ChangeSource
- CheckpointAndChange, CheckpointAndChangeQueue
- DeleteDocumentHandle, DeleteDocumentHandleFactory
- DocumentHandle, DocumentHandleFactory
- DiffingConnector, DiffingConnectorTraversalManager
- DiffingConnectorCheckpoint, DiffingConnectorDocumentList
- DocIdUtil
- FilterReason
- GenericDocument
- DocumentSink, LoggingDocumentSink
- DocumentSnapshot, DocumentSnapshotFactory
- DocumentSnapshotRepositoryMonitor
- DocumentSnapshotRepositoryMonitorManager
- DocumentSnapshotRepositoryMonitorManagerImpl
- MonitorCheckpoint
- SnapshotRepository, SnapshotRepositoryRuntimeException
- SnapshotStore, SnapshotStoreException
- SnapshotReader, SnapshotReaderException
- SnapshotWriter, SnapshotWriterException
- TraversalContextManager
For additional information, please refer to the JavaDoc at:
http://google-enterprise-connector-manager.googlecode.com/svn/docs/javadoc/2.8.0/index.html
* New utility database and diffing testing packages additions to the SPI
provides test classes for use by connector developers:
(available in package com.google.enterprise.connector.util.database.testing)
- TestJdbcDatabase
- TestLocalDatabase
- TestResourceClassLoader
(available in package com.google.enterprise.connector.util.diffing.testing)
- FakeDocumentSnapshotRepositoryMonitorManager
- FakeTraversalContext
- TestDirectoryManager
* The Connector Manager now ships with several new third party JARs.
The connector developer may find these functionally useful, however
they should note that these are now distributed with the Connector
Manager and the connectors should take care not to replace them
with older or incompatible versions.
- commons-cli.jar v1.2 http://commons.apache.org/cli
- eproperties.jar v1.1.0 http://code.google.com/p/eproperties
- h2.jar v1.2.147 http://www.h2database.com
Release 2.6.10, 04 February 2011
================================
Introduction
------------
This is an internal release for the Connector Manager on-board the GSA,
not for general use.
Release 2.6.6, 7 December 2010
===============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. New additions have been made to the connector SPI.
Users of the previous release are encouraged to upgrade.
Summary of Changes
------------------
* Issue 217 - Schedule intervals that include midnight were treated as
empty and ignored.
* Issue 224 - Fixes a potential loss of information about exceptions in
the connector logs.
* Issue 225 - Fixes a series of problems with the ImportExport utility.
* Issue 227 - Use ' instead of ' when escaping single quotes,
for HTML compatibility. The use of ' could lead to errors when
configuring connector instances using Internet Explorer.
* Improved log messages when free memory is low, when the feeds are
paused due to a backlog on the GSA, and when constructing a new
connector instance throws an exception while starting a new traversal
batch.
* New additions to the SPI:
o SpiConstants.RESERVED_PROPNAME_PREFIX
o SpiConstants.PROPNAME_FOLDER
o SpiConstants.PROPNAME_LOCK
o UrlValidator class
o UrlValidatorException class
The UrlValidator class is in a new com.google.enterprise.connector.util
package. This package will be used for utility classes that are not
part of or related to the spi package, but which connector
implementers might find useful.
Release 2.6.4, 16 September 2010
================================
Introduction
------------
This is an internal release for the Connector Manager on-board the GSA,
not for general use.
Summary of Changes
------------------
* Servlet access to the Connector Manager on-board the GSA has been
largely locked down. The getConfiguration and getConnectorLogs
servlets remain accessible for the benefit of connector administrators
and support personnel.
Release 2.6.2, 09 September 2010
================================
Introduction
------------
This is an internal release for the Connector Manager on-board the GSA,
not for general use.
Release 2.6.0, 14 June 2010
============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of the previous release are encouraged to upgrade.
Summary of Changes
------------------
* Issue 189 - Fixes a problem with the HostLoadManager that would
impose artificial traversal delays. The delays, although short,
would occur frequently. This fix allows the Connector Manager to
more accurately adhere to the configured documents per minute
traversal rate.
* Issue 194 - Enhances the HostLoadManager to more accurately adhere
to the configured documents per minute traversal rate; while allowing
the Connector to occasionally exceed that rate to improve efficiency.
* Issue 220 - Fixes the representation of multiple-valued metadata supplied
by the Connector as it is fed to the Google Search Appliance. This
corrects a problem with parametric navigation.
* Fixed a problem where restarting a traversal from the beginning
of a Connector's Repository or changing a Connector's traversal
schedule might not take immediate effect.
* Added a 'traversal.enabled' property that may be set in the Connector
Manager's applicationContext.properties is used to enable or disable
Traversals and Feeds for all Connector instances in the Connector
Manager. Disabling Traversal would be desirable if configuring a
Connector Manager deployment that only authorizes search results.
This feature is designed for turning off traversal for replica
Connector Managers in a clustered, load-balanced, or fail-over
environment. Traversals are enabled by default.
* Added an EncryptPassword command line utility that can be used by
an administrator to encrypt passwords that will be manually added to
Connector Manager or Connector properties files. For details, see
the EncryptPassword wiki page at:
http://code.google.com/p/google-enterprise-connector-manager/wiki/EncryptPassword
* Enhanced support for the Apache JULI LogManager and FileHandler.
* Corrected a minor issue with regards to the naming of log file
archives that are generated by the GetConnectorLogs servlet.
* Enhanced logging support for the testing environment by adding a
connector-manager/testdata/config/logging.properties file.
* Fixed the Connector Manager web application web.xml file to more
closely follow its DTD.
* Fixed a Daylight Saving Time bug in one of the tests. This issue
affected the test only, not the production Connector Manager.
Release 2.4.4, 05 February 2010
===============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of the previous release are encouraged to upgrade
immediately.
Summary of Changes
------------------
* Issue 209: Changed the default feed time zone from UTC to the
local time zone. Added a new configuration property, feed.timezone,
that may be set in applicationContext.properties if the local time
zone should not be used.
The 'feed.timezone' property defines the default time zone used
for Date metadata values for Documents. A null or empty string
indicates that the local time zone of the machine running the
Connector Manager should be used. Standard TimeZone identifiers
may be specified. For example:
feed.timezone=America/Los_Angeles
If a standard TimeZone identifier is unavailable, then a custom
TimeZone identifier can be constructed as +/-hours[minutes] offset
from GMT. For example:
feed.timezone=GMT+10 # GMT + 10 hours
feed.timezone=GMT+0630 # GMT + 6 hours, 30 minutes
feed.timezone=GMT-0800 # GMT - 8 hours, 0 minutes
This modification has compatibility implications when upgrading;
refer to the Version Compatibility section, below.
* Issue 211: Moved common Connector Manager configuration properties
from applicationContext.xml to applicationContext.properties.
This makes it much easier to upgrade to a newer version of the
Connector Manager, while preserving properties that had been
customized by the administrator.
This modification has compatibility implications when upgrading;
refer to the Version Compatibility section, below.
* Issue 212: Fixes "IOException: Attempted read on closed stream."
exceptions that might be generated if a Connector uses the
Apache Commons AutoCloseInputStream to provide document content
to the Connector Manager. Currently, only the SharePoint Connector
seems to have been affected by this problem.
Version Compatibility
---------------------
The time zone change has the most dramatic impact when upgrading.
Previous versions of Connector Manager would convert all date/time
values provided by the Connector to UTC dates when supplying the date
to the GSA. This had undesired consequences for users performing
date-range searches, where expected search results may have been
discarded as their adjusted date-stamps may have pushed the document
into the previous or next day. Consider this: a document modified
at 8:00PM PST today in California will have a calendar date of
tomorrow when adjusted to UTC time, so if you search for documents
modified today, it won't be found.
The Issue 209 change assumes date values supplied to the GSA are
local time, unless otherwise specified. This allows date-range
queries to function as expected when the Document Repository,
the Connector Manager, and the Search Appliance are in the same
time zone (or near enough time zones that 'normal working hours'
significantly overlap).
Upgrading an existing deployment to this new Connector Manager will
result in all newly indexed content feeding date values in the
local time zone, whereas all previously indexed content will have
UTC date values. The connector administrator may choose to handle
this inconsistency in one of several ways:
1) Do nothing. This may be desirable if date-range searches
for older materials need not be accurate to the day; or if
re-indexing all content is untenable; or if the Repository,
Connector Manager, Search Appliance, and search users are
widely dispersed across time zones.
2) Set the feed.timezone property in applicationContext.properties
to GMT. If your local time zone differs significantly from GMT,
then date-range searches will continue to be unreliable.
However, all dates in the index will be consistently inaccurate.
3) Re-index all content fed by Connectors. This is the preferred
solution if date-range searches are required to be accurate to
the day, or if re-indexing the content is not an onerous task.
The mechanics of re-indexing depends upon the GSA and Connector
Manager versions. Please consult the appropriate documentation
for details.
4) Set the feed.timezone property in applicationContext.properties
to a value other than the local time zone or GMT. This would
be appropriate if the Connector Manager and/or GSA were in
significantly different timezones than the Repository and/or
search users. The options to re-index or not still apply.
--
As a result of the modifications for Issue 211, the Connector Manager
applicationContext.xml and applicationContext.properties files have
changed significantly. Consequently, simply dropping the v2.4.4
JAR files into an existing installation will not function properly.
An in-place upgrade must include the applicationContext.xml file
as well.
If the connector administrator has made no modifications to
applicationContext.xml, then a drop-in update of just the
Connector Manager v2.4.4 JAR files and the applicationContext.xml
file over an existing v2.4.x installation should proceed uneventfully.
If using the GCI Connector Installer v2.4.4 to upgrade or following the
procedures as described in the UpdatePatchReleasewiki page (see below),
then installation of the applicationContext.xml file will be automatic.
However, if the connector administrator has made modifications to
the applicationContext.properties, those modifications must be
re-applied after installation. But because of the restructuring of
applicationContext.xml, it may be difficult to merge the differences.
Before upgrading, the administrator should make back-up copies of the
existing applicationContext.xml and applicationContext.properties files.
These may be used as reference when modifying the newer versions of
these files.
Once the backup files have been made, follow the instructions for applying
an update as described in the UpdatePatchRelease wiki page (see below).
EXCEPT - do not copy the old applicationContext.properties file over
the new one (step 5 in the instructions), and don't restart Tomcat yet
(step 6), or if using the GCI Connector Installer v2.4.4 to upgrade,
shut down Tomcat after the upgrade completes, before re-applying the
modifications.
Copy the few set properties from the old applicationContext.properties
to the new one. The old properties file contains only a half-dozen or so
properties and they are clearly documented in the new properties file.
Next it is time to reapply your old applicationContext.xml modifications.
Most of the properties an administrator would wish to change will now
be set in applicationContext.properties rather than the XML file.
For instance, rather than modifying the constructor-arg for the
TraversalTimeLimitSecondsDefault bean in the applicationContext.xml file,
the administrator should set the traversal.time.limit property in the
applicationContext.properties file. This process may be tedious, but
the v2.4.4 applicationContext.properties file is well commented, so
the appropriate modifications should be clear.
Finally, once the appropriate modifications have been made to the
new applicationContext.properties file, the Tomcat server may be
restarted as described in the UpdatePatchRelease wiki page. Once
the server is restarted, the administrator is encouraged to examine
the Connector Manager logs to check for any configuration errors.
For additional details, please refer to the Connector Manager wiki
page describing how to manually install an update or patch release:
http://code.google.com/p/google-enterprise-connector-manager/wiki/UpdatePatchRelease
Release 2.4.2, 11 January 2010
==============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of the previous release are encouraged to upgrade
immediately.
Summary of Changes
------------------
* Issue 3: Changed MockJcrRepository.login() to throw LoginException as
described by the Repository Interface rather than just return null Session.
The scope of this change is limited to the quality control unit tests.
* Issue 197: Added Microsoft Office 2007 OpenXML media types to the set of
supportedMimeTypes in applicationContext.xml.
* Issue 204: Handle XML CDATA end markers "]]>" embedded in the form
snippet returned by ConnectorType.getConfigForm() and
ConnectorType.getPopulatedConfigForm().
* Issue 206: Auto-Disabled connectors could not be re-enabled. This
problem would have been encountered if the connector administrator
entered a negative value for the Retry Delay in a connector's
configuration, and then tried to re-enable traversal after it had been
automatically disabled at the end of the traversal. This is the
preferred mechanism to "catch up", indexing documents that have been
added or changed since a "run-once" traversal finished.
* Issue 207: Fixes OutOfMemoryError when submitting feeds over a slow
feed connection or to busy Search Appliance. If the GSA is slow to
accept feeds, then too many feeds could be queued up waiting to be sent,
each with an attached 10MB buffer of data. Memory allocation would
grow with each new feed created until the heap was exhausted.
Release 2.4.0, 20 November 2009
===============================
Introduction
------------
This is an upgrade release with some enhancements. Users of previous
releases are encouraged to upgrade. It also contains some new features.
Users of previous releases should check the "Version Compatibility" section
below for instructions on how to use existing data with this new release.
This release focuses primarily on improving the performance and robustness
of the document traversal and feed process. Informal measurements show
feeding documents to the GSA 3 to 10 times faster than version 2.0.0.
Within the Connector Manager, these gains were achieved by:
- Reducing the number of feeds sent to the GSA by generating larger feeds
with more documents per feed.
- Sending compressed document content to newer GSAs that support compressed
content feeds (GSA v6.2).
- Traversing the repository in larger swaths.
- Reducing traversal work redundancy.
Performance analysis also identified some bottlenecks in several of the
individual Connectors that were addressed. (See each Connector's Release
Notes for details).
Summary of Changes
------------------
* Issue 106: Added support for feeding multiple documents to the GSA
in a single feed file. Previously, the Connector Manager would create
a new feed file for each Document. This was inefficient and could
result in slow feed performance. The Connector Manager will now
accumulate feed data into a single feed per connector traversal.
Once that feed exceeds a set size or when the traversal batch
completes, the feed is wrapped up and sent to the GSA. The default
maxFeedSize is 10MB, and is configurable in applicationContext.xml.
An early version of this feature was made available in the v2.0.2
release. This release provides the full implementation of the feature.
* Issue 111: A catch-all for small performance-related issues, including
a faster Base64 encoder, larger I/O buffer sizes, reduced data copying,
processing additional records from a returned traversal DocumentList,
even if it would exceed the supplied hint size and host load constraints.
Several other items were spun off into separate Issues.
* Issue 117: Fixes a problem where some Date meta-data fields are being
incorrectly formatted for non-English locales. The RFC 822 specification
explicitly states that month and day names are specified in English.
Previous releases would translate them to the current locale.
* Issue 124: Throttle feeds to GSAs that seem to be falling behind
in feed processing. GSA revisions 5.2.0.G28 and later allow the
Connector Manager to query the backlog of unprocessed feed files.
This feature is used to throttle back the document feed if the GSA has
fallen behind processing outstanding feed items. The Connector Manager
will periodically poll the GSA, asking for the count of unprocessed
feed items (the backlog count). If the backlog count exceeds the
a configured ceiling we pause the feed. We resume the feed once the
backlog count drops down below a floor value. The floor, ceiling, and
poll interval are configurable by editing the FeedBacklogFloor,
FeedBacklogCeiling, and FeedBacklogCheckIntervalSeconds bean definitions
in applicationContext.xml.
* Issue 141: Replaces many of the home-rolled multi-threading constructs
with newer java.util.concurrent technologies available in Java 1.5
* Issue 143: Adds an 'excluded' set to the mime type map. This allows
administrators to specify a set of document types that should be
excluded during traversals. Neither their content, nor their meta-data
should be fed to the GSA. Note that not all Connectors yet support
this feature.
* Issue 153: Adds support for compressing the document content data
in Content Feeds. This reduces the size of the feed file sent to
the GSA. Compressed Content Feeds are supported in GSA versions
6.2 and above. The Connector Manager automatically detects whether
the GSA feed host supports compressed feeds and provides either
compressed or uncompressed data accordingly.
* Issue 164: Enhanced the SimpleProperty class, adding a single-value
constructor. This should make it much easier for Connectors to
use this class.
* Issue 171: Moves the traversal schedule check into a synchronized
block, eliminating the risk of using a stale schedule.
* Issue 172: Corrects a problem when shutting down the Connector Manager
after a feed error is encountered.
* Issue 173: Properly format the HTTP feed requests packets sent to
the GSA. The HTTP protocol explicitly specifies the use of MS-DOS
style CR-LF line endings.
* Issue 174: Fixes a unit test failure for non-English locales.
* Issue 175: Fixes a NullPointerException that would occur if the
Connector Manager was not an authorized feed client of the GSA
it was attempting to feed.
* Issue 177: Fixes an IOException thrown on startup if a GSA
feed host is not defined.
* Issue 178: Cleans up handling of legacy Connector traversal
schedule strings exchanged between the GSA and the Connector Manager.
* Issue 182: Submit feeds to the GSA in a separate thread. In the
case where a traversal batch generates multiple feed files, a
full feed file is submitted to the GSA in separate thread, while
the traversal thread builds the next feed file. This overlaps
I/O, adding better concurrency. One thread is focused on I/O
between the Connector Manager and the GSA, the other thread
is focused on I/O between the Connector Manager and the document
Repository.
* Issue 187: Fix a problem that would add a redundant and unnecessary
log message once per second during connector traversals.
* Issue 188: Adds simple implementations of the SPI callback
interfaces, SimpleConnectorFactory and SimpleTraversalContext.
These make it easier for Connector Developers to create tests
that use these features.
* Issue 190: Fixes a regression in handling non-lowercase connector
names. Although fixed in release v1.3.0, this got broken again
in release v2.0.0. For details, see:
http://code.google.com/p/google-enterprise-connector-manager/wiki/LowerCaseConnectorNames
* Issue 193: Adds code to check for the presence of a "password" attribute in
the element sent within the Authorization query. If the
attribute is present it is now read and stored in the parsed
AuthenticationIdentity object.
* Issue 196: Connections between the Connector Manager and the feed reader
used when the Connector Manager pushes feeds are now explicitly being
closed.
Version Compatibility
---------------------
Connector authors should note the changes for Issue 143 ('excluded' mime
type) subtly changes the meaning of values returned by the method,
TraversalContext.mimeTypeSupportLevel(String). Previously, values
less than or equal to zero were considered 'unsupported'. With this
release, values equal to zero are considered 'unsupported', while
values less than zero are considered 'excluded'.
Connector authors and administrators should note the changes related to
Issue 111 change the behavior of the Connector Manager regarding DocumentList
objects that contain more Documents than was specified by the batchHint
(as provided to TraversalManager.setBatchHint(int)). Previously, the
Connector Manager would process no more than batchHint number of
Documents from the returned DocumentList. The current release will
continue to process the DocumentList until it is exhausted, or the
number of documents exceeds twice the batchHint, or the traversal
time limit is reached. This could result in the Connector Manager
processing up to twice as many documents from each batch. The current
host load management does not take this into consideration, so in
certain instances, traversal rates may exceed the configured host
load for brief periods. This load management issue will be corrected
in a subsequent release.
Known Issues
------------
Connector administrators should note the changes related to Issue 111
can result in traversal rates that may exceed the configured host load.
For most of the current Google-supplied Connectors this has little
impact. For instance the additional documents returned by the Livelink
and Documentum connectors are deleted documents, which pull no content
from the repository. However, the Sharepoint Connector might regularly
exceed the configured load. In this case, the administrator may wish
to reduce the configured load to compensate.
Release 2.0.4, 09 October 2009
===============================
Introduction
------------
This is a patch release that fixes a few small problems discovered in the
previous release. Users of previous releases are encouraged to upgrade if they
are pushing large documents or using the ImportExport utility.
Summary of Changes
------------------
* r2259: This change fixes an error in handling big documents. The code for
supplying the alternate content (title or space) for documents that exceed
the maximum document size was broken. The alternate content was the right
size, but was read into the wrong location of the I/O buffer.
* r2263: Fixed ImportExport utility on the branch. There was a series of
failures with the current ImportExport utility.
Release 2.0.2, 02 Sept 2009
============================
Introduction
------------
This is a patch release that addresses performance issues and
fixes several small problems discovered in the previous release.
Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
* Issue 162: Fixed a NullPointerException in logging XmlFormatter.
This fix now allows use of the XmlFormatter without requiring a
format to be specified.
* Fixed a NullPointerException in context logging on shutdown.
* Issue 106: Added support for feeding multiple documents to the GSA
in a single feed file. Previously, the Connector Manager would create
a new feed file for each Document. This was inefficient and could
result in slow feed performance. The Connector Manager will now
accumulate feed data into a single feed per connector traversal.
Once that feed exceeds a set size or when the traversal batch
completes, the feed is wrapped up and sent to the GSA. The default
maxFeedSize is 10MB, and is configurable in applicationContext.xml.
* Issue 180: Increased the default size of traversal batches from 100
documents to 500 documents. This provides improved efficiency in most
connectors. The size of traversal batches can now be configured
by setting the batchSize property of the HostLoadManager bean
in applicationContext.xml.
* Fixed a rounding error in the HostLoadManager documents-per-minute
computation.
* The Connector Manager now enforces the FileSizeLimitInfo.maxDocumentSize
configuration property. This property sets the maximum size of a Document's
content and is specified in applicationContext.xml. The default value is
30 megabytes. When constructing a document feed, if the number of bytes
read from the Document's content exceeds the maxDocumentSize, the
content will be discarded. The Document's meta-data will be supplied in
the feed, but its content will not.
It is still preferable for TraversalContextAware Connector implementations
to make use of the maxDocumentSize supplied in TraversalContext to avoid
feeding the content of large documents. If the Connector knows in advance
that a Document's content will exceed the maximum size, it can avoid the
Connector Manager pulling megabytes of data from the Repository, only to
have it discarded.
Release 2.0.0, 01 June 2009
============================
Introduction
------------
This is an upgrade release with some enhancements. Users of previous
releases are encouraged to upgrade. It also contains some new features.
Users of previous releases should check the "Version Compatibility" section
below for instructions on how to use existing data with this new release.
Summary of Changes
------------------
* Issues 79, 127, 134, 135: Connector Traversal Schedule improvements
that allow traversals to be disabled, paused, resumed, or to be empty.
* Issue 94: Enhanced documentation with regards to the meaning of
a null return value from DocumentList.checkpoint().
* Issue 107: Connector Instance name added to log messages makes it much
easier to troubleshoot problems with multiple Connector instances.
This has deployment configuration issues in manual deployments and
upgrading existing deployments. See the Compatibility section below.
* Issues 111, 119, 142: Infrastructure upgrades to Java 5 language features,
faster Base64 encoding in feeds, and deployment using Apache
Tomcat 6.0.18 and Spring Framework 2.5.6.
* Issue 114: Changes to connectorInstance.xml make it easier to add new
configuration parameters and to change the default values of existing
ones in the future, even if a customer has a customized configuration..
* Issue 122: The Connector's name is now passed to the connector via its
configuration properties so that connectors may know their own name.
* Issues 126, 145: Enhanced reliability and error handling in the
AuthenticationManager and AuthorizationManager.
* Issue 128: Invalid characters are now either removed or properly quoted
in the Feed XML attribute values.
* Issues 131, 148: "Domain" gets preserved when the Connector Manager creates
an AuthenticationIdentity for the Connector. This has compatibility
implications for connector writers. See Compatibility section below.
* Issues 133, 152: Fixed problems that would corrupt the Connector's
configuration or leave a partially constructed connector directory
on disk if an error occurred when creating a new Connector instance.
* Issues 146, 149: The Connector Manager servlet now has a simple main page
it may be useful for connectivity test, rather than returning a 404.
Version Compatibility
---------------------
Connector authors should note the Issues 131 and 148 changes to the
AuthenticationIdentity Interface, adding the Domain element and getDomain()
method. The AuthenticationIdentity supplied to the Connector's
AuthenticationManager and the AuthorizationManager. Older versions
of the Google Search Appliance do not supply the domain, so connectors often
required a configuration setting that specified the Windows domain used.
Connectors that implemented this work-around should continue to do so for the
indefinite future. However, if the connector received a domain from the GSA,
that domain should be used in deference to any locally configured domain.
For additional details, please see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=131
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=148
The Issue 107 changes provide enhanced logging capabilities that make it
easier to troubleshoot installations with multiple Connector instances.
This feature requires the configuration of a custom log formatter.
The Google Connectors Installer will automatically configure this properly.
However manual installations will require two small changes to enable this
feature:
- In $TOMCAT_HOME/webapps/connector-manager/WEB-INF/classes/logging.properties,
change the FileHandler.formatter to one of our custom formatters:
java.util.logging.FileHandler.formatter=com.google.enterprise.connector.logging.SimpleFormatter
or
java.util.logging.FileHandler.formatter=com.google.enterprise.connector.logging.XmlFormatter
- If $TOMCAT_HOME/webapps/connector-manager/WEB-INF/classes/logging.properties
specifies using the java.util.logging.FileHandler, then you must add the
new connector-logging.jar file to the system classpath that Tomcat uses at
startup. Tomcat ignores the CLASSPATH environment variable and builds a
custom classpath using the $TOMCAT_HOME/bin/setclasspath.sh or
$TOMCAT_HOME/bin/setclasspath.bat scripts. Modify these scripts, adding
connector-logging.jar to the CLASSPATH constructed: For instance:
CLASSPATH="$CLASSPATH":"$BASEDIR"/webapps/connector-manager/WEB-INF/lib/connector-logging.jar
Release 1.3.2, Apr 07, 2009
===========================
Introduction
------------
This is an upgrade release that addresses some problems discovered since
the 1.3.0 release. It also adds some enhanced security measures that
obfuscates configuration data communicated between the Google Search
Appliance and the Connector Manager, and prevents hijacking of a content feed.
These enhanced security measures impact Connector Manager administrators
and Connector Developers. For details see the Version Compatibility
section below. Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
* Fixed a Concurrency issue with the HostLoadManager.
* Fix Issue 137: Obfuscate configuration data transported between the
Google Search Appliance and the Connector Manager.
This has compatibility implications for connector
writers. See the Compatibility section below.
* Fix Issue 138: Prevent hijacking a Connector Manager Content Feed
using a "set once latch". This makes it difficult to
implement man-in-the middle attacks that observes
document content as it is being fed to the GSA.
Redirecting a content feed now requires manual intervention.
This has compatibility implications for Connector Manager
administrators. See the Compatibility section below.
* Fix Issue 139: A followup change to Issue 137 that corrects a problem
filtering forms that have embedded scripts, but no
sensitive data.
* Fix Issue 140: Fix a problem where the GsaFeedConnection did not
handle OutOfMemoryErrors correctly, resulting in a
loop feeding the same content indefinitely.
Version Compatibility
---------------------
The two security enhancements (Issues 137 and 138) have compatibility
issues for connector authors and administrators.
The Connector Manager now obfuscates the connector configuration form data
sent between the GSA and Connector Manager. This prevents sensitive
configuration settings (such as ECM account passwords) from being trasmitted
in the clear. However to accomplish this, the Connector Manager must parse
the configuration form returned by ConnectorType getPopulatedConfigForm(),
getConfigForm(), and validateConfig() methods. These forms must now be
well-formed XHTML fragments. For additional details, please see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=137
The Connector Manager now has a latch that prevents redirecting a content
feed without manual intervention by an administrator. Connector administrators
must now clear a flag in the Connector Manager properties before configuring
the Connector Manager to feed a different Search Appliance. For details see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=138
Release 1.3.1, Mar 16, 2009
===========================
Introduction
------------
This is an upgrade release that addresses some problems discovered since
the 1.3.0 release. Users of previous releases are encouraged to upgrade.
Summary of Changes
------------------
* Fix Issue 110: Multiple threading issues, including a race condition,
a deadlock condition, and failures when handling hanging
traversers. This should fix a large number of traversals
that stop and never resume. Some timeouts were lengthened
as well, so some long-running tasks should no longer be
considered "hung" and are likely to complete successfully.
* Fix Issue 123: Properties, including passwords, were inadvertently logged.
* Fix Issue 130: Wrong error messaged displayed if error status is returned
from SetConnectorConfig servlet.
Release 1.3.0, Jan 13, 2009
===========================
Introduction
------------
This is an upgrade release with some enhancements. Users of previous
releases are encouraged to upgrade. It also contains some new features. Users
of previous releases should check the "Version Compatibility" section below for
instructions on how to use existing data with this new release.
Summary of Changes
------------------
* Fix Issue 9: Miscellaneous minor issues in mock pusher
* Fix Issue 49: Connector instance properties are stored before they are validated.
* Fix Issue 52: ConnectorFactory interface in the Javadoc is not documented
* Fix Issue 57: AuthorizationResponse.equals should use the valid field
* Fix Issue 58: site: operator doesn't work with connector feeds
* Fix Issue 59: Add a canonical title property
* Fix Issue 62: TraversalContextAware interface is not used
* Fix Issue 72: Multiple problems with exception handling in QueryTraverser.runBatch
* Fix Issue 78: Fix config storage
* Fix Issue 80: Implement ConnectorFactory
* Fix Issue 81: Encrypt property with 'password' case insensitive
* Fix Issue 85: Property accesses are not documented clearly
* Fix Issue 100: Sending documents with a search url causes exception for smb paths.
* Fix Issue 102: Tomcat process may not exit on Linux
* Fix Issue 105: Allow retrieval of CM and/or Connector config by the GSA
* Fix Issue 108: Fix exception handling in DocPusher
* Fix Issue 109: Latest svn connector-manager builds (post 1.1) do not set googleworkdir
and googleconnectorworkdir in connector.properties file
* Fix Issue 112: Use PropertyPlaceholderConfigurer for feedLoggingLevel property
* Fix Issue 116: restartConnectorTraversal is broken
Version Compatibility
---------------------
Connector Names Should Not Contain Upper Case Alphabetics
---------------------------------------------------------
Newer releases of the Google Search Appliance require that
Connector names contain only lower case alphabetic characters.
Numeric digits, dashes, and underscores are still allowed,
with the previously documented limitations, however upper case
alphabetics should no longer be used. Problems using upper
case characters in Connector names first appeared in GSA
version 5.0.4.G22, showing only minor inconsistencies in
crawl diagnostics. In GSA version 5.2, search authorizations
fail.
Unfortunately, the new lower case connector name limitation
is not enforced when creating Connectors on the GSA
Connector Administration page.
Connector Manager version 1.3.0 makes small concessions to
this issue. For instance, new Connectors have their name
lower cased at the time of creation.
The Connector Manager does not, however, change the case
of existing Connectors or migrate existing Connectors to
the new lower case form. Doing so would lead to inconsistent
search results and make previously indexed content inaccessable.
If you have existing Connector instances with mixed case or
upper case names, you must changed them before upgrading to
GSA version 5.2, or existing content fed from that Connector
will become unsearchable.
Further details and instructions may be found at:
http://code.google.com/p/google-enterprise-connector-manager/wiki/LowerCaseConnectorNames
Connector Manager SPI Changes that Affect Connector Implementation
------------------------------------------------------------------
Connectors created using previous 1.0.x and 1.1.0 versions of this
product may not be compatible with this version. There have been
several small changes to the SPI, some additional functionality made
available for the Connectors, and some clarification of flow of control.
The following changes to the Connector Manager may have direct impact on
existing connectors:
ConnectorFactory for use by ConnectorType.validateConfig()
----------------------------------------------------------
The Connector Factory is provided to ConnectorType.validateConfig(),
which may use it to construct Connector instances for the purpose
of validation. The ConnectorFactory uses the same mechanism to
create the Connector instance that the ConnectorManager uses to
create the "Normal" running instances. However, the instances
created by the ConnectorFactory are considered transient - they
are not scheduled for traversal or used to authorize search
results.
For additional information see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=80
http://code.google.com/p/google-enterprise-connector-manager/source/browse/trunk/projects/connector-manager/source/java/com/google/enterprise/connector/spi/ConnectorFactory.java
For an example of ConnectorFactory use:
http://code.google.com/p/google-enterprise-connector-otex/source/browse/trunk/projects/otex-core/source/java/com/google/enterprise/connector/otex/LivelinkConnectorType.java
ConnectorType.validateConfig() May Return a Modified Configuration
------------------------------------------------------------------
ConnectorType.validateConfig() may now return a modified configuration
in the ConfigureResponse if desired. That modified configuration
will be saved and used to create the running connector instance.
For additional information see:
http://code.google.com/p/google-enterprise-connector-manager/source/browse/trunk/projects/connector-manager/source/java/com/google/enterprise/connector/spi/ConnectorType.java
http://code.google.com/p/google-enterprise-connector-manager/source/browse/trunk/projects/connector-manager/source/java/com/google/enterprise/connector/spi/ConfigureResponse.java
Exception Handling in TraversalManager, DocumentList, Document, Property, Value
-------------------------------------------------------------------------------
The handling of Exceptions thrown during document traversal and feeding
has been greatly improved. In the previous releases, exceptions thrown
during traversal would often result in loops or hangs, usually halting
traversal progress. Connectors should only ever throw RepositoryExceptions
out of these interfaces, however we now provide a new subclass of
RepositoryException, called RepositoryDocumentException, that is handled
differently. In short, throwing a RepositoryDocumentException will
force the Connector Manager to skip the document currently being processed,
proceeding to the next one. Throwing a RepositoryException will instruct
the Connector Manager to abandon the current batch of documents and
retry later. The Connector must also properly handle a call to
DocumentList.checkpoint() after an exception is thrown.
For more information, see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=72
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=108
http://code.google.com/p/google-enterprise-connector-manager/source/browse/trunk/projects/connector-manager/source/java/com/google/enterprise/connector/spi/DocumentList.java
http://code.google.com/p/google-enterprise-connector-manager/source/browse/trunk/projects/connector-manager/source/java/com/google/enterprise/connector/spi/Document.java
Returning Null DocumentList vs. Empty DocumentList from TraversalManager
------------------------------------------------------------------------
Previous versions of the Connector Manager handled a null return value
and an empty DocumentList [non-null, but zero items] returned from
TraversalManager.startTraversal() and TraversalManager.resumeTraversal()
identically. This version of the Connector Manager makes a subtle
differentiation between the two. A null return value is interpreted
as before: no new content is available for indexing, sleep for a few
minutes and try again. An returned empty DocumentList is interpreted
differently: although no suitable documents were found yet, the
Connector is performing a rather time-consuming search looking for
appropriate content. The Connector Manager will call checkpoint()
and reschedule the Connector for an immediate call to resumeTraversal().
This allows the Connector to time-slice or monitor a time-consuming
search for content without running afoul of the Connector Manager
time-out of work threads. Connectors that return an empty DocumentList,
when they should be returning null, will effectively run in a busy loop.
For more information, see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=72
For an example of a Connector that uses this model, see the
LivelinkTraversalManager.listNodes() method at:
http://code.google.com/p/google-enterprise-connector-otex/source/browse/trunk/projects/otex-core/source/java/com/google/enterprise/connector/otex/LivelinkTraversalManager.java
New "google:title" Property
---------------------------
The named link that the GSA presents in search results is usually
a title or headline that the GSA extracts from the document content.
At this time, the GSA does not make use of other metadata supplied
by the Connector to display this title, so if the feed has no content
or the GSA cannot extract a meaningful title from the supplied
content, it instead displays the URL to the document in the search
result. Unfortunately, the URLs of documents from Connector Feeds
are usually uninformative to the viewer.
The Connector Manager has created a new canonical metadata field,
"google:title", defined as SpiConstants.PROPNAME_TITLE. At this
point, the GSA makes no special use of this field. However, if
the Connector Manager receives a metadata and content feed with
no actual "google:content" field, it will create stub content
consisting of an html title fragment. This causes the current
GSA versions to display that title in the search results.
In the future the GSA may make more direct use of the google:title
field, so even if your Connector does provide content, it should
still present the document name/title/headline/subject as google:title.
For more information see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=59
TraversalContext and TraversalContextAware
------------------------------------------
The Connector Manager now provides a TraversalContext implementation
to Connectors so that they may better determine what types of
document content to provide during a traversal. Connectors may use
the information provided by the TraversalContext to limit content
provided for indexing, based upon document size or MIME type.
For instance, the Connector might use TraversalContext information to:
- Provide a Document with metadata and full content.
- Provide a Document with metadata but supply content in an
alternate format (such as HTML or PDF).
- Provide a Document with metadata and summarized content.
- Provide a Document with metadata but no content.
- Skip a Document entirely.
If a Connector's TraversalManager implementation adds the
com.google.enterprise.connector.spi.TraversalContextAware interface,
the Connector Manager will then call the setTraversalContext()
method, supplying a TraversalContext for the Connector to use,
before calling any methods in the TraversalManager interface.
If a TraversalContext is provided, the Connector's TraversalManager
may then use it to tailor its Document feed. For instance, the
TraversalContext could be used to determine whether or not to
supply a "google:content" property for a Document, based upon
the document size or MIME type. Note that the TraversalContext
interface has changed slightly from its previous (unimplemented)
version.
For additional information, see:
http://code.google.com/p/google-enterprise-connector-manager/wiki/TraversalContext
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=78
Connector Configuration Storage
-------------------------------
This version of the Connector Manager moves the stored Connector
schedule and traversal state (checkpoint) from the Java Preferences
to files stored in the Connector instance directory (found under
$TOMCAT_HOME/webapps/connector-manager/WEB-INF/connectors). This
is the same directory that the Connector's configuration properties
file and optional connectorInstance.xml file is stored.
The presence of these two additional files is unlikely to affect the
Connectors. The files are named $CONNECTOR_NAME_schedule.txt and
$CONNECTOR_NAME_state.txt, where $CONNECTOR_NAME is the name of the
Connector instance.
For more information, see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=78
Password Encryption with EncryptedPropertyPlaceholderConfigurer
---------------------------------------------------------------
All properties in the Connector's configuration properties file,
whose property key contains the substring "password" (case-insensitive
match) are now encrypted by default. In the past, only properties
with the key "Password" were encrypted. Connectors using the
EncryptedPropertyPlaceholderConfigurer are unlikely to notice the
change.
The names of future new configuration properties should be chosen
accordingly. For instance, this now allows a Connector to maintain
separate passwords for different repository services. However,
the Livelink Connector configuration now has an encrypted boolean
property, because it happens to contain the substring "password"
in its name.
For more information, see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=81
SMB Search URLs
---------------
Previous versions of the Connector Manager would reject
google:searchurl metadata that used the "smb:" scheme for
the URL. This has been fixed.
For additional information, see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=100
AuthorizationResponse.equals() and AuthorizationResponse.hashCode()
-------------------------------------------------------------------
The AuthorizationResponse.equals() and AuthorizationResponse.hashCode()
methods have been changed to include the AuthorizationResponse.valid
member in the computations. In previous versions of the Connector
Manager, only the AuthorizationResponse.docid member was used in
AuthorizationResponse.equals() and AuthorizationResponse.hashCode().
The change is subtle, but AuthorizationResponse instances
{ "1234", true } and { "1234", false } are now considered unequal,
where they would have been considered equal in the past.
For more information, see:
http://code.google.com/p/google-enterprise-connector-manager/issues/detail?id=57
Release 1.1.0, Jun 30, 2008
===========================
Introduction
------------
This is an upgrade release with some enhancements. Users of previous
releases are encouraged to upgrade. It also contains some new features. Users
of previous releases should check the "Version Compatibility" section below for
instructions on how to use existing data with this new release.
Summary of Changes
------------------
* Fix Issue 25: Can't recrawl an existing connector
* Fix Issue 35: Better handling of null QueryTraversalManager
* Fix Issue 41: Typo in ConnectorManagerGetServlet documentation
* Fix Issue 45: DocPusher uses a logger named WorkQueue
* Fix Issue 47: Deleting a connector may leave behind connector state
* Fix Issue 53: ServletUtil methods have unused StringBuffers
* Fix Issue 60: Connector instances can be used after being deleted
* Fix Issue 63: Editing a connector instance has no effect until Tomcat is restarted
* Fix Issue 66: TraversalWorkQueueItem.doWork calls connectorFinishedTraversal too often
* Fix Issue 67: Scheduler runs traversals faster than configured traversal rate
* Fix Issue 69: Remove keystore warning from logs as it is harmless and confuses everyone
* Fix Issue 70: Allow GSA to request log files from connector manager
* Fix Issue 71: InstanceInfo and InstanceMap are missing spaces in log messages
* Fix Issue 74: Log an error if GSA is rejecting feeds from connector(s)
* Fix Issue 75: HTTP 400 error thrown for correct feeds
* Fix Issue 76: Last Modified date that is sent to the GSA should be in
YYYY-MM-DD (ISO 8601) format
* Fix Issue 89: Try to avoid calling Connector classes from an interrupted thread
* Fix Issue 90: Enable the WorkQueueItem.timeout to be configurable
* Fix Issue 91: When Registering a running Connector Manager to the GSA, it
shuts down the TraversalScheduler and it is not restarted
* Added support for deleting documents. See r757 for details.
* Added new feed logger as an alternative to the teedFeedFile. See r783 for details.
* Added support to extracting Connector Manager log files from the GSA. See r832 for details.
* Changed default retry interval. See r781 for details.
* Changed SPI related to SimpleDocument. See r824 for details.
* Improved logging.
Version Compatibility
---------------------
Connectors created using previous 1.0.x versions of this product may not be
compatible with this version. In particular, if SimpleDocument is used within
the Connector there are SPI changes that need to be made. Also, the Connector
Manager will now check all Document implementations for the new optional
SpiConstants.PROPNAME_ACTION property and that needs to be handled gracefully.
Release 1.0.3, Dec 07, 2007
===========================
Introduction
------------
This is a maintenance release that improves quality, reliability, and
performance without adding any new functionality. All users of
previous 1.0.x releases should upgrade to this release.
Summary of Changes
------------------
* Fix Issue 64 where the googleWorkDir property has an extra file separator
* Fix Issue 65 added code to ProductionManager to check for
AuthenticationManager before using
* Fix Bug 922427 two copies of feed appearing in Teed Feed File
* Fix Issue 40 pass language parameter down to getPopulatedConfigForm()
* Fix Bug 950013 where Unicode characters did not display correctly (????)
when adding a new connector.
* Added argument to PrefsStore constructor to enable use of systemNode rather
than userNode as the root node.
Version Compatibility
---------------------
Connectors created using version 1.0.1 or 1.0.2 of this product may be used
with this version.
Release 1.0.2, Oct 03, 2007
===========================
Introduction
------------
This is the first full release of this product. See the product
website for a complete description.
Summary of Changes
------------------
* FIX Issue 50 Password encryption reencrypts strings multiple times
* Moved and updated README and RELEASE_NOTES files
Version Compatibility
---------------------
Connectors created using version 1.0.1 of this product may be used
with this version.
Release 1.0.1, Sep 25, 2007
===========================
Introduction
------------
This is an early access release for wide evaluation and usage. Your
feedback is important to us. Keep in mind that we are continuing to
work on the Connector Manager and things may change in the future.
Summary of Changes
------------------
* Two SPI changes
* Some code cleanup
* Changed build files to compile for 1.4
* FIX 749919 AdminConsole GUI: Connector error message
Version Compatibility
---------------------
Since the SPI has changed for this release, Connectors written to the
previous SPI available in Release 1.0.0 will have to be recompiled.
Release 1.0.0, Aug 16, 2007
===========================
Introduction
------------
This is an early access release for evaluation and usage by select
partners. Your feedback is important to us.