# mongoosastic-ts Mongoosastic-ts is a [mongoose](http://mongoosejs.com/) plugin that can automatically index your models into [elasticsearch](https://www.elastic.co/). - [Installation](#installation) - [Setup](#setup) - [Indexing](#indexing) - [Saving a document](#saving-a-document) - [Removing a document](#removing-a-document) - [Indexing nested models](#indexing-nested-models) - [Indexing mongoose references](#indexing-mongoose-references) - [Indexing an existing collection](#indexing-an-existing-collection) - [Bulk indexing](#bulk-indexing) - [Filtered indexing](#filtered-indexing) - [Indexing on demand](#indexing-on-demand) - [Unindexing on demand](#unindexing-on-demand) - [Truncating an index](#truncating-an-index) - [Restrictions](#restrictions) - [Auto indexing](#auto-indexing) - [Search immediately after es-indexed event](#search-immediately-after-es-indexed-event) - [Mapping](#mapping) - [Geo mapping](#geo-mapping) - [Indexing a geo point](#indexing-a-geo-point) - [Indexing a geo shape](#indexing-a-geo-shape) - [Creating mappings on-demand](#creating-mappings-on-demand) - [Queries](#queries) - [Hydration](#hydration) ## Installation The latest version of this package will be as close as possible to the latest `elasticsearch` and `mongoose` packages. ```bash npm install -S mongoosastic-ts ``` ## Setup ### Model.plugin(mongoosastic, options) Options are: - `index` - the index in Elasticsearch to use. Defaults to the pluralization of the model name. - `type` - the type this model represents in Elasticsearch. Defaults to the model name. - `esClient` - an existing Elasticsearch `Client` instance. - `hosts` - an array hosts Elasticsearch is running on. - `host` - the host Elasticsearch is running on - `port` - the port Elasticsearch is running on - `auth` - the authentication needed to reach Elasticsearch server. In the standard format of 'username:password' - `protocol` - the protocol the Elasticsearch server uses. Defaults to http - `hydrate` - whether or not to lookup results in mongodb before - `hydrateOptions` - options to pass into hydrate function - `bulk` - size and delay options for bulk indexing - `filter` - the function used for filtered indexing - `transform` - the function used to transform serialized document before indexing - `populate` - an Array of Mongoose populate options objects - `indexAutomatically` - allows indexing after model save to be disabled for when you need finer control over when documents are indexed. Defaults to true - `customProperties` - an object detailing additional properties which will be merged onto the type's default mapping when `createMapping` is called. - `saveOnSynchronize` - triggers Mongoose save (and pre-save) method when synchronizing a collection/index. Defaults to true To have a model indexed into Elasticsearch simply add the plugin. ```typescript const mongoose = require('mongoose'), mongoosastic = require('mongoosastic-ts'), Schema = mongoose.Schema; const User = new Schema({ name: String, email: String, city: String, }); User.plugin(mongoosastic); ``` This will by default simply use the pluralization of the model name as the index while using the model name itself as the type. So if you create a new User object and save it, you can see it by navigating to http://localhost:9200/users/user/\_search (this assumes Elasticsearch is running locally on port 9200). The default behavior is all fields get indexed into Elasticsearch. This can be a little wasteful especially considering that the document is now just being duplicated between mongodb and Elasticsearch so you should consider opting to index only certain fields by specifying `es_indexed` on the fields you want to store: ```typescript const User = new Schema({ name: { type: String, es_indexed: true }, email: String, city: String, }); User.plugin(mongoosastic); ``` In this case only the name field will be indexed for searching. Now, by adding the plugin, the model will have a new method called `search` which can be used to make simple to complex searches. The `search` method accepts [standard Elasticsearch query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-queries.html) ```typescript const results = await User.search({ query_string: { query: "john" }); // results here ``` To connect to more than one host, you can use an array of hosts. ```typescript MyModel.plugin(mongoosastic, { hosts: ['localhost:9200', 'anotherhost:9200'], }); ``` Also, you can re-use an existing Elasticsearch `Client` instance ```typescript const esClient = new elasticsearch.Client({ host: 'localhost:9200' }); MyModel.plugin(mongoosastic, { esClient: esClient, }); ``` ## Indexing ### Saving a document The indexing takes place after saving in mongodb and is a deferred process. One can check the end of the indexation by catching the es-indexed event. ```typescript await doc.save(); /* Document indexation on going */ doc.on('es-indexed', function (err, res) { if (err) throw err; /* Document is indexed */ }); ``` ### Removing a document Removing a document, or unindexing, takes place when a document is removed by calling `.remove()` on a mongoose Document instance. One can check the end of the unindexing by catching the es-removed event. ```typescript await doc.remove(); /* Document unindexing in the background */ doc.on('es-removed', function (err, res) { if (err) throw err; /* Docuemnt is unindexed */ }); ``` Note that use of `Model.remove` does not involve mongoose documents as outlined in the [documentation](http://mongoosejs.com/docs/api.html#model_Model.remove). Therefore, the following will not unindex the document. ```typescript await MyModel.remove({ _id: doc.id }); /* doc remains in Elasticsearch cluster */ ``` ### Indexing Nested Models In order to index nested models you can refer following example. ```typescript const Comment = new Schema({ title: String, body: String, author: String, }); const User = new Schema({ name: { type: String, es_indexed: true }, email: String, city: String, comments: { type: [Comment], es_indexed: true }, }); User.plugin(mongoosastic); ``` ### Elasticsearch [Nested datatype](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) Since the default in Elasticsearch is to take arrays and flatten them into objects, it can make it hard to write queries where you need to maintain the relationships between objects in the array, per . The way to change this behavior is by changing the Elasticsearch type from `object` (the mongoosastic default) to `nested` ```typescript const Comment = new Schema({ title: String, body: String, author: String, }); const User = new Schema({ name: { type: String, es_indexed: true }, email: String, city: String, comments: { type: [Comment], es_indexed: true, es_type: 'nested', es_include_in_parent: true, }, }); User.plugin(mongoosastic); ``` ### Indexing Mongoose References In order to index mongoose references you can refer following example. ```typescript const Comment = new Schema({ title: String, body: String, author: String, }); const User = new Schema({ name: { type: String, es_indexed: true }, email: String, city: String, comments: { type: Schema.Types.ObjectId, ref: 'Comment', es_schema: Comment, es_indexed: true, es_select: 'title body', }, }); User.plugin(mongoosastic, { populate: [{ path: 'comments', select: 'title body' }], }); ``` In the schema you'll need to provide `es_schema` field - the referenced schema. By default every field of the referenced schema will be mapped. Use `es_select` field to pick just specific fields. `populate` is an array of options objects you normally pass to [Model.populate](http://mongoosejs.com/docs/api.html#model_Model.populate). ### Indexing An Existing Collection Already have a mongodb collection that you'd like to index using this plugin? No problem! Simply call the synchronize method on your model to open a mongoose stream and start indexing documents individually. // Todo example with async / await promise ```typescript const BookSchema = new Schema({ title: String, }); BookSchema.plugin(mongoosastic); const Book = mongoose.model('Book', BookSchema), stream = Book.synchronize(), count = 0; stream.on('data', function (err, doc) { count++; }); stream.on('close', function () { console.log('indexed ' + count + ' documents!'); }); stream.on('error', function (err) { console.log(err); }); ``` You can also synchronize a subset of documents based on a query! ```typescript const stream = Book.synchronize({ author: 'Arthur C. Clarke' }); ``` As well as specifying synchronization options ```typescript const stream = Book.synchronize({}, { saveOnSynchronize: true }); ``` Options are: - `saveOnSynchronize` - triggers Mongoose save (and pre-save) method when synchronizing a collection/index. Defaults to global `saveOnSynchronize` option ### Bulk Indexing You can also specify `bulk` options with mongoose which will utilize Elasticsearch's bulk indexing api. This will cause the `synchronize` function to use bulk indexing as well. Mongoosastic will wait 1 second (or specified delay) until it has 1000 docs (or specified size) and then perform bulk indexing. ```typescript BookSchema.plugin(mongoosastic, { bulk: { size: 10, // preferred number of docs to bulk index delay: 100, //milliseconds to wait for enough docs to meet size constraint }, }); ``` ### Filtered Indexing You can specify a filter function to index a model to Elasticsearch based on some specific conditions. Filtering function must return True for conditions that will ignore indexing to Elasticsearch. ```typescript const MovieSchema = new Schema({ title: { type: String }, genre: { type: String, enum: ['horror', 'action', 'adventure', 'other'] }, }); MovieSchema.plugin(mongoosastic, { filter: function (doc) { return doc.genre === 'action'; }, }); ``` Instances of Movie model having 'action' as their genre will not be indexed to Elasticsearch. ### Indexing On Demand You can do on-demand indexes using the `index` function ```typescript const dude = Dude.findOne({ name: 'Jeffrey Lebowski' }); dude.awesome = true; dude .index() .then((res) => { console.log("egads! I've been indexed!"); }) .catch((err) => { console.error('error in indexing'); }); ``` The index method takes 2 arguments: - `options` (optional) - {index, type} - the index and type to publish to. Defaults to the standard index and type that the model was setup with. - `callback` - callback function to be invoked when document has been indexed. Note that indexing a model does not mean it will be persisted to mongodb. Use save for that. ### Unindexing on demand You can remove a document from the Elasticsearch cluster by using the `unIndex` function. ```typescript doc.unIndex(function (err) { console.log("I've been removed from the cluster :("); }); ``` The unIndex method takes 2 arguments: - `options` (optional) - {index, type} - the index and type to publish to. Defaults to the standard index and type that the model was setup with. - `callback` - callback function to be invoked when model has been unindexed. ### Truncating an index The static method `esTruncate` will delete all documents from the associated index. This method combined with `synchronize()` can be useful in case of integration tests for example when each test case needs a cleaned up index in Elasticsearch. ```typescript GarbageModel.esTruncate(function(err) {... }); ``` ### Restrictions #### Auto indexing Mongoosastic-ts try to auto index documents in favor of mongoose's [middleware](http://mongoosejs.com/docs/middleware.html) feature. Mongoosastic-ts will auto index when `document.save`/`Model.findOneAndUpdate`/`Model.insertMany`/`document.remove`/`Model.findOneAndRemove`, but not include `Model.remove`/`Model.update`. And you should have `new: true` options when `findOneAndUpdate` so that mongoosastic-ts can get new values in post hook. #### Search immediately after es-indexed event > Elasticsearch by default refreshes each shard every 1s, so the document will be available to search 1s after indexing it. The event `es-indexed` means that elasticsearch received the index request, and if you want to search the document, please try after 1s. See [Document not found immediately after it is saved ](https://github.com/elastic/elasticsearch-js/issues/231) ## Mapping Schemas can be configured to have special options per field. These match with the existing [field mapping configurations](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html) defined by Elasticsearch with the only difference being they are all prefixed by "es\_". > Boost has been removed in Elasticsearch 8, the below example is for version < 8 So for example. If you wanted to index a book model and have the boost for title set to 2.0 (giving it greater priority when searching) you'd define it as follows: ```typescript const BookSchema = new Schema({ title: { type: String }, author: { type: String, es_null_value: 'Unknown Author' }, publicationDate: { type: Date, es_type: 'date' }, }); ``` This example uses a few other mapping fields... such as null_value and type (which overrides whatever value the schema type is, useful if you want stronger typing such as float). There are various mapping options that can be defined in Elasticsearch. Check out [https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html) for more information. Here are examples to the currently possible definitions in mongoosastic-ts: ```typescript const ExampleSchema = new Schema({ // String (core type) string: { type: String }, // Number (core type) number: { type: Number, es_type: 'integer' }, // Date (core type) date: { type: Date, es_type: 'date' }, // Array type array: { type: Array, es_type: 'string' }, // Object type object: { field1: { type: String }, field2: { type: String } }, // Nested type nested: [SubSchema], // Multi field type multi_field: { type: String, es_type: 'multi_field', es_fields: { multi_field: { type: 'string', index: 'analyzed' }, untouched: { type: 'string', index: 'not_analyzed' } } }, // Geo point type geo: { type: String, es_type: 'geo_point' }, // Geo point type with lat_lon fields geo_with_lat_lon: { geo_point: { type: String, es_type: 'geo_point', es_lat_lon: true }, lat: { type: Number }, lon: { type: Number } } geo_shape: { coordinates: [], type: { type: String }, geo_shape: { type: String, es_type: "geo_shape", es_tree: "quadtree", es_precision: "1km" } } // Special feature : specify a cast method to pre-process the field before indexing it someFieldToCast: { type: String, es_cast: function(value) { return value + ' something added'; } } }); // Used as nested schema above. const SubSchema = new Schema({ field1: { type: String }, field2: { type: String } }); ``` ### Geo mapping Prior to index any geo mapped data (or calling the synchronize), the mapping must be manualy created with the createMapping (see above). Notice that the name of the field containing the ES geo data must start by 'geo\_' to be recognize as such. #### Indexing a geo point ```typescript const geo = new GeoModel({ /* … */ geo_with_lat_lon: { lat: 1, lon: 2 }, /* … */ }); ``` #### Indexing a geo shape ```typescript const geo = new GeoModel({ … geo_shape:{ type:'envelope', coordinates : [[3, 4], [1, 2] /* Arrays of coord : [[lon,lat],[lon,lat]] */ } … }) ; ``` Mapping, indexing and searching example for geo shape can be found in test/geo-test.js For example, one can retrieve the list of document where the shape contain a specific point (or polygon...) ```typescript const geoQuery = { match_all: {}, }; const geoFilter = { geo_shape: { geo_shape: { shape: { type: 'point', coordinates: [3, 1], }, }, }, }; try { const res = await GeoModel.search(geoQuery, { filter: geoFilter }); // elastic search resulats are here } catch (err) { // error in search } ``` ### Creating Mappings On Demand Creating the mapping is a **one time operation** and **should be called manualy**. A BookSchema as an example: ```typescript const BookSchema = new Schema({ title: { type: String } , author: { type: String, es_null_value: "Unknown Author" } , publicationDate: { type: Date, es_type: 'date' } BookSchema.plugin(mongoosastic); const Book = mongoose.model('Book', BookSchema); Book.createMapping({ "analysis": { "analyzer": { "content": { "type": "custom", "tokenizer": "whitespace" } } } }).then((mapping) => { // do neat things here }); ``` This feature is still a work in progress. As of this writing you'll have to manage whether or not you need to create the mapping, mongoosastic-t will make no assumptions and simply attempt to create the mapping. If the mapping already exists, an Exception detailing such will be populated in the `err` argument. ## Queries The full query DSL of Elasticsearch is exposed through the search method. For example, if you wanted to find all people between ages 21 and 30: ```typescript try { const people = await Person.search({ range: { age: { from: 21, to: 30, }, }, }); // elasticsearch results here // all the people who fit the age group are here! } catch (e) { // error in searching in elasticsearch } ``` See the Elasticsearch [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) docs for more information. You can also specify query options like [sorts](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#search-request-sort) ```typescript Person.search( { /* ... */ }, { sort: 'age:asc' } ) .then((people) => { //sorted results }) .catch((err) => { // error }); ``` And also [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html): ```typescript Person.search( { /* ... */ }, { aggs: { names: { terms: { field: 'name', }, }, }, } ) .then((results) => { // results.aggregations holds the aggregations }) .catch((err) => { // err }); ``` Options for queries must adhere to the [typescript elasticsearch driver specs](https://www.elastic.co/guide/en/elasticsearch/client/typescript-api/current/api-reference.html#api-search). ### Raw queries A full ElasticSearch query object can be provided to mongoosastic-ts through `.esSearch()` method. It can be useful when paging results. The query to be provided wraps the query object provided to `.search()` method and accepts the same options: ```typescript const rawQuery = { from: 60, size: 20, query: /* query object as in .search() */ }; Model.esSearch(rawQuery, options, cb); ``` For example: ```typescript Person.esSearch({ from: 60, size: 20, query: { range: { age: { from: 21, to: 30, }, }, }, }) .then((res) => { // only the 61st to 80th ranked people who fit the age group are here! }) .catch((err) => { // error in search }); ``` ### Hydration By default objects returned from performing a search will be the objects as is in Elasticsearch. This is useful in cases where only what was indexed needs to be displayed (think a list of results) while the actual mongoose object contains the full data when viewing one of the results. However, if you want the results to be actual mongoose objects you can provide {hydrate:true} as the second argument to a search call. ```typescript User.search({ query_string: { query: 'john' } }, { hydrate: true }, function (err, results) { // results here }); ``` You can also pass in a `hydrateOptions` object with information on how to query for the mongoose object. ```typescript User.search( { query_string: { query: 'john' } }, { hydrate: true, hydrateOptions: { select: 'name age' }, }, function (err, results) { // results here } ); ``` Original ElasticSearch result data can be kept with `hydrateWithESResults` option. Documents are then enhanced with a `_esResult` property ```typescript User.search( { query_string: { query: 'john' } }, { hydrate: true, hydrateWithESResults: true, hydrateOptions: { select: 'name age' }, }, function (err, results) { // results here results.hits.hits.forEach(function (result) { console.log('score', result._id, result._esResult._score); }); } ); ``` By default the `_esResult._source` document is skipped. It can be added with the option `hydrateWithESResults: {source: false}`. Note using hydrate will be a degree slower as it will perform an Elasticsearch query and then do a query against mongodb for all the ids returned from the search result. You can also default this to always be the case by providing it as a plugin option (as well as setting default hydrate options): ```typescript const User = new Schema({ name: { type: String, es_indexed: true }, email: String, city: String, }); User.plugin(mongoosastic, { hydrate: true, hydrateOptions: { lean: true } }); ```