openapi: 3.0.3 info: title: Sketch Engine - API documentation description: "An **application programming interface** (API) is a set of rules and\ \ protocols that allow different software applications to communicate with each\ \ other.\n \n\nIn the context of Sketch Engine, the API provides\ \ a standardized way for developers to access and use Sketch Engine's language\ \ data and text analysis \n tools in their own software applications.\ \ It is useful for anyone who needs to work with text, from analyzing text data\ \ such as searching collocations, \n generating word lists\ \ and keywords, and building text corpora (databases of written language), etc.\ \ With the API, developers can integrate these features \n \ \ into their own applications and create custom text analysis tools.\n\n \ \ \n\nThis **API documentation** outlines the Sketch Engine endpoints\ \ used mainly for working with corpora, including their creation, compilation\ \ and various \n functions such as word sketches, concordances,\ \ etc. The documentation describes the **requests** and responses of API calls,\ \ with most responses provided in \n either **JSON** or **plain\ \ text** format.\n\n \n\nYou can try every endpoint by **authenticating**\ \ with your **API key**, clicking **Try it out** on the endpoint you want to use,\ \ filling in the requested parameters, \n and executing the\ \ query.\n\n \n\nIt is **recommended** to use your **Sketch\ \ Engine API key** for **authentication** when calling the endpoints, otherwise\ \ it does not have to work. \n The **key** can be retrieved\ \ from the Sketch Engine dashboard by following these steps: select **More options**\ \ (upper right corner), then click on **My Account**.\n\n \n\ \n**Last update:** `2nd April 2024`" version: 2.0.0 termsOfService: https://www.sketchengine.eu/terms-of-use/ contact: name: Support url: https://www.sketchengine.eu/contact-us/ externalDocs: description: former API documentation url: https://www.sketchengine.eu/documentation/api-documentation/ servers: <!--#if expr="$HTTP_X_FORWARDED_PROTO" -->- url: <!--#echo var="HTTP_X_FORWARDED_PROTO" -->://<!--#echo var="HTTP_HOST" --> <!--#else -->- url: http://<!--#echo var="HTTP_HOST" --><!--#endif --> tags: - name: Corpus Search description: A variety of tools to search and analyse words or texts in the corpus and generates their statistics. - name: Corpora description: Retrieves information about a corpus. Also creates, compiles and deletes a corpus. - name: Documents description: Uploads new documents and deletes documents from a corpus. Adds and edits document metadata. - name: Filesets description: Creates or deletes folders with documents (filesets) or shows their content. - name: Languages description: Retrieves the list of languages. - name: Somefiles description: Uploads or updates aligned multilingual files to build a parallel corpus. - name: Templates description: Corpus template management. - name: Users description: Retrieves information about user accounts. paths: /search/corp_info: get: operationId: getCorpInfo parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/002_usesubcorp' - $ref: '#/components/parameters/006_subcorpora' - $ref: '#/components/parameters/003_gramrels' - $ref: '#/components/parameters/004_corpcheck' - $ref: '#/components/parameters/005_registry' - $ref: '#/components/parameters/007_struct_attr_stats' - $ref: '#/components/parameters/008_format' tags: - Corpus Search summary: Statistics and information about the whole corpus. description: "-" responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/01_corp_info' /search/wordlist: get: operationId: getWordList parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/010_wlattr' - $ref: '#/components/parameters/002_usesubcorp' - $ref: '#/components/parameters/011_wlnums' - $ref: '#/components/parameters/072_wlmaxfreq' - $ref: '#/components/parameters/012_wlminfreq' - $ref: '#/components/parameters/014_wlpat' - $ref: '#/components/parameters/015_wlsort' - $ref: '#/components/parameters/019_wlblacklist' - $ref: '#/components/parameters/073_include_nonwords' - $ref: '#/components/parameters/091_relfreq' - $ref: '#/components/parameters/092_reldocf' - $ref: '#/components/parameters/018_wlfile' - $ref: '#/components/parameters/071_wlicase' - $ref: '#/components/parameters/013_wlmaxitems' - $ref: '#/components/parameters/093_wlpage' - $ref: '#/components/parameters/008_format' - $ref: '#/components/parameters/074_random' - $ref: '#/components/parameters/089_wltype' - $ref: '#/components/parameters/063_ngrams_n' - $ref: '#/components/parameters/087_ngrams_max_n' - $ref: '#/components/parameters/112_nest_ngrams' - $ref: '#/components/parameters/057_simple_n' - $ref: '#/components/parameters/088_usengrams' tags: - Corpus Search summary: A list of word frequencies from the specified corpus. description: This method can be used for generating frequency lists of all tokens, lemmas, word forms etc. or for retrieving frequencies of concrete items. Regex can be used for detailed criteria. responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/02_wordlist' /search/struct_wordlist: get: operationId: getStructWordList parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/010_wlattr' - $ref: '#/components/parameters/090_wlstruct_attr1' - $ref: '#/components/parameters/102_wlstruct_attr2' - $ref: '#/components/parameters/103_wlstruct_attr3' - $ref: '#/components/parameters/011_wlnums' - $ref: '#/components/parameters/072_wlmaxfreq' - $ref: '#/components/parameters/012_wlminfreq' - $ref: '#/components/parameters/013_wlmaxitems' - $ref: '#/components/parameters/014_wlpat' - $ref: '#/components/parameters/015_wlsort' - $ref: '#/components/parameters/019_wlblacklist' - $ref: '#/components/parameters/073_include_nonwords' - $ref: '#/components/parameters/091_relfreq' - $ref: '#/components/parameters/092_reldocf' - $ref: '#/components/parameters/071_wlicase' - $ref: '#/components/parameters/093_wlpage' - $ref: '#/components/parameters/008_format' - $ref: '#/components/parameters/074_random' - $ref: '#/components/parameters/089_wltype' tags: - Corpus Search summary: Provides a list of frequencies in the specified corpus. Offers more flexibility. description: The difference from the wordlist is that this enpoint allows to customize how the results are displayed. responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/12_struct_wordlist' /search/concordance: get: operationId: getConcordance parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/041_q' - $ref: '#/components/parameters/127_concordance_query_queryselector' - $ref: '#/components/parameters/128_concordance_query_iquery' - $ref: '#/components/parameters/129_concordance_query_cql' - $ref: '#/components/parameters/130_concordance_query_lemma' - $ref: '#/components/parameters/131_concordance_query_char' - $ref: '#/components/parameters/132_concordance_query_word' - $ref: '#/components/parameters/133_concordance_query_phrase' - $ref: '#/components/parameters/002_usesubcorp' - $ref: '#/components/parameters/025_lpos' - $ref: '#/components/parameters/077_default_attr' - $ref: '#/components/parameters/058_attrs' - $ref: '#/components/parameters/078_refs' - $ref: '#/components/parameters/079_attr_allpos' - $ref: '#/components/parameters/080_viewmode' - $ref: '#/components/parameters/081_cup_hl' - $ref: '#/components/parameters/082_structs' - $ref: '#/components/parameters/083_fromp' - $ref: '#/components/parameters/084_pagesize' - $ref: '#/components/parameters/085_kwicleftctx' - $ref: '#/components/parameters/086_kwicrightctx' - $ref: '#/components/parameters/134_errcorr_switch' - $ref: '#/components/parameters/135_cup_err_code' - $ref: '#/components/parameters/136_cup_err' - $ref: '#/components/parameters/137_cup_corr' - $ref: '#/components/parameters/040_json' - $ref: '#/components/parameters/039_asyn' - $ref: '#/components/parameters/008_format' tags: - Corpus Search summary: Concordance - shows the search word or phrase in context. description: "The concordance allows complex criteria for searching the corpus.\ \ The queries can combine any data, metadata and annotations found in the\ \ corpus.\n\n `To make basic concordance its enough to use just corpname and\ \ q parameters.`" responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/06_concordance' /search/fullref: get: operationId: getFullRef parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/111_pos' summary: Returns all metadata of one concordance line. description: Displays all available text types (metadata) related to the concrete KWIC (hit) defined by its position in the corpus. tags: - Corpus Search responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/14_fullref' /search/widectx: get: operationId: getWideCtx parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/111_pos' - $ref: '#/components/parameters/138_hitlen' - $ref: '#/components/parameters/082_structs' - $ref: '#/components/parameters/139_detail_left_ctx' - $ref: '#/components/parameters/140_detail_right_ctx' summary: Returns extended context the KWIC in a concrete concordance line. description: This is the equivalent of clicking KWIC in one concordance line which displays a popup with an extended context. tags: - Corpus Search responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/16_widectx' /search/freqml: get: operationId: getFreqMl tags: - Corpus Search summary: "Calculates frequencies of words, lemmas\u2026 in the concordance." description: The frequency of any [positional attribute] (https://www.sketchengine.eu/my_keywords/positional-attribute/) such as word forms, lemmas, tags can be counted with this method. Structure attributes (metadata/text types) can also be counted. parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/100_ml1attr' - $ref: '#/components/parameters/101_ml1ctx' - $ref: '#/components/parameters/141_ml2attr' - $ref: '#/components/parameters/142_ml2ctx' - $ref: '#/components/parameters/143_ml3attr' - $ref: '#/components/parameters/144_ml3ctx' - $ref: '#/components/parameters/145_ml4attr' - $ref: '#/components/parameters/146_ml4ctx' - $ref: '#/components/parameters/147_ml5attr' - $ref: '#/components/parameters/148_ml5ctx' - $ref: '#/components/parameters/149_ml6attr' - $ref: '#/components/parameters/150_ml6ctx' - $ref: '#/components/parameters/041_q' - $ref: '#/components/parameters/002_usesubcorp' - $ref: '#/components/parameters/044_fmaxitems' - $ref: '#/components/parameters/094_fpage' - $ref: '#/components/parameters/095_group' - $ref: '#/components/parameters/096_showpoc' - $ref: '#/components/parameters/097_showreltt' - $ref: '#/components/parameters/098_showrel' - $ref: '#/components/parameters/099_freqlevel' - $ref: '#/components/parameters/040_json' - $ref: '#/components/parameters/045_freq_sort' - $ref: '#/components/parameters/008_format' responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/11_freqml' /search/freq_distrib: get: operationId: getFregDistrib parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/118_res' - $ref: '#/components/parameters/025_lpos' - $ref: '#/components/parameters/077_default_attr' - $ref: '#/components/parameters/058_attrs' - $ref: '#/components/parameters/082_structs' - $ref: '#/components/parameters/078_refs' - $ref: '#/components/parameters/079_attr_allpos' - $ref: '#/components/parameters/080_viewmode' - $ref: '#/components/parameters/120_fc_lemword_window_type' - $ref: '#/components/parameters/121_fc_lemword_wsize' - $ref: '#/components/parameters/122_fc_lemword_type' - $ref: '#/components/parameters/125_fc_pos_window_type' - $ref: '#/components/parameters/123_fc_pos_wsize' - $ref: '#/components/parameters/124_fc_pos_type' - $ref: '#/components/parameters/040_json' - $ref: '#/components/parameters/119_normalize' - $ref: '#/components/parameters/008_format' summary: Provides the distribution of hits in the corpus description: "-" tags: - Corpus Search responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/13_freq_distrib' /search/freqdist: get: operationId: getFreqDist description: "-" parameters: - $ref: '#/components/parameters/166_corpname_freqdist' - $ref: '#/components/parameters/010_wlattr' - $ref: '#/components/parameters/162_diaattr' - $ref: '#/components/parameters/163_sse' - $ref: '#/components/parameters/164_threshold' - $ref: '#/components/parameters/161_ctx' - $ref: '#/components/parameters/167_wordlist' - $ref: '#/components/parameters/165_json_freqdist' tags: - Corpus Search summary: Utility for web interface only. Provides relative frequency data for wordlist graphs within a specific time period, only for trend corpora. responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/19_freqdist' /search/collx: get: operationId: getCollx description: "-" summary: Computes collocation candidates from a concordance. tags: - Corpus Search parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/041_q' - $ref: '#/components/parameters/002_usesubcorp' - $ref: '#/components/parameters/046_cattr' - $ref: '#/components/parameters/053_csortfn' - $ref: '#/components/parameters/052_cbgrfns' - $ref: '#/components/parameters/047_cfromw' - $ref: '#/components/parameters/048_ctow' - $ref: '#/components/parameters/049_cminfreq' - $ref: '#/components/parameters/050_cminbgr' - $ref: '#/components/parameters/051_cmaxitems' - $ref: '#/components/parameters/154_json_collx' responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/10_collx' /search/subcorp: get: operationId: getSubCorp parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/054_subcname' - $ref: '#/components/parameters/155_create' - $ref: '#/components/parameters/055_delete' - $ref: '#/components/parameters/156_q_subcorp' - $ref: '#/components/parameters/157_struct' - $ref: '#/components/parameters/160_json_subcorp' - $ref: '#/components/parameters/008_format' tags: - Corpus Search summary: Get a list of subcorpora in the corpus or create/delete a subcorpus. description: There is two option how to create subcorpora in Sketch Engine from `text types` => json parameter (corpus must be annotated for text types) or from `concordances` => q + struct parameters. responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/07_subcorp' /search/subcorpus_rename: get: operationId: subcorpusRename description: "-" parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/158_subcorp_id' - $ref: '#/components/parameters/159_new_subcorp_name' tags: - Corpus Search summary: Rename subcorpus. responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/17_subcorpus_rename' /search/subcorp_info: get: operationId: subcorpusInfo description: "-" parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/054_subcname' tags: - Corpus Search summary: Statistics about the subcorpus. responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/18_subcorp_info' /search/extract_keywords: get: operationId: getExtractKeywords parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/056_ref_corpname' - $ref: '#/components/parameters/002_usesubcorp' - $ref: '#/components/parameters/057_simple_n' - $ref: '#/components/parameters/018_wlfile' - $ref: '#/components/parameters/019_wlblacklist' - $ref: '#/components/parameters/115_attr' - $ref: '#/components/parameters/060_alnum' - $ref: '#/components/parameters/061_onealpha' - $ref: '#/components/parameters/151_minfreq_extract_keywords' - $ref: '#/components/parameters/152_maxfreq_extract_keywords' - $ref: '#/components/parameters/062_max_keywords' - $ref: '#/components/parameters/073_include_nonwords' - $ref: '#/components/parameters/104_icase' - $ref: '#/components/parameters/014_wlpat' - $ref: '#/components/parameters/153_addfreqs' - $ref: '#/components/parameters/092_reldocf' - $ref: '#/components/parameters/088_usengrams' - $ref: '#/components/parameters/063_ngrams_n' - $ref: '#/components/parameters/087_ngrams_max_n' - $ref: '#/components/parameters/008_format' tags: - Corpus Search summary: Identifies keywords, key n-grams, key collocations and terms. description: Keywords, key n-grams, key collocations and terms are identified by comparing the focus corpus (or a subcorpus) to a refrence corpus (or a subcorpus). It is the equivalent of using the Keywords and terms tool or using the key option in n-grams or the word sketch. responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/08_extract_keywords' /search/textypes_with_norms: get: operationId: getTextTypesWithNorms description: "-" parameters: - $ref: '#/components/parameters/001_corpname' summary: Returns a list of text types with values. tags: - Corpus Search responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/15_textypes_with_norms' /search/attr_vals: get: operationId: getAttrVals parameters: - $ref: '#/components/parameters/001_corpname' - $ref: '#/components/parameters/020_avattr' - $ref: '#/components/parameters/021_avpat' - $ref: '#/components/parameters/023_avfrom' - $ref: '#/components/parameters/022_avmaxitems' - $ref: '#/components/parameters/104_icase' - $ref: '#/components/parameters/008_format' tags: - Corpus Search summary: Utility for web interface only. A list of values for a given structure attribute (avattr). description: Not to be used outside the web interface. Replaced by a more powerful wordlist. responses: '200': description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/09_attr_vals' /ca/api/corpora: get: operationId: getCorpora description: "-" tags: - Corpora summary: Returns a list of all corpora accessible to you. responses: "200": description: '`OK`' content: application/json: schema: type: object properties: data: type: array items: $ref: '#/components/schemas/03_corpora_list' post: operationId: createCorpus tags: - Corpora summary: Creates a new user corpus. description: Creates a new empty corpus. Use **Documents** endpoints (and optionally **Filesets**) to add data to the corpus. requestBody: description: "Set the language, corpus name and corpus description. \n\n -\ \ `info` => The additional information for a newly created corpus. (string)\ \ \n\n - `language_id` => Language iso-code. `ISO 639-1`. (string) \n\n\ \ - `name` => Unique `corpus name` for a newly created corpus. (string)" content: application/json: schema: $ref: '#/components/schemas/01_corpora_request' required: true responses: "201": description: '`Created`' content: application/json: schema: type: object properties: data: $ref: '#/components/schemas/04_corpora_single' /ca/api/corpora/{corpusId}: parameters: - $ref: '#/components/parameters/01_corpus_id' get: operationId: getCorpus description: "-" tags: - Corpora summary: Retrieves a user corpus. responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/04_corpora_single' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' put: operationId: updateCorpus description: "-" tags: - Corpora summary: Updates a user corpus. requestBody: description: " - `expert_mode` => Set to **True** if you are hard-core. (boolean)\ \ \n\n - `name` => Corpus name. **Given by user**. (string) \n\n - `info`\ \ => Additional info about corpus. (string) \n\n - `document_order` =>\ \ Can be set to enforce document order within the corpus. (list of integers)\ \ \n\n - `structures` => Available structures or tags in the corpus. Structures\ \ like **s** (sentence), **g** (glue), **doc** (document).(list) \n\n \ \ - `name` => Structure name. Example: **s**. (string) \n\n - `attributes`\ \ => A list of used attributes in corpus. (list) \n\n - `name` => The\ \ name of used attribute. (string) \n\n - `file_structure` => The structure\ \ in which individual documents should be wrapped. Usually **doc**. (string)\ \ \n\n - `onion_structure` => The structure for deduplication. Usually\ \ **p** (paragraph) or **Null** (no deduplication). (string) \n\n - `docstructure`\ \ => Structure in which individual documents should be wrapped. Usually\ \ **doc**. (string) \n\n - `sketch_grammar_id` => Name of sketch grammar\ \ file. For sketch grammars querying. Sketch grammar is a series of rules\ \ written in the CQL query language that search for collocations in a text\ \ corpus and categorize them according to\_their grammatical relations.\ \ Example: **preloaded/english-penn_tt-3.3.wsdef.m4**. (string) \n\n -\ \ `term_grammar_ir` => Name of term grammar file. Term grammar tells Sketch\ \ Engine which words and phrases should indentify as terms. Example: **/corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4**.\ \ (string)" content: application/json: schema: $ref: '#/components/schemas/07_corpus_update' responses: "200": description: '`OK`' content: application/json: schema: type: object properties: data: $ref: '#/components/schemas/04_corpora_single' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' delete: operationId: deleteCorpus description: "-" tags: - Corpora summary: Deletes a user corpus. responses: "204": description: '`No Content`' content: {} "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' /ca/api/corpora/{corpusId}/can_be_compiled: parameters: - $ref: '#/components/parameters/01_corpus_id' post: operationId: checkCompilable description: "-" tags: - Corpora summary: Checks if the corpus fulfills all conditions to be compiled. (RPC) requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/05_can_be_compiled' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/45_not_found_RPC' /ca/api/corpora/{corpusId}/get_progress: parameters: - $ref: '#/components/parameters/01_corpus_id' post: operationId: getCompilationProgress description: "-" tags: - Corpora summary: Retrieves the current progress of the corpus compilation. (RPC) requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' required: true responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/06_get_progress' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/45_not_found_RPC' /ca/api/corpora/{corpusId}/compile: parameters: - $ref: '#/components/parameters/01_corpus_id' post: operationId: compileCorpus description: "-" tags: - Corpora summary: Performs the corpus compilation. (RPC) requestBody: description: ' `Structures` or `structure attributes` in corpus which should be compiled. Usually: `all`. (string) ' content: application/json: schema: $ref: '#/components/schemas/02_compile_request' required: true responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/69_rpc_style' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/29_bad_request_RPC_9' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/23_unauthorized_rpc' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/45_not_found_RPC' /ca/api/corpora/{corpusId}/logs/{logName}: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/05_logname' get: operationId: getCompilationLog tags: - Corpora summary: Show the compilation log file '.log' of the corpus. description: If **logName** == **last.log** it shows the latest version of the log file. responses: "200": description: '`OK`' content: text/plain; charset=utf-8: schema: description: The log file of the corpus. type: string "401": description: '`Unauthorized`' "403": description: '`Forbidden` (you need `read` permission)' "404": description: '`Not Found`' "405": description: '`Method Not Allowed`' /ca/api/corpora/{corpusId}/download: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/09_format' - $ref: '#/components/parameters/10_file_structure' - $ref: '#/components/parameters/11_aligned' get: operationId: getCorpusSource tags: - Corpora summary: Downloads the documents from which the corpus was created (the source files). description: 'Example call can be: **https://app.sketchengine.eu/ca/api/corpora/{corpusId}/download?format=vert&file_structure=doc**.' responses: "200": description: '`OK`' content: text/plain: schema: type: string "400": description: '`Bad Request` Examples: `ALIGNED_NOT_FOUND`, `ALIGNED_FORBIDDEN`, `INVALID_FORMAT`.' "401": description: '`Unauthorized`' "403": description: '`Forbidden`' "404": description: '`Not Found`' /ca/api/corpora/{corpusId}/cancel_job: parameters: - $ref: '#/components/parameters/01_corpus_id' post: operationId: cancelJob description: "-" tags: - Corpora summary: Cancels running tasks (e.g. compilation) relatated to the corpus. (RPC) requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/69_rpc_style' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/23_unauthorized_rpc' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' /ca/api/corpora/compile_aligned: post: operationId: compileAlignedCorpus description: "-" tags: - Corpora summary: Compiles a parallel corpus (consisting of two or more aligned corpora). (RPC) requestBody: description: "List of corpus IDs used in aligned compilation. \n\n - `corpus_ids`\ \ => A list of **Corpus ID** of multilingual corpora. (integer) \n\n -\ \ `structures` => Represent if **all** structures should be used during\ \ compilation (in that case it should be contain just **all**) or just some\ \ of them. (string) " content: application/json: schema: $ref: '#/components/schemas/03_corpus_ids' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/69_rpc_style' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/28_bad_request_RPC_8' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' /ca/api/corpora/align: post: operationId: segmentAlign tags: - Corpora summary: Creates segment representing the same line in two languages in a parallel corpus. (RPC) description: Run if documents inserted into the corpus are not aligned. requestBody: description: "- `alignstruct` => According to which structure the document\ \ should be aligned. Usually, **\\<s>**. (string) \n\n - `auto` => **True**,\ \ when documents are not compiled. Sketch Engine will align them automatically.\ \ (boolean) \n\n - `corpus_ids` => A list of **Corpus ID** of multilingual\ \ corpus. (integer) " content: application/json: schema: $ref: '#/components/schemas/04_align_req' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/69_rpc_style' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/17_bad_request_RPC_1' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/23_unauthorized_rpc' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' /ca/api/corpora/{corpusId}/documents: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/07_fileset_id_query' get: operationId: getAllDocuments description: "-" tags: - Documents summary: Retrieves a list of all documents in the corpus. responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/13_documents_get' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need `view` permission).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found` (HTML response).' post: operationId: createNewDocument description: "-" tags: - Documents summary: Uploads a new document. parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/07_fileset_id_query' - $ref: '#/components/parameters/18_wait_with_tagging' requestBody: description: File to upload. content: multipart/form-data: schema: properties: file: type: string description: File to upload. format: binary responses: "201": description: '`Created`' content: application/json: schema: $ref: '#/components/schemas/14_documents_post' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/30_bad_request_10' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need `upload` permission).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found` (HTML response).' put: operationId: updateDocumentMetadata description: "-" tags: - Documents summary: Edits the metadata of a document. requestBody: description: " - `id` => Unique numeric `document ID`. (integer) \n\n -\ \ `metadata` => Pairs of `attribute_name`:`value`." content: application/json: schema: $ref: '#/components/schemas/11_doc_metadata' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/13_documents_get' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/31_bad_request_11' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need `edit` permission).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found` (HTML response).' /ca/api/corpora/{corpusId}/documents/{documentId}: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/02_document_id' get: operationId: getDocument description: "-" tags: - Documents summary: Retrieves a specific document. responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/13_documents_get' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need `view` permission).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' put: operationId: updateDocument description: "-" tags: - Documents summary: Updates documents to corpus. requestBody: description: " - `filename_display` => Name of documents. (string) \n\n -\ \ `id` => Unique numeric **document ID** to identify individual documents.\ \ (integer) \n\n - `inProgress` => Represents whether the currently edited\ \ document is in use. (boolean) \n\n - `isArchive` => Represents if the\ \ updated document is in a format like .zip (created via some archive manager).\ \ (boolean) \n\n - `metadata` => Metadata of document. For example, additional\ \ attributes and values. \n\n - `parameters` => Parameters for plaintext\ \ extraction. \n\n - `encoding` => Encoding standard of the document.\ \ Usually, **UTF-8**. (string) \n\n - `justext_stoplist` => Represent\ \ the list of unimportant words, in a specified language, from an NLP point\ \ of view. (string) \n\n - `permutation` => Changing the order of columns\ \ (applies only to **type=vert**). (integer) \n\n - `tmx_lang` => TMX\ \ (translation memory exchange). Language of document used for parallel\ \ corpus creation. (string) \n\n - `tmx_struct` => Alignment structure\ \ to be used for multilingual documents, **align** is the most used structure.\ \ Used within segment distinction, which sentence is in which language and\ \ to put sentences with the same meaning into one segment. (string) \n\n\ \ - `tmx_untranslated` => Placeholder for empty segments in multilingual\ \ documents. The segments which have no counterpart in a second language\ \ of parallel corpus. (string) \n\n - `type` => File format (.csv, .doc,\ \ .docx, .htm, .html). (string) \n\n - `unlegalese` => Convert **all-caps**\ \ text to **normal case**. (boolean) \n\n - `temporary` => Is document\ \ temporary or not. (boolean) \n\n - `word_count` => Total number of **words**\ \ (tokens minus punctuation etc.) in document. (integer) \n\n - `vertical_progress`\ \ => Progress of **vertical file** creation. (integer) \n\n - `vertical_error`\ \ => An error occured while creating the vertical file. If the creation\ \ was succesfull the value is **Null**. (string)" content: application/json: schema: $ref: '#/components/schemas/09_doc_put_req' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/13_documents_get' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/31_bad_request_11' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need `edit` permission).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' delete: operationId: deleteDocuments tags: - Documents summary: Deletes one or more documents from the corpus. description: 'To delete more documents, separate document ids with commas. Example call: `https://app.sketchengine.eu/ca/api/corpora/{corpusId}/documents/{documentId_1},{documentId_2}`' responses: "204": description: '`No Content`' content: {} "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/31_bad_request_11' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need `delete` permission).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' /ca/api/corpora/{corpusId}/documents/{documentId}/preview: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/02_document_id' post: operationId: updateDocumentParameters tags: - Documents summary: Updates document parameters. (RPC) description: 'Updates parameters like: `File Type`, `Encoding`, etc.' requestBody: description: " - `auto_paragraphs` => Automatically insert paragraph breaks\ \ (**\\<p>**) in place of blank lines. (string) \n\n - `encoding` => Encoding\ \ standard of the document. Usually **UTF-8**. (string) \n\n - `justext_stoplist`\ \ => Represent the list of unimportant words, in a specified language, from\ \ an NLP point of view. (string) \n\n - `permutation` => Changing the order\ \ of columns (applies only to **type=vert**). \n\n - `tmx_lang` => TMX\ \ (translation memory exchange). Language of document used for parallel\ \ corpus creation. (string) \n\n - `tmx_struct` => Alignment structure\ \ to be used for multilingual documents, **align** is the most used structure.\ \ Used within segment distinction, which sentence is in which language and\ \ to put sentences with the same meaning into one segment. (string) \n\n\ \ - `tmx_untranslated` => Placeholder for empty segments in multilingual\ \ documents. The segments which have no counterpart in a second language\ \ of parallel corpus. (string) \n\n - `type` => File format (.csv, .doc,\ \ .docx, .htm, .html). (string) \n\n - `unlegalese` => Convert **all-caps**\ \ text to **normal case**. (boolean) " content: application/json: schema: $ref: '#/components/schemas/10_doc_preview' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/15_doc_preview' "403": description: '`Forbidden`' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' /ca/api/corpora/{corpusId}/documents/{documentId}/original: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/02_document_id' get: operationId: getDocumentOriginal description: "-" tags: - Documents summary: Downloads a corpus file in its original format (the format that was uploaded). This method cannot be simulated in this online documentation. responses: "200": description: '`OK`' content: application/octet-stream: schema: type: string format: binary description: The document was downloaded successfully in its original format. "401": description: '`Unauthorized`' "403": description: '`Forbidden` (you need `view` permission).' "404": description: '`Not Found` (Html response).' "405": description: '`Method Not Allowed`' /ca/api/corpora/{corpusId}/documents/{documentId}/plaintext: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/02_document_id' get: operationId: getDocumentPlaintext description: "-" tags: - Documents summary: Retrieves 1KB of data in plaintext format. Can load more than 1KB. responses: "206": description: '`Partial Content`' content: text/plain; charset=utf-8: schema: description: Document in plaintext format. type: string "401": description: '`Unauthorized`' "403": description: '`Forbidden` (you need `view` permission).' "404": description: '`Not Found` (Html response).' "405": description: '`Method Not Allowed`' "416": description: '`Range Not Satisfiable`' /ca/api/corpora/{corpusId}/documents/{documentId}/vertical: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/02_document_id' get: operationId: getDocumentVertical description: "-" tags: - Documents summary: Retrieves 1KB of data in vertical format. Can load more than 1KB. responses: "206": description: '`Partial Content`' content: text/plain; charset=utf-8: schema: description: Document in vertical format. type: string "401": description: '`Unauthorized`' "403": description: '`Forbidden`' "404": description: '`Not Found` (Html response).' "405": description: '`Method Not Allowed`' "416": description: '`Range Not Satisfiable`' /ca/api/corpora/{corpusId}/documents/{documentId}/expand_archive: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/02_document_id' post: operationId: expandArchive description: "-" tags: - Documents summary: Expands a ZIP file. (if the corpus files were uploaded as a zip archive). Expanding is not necessary for the corpus to work.(RPC) requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/16_rpc_expand_archive' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/33_bad_request_13' "403": description: '`Forbidden (you need edit permission)`' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' "404": description: '`Not Found` (Html response).' /ca/api/corpora/{corpusId}/documents/{documentId}/cancel_job: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/02_document_id' post: operationId: cancelDocumentJob tags: - Documents summary: Cancels running task which is in direct relation to the document. (RPC) description: An example task that can be canceled is `uploading file`. requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/69_rpc_style' "403": description: '`Forbidden` (you need `edit` permission).' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/45_not_found_RPC' /ca/api/corpora/{corpusId}/documents/{documentId}/get_progress: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/02_document_id' post: operationId: getProgress tags: - Documents summary: Shows the actual progress of the currently running task related to documents. (RPC) description: Used in tasks like `uploading` new files which will be used to create or expand the corpora. requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/06_get_progress' "403": description: '`Forbidden` (you need permission to `view` to the specified corpus).' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/45_not_found_RPC' /ca/api/corpora/{corpusId}/filesets: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/19_compile_when_finished' get: operationId: getFileSets description: "-" tags: - Filesets summary: List of "subdirectories", i.e. unzipped archives or WebBootCaT runs for a given corpus. responses: "200": description: '`OK`' content: application/json: schema: type: object properties: data: type: array items: $ref: '#/components/schemas/58_fileset' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need permission to `upload` to the specified corpus).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' post: operationId: createFileSet tags: - Filesets summary: Creates new fileset by the web-crawler. description: Used within creating or expanding corpus. requestBody: description: "Setting parameters to improve web-crawler accuracy. \n\n -\ \ `bl_max_total_kw` => **Blacklist max total keyword**. Means that web page\ \ or document will be discarded if it contains more words from the denylist\ \ (blacklist) than this limit. (integer) \n\n - `bl_max_unique_kw` => **Blacklist\ \ max unique keyword**. Means that web page or document will be discarded\ \ if it contains more unique words from the denylist (blacklist) than this\ \ limit. (integer) \n\n - `black_list` => A list (separated by whitespaces)\ \ of **blocked words**, words you don't want to see in your future corpus.\ \ (string) \n\n - `input_type` => Input types the web-crawler will works\ \ with. Example: **urls**. (string) \n\n - `max_cleaned_file_size` => Web\ \ pages and documents with a size **over** this limit (**in kB**) will be\ \ ignored. (integer) \n\n - `max_file_size` => Web pages and documents\ \ with a size **over** this limit (**in kB**) will be ignored. (integer)\ \ \n\n - `min_cleaned_file_size` => Web pages and documents **smaller**\ \ than this limit (**in kB**) after cleaning will be ignored. Cleaning involves\ \ conversion to plain text, removing boilerplate text (e.g. navigation menus,\ \ legal text, disclaimers and other repetitive content). (integer) \n\n\ \ - `min_file_size` => Web pages and documents with a **size below** this\ \ limit (**in kB**) will be ignored. (integer) \n\n - `name` => Texts will\ \ be organized into a corpus folder with this name. (string) \n\n - `seed_word`\ \ => A list of words according to which the URLs were chosen to be searched.\ \ (string) \n\n - `white_list` => A list (separated by whitespaces) of\ \ allowed words, words you want to see in your future corpus. (list of string)\ \ \n\n - `wl_min_kw_ratio` => **Whitelist minimal keywords ratio**. Means\ \ that web page or document will be included only if the percentage of allowlist\ \ words compared to total words is higher than this limit. (integer) \n\n\ \ - `wl_min_total_kw` => **Whitelist minimal total keywords**. Means that\ \ web page or document will be included only if it contains more words from\ \ the allowlist (whitelist) than this limit. (integer) \n\n - `wl_min_unique_kw`\ \ => **Whitelist minimal unique keywords**. Means that a web page or document\ \ will be included only if it contains more words from the allowlist (whitelist)\ \ than this limit. (integer)" content: application/json: schema: $ref: '#/components/schemas/13_filesets_creation' required: true responses: "201": description: '`Created`' content: application/json: schema: type: object properties: data: $ref: '#/components/schemas/61_fileset_creation' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/34_bad_request_14' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need permission to `upload` to the specified corpus).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' /ca/api/corpora/{corpusId}/filesets/{filesetId}: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/06_fileset_id' get: operationId: getFileSet description: "-" tags: - Filesets summary: Returns information about a specific "subdirectory". responses: "200": description: '`OK`' content: application/json: schema: type: object properties: data: $ref: '#/components/schemas/58_fileset' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need permission to `view`).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' delete: operationId: deleteFileSet description: "-" tags: - Filesets summary: Deletes subdirectory containing document (for creating corpus). responses: "204": description: '`No Content`' content: {} "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/32_bad_request_12' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need permission to `delete`).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' /ca/api/corpora/{corpusId}/filesets/{filesetId}/cancel_job: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/06_fileset_id' post: operationId: cancelFileSetJob tags: - Filesets summary: Cancel running task. (RPC) description: 'Example: cancel downloading data from websites by web-crawler.' requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/69_rpc_style' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/23_unauthorized_rpc' "403": description: '`Forbidden` (you need `upload` permission).' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/45_not_found_RPC' /ca/api/corpora/{corpusId}/filesets/{filesetId}/get_progress: parameters: - $ref: '#/components/parameters/01_corpus_id' - $ref: '#/components/parameters/06_fileset_id' post: operationId: getFileSetProgress tags: - Filesets summary: Shows the actual progress of a running task related to filesets. (RPC) description: A task like `downloading content from web` to create corpus with web crawler. requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/59_filesets_get_progress' "403": description: '`Forbidden` (you need `view` permission).' content: application/json: schema: $ref: '#/components/schemas/21_forbidden' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/45_not_found_RPC' /ca/api/languages: get: operationId: getLanguages description: "-" tags: - Languages summary: Retrieves a list of all languages. responses: "200": description: '`OK`' content: application/json: schema: type: object properties: data: type: array items: $ref: '#/components/schemas/63_language' /ca/api/somefiles: post: operationId: uploadAligendDocuments description: "-" tags: - Somefiles summary: Upload aligned documents for creating parallel corpus. requestBody: description: Aligned multilingual file (mostly in '.tmx' file type). content: multipart/form-data; boundary={boundary}: schema: description: Aligned multilingual file (mostly in '.tmx' file type). type: string responses: "201": description: '`Created`' content: application/json: schema: $ref: '#/components/schemas/78_somefiles_post' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/40_bad_request_20' /ca/api/somefiles/{somefileId}: parameters: - $ref: '#/components/parameters/08_somefile_id' get: operationId: getAlignedDocuments description: "-" tags: - Somefiles summary: Retrieves specific multilingual file metadata. responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/78_somefiles_post' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' put: operationId: updateAlignedDocs description: "-" tags: - Somefiles summary: Updates multilingual file metadata. requestBody: description: " - `corpora` \n\n - `guessed_language_code` \n\n - `language_id`\ \ => Language iso-code. **ISO 639-1**. (string) \n\n - `name` => Language\ \ name in **English**. (string) " content: application/json: schema: $ref: '#/components/schemas/19_somefiles_put' responses: "200": description: '`OK`' content: application/json: schema: type: object properties: data: $ref: '#/components/schemas/12_corpora_single_full' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/33_bad_request_13' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' /ca/api/tagsets/{templateId}: parameters: - $ref: '#/components/parameters/03_template_id' get: operationId: getUserTemplate description: "-" tags: - Templates summary: Retrieves details of specified user template / tagset. responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/79_template' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need permission to `read`).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' put: operationId: updateUserTemplate description: "-" tags: - Templates summary: Updates specified user template/tagset. requestBody: description: " - `id` => Alphanumeric **template/tagset ID**. The terms **tagset**\ \ and **templates** are interchangeable. (string) \n\n - `name` => Name\ \ of **template/tagset** file. (string) \n\n - `owner_id` => Unique numeric\ \ owner ID (usually you). If tagset/template is preloaded Null. (integer)\ \ \n\n - `owner_name` => Tagset/template owner name (usually you). If tagset/template\ \ is preloaded Null. (string) \n\n - `has_pipeline` => Vertical creation\ \ is supported. False for legacy templates. (boolean) \n\n - `has_tags`\ \ => Morphological tagging is supported. (boolean) \n\n - `has_lemmas`\ \ => Lemmatization is supported. (boolean) \n\n - `static_attributes` =>\ \ A list of attributes which can appear in corpus. \n\n - `structures`\ \ => A list of used structures. Examples \\<s>, \\<g>. \n\n - `tagsetdoc`\ \ => URL leading to template/tagset documentation. (string) \n\n - `content`\ \ => Content of tagset. (string) \n\n - `default_sketchgrammar_id` => Not\ \ ID, as you probably imagine, but filename of preselected sketchgrammar\ \ (.m4 format). (string) \n\n - `default_termgrammar_id` => Not ID, as\ \ you probably imagine, but filename of preselected sketchgrammar (.m4 format).\ \ (string) \n\n - `sharing` => List. \n\n - `users` => The ID of user\ \ you share template with. \n\n - `id` => The ID of group you share template\ \ with." content: application/x-www-form-urlencoded; charset=UTF-8: schema: $ref: '#/components/schemas/80_template_put' responses: "200": description: '`OK`' content: application/json: schema: type: object properties: data: $ref: '#/components/schemas/79_template' "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/33_bad_request_13' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need to be `owner` of the template).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' delete: operationId: deleteUserTemplate description: "-" tags: - Templates summary: Deletes user template. responses: "204": description: '`No Content`' content: {} "400": description: '`Bad Request`' content: application/json: schema: $ref: '#/components/schemas/33_bad_request_13' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/20_unauthorized' "403": description: '`Forbidden` (you need to be `owner` of the template).' content: application/json: schema: $ref: '#/components/schemas/27_forbidden_normal' "404": description: '`Not Found`' content: application/json: schema: $ref: '#/components/schemas/57_not_found_404' /ca/api/users/me/get_used_space: post: operationId: "getUsedSpace" description: "-" tags: - Users summary: Retrieves the user's current word space state (default 1 000 000 words). (RPC) requestBody: description: ' In this documentation, an empty request is used mostly used with the **RPC style** method where the content in a request is not needed (in most cases). RPC style endpoints focus on performing **one action** right (procedures, command) easier than **REST API**-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method. ' content: application/json: schema: $ref: '#/components/schemas/05_empty_request' responses: "200": description: '`OK`' content: application/json: schema: $ref: '#/components/schemas/81_get_used_space' "401": description: '`Unauthorized`' content: application/json: schema: $ref: '#/components/schemas/23_unauthorized_rpc' components: parameters: 001_corpname: in: query name: corpname schema: type: string required: true description: Corpus name. To query your own corpus (e.g. username john, corpus mycorpus), `use` value `user/john/mycorpus`. example: preloaded/bnc2_tt21 002_usesubcorp: in: query name: usesubcorp schema: type: string description: The name of the `subcorpus`. The `default` value `empty string` refers to the entire corpus. An example for `preloaded/bnc2_tt21` can be `Written Academic` or `1960-1974`. 003_gramrels: in: query name: gramrels schema: type: integer enum: - 0 - 1 description: A list of grammatical relations from the correspoding `sketch grammar`. 004_corpcheck: in: query name: corpcheck schema: type: integer enum: - 0 - 1 description: Results of the last corpcheck (if available in the compilation log). 005_registry: in: query name: registry schema: type: integer enum: - 0 - 1 description: The content of the registry file (registry_dump and registry_text). 006_subcorpora: in: query name: subcorpora schema: type: integer enum: - 0 - 1 description: A parameter to obtain the list of subcorpora and their sizes (e.g. number of tokens, words). 007_struct_attr_stats: in: query name: struct_attr_stats schema: type: integer enum: - 0 - 1 description: The lexicon sizes of structure attributes. 008_format: in: query name: format schema: type: string enum: - json - xml - csv - tsv - txt - xls description: The `format` of the output. `Empty value` is interpreted as `JSON`. Not every endpoint supports all formats. 010_wlattr: in: query name: wlattr schema: type: string example: lc required: true description: "Sets the corpus attribute you want to work with. Some corpora\ \ may have specific additional attributes.\n\n Basic examples:\n - word\n\ \ - lc\n - lemma\n - lemma_lc\n - lempos\n - lempos_lc\n - tag\n - pos\n\n\ \ For list of available word list attributes see **/ca/api/corpora/{corpusId}**" 011_wlnums: in: query name: wlnums schema: type: string default: frq enum: - frq - docf - arf description: 'The type of frequency. The values stand for: `frq` -> absolute or raw frequency, `docf` -> document frequency. `arf` -> average reduced frequency.' 012_wlminfreq: in: query name: wlminfreq schema: type: integer description: Sets the minimum frequency limit. Items with a lower frequency will not be included. 013_wlmaxitems: in: query name: wlmaxitems schema: type: integer description: Sets the number of items to be returned in the API response. It is not limited for user corpora, in preloaded corpora can be some limitation. This parameter is often used with wlpage to help with pagination in frontend development. example: 100 014_wlpat: in: query name: wlpat schema: type: string description: Sets a regex to filter the results. Relevant only in a simple wordlist. example: .* 015_wlsort: in: query name: wlsort schema: type: string enum: - frq - docf description: Sets the sorting of the results. The default is `frq`, i.e. by absolute frequency. Docf means document frequency. 018_wlfile: in: query name: wlfile schema: type: string description: Defines the allow list (formerly known as whitelist), the list of words which should be included in the list. See also `wlblacklist`. 019_wlblacklist: in: query name: wlblacklist schema: type: string description: A deny list (formerly known as blacklist) is a list of items that should be excluded from the result. The values should be be separated by a newline symbol (without commas between values). In the URL, the newline symbol is `%0A`. 020_avattr: in: query name: avattr schema: type: string required: true example: bncdoc.alltyp description: Selects a structure attribute (reffered to as text type in the web interface). Corpora have different numbers structure attributes and their values. You can find them in the response of the `corpus_info` method in `freqttattrs` or `subcorpattrs` keys. Not all of them have attributes to show. 021_avpat: in: query name: avpat schema: type: string description: A regex to filter the results. Empty string defaults to `.*` (match everything). 022_avmaxitems: in: query name: avmaxitems schema: type: integer description: The number of items to return. 023_avfrom: in: query name: avfrom schema: type: integer description: The starting index from which to return the results. 025_lpos: in: query name: lpos schema: type: string enum: - -n - -v - -j - -a - -d - -i description: The part of speech of the lemma. The concrete values depend on the corpus. If the corpus contains the `lempos` attribute and `lpos` is empty, the result defaults to the most frequent part of speech of the lemma. 039_asyn: in: query name: asyn schema: type: integer enum: - 0 - 1 description: Switches the asynchronous processing on/off. ON = partial results are returned as soon as the first page is filled with results. OFF = results are returned only after the search is completed. Normally, ON is used in the web interface and OFF when using the API. 040_json: in: query name: json schema: type: object example: {"concordance_query":[{"queryselector":"iqueryrow","iquery":"test"}]} description: "An optinal way of **wraping parameters**. It is possible to send\ \ all parametres via this parameter only.\n\n The most frequent uses are:\n\ \n `queryselector`: To select the query type. Supported options are: **cqlrow**,\ \ **iqueryrow**, **lemmarow**, **charrow**, **phraserow**, **wordrow**. For\ \ more information see concordance_query parameters.\n\n `iquery`: Use with\ \ `iqueryrow`. \n\n `cql`: Use with `cqlrow`.\n\n `lemma`: Use with `lemmarow`.\n\ \n `lpos`: The part of speech of the lemma.\n\n `qmcase`: Sets the attribute\ \ to its lowercased version, i.e. the data are extracted from a lowercased\ \ version of the corpus. It used for case insensitive analysis. 1 = case sensitive),\ \ 0 = lowercased corpus/case insensitive).\n\n\n`char`: Use with charrow.\n\ \n `phrase`: Use with phraserow.\n\n `word`: Use with wordrow\n\n\n`name`:\ \ \n\n `pnfilter`: \n\n `inclkwic`: \n\n `filfpos`: \n\n `filtpos`: \n\n `desc`:\ \ \n\n `q`: \n\n" 041_q: in: query name: q schema: type: string example: q[lemma="test"] description: The CQL query. Regexes are supported for `lemma`, `phrase` and `word` types. The `iquery` supports simplified wildcards (see concordance_query[iquery]). If you decide to use the concordance_query in a json parameter, you do not have to use this parameter. 044_fmaxitems: in: query name: fmaxitems schema: type: integer default: 50 description: The number of items in one response. 045_freq_sort: in: query name: freq_sort schema: type: string default: frq enum: - frq - rel description: The identifier of the sorted column. Use `frq` (default) to sort by frequency. 046_cattr: in: query name: cattr schema: type: string default: word description: The (positional attribute)[https://www.sketchengine.eu/my_keywords/positional-attribute/] (lemma, word form etc.) used in the computation. 047_cfromw: in: query name: cfromw schema: type: integer default: -5 description: The left boundary of the window in which the collocations should be identified. Defined by the token position left or right of KWIC. 048_ctow: in: query name: ctow schema: type: integer default: 5 description: The right boundary of the window in which the collocations should be identified. Defined by the token position left or right of KWIC. 049_cminfreq: in: query name: cminfreq schema: type: integer default: 5 description: The minimum frequency of the token in the corpus. 050_cminbgr: in: query name: cminbgr schema: type: integer default: 3 description: The minimum frequency of the token in the window defined by `cfromw` and `ctow`. 051_cmaxitems: in: query name: cmaxitems schema: type: integer default: 50 description: Sets the maximum number of items in the response. 052_cbgrfns: in: query name: cbgrfns schema: type: string description: "Defines the types of statistics (association measures) to be computed.\n\ \n`t` -> T-score\n\n`m` -> MI\n\n`3` -> MI3\n\n`l` -> log likelihood\n\n`s`\ \ -> min. sensitivity\n\n`p` -> MI.log_f\n\n`r` -> relative freq.\n\n`f` ->\ \ absolute freq.\n\n`d` -> logDice.\n\n To send one value just type the value\ \ for example `t`. If you need to send more values write it as `[\"t\",\"\ m\",\"d\",\"3\",\"l\",\"s\",\"p\"]`." 053_csortfn: in: query name: csortfn schema: type: string default: m enum: - t - m - "3" - l - s - p - r - f - d description: Function according to which the result is sorted. 054_subcname: in: query name: subcname schema: type: string required: true description: Name of the subcorpus. 055_delete: in: query name: delete schema: type: integer default: 0 enum: - 0 - 1 description: Set to `1` if the corpus should be deleted. Only user subcorpora can be deleted. Nothing will be deleted if left empty (default value == 0). 056_ref_corpname: in: query name: ref_corpname schema: type: string required: true description: Corpus name of the reference corpus, it must have the same processing (the same attributes, the same term grammar). 057_simple_n: in: query name: simple_n schema: type: string default: "1" enum: - "1" - "0" description: The smoothing parameter for (simple maths) [https://www.sketchengine.eu/documentation/simple-maths/]. 058_attrs: in: query name: attrs schema: type: string default: word description: A list of comma-delimited attributes that are returned together with each token. Other examples are:`word, lc, lemma, tag` etc.. 060_alnum: in: query name: alnum schema: type: integer default: 1 enum: - 1 - 0 description: Limits the results to items containing only alphanumeric characters. 061_onealpha: in: query name: onealpha schema: type: integer default: 1 enum: - 1 - 0 description: Limits the results to items containing at least one alphanumberic character. Words such as 16-year-old or 3D will be included. 062_max_keywords: in: query name: max_keywords schema: type: integer default: 100 description: The number of items to be returned in the response. 063_ngrams_n: in: query name: ngrams_n schema: type: integer enum: - 2 - 3 - 4 - 5 description: The minimum n-gram length. Usually used with `ngrams_max_n` and `usengrams`. 071_wlicase: in: query name: wlicase schema: type: integer default: 0 enum: - 1 - 0 description: Sets the case sensitivity of the corpus, i.e. the data are extracted from a lowercased version of the corpus. It is used for case insensitive analysis. Parameter "1" means case sensitivity, "0" means case insensitivity. 072_wlmaxfreq: in: query name: wlmaxfreq schema: type: integer description: Sets the maximum frequency limit in the wordlist. 073_include_nonwords: in: query name: include_nonwords schema: type: integer enum: - 1 - 0 description: Includes, or excludes, nonwords in the in the result. Nonwords are tokens which do not start with letter of the alphabet (e.g. numbers, punctuation). The regex to match the nonwords is `[^[:alpha:]].*`. Certain specialized corpora may use their own specific definition of nonwords. 074_random: in: query name: random schema: type: integer enum: - 1 - 0 description: Parameter that represents if the wordlist is created from the first 10 milions lines of corpus. One if yes, no if he wordlist is created from the whole corpus. deprecated: true 077_default_attr: in: query name: default_attr schema: type: string description: The attribute applied to tokens in the query which do not have an attribute specified explicitly as part of the query. 078_refs: in: query name: refs schema: type: string example: =bncdoc.alltyp description: The text type (metadata) for which statistics should be calculated from the concordance. The default is `bncdoc.alltyp` (all available text types are included). Text types (attributes and there values) differ between corpora). You can find them in the response of `corpus_info` method in `freqttattrs` or `subcorpattrs` keys. Not all of them have attributes to show. 079_attr_allpos: in: query name: attr_allpos schema: type: string default: all enum: - kw - all description: Determines which tokens will be returned with additional attributes defined in `attrs`. `kw` will add the attributes to the KWIC only. `all` will return them with all tokens. 080_viewmode: in: query name: viewmode schema: type: string enum: - sen - kwic description: Switches between sentence view and the KWIC view. `sen` returns complete sentences without trimming them. `kwic` returns the KWIC view with the query in the centre and some context left and right. 081_cup_hl: in: query name: cup_hl schema: type: string enum: - q - e - c - b description: "Only used with error-annotated corpora. It determines what should\ \ be highlighted. It is set to 'q' for corpora without error annotation. Meaning\ \ of individual options:\n\n - `q` -> to higlight query result.\n\n - `e`\ \ -> to higlight errors.\n\n - `c` -> to highlight corrections.\n\n - `b`\ \ -> to highlight both erros and corrections.\n\n Example of such a corpus\ \ can be `preloaded/enwiki_error_sample_sentences`." 082_structs: in: query name: structs schema: type: string example: s,g description: A list of comma-delimited structures (=structure tags) that should be included in the result. 083_fromp: in: query name: fromp schema: type: integer example: 1 description: The number of the page that should be returned. 084_pagesize: in: query name: pagesize schema: type: integer example: 20 description: The number of lines in the concordance. 085_kwicleftctx: in: query name: kwicleftctx schema: type: string example: 100# description: The size of the left context in KWIC view. Number of tokens. 086_kwicrightctx: in: query name: kwicrightctx schema: type: string example: 100# description: The size of the right context in KWIC view. Number of tokens. 087_ngrams_max_n: in: query name: ngrams_max_n schema: type: integer enum: - 2 - 3 - 4 - 5 - 6 description: The maximum n-gram length. The maximum is `6`. 088_usengrams: in: query name: usengrams schema: type: integer enum: - 0 - 1 description: Represent if n-grams should be extracted or just simple keywords. 089_wltype: in: query name: wltype schema: type: string enum: - simple - struct_wordlist example: simple description: Parameter to set the format of ouput. Is it always set to `simple`, for the `struct_wordlist` is another enpoint called `struct_wordlist`. deprecated: true 090_wlstruct_attr1: in: query name: wlstruct_attr1 schema: type: string enum: - word - lemma - tag - lempos description: Sets the attributes used for generating the wordlist. Up to 3 attributes are allowed (see wlstruct_attr2 and wlstruct_attr3). Some corpora may contain additional specific attributes. required: true 091_relfreq: in: query name: relfreq schema: type: integer enum: - 1 - 0 description: Includes the relative frequency of each item in the result. 092_reldocf: in: query name: reldocf schema: type: integer enum: - 1 - 0 description: Calculate the document frequency for each item in the result. Must be used with `addfreqs` set to `docf`. 093_wlpage: in: query name: wlpage schema: type: integer description: To select page of the response. The number of items on the page is specified by parameter wlmaxitems. 094_fpage: in: query name: fpage schema: type: integer description: The number of the response batch (page). The number of items in each batch is specified by `fmaxitems`. 095_group: in: query name: group schema: type: integer enum: - 1 - 0 description: If there are more attributes (e.g. m1attr, m2attr), the results can be grouped by the first column/attribute. 096_showpoc: in: query name: showpoc schema: type: integer enum: - 1 - 0 description: Includes the percentage of the concordance in the result. 097_showreltt: in: query name: showreltt schema: type: integer enum: - 1 - 0 description: Includes relative in text types value in the result. 098_showrel: in: query name: showrel schema: type: integer enum: - 1 - 0 description: Includes the relative frequency in the result. 099_freqlevel: in: query name: freqlevel schema: type: integer enum: - 1 - 2 - 3 - 4 - 5 - 6 description: The number of attributes for which the frequencies should be counted. 100_ml1attr: in: query name: ml1attr required: true schema: type: string example: word description: Used to count the frequency of positional attributes [attributes](https://www.sketchengine.eu/my_keywords/positional-attribute/) or structure attributes (metadata/text types) of any token in the concordance.\n\n A maximum of 6 attributes is allowed (e.g. ml2attr, ml3attr). At least one attribute is required. 101_ml1ctx: in: query name: ml1ctx required: true schema: type: string example: -1<0 description: " Position of the selected attribute in the concordance. **Minus**\ \ means **left** context (-1<0). **Plus** means **right** context (6>0). **At\ \ least one attribute is required, the others are optional.** Every attribute\ \ (ml1attr, ml2attr. etc.) needs it's **own** context position (e.g. if 3\ \ attributes are selected three context position **needs to** be set ml1ctx,\ \ ml2ctx, ml3ctx).\n\n\n**Positions can be referenced as follows:**\n\n `integer\ \ number` - where **0** is the first token in **KWIC**, **-1** the rightmost\ \ token in the left context etc.\n\n `1:x` - where **x** is one of the corpus\ \ structures (e.g. \u201Cdoc\u201D or \u201Cs\u201D if the corpus has the\ \ particular markup). Its meaning is the first token in the structure, except\ \ when it is the right boundary of a range - then it is the last token in\ \ the structure. Also, other numbers can be used, e.g. -2:x, 3:x, etc. (-1\ \ is the same as 1 with meaning \u201Cstructure containing KWIC\u201D)\n\n\ \ `a<0` - where **a** stands for a position reference as described in the\ \ first two points with meaning '**a** positions before/after the firs KWIC\ \ position' (so this is equivalent to **a**)\n\n `a>0` - where **a** stands\ \ for the same position reference with meaning 'positions before/after the\ \ last KWIC position'\n\n in the previous two points, if **0** is substituted\ \ with a natural number **k**, it means 'before/after **k**-th collocation'\ \ instead of 'before/after KWIC'. Collocations are special token groups in\ \ the context, that can be added using positive filters (see below).\n\n\n\ `Ranges` can be referenced as a~b where **a**, **b** stand for token identifiers\ \ as above. Examples of positions and ranges:\n\n `-1<0` - rightmost token\ \ in the left context\n\n `3>0` - third token in right context\n\n `0>0` -\ \ last token in KWIC\n\n `0<0` - first token in KWIC\n\n `0<0~0>0` - range\ \ of KWIC\n\n `-1<0~1>0` - range of KWIC with one token from the left context\ \ and one from the right context\n\n `1:s` - first token in the sentence containing\ \ KWIC (or its first token)\n\n `1:s>0` - first token in the sentence containing\ \ KWIC (or its last token)\n\n `0<1` - first token of the first-added collocation.\n\ \n\n`Examples:`\n\n sword/ **1>0~3>0**\n\n sword/ **1>0~3>0**\n\n slemma/\ \ **0<0~0>0**\n\n sword/i **-1**\n\n sword/ **0** word/ir **-1<0** tag/r **-2<0**\n\ \n " 102_wlstruct_attr2: in: query name: wlstruct_attr2 schema: type: string enum: - word - lemma - tag - lempos description: Additional optional attribute that should be included in the result. 103_wlstruct_attr3: in: query name: wlstruct_attr3 schema: type: string enum: - word - lemma - tag - lempos description: Additional optional attribute that should be included in the result. 104_icase: in: query name: icase schema: type: integer enum: - 1 - 0 description: Switches to the `lc` attribute, i.e. the lower-cased version of the corpus to allow case insensitive searching. `1` means that case sensitivity is off, `0` means it is on. 111_pos: in: query name: pos schema: type: integer example: 10336 description: The position of the first token of KWIC in the corpus. 112_nest_ngrams: in: query name: nest_ngrams schema: type: integer enum: - 1 - 0 description: N-grams which are sub-ngrams of a longer n-gram will be grouped together with the longer n-gram. Nesting only works when a `ngrams_n` and `ngrams_max_n` are different values. 115_attr: in: query name: attr schema: type: string default: word enum: - lemma - word - TERM - WSCOLLOC description: "Switches between the computation of keywords, terms, key n-grams\ \ and key collocations. With keywords and n-grams, it also sets the attribute\ \ to be used for the computation.\n\n For keywords, set to the required attribute,\ \ usually `lc`, `word` or `lemma`.\n\n For n-grams, set to the required attribute\ \ and set `usengrams`, `ngrams_n` and `ngrams_max_n`.\n\n For terms, set the\ \ attribute to `TERM`.\n\n For collocations (word sketch triples, equivalent\ \ of using the Word Sketch with AS A LIST option in the web interace), set\ \ the attribute to `WSCOLLOC`. Consider using `wlpat`. " 118_res: in: query name: res schema: type: integer example: 50 description: '' 119_normalize: in: query name: normalize schema: type: integer example: 0 description: '' 120_fc_lemword_window_type: in: query name: fc_lemword_window_type schema: type: string example: both description: '' 121_fc_lemword_wsize: in: query name: fc_lemword_wsize schema: type: integer example: 5 description: '' 122_fc_lemword_type: in: query name: fc_lemword_type schema: type: string example: all description: '' 123_fc_pos_wsize: in: query name: fc_pos_wsize schema: type: integer example: 5 description: '' 124_fc_pos_type: in: query name: fc_pos_type schema: type: string example: all description: '' 125_fc_pos_window_type: in: query name: fc_pos_window_type schema: type: integer example: 5 description: '' 127_concordance_query_queryselector: in: query name: concordance_query[queryselector] schema: type: string enum: - iquery - cqlrow - lemmarow - charrow - wordrow - phraserow description: The query type. You can send it directly or via the `json` parameter, the results are the same. 128_concordance_query_iquery: in: query name: concordance_query[iquery] schema: type: string description: "Only works when `queryselector` is set to `iqueryrow`. Type a\ \ word or phrase.\n\n These special wildcards are supported .\n\nUse the `asterisk\ \ (*)` for any number of unspecified characters. Use a `question mark (?)`\ \ for exactly one unspecified character. Use the `pipe (|)` to include more\ \ than one word or phrase. Use `two hyphens (--)` to find a word which is\ \ hyphenated, non-hyphenated or spelt as two separate words." 129_concordance_query_cql: in: query name: concordance_query[cql] schema: type: string description: Only works when `queryselector` is set to `cqlrow`. Type the query using the [cql](https://www.sketchengine.eu/documentation/corpus-querying/) query language. 130_concordance_query_lemma: in: query name: concordance_query[lemma] schema: type: string description: Only works when `queryselector` is set to `lemmarow`. Type the lemma. Regex is supported. 131_concordance_query_char: in: query name: concordance_query[char] schema: type: string description: Only works when `queryselector` is set to `charrow`. Type the characters that the tokens should contain. Regex is supported. 132_concordance_query_word: in: query name: concordance_query[word] schema: type: string description: Only works when `queryselector` is set to `wordrow`. Type the word form. Regex is supported. 133_concordance_query_phrase: in: query name: concordance_query[phrase] schema: type: string description: Only works when `queryselector` is set to `phraserow`. Type the phrase. Regex is supported. 134_errcorr_switch: in: query name: errcorr_switch schema: type: string enum: - corr - err description: (Only for error-annotated corpora.) Determines what should be highlighted. Corr means **correction** and err means **error**. An **example** of such a corpus is `preloaded/enwiki_error_sample_sentences`. 135_cup_err_code: in: query name: cup_err_code schema: type: string enum: - .* - lexicosemantic - punct - spelling - style - typographical - unclassified description: (Only for error-annotated corpora). Determines which error type to higlight. An example of such a corpus is `preloaded/enwiki_error_sample_sentences`. 136_cup_err: in: query name: cup_err schema: type: string description: (Only for error-annotated corpora.) An error token to search. 137_cup_corr: in: query name: cup_corr schema: type: string description: (Only for error-annotated corpora.) A correction token to search. 138_hitlen: in: query name: hitlen schema: type: integer example: 1 description: Only used by the web interface. Indicates the number of tokens that should be highlighted in red. 139_detail_left_ctx: in: query name: detail_left_ctx schema: type: integer example: 50 description: Size of the left context in tokens. 140_detail_right_ctx: in: query name: detail_right_ctx schema: type: integer example: 50 description: Size of the right context in tokens. 141_ml2attr: in: query name: ml2attr schema: type: string example: word description: Used to count the frequency of positional attributes [attributes](https://www.sketchengine.eu/my_keywords/positional-attribute/). Just like `ml1attr` but optional. 142_ml2ctx: in: query name: ml2ctx schema: type: string example: -1<0 description: Position of the selected attribute in the concordance. Minus means left context (-1<0). Plus means right context (6>0). Just like ml1ctx but optional. 143_ml3attr: in: query name: ml3attr schema: type: string example: word description: Used to count the frequency of positional attributes [attributes](https://www.sketchengine.eu/my_keywords/positional-attribute/). Just like `ml1attr` but optional. 144_ml3ctx: in: query name: ml3ctx schema: type: string example: -1<0 description: Position of the selected attribute in the concordance. Minus means left context (-1<0). Plus means right context (6>0). Just like ml1ctx but optional. 145_ml4attr: in: query name: ml4attr schema: type: string example: word description: Used to count the frequency of positional attributes [attributes](https://www.sketchengine.eu/my_keywords/positional-attribute/). Just like `ml1attr` but optional. 146_ml4ctx: in: query name: ml4ctx schema: type: string example: -1<0 description: Position of the selected attribute in the concordance. Minus means left context (-1<0). Plus means right context (6>0). Just like ml1ctx but optional. 147_ml5attr: in: query name: ml5attr schema: type: string example: word description: Used to count the frequency of positional attributes [attributes](https://www.sketchengine.eu/my_keywords/positional-attribute/). Just like `ml1attr` but optional. 148_ml5ctx: in: query name: ml5ctx schema: type: string example: -1<0 description: Position of the selected attribute in the concordance. Minus means left context (-1<0). Plus means right context (6>0). Just like ml1ctx but optional. 149_ml6attr: in: query name: ml6attr schema: type: string example: word description: Used to count the frequency of positional attributes [attributes](https://www.sketchengine.eu/my_keywords/positional-attribute/). Just like `ml1attr` but optional. 150_ml6ctx: in: query name: ml6ctx schema: type: string example: -1<0 description: Position of the selected attribute in the concordance. Minus means left context (-1<0). Plus means right context (6>0). Just like ml1ctx but optional. 151_minfreq_extract_keywords: in: query name: minfreq schema: type: string description: Sets the minimum frequency of the item. example: auto 152_maxfreq_extract_keywords: in: query name: maxfreq schema: type: integer description: Sets the maximum frequency of the item. 153_addfreqs: in: query name: addfreqs schema: type: string default: docf description: Represent what kind of frequnecy should be calculated. When used with `reldocf` it is set to `docf` to calculate document frequency. 154_json_collx: in: query name: json schema: type: object example: {"concordance_query":[{"queryselector":"iqueryrow","iquery":"test"}],"cbgrfns":["t","m","3","l","s","p","r","f","d"]} description: "An optinal way of **wraping parameters**. It is possible to send\ \ all relevant parametres via this parameter only. It is classic JSON format.\n\ \n The most frequent uses are:\n\n `queryselector`: To select the query type.\ \ Supported options are: **cqlrow**, **iqueryrow**, **lemmarow**, **charrow**,\ \ **phraserow**, **wordrow**.\n\n `iquery`: Use with `iqueryrow`. \n\n `cql`:\ \ Use with `cqlrow`.\n\n `lemma`: Use with `lemmarow`.\n\n `lpos`: The part\ \ of speech of the lemma.\n\n `qmcase`: Sets the attribute to its lowercased\ \ version, i.e. the data are extracted from a lowercased version of the corpus.\ \ It used for case insensitive analysis. 1 = case sensitive), 0 = lowercased\ \ corpus/case insensitive).\n\n `char`: Use with charrow.\n\n `phrase`: Use\ \ with phraserow.\n\n `word`: Use with wordrow\n\n `cbgrfns`: If you need\ \ to send more cbgrfnsDefines the types of statistics (association measures)\ \ to be computed. See example." 155_create: in: query name: create schema: type: integer default: 0 enum: - 0 - 1 description: Set to `1` if new subcorpus should be created. Subcorpus will not be created if left empty (default value == 0). 156_q_subcorp: in: query name: q schema: type: string example: >- q=alemma,[lc="test" | lemma_lc="test"] description: "Query for creating subcorpora from concordance. \n\nThe search\ \ criteria are specified within brackets following a prefix like `alemma`\ \ or `aword`. This prefix often indicates the type of linguistic search (e.g.,\ \ lemma-based, word-based). The criteria within the brackets can include checks\ \ for specific words, lemmas, parts of speech and more, using operators like\ \ | (OR), & (AND), and regular expressions. \n\nThe lists of available attributes,\ \ pos tags for specific corpus can be obtained via `/search/corp_info`.\n\n\ \n`Examples:`\n\n\n - Simple word or lemma search in the BNC corpus: \n\n\ \ **q=alemma,[lc=\"test\" | lemma_lc=\"test\"]**\n\n\n - Search for nouns\ \ with the lemma 'test' in a case-sensitive manner: \n\n **q=alemma,[lempos_lc=\"\ (test)-n\"]**\n\n\n - Search for verbs with the lemma 'test', case-insensitive:\ \ \n\n **q=alemma,[lempos=\"(test)-v\"]**\n\n\n - Searching for a specific\ \ phrase 'test' in a case-sensitive manner: \n\n **q=aword,[word=\"test\"\ ]**\n\n\n - Searching for the numeral '1955': \n\n **q=alemma,[word=\"\ 1955\" & tag=\"CD\"]**\n\n\n - Regex-based search for words containing the\ \ character 'h': \n\n **q=alemma,[word=\".\\*h.\\*\"]**\n\n\n - Complex\ \ search involving the lemma 'book' followed by up to three words, then a\ \ verb: \n\n **q=alemma,[lemma=\"book\"][]{1,3}[tag=\"V.\\*\"]**" 157_struct: in: query name: struct schema: type: string example: doc description: Which corpus structure should be used in new subcorpus. Used withim concordance type of subcorpus. The lists of structures can be obtained via `/search/corp_info`. 158_subcorp_id: in: query name: subcorp_id schema: type: string required: true description: The name of subcorpus you want to rename. 159_new_subcorp_name: in: query name: new_subcorp_name schema: type: string required: true description: A new name for the subcorpus. 160_json_subcorp: in: query name: json schema: type: object example: sca_bncdoc.alltyp: - Spoken context-governed description: "Used to specify text types for a subcorpus. Takes a JSON object\ \ as input, where the key-value pairs define the specific attributes. The\ \ attributes can vary depending on the corpus.\n\n When using the json parameter\ \ in a query, you can define a JSON object with one or more attributes. Each\ \ attribute can have a single value or an array of values.\n\n\nThe list of\ \ available text types for specific corpus can be obtained via `/search/corp_info`\ \ just add `sca_` in front the name (see examples).\n\n\n`Examples:`\n\n\n\ \ - To create a subcorpus based on a specific spoken text type from the BNC\ \ corpus: \n\n **{\"sca_bncdoc.alltyp\":[\"Spoken context-governed\"]}**\n\ \n\n - To filter texts from the BNC corpus that are both spoken context-governed\ \ and spoken demographic: \n\n **{\"sca_bncdoc.alltyp\":[\"Spoken context-governed\"\ ,\"Spoken demographic\"]}**\n\n\n - To select texts from the BNC corpus from\ \ a specific time period (1960-1974): \n\n **{\"sca_bncdoc.alltim\":[\"\ 1960-1974\"]}**\n\n\n - To create a subcorpus with texts from specific authors\ \ and time periods, along with regional specifications: \n\n **{\"sca_bncdoc.author\"\ : [\"Author1\",\"Author2\",...],\"sca_bncdoc.alltim\": [\"1985-1993\",\"1975-1984\"\ ], \"sca_bncdoc.wripp\": [\"UK (unspecific)\",\"Ireland\"]}**\n\n\n - To filter\ \ texts from the Ententen corpus based on domain and topic: \n\n **{\"\ sca_doc.tld\":[\"org\",\"com\"], \"sca_doc.topic\": [\"arts\",\"beauty & fashion\"\ ,\"cars & bikes\",\"culture & entertainment\"]}**\n\n\n - For a user-specific\ \ corpus, filtering based on document ID and filename: \n\n **{\"sca_doc.id\"\ :[\"file29173711\"],\"sca_doc.filename\":[\"Filename.pdf\"]}**" 161_ctx: in: query name: ctx schema: type: string example: 0~0>0 description: " **Minus** means **left** context (-1<0). **Plus** means **right**\ \ context (6>0).\n\n\n**Positions can be referenced as follows:**\n\n `integer\ \ number` - where **0** is the first token, **-1** the rightmost token in\ \ the left context etc.\n\n `1:x` - where **x** is one of the corpus structures\ \ (e.g. \u201Cdoc\u201D or \u201Cs\u201D if the corpus has the particular\ \ markup). Its meaning is the first token in the structure, except when it\ \ is the right boundary of a range - then it is the last token in the structure.\ \ Also, other numbers can be used, e.g. -2:x, 3:x, etc. (-1 is the same as\ \ 1 with meaning \u201Cstructure containing searched word\u201D)\n\n `a<0`\ \ - where **a** stands for a position reference as described in the first\ \ two points with meaning '**a** positions before/after the first searched\ \ word position' (so this is equivalent to **a**)\n\n `a>0` - where **a**\ \ stands for the same position reference with meaning 'positions before/after\ \ the last searched word position'\n\n in the previous two points, if **0**\ \ is substituted with a natural number **k**, it means 'before/after **k**-th\ \ collocation' instead of 'before/after KWIC'. Collocations are special token\ \ groups in the context, that can be added using positive filters (see below).\n\ \n\n`Ranges` can be referenced as a~b where **a**, **b** stand for token identifiers\ \ as above. Examples of positions and ranges:\n\n `-1<0` - rightmost token\ \ in the left context\n\n `3>0` - third token in right context\n\n `0>0` -\ \ last token\n\n `0<0` - first token\n\n `0<0~0>0` - range\n\n `-1<0~1>0`\ \ - range with one token from the left context and one from the right context\n\ \n `1:s` - first token in the sentence containing searched word (or its first\ \ token)\n\n `1:s>0` - first token in the sentence containing searched word\ \ (or its last token)\n\n `0<1` - first token of the first-added collocation.\n\ \n\n`Examples:`\n\n sword/ **1>0~3>0**\n\n sword/ **1>0~3>0**\n\n slemma/\ \ **0<0~0>0**\n\n sword/i **-1**\n\n sword/ **0**\n\n word/ir **-1<0**\n\n\ \ tag/r **-2<0**\n\n " 162_diaattr: in: query name: diaattr required: true schema: type: string example: doc.month description: A diachronic attribute to be selected. Available attributes **can differ** in corpora. Examples can be **doc.year** or **doc.month**. 163_sse: in: query name: sse schema: type: string enum: - "1" - "0" required: true description: '`1`: display results during calculation `0`: display results after all data has been calculated (can take quite a lot of time).' 164_threshold: in: query name: threshold schema: type: string example: "0.05" required: true description: Determines which periods are included in the results. It signifies the percentage above the average size, acting as a **limit**. When relative frequency (rel_frq) surpasses this limit, it is discarded (moved to **removed_freqdist** object). 165_json_freqdist: in: query name: json schema: type: object example: wordlist: - the - ',' - \. - to - and - of - a - in - '"' - for - that - is - it - 'on' - with - as - was - i - you - at - this - be - are - from - have - by - ':' - he - has - '''s' - \) - but - \( - we - an - they - will - '''' - not - his - said - or - their - can - n't - more - "\u2013" - your - one - who description: "An optinal way of **wraping parameters**. It is possible to send\ \ all relevant parametres via this parameter only. It is classic JSON format.\n\ \n `wordlist`: words for which the relative frequency should be counted.\n\ \n" 166_corpname_freqdist: in: query name: corpname schema: type: string required: true description: Corpus name. To query your own corpus (e.g. username john, corpus mycorpus), `use` value `user/john/mycorpus`. example: preloaded/trends_et_3 167_wordlist: in: query name: wordlist description: 'A wordlist of words for which the relative frequency should be counted. No exact example is here because it is already set in `JSON` parameter.\Example: [''the'',''a'',''lion''].' schema: type: string 01_corpus_id: name: corpusId in: path description: Numeric corpus ID. For corpora querying. required: true schema: type: integer 02_document_id: name: documentId in: path description: Document ID. For document querying. required: true schema: type: integer 03_template_id: name: templateId in: path description: 'Numerical template ID, but preloaded templates do not have ID but you can query them by their name. Example: `UNIVERSAL_3`.' required: true schema: type: string 05_logname: name: logName in: path description: Name of log file. Name 'last.log' show the newest log for that corpus. example: last.log required: true schema: type: string 06_fileset_id: name: filesetId in: path description: ID of file subdirectories. If sets to 0 it will return top-level folder of documents, so if you have a web corpora with folders web1 and web2 it will return web1. required: true schema: type: integer 07_fileset_id_query: name: fileset_id in: query description: ID of file subdirectory. **0** stands for default document directory with name `upload`. schema: type: integer 08_somefile_id: name: somefileId in: path description: Alphanumeric multilanguage file ID required: true schema: type: string 09_format: name: format in: query description: File format in which the corpus should be downloaded. Just three formats are supported. required: true schema: type: string example: txt, vert, tmx 10_file_structure: name: file_structure in: query description: 'The contents of each file will be enclosed in a XML like structure of the specified name with the filename as its id attribute and the URL (if available) as the url attribute. If empty document boundaries will be lost. Example: `doc`.' schema: type: string 11_aligned: name: aligned in: query description: Required when you want to download parallel corpora, **when format == tmx.** Specify aligned corpus name. schema: type: string 18_wait_with_tagging: name: wait_with_tagging in: query description: Delay tagging by given number of `seconds`. schema: type: integer 19_compile_when_finished: name: compile_when_finished in: query description: Start corpus compiling after web-crawler finishes downloading content from the internet. schema: type: integer schemas: 01_corp_info: type: object properties: wposlist: description: A list of WPOS (`Word Part Of Speech`). Presented as a pair of WPOS names and regular expression matching the WPOS tags. type: array items: type: array items: type: string example: >- ["adjective","J.*"] description: At [tagsets](https://www.sketchengine.eu/tagsets/) you can find the meaning of used POS tags for 55 languages. Make sure you select the correct language. lposlist: description: A list of LPOS (`Lemma Part Of Speech`). Presented as a pair of LPOS names and LPOS tags. Used in Concordance forms. type: array items: type: array items: type: string example: >- ["adjective","-j"] description: Other examples can be [ "adverb", "-a" ], [ "conjunction", "-c" ], [ "noun", "-n" ] etc. To see all pairs of LPOSLIST you execute the endpoint via the `Try it Out button`. wsposlist: description: Has the same format as LPOSLIST but WSPOSLIST is used in Word Sketch and Thesaurus forms. type: array items: type: array items: type: string example: >- ["adjective","-j"] description: Same as in LPOSLIST. To see all pairs of WSPOSLIST you execute the endpoint via the `Try it Out button`.' attributes: description: A list of objects containing detailed information about attributes occuring in specified corpora. type: array items: type: object properties: name: type: string example: lempos_lc description: Name of attribute. Lempos_lc = Lemma part of speech lowercase. id_range: type: integer example: 524493 description: The number of attributes of the given name in the corpus. Each is counted only once even if it appears in corpus many times. label: type: string example: lempos(lowercase) description: An extra description. dynamic: type: string example: utf8lowercase description: Represents the rule according which the attribute should be derived from the original attribute. The attribute `lempos_lc` is derived from `lempos` to save disk space etc. [Read more](https://www.sketchengine.eu/documentation/corpus-configuration-file-all-features/#Dynamicattributes). fromattr: type: string example: lempos description: A name of attribute this attribute is derived from. Empty string if the attribute is not derived from any. structs: description: A list of structures in the corpus. type: array items: type: string name: type: string example: British National Corpus (BNC) description: The full name of the corpus. lang: type: string example: English description: The language of the corpus. infohref: type: string example: https://www.sketchengine.eu/british-national-corpus/ description: An URL with more information about the corpus. Empty string if none. info: type: string example: "A balanced English corpus of samples of a written and spoken language\ \ of British English from the later part of the 20th century (1969\u2013\ 1994). The spoken part is accompanied by audio recordings." description: More information about the corpus. encoding: type: string example: UTF-8 description: The used character encoding in the corpus. tagsetdoc: type: string example: https://www.sketchengine.eu/english-treetagger-pipeline-2/ description: An URL with more information about the POS tagger used in the corpus. Information like meanings of POS tags, comparition with other tagsets for specified language etc. defaultattr: type: string example: lc description: The default attribute for the corpus. Usually `word` or `lc`. starattr: type: string unicameral: type: boolean example: false description: A boolean value indicating if the corpus is unicameral (not distinguishing between upper and lower case). righttoleft: type: boolean example: false description: The order of writing in language of the used corpus. errsetdoc: type: string wsattr: type: string example: lempos_lc description: The attribute name for which word sketches are computed, e.g. `lempos`. wsdef: type: string example: /corpora/wsdef/english-penn_tt-3.1.wsdef.m4 description: A path to the `used` word sketches grammar definition file. termdef: type: string example: /corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4 description: A path to the term grammar definition file used in the corpus. diachronic: description: A list of diachronic subcorporas. Diachronic corpus is corpus with timestamps to watch development of the language in time. type: array items: type: string example: bncdoc.year aligned: description: A list of aligned corpora names. `Example used here is from different corpus because BNC corpus is not parallel.` type: array items: type: array items: type: string example: a_czech description: Just the corpname of the aligned corpora. Parallel corpora support just two languages (corpora). aligned_details: description: Is shown only if the specified corpus is parallel. A list of dictionaries containing detailed information about each aligned subcorpus. type: array items: type: object properties: name: type: string example: Example_1 description: The name of the aligned corpus. language_name: type: string example: Czech description: The language of the aligned corpus. Wposlist: type: array items: type: object properties: n: type: string example: noun description: The name of the part of speech category v: type: string example: k1.* description: A regex matching the category Lposlist: type: array items: type: object properties: n: type: string example: noun description: Name of part of speech category v: type: string example: -n description: The shortcut for better representation of a speech category. has_case: type: boolean description: Represent if the language of aligned corpus differentiates between upper case and lower case. has_lemma: type: boolean description: Represent if the language of aligned corpus has lemma of not. tagsetdoc: type: string example: https://www.sketchengine.eu/tagset-reference-for-czech description: URL with closer information. freqttattrs: description: A list of attributes (text types) that will be used for Frequency. Text types are metadata attached to the corpus structures. You can access it via Sketch Engine dashboard -> Corpus Info -> Text Type Analysis. type: array items: type: string example: '["bncdoc.alltyp", "bncdoc.alltim", "bncdoc.author", "bncdoc.wripp", "bncdoc.sporeg", "bncdoc.scgdom", "bncdoc.wridom", "bncdoc.spolog", "event.desc", "bncdoc.wrimed", "bncdoc.year", "bncdoc.genre"]' subcorpattrs: description: A list of subcorpus attributes for the corpus. type: array items: type: string example: '["bncdoc.alltyp", "bncdoc.alltim", "bncdoc.author", "bncdoc.wripp", "bncdoc.sporeg", "bncdoc.scgdom", "bncdoc.wridom", "bncdoc.spolog", "event.desc", "bncdoc.wrimed", "bncdoc.year", "bncdoc.genre"]' shortref: type: string example: =bncdoc.alltyp description: The attribute of a structure to display as a default reference in the left-hand column of a concordance. The syntax is like `=structure.attribute`, e.g. `=doc.id` for displaying only the value of `doc.id`. docstructure: type: string example: bncdoc description: A structures that is considered as individual documents. Usually `doc`. newversion: type: string example: '' description: Information about the new version of the corpus, if available. Empty string if not. structures: description: A list of structures appearing in corpus. type: array items: type: object properties: name: type: string example: head description: The name of the structure. label: type: string description: Just some extra information. Empty string if none. attributes: description: More detailed information. type: array items: type: object properties: name: type: string example: rend description: Name of attribute. label: type: string example: '' description: An extra information about attribute. Empty string if none. dynamic: type: string description: Dynamic (derived) attribute. Empty string if none. [Read more](https://www.sketchengine.eu/documentation/corpus-configuration-file-all-features/#Dynamicattributes) fromattr: type: string description: A name of attribute this attribute is derived from. Empty string if none. size: type: integer example: 5 description: Number of occurences. size: type: integer example: 14868944 description: Number of occurences of `head` structure in this case. is_error_corpus: type: boolean example: false description: A boolean value indicating if the corpus is an error corpus (not compiled etc.). structctx: type: string example: '' description: The structural context for the corpus. Empty string if none. deffilerlink: type: boolean example: false description: A boolean value indicating if the default filter link is enabled. defaultstructs: description: A list of default structures for the corpus. type: array items: type: string wsttattrs: type: string description: The text types for which the word highlights [Read more](https://www.sketchengine.eu/find-x-word-highlights/) are computed. terms_compiled: type: boolean example: true description: A boolean value indicating if the terms file is compiled. compiled: type: string example: 06/30/2017 07:34:25 description: A date of compilation in format `mm/dd/yyyy hh:mm:ss`. gramrels: type: array items: type: string sizes: type: object properties: tokencount: type: string example: '112345722' wordcount: type: string example: '96134547' doccount: type: string example: '4054' description: Document counter. parcount: type: string example: '1514906' description: Paragraph counter. sentcount: type: string example: '6052190' description: Sentences counter. normsum: type: string example: '96134547' description: Wordcount after normalization. alsizes: description: A list of tuples containing sizes of aligned corpora. type: array items: type: string registry_dump: type: string description: The registry dump for the corpus (detailed information about corpus setting), if the registry parameter is set. registry_text: type: string description: The registry text for the corpus (detailed information about corpus setting), if the registry parameter is set. subcorpora: type: array items: type: object properties: n: type: string example: Test description: A name of subcorpus. name: type: string example: Test description: A name of subcorpus. user: type: integer example: 1 description: Represent if subcorpora is created by user or not. tokens: type: integer example: 271454 description: Number of tokens in subcorpus. relsize: type: number example: 0.24162379765559744 description: The percentage of subcorpus size from total corpus size. words: type: integer example: 232283 description: Number of words in subcorpus. struct: type: string example: s query: type: string example: Q:q[lc="dog" | lemma_lc="dog"] api_version: type: string example: 5.62.3 description: Current API version. manatee_version: type: string example: 2.36.7-SkE-2.219.2 description: Current version of Manatee. request: description: Just summary section of parsed query parameters used in this endpoint call. These parameters are all documented in the beggining of every endpoint box (after you unwrap the endpoint). type: object properties: subcorpora: type: string example: '1' struct_attr_stats: type: string example: '1' corpname: type: string example: preloaded/bnc2_tt21 02_wordlist: type: object properties: new_maxitems: type: integer example: 20000 wllimit: type: integer example: 1000 description: Word list limit, amount of words to be display. lastpage: type: integer example: 0 note: type: string example: You are allowed to see only 1000 items. description: An additional note to displayed results. total: type: integer example: 165953 description: Number if displayed items (word - frequency). totalfrq: type: integer example: 111680004 description: Sum of all frequencies. items: description: A result list. type: array items: type: object properties: str: type: string example: the description: The word to which the frequency has been calculated. frq: type: integer example: 6054939 description: The word frequency. relfreq: type: number example: 53895.59026 description: The relative word frequency. Relative frequency is a way of expressing how often something happens compared to other events or items in a given group. wlattr_label: type: string example: word(lowercase) frtp: type: string example: frequency description: Frequency type. Other possible values can be `average reduced frequency`, `document frequency`, `score`. api_version: type: string example: 5.62.3 manatee_version: type: string example: 2.36.7-SkE-2.219.2 request: description: Just summary section of parsed query parameters used in this endpoint call. These parameters are all documented in the beggining of every endpoint box. type: object properties: wlminfreq: type: string example: '5' random: type: string example: '0' include_nonwords: type: string example: '1' wltype: type: string example: simple wlmaxitems: type: string example: '20000' wlsort: type: string example: frq wlicase: type: string example: '1' wlpage: type: string example: '1' reldocf: type: string example: '1' wlpat: type: string example: .* relfreq: type: string example: '1' wlattr: type: string example: lc wlmaxfreq: type: string example: '0' corpname: type: string example: preloaded/bnc2_tt21 06_concordance: type: object properties: Lines: type: array items: type: object properties: toknum: type: integer example: 10336 hitlen: type: integer example: 1 Refs: type: array items: type: string example: Written books and periodicals Tbl_refs: type: array items: type: string example: Written books and periodicals Left: type: array items: type: object properties: strc: type: string example: <s> Kwic: type: array items: type: object properties: str: type: string example: dogs coll: type: integer example: 1 Right: type: array items: type: object properties: str: type: string example: </s> Links: type: array items: type: string linegroup: type: string example: _ linegroup_id: type: integer example: 0 fromp: type: integer example: 1 concsize: type: integer example: 12087 concordance_size_limit: type: integer example: 10000 Sort_idx: type: array items: type: string righttoleft: type: boolean example: false Aligned_rtl: type: array items: type: string numofcolls: type: integer example: 0 finished: type: integer example: 1 fullsize: type: integer example: 12087 relsize: type: number example: 107.59 q: type: array items: type: string example: q[lc=\"dog\" | lemma_lc=\"dog\"] Desc: type: object properties: op: type: string example: Query arg: type: string example: '[lc=\"dog\" | lemma_lc=\"dog\"]' nicearg: type: string example: dog rel: type: number example: 107.59 size: type: integer example: 12087 tourl: type: string example: q=q%5Blc%3D%22dog%22+%7C+lemma_lc%3D%22dog%22%5D port: type: integer example: 0 gdex_scores: type: array items: type: string sc_strcts: type: array items: type: array items: type: string example: bncdoc api_version: type: string example: 5.63.1 manatee_version: type: string example: 2.36.7-SkE-2.221 request: type: object properties: concordance_query: type: array items: type: object properties: queryselector: type: string example: iqueryrow iquery: type: string example: dog corpname: type: string example: preloaded/bnc2_tt21 kwicleftctx: type: string example: 100# structs: type: string example: s,g viewmode: type: string example: sen attr_allpos: type: string example: all fromp: type: string example: '1' json: type: string example: '{\"concordance_query\":[{\"queryselector\":\"iqueryrow\",\"iquery\":\"dog\"}]}' kwicrightctx: type: string example: 100# refs: type: string example: =bncdoc.alltyp cup_hl: type: string example: q attrs: type: string example: word pagesize: type: string example: '20' 07_subcorp: type: object properties: subcname: type: string example: Australian domain .au SubcorpList: type: array items: type: object properties: n: type: string example: Australian domain .au name: type: string example: Australian domain .au user: type: integer example: 0 api_version: type: string example: 5.63.1 manatee_version: type: string example: 2.36.7-SkE-2.221 request: type: object properties: corpname: type: string example: preloaded/ententen13_tt2_1 08_extract_keywords: type: object properties: keywords: type: array items: type: object properties: item: type: string example: "galsk\xFD" score: type: number example: 2411.16 frq1: type: integer example: 2 frq2: type: integer example: 512 rel_frq1: type: number example: 4291.8457 rel_frq2: type: number example: 0.78041 query: type: string example: "[lemma=\\\"galsk\xFD\\\"]" referece_corpus_name: type: string example: Slovak Web 2011 (skTenTen11) reference_corpus_size: type: integer example: 656067998 reference_subcorpus_size: type: integer example: 656067998 subcorpus_size: type: integer example: 466 corpus_size: type: integer example: 466 total: type: integer example: 175 totalfrq1: type: integer example: 466 totalfrq2: type: integer example: 250525622 wllimit: type: integer example: 1000 note: type: string example: '' api_version: type: string example: 5.63.1 manatee_version: type: string example: 2.36.7-SkE-2.221 request: type: object properties: alnum: type: string example: '1' maxfreq: type: string example: '0' minfreq: type: string example: '1' wlpat: type: string example: .* attr: type: string example: lemma keywords: type: string example: '1' ref_corpname: type: string example: preloaded/sktenten11_rft1 simple_n: type: string example: '1' k_attr: type: string example: lemma include_nonwords: type: string example: '0' reldocf: type: string example: :"0" icase: type: string example: '1' onealpha: type: string example: '1' max_keywords: type: string example: '1000' corpname: type: string example: user/matuskostka1/aaaaa_slovak 09_attr_vals: type: object properties: query: type: string example: .* description: The regular expression from query parameter `avpat`. suggestions: type: array items: type: string example: "[ \"Cookson, Neil Andrew\", \u2026 ]" description: Suggestions for avattr `bncdoc.author`. no_more_values: type: boolean example: false description: Represent if the `suggestion` list is complete. api_version: type: string example: 5.63.1 manatee_version: type: string example: 2.36.7-SkE-2.221 request: type: object properties: avpat: type: string example: .* avmaxitems: type: string example: '15' ajax: type: string example: '1' corpname: type: string example: preloaded/bnc2_tt21 avfrom: type: string example: '0' icase: type: string example: '1' avattr: type: string example: bncdoc.author 10_collx: type: object properties: Head: type: array items: type: object properties: n: type: string example: Cooccurrence count s: type: string example: f style: type: string example: ' style="word-wrap: break-word; width: 5em;"' Items: type: array items: type: object properties: str: type: string example: Belvin freq: type: integer example: 7 coll_freq: type: integer example: 5 Stats: type: array items: type: object properties: s: type: string example: '2.64537' n: type: string example: t pfilter: type: string example: q=P-5+5+1+%5Bword%3D%22Belvin%22%5D nfilter: type: string example: q=N-5+5+1+%5Bword%3D%22Belvin%22%5D lastpage: type: integer example: 0 wllimit: type: integer example: 1000 concsize: type: integer example: 22685 Desc: type: array items: type: object properties: op: type: string example: Query arg: type: string example: '[lemma="test"]' nicearg: type: string example: test rel: type: number example: 202.22 size: type: integer example: 22685 tourl: type: string example: q=q%5Blemma%3D%22test%22%5D api_version: type: string example: 5.63.1 manatee_version: type: string example: 2.36.7-SkE-2.223.6 request: description: Just summary section of parsed query parameters used in this endpoint call. These parameters are all documented in the beggining of every endpoint box (after you unwrap the endpoint). type: object properties: csortfn: type: string example: m corpname: type: string example: preloaded/bnc2 q: type: string example: q[lemma="test"] 11_freqml: type: object properties: fcrit: type: string example: fcrit=word%2Fe+-1%3C0+lemma%2Fe+-1%3C0 FCrit: type: array items: type: object properties: fcrit: type: string example: word/e -1<0 lemma/e -1<0 Blocks: type: array items: type: object properties: Head: type: array items: type: object properties: n: type: string example: word s: type: integer example: 0 id: type: string example: word/e total: type: integer example: 1700 totalfrq: type: integer example: 12087 Items: type: array items: type: object properties: Word: type: array items: type: object properties: n: type: string example: the frq: type: integer example: 2621 rel: type: integer example: 0 reltt: type: integer example: 0 norm: type: integer example: 0 fbar: type: integer example: 301 relbar: type: integer example: 0 freqbar: type: integer example: 0 pfilter: type: string example: ;q=p-1%3C0+-1%3C0+0+%5Bword%3D%22the%22%5D;q=p-1%3C0+-1%3C0+0+%5Blemma%3D%22the%22%5D nfilter: type: string pfilter_list: type: array items: type: array items: type: string example: p-1<0 -1<0 0 [word="the"] poc: type: number example: 21.684454372466284 fpm: type: number example: 23.329771292938062 paging: type: integer example: 1 concsize: type: integer example: 12087 fullsize: type: integer example: 14297 Desc: type: array items: type: object properties: op: type: string example: Query arg: type: string example: '[lc="dog" | lemma_lc="dog"]' nicearg: type: string example: dog rel: type: number example: 107.59 size: type: integer example: 12087 tourl: type: string example: q=q%5Blc%3D%22dog%22+%7C+lemma_lc%3D%22dog%22%5D numofcolls: type: integer example: 0 hitlen: type: integer example: 1 wllimit: type: integer example: 1000 lastpage: type: integer example: 0 ml: type: boolean example: true api_version: type: string example: 5.63.12 manatee_version: type: string example: 2.36.7-SkE-2.223.6 request: description: Just summary section of parsed query parameters used in this endpoint call. These parameters are all documented in the beggining of every endpoint box. type: object properties: concordance_query: type: array items: type: object properties: queryselector: type: string example: iqueryrow iquery: type: string example: dog format: type: string example: json fpage: type: string example: '1' showpoc: type: string example: '1' freqlevel: type: string example: '2' group: type: string example: '1' freq_sort: type: string example: freq ml1ctx: type: string example: -1<0 showreltt: type: string example: '1' ml2attr: type: string example: lemma ml1attr: type: string example: word ml2ctx: type: string example: -1<0 fmaxitems: type: string example: '5000' corpname: type: string example: preloaded/bnc2_tt21 showrel: type: string example: '1' 12_struct_wordlist: type: object properties: fcrit: type: string example: fcrit=lemma%2Fe+0+word%2Fe+0+lempos%2Fe+0 FCrit: type: array items: type: object properties: fcrit: type: string example: lemma/e 0 word/e 0 lempos/e 0 Blocks: type: array items: type: object properties: Head: type: array items: type: object properties: n: type: string example: lemma s: type: integer example: 0 id: type: string example: lemma/e total: type: integer example: 77 totalfrq: type: integer example: 13931 Items: type: array items: type: object properties: Word: type: array items: type: object properties: n: type: string example: dog frq: type: integer example: 6829 rel: type: integer example: 0 reltt: type: integer example: 0 norm: type: integer example: 0 fbar: type: integer example: 301 relbar: type: integer example: 0 freqbar: type: integer example: 0 pfilter: type: string example: ;q=p0+0+0+%5Blemma%3D%22dog%22%5D;q=p0+0+0+%5Bword%3D%22dog%22%5D;q=p0+0+0+%5Blempos%3D%22dog-n%22%5D nfilter: type: string pfilter_list: type: array items: type: array items: type: string example: p0 0 0 [lemma="dog"] poc: type: number example: 47.765265440302166 fpm: type: number example: 60.78558113676994 paging: type: integer example: 1 concsize: type: integer example: 14297 fullsize: type: integer example: 14297 Desc: type: array items: type: object properties: op: type: string example: Query arg: type: string example: '[lemma_lc="(dog.*)"]' nicearg: type: string example: (dog.*) rel: type: number example: 127.26 size: type: integer example: 14297 tourl: type: string example: q=q%5Blemma_lc%3D%22%28dog.%2A%29%22%5D numofcolls: type: integer example: 0 hitlen: type: integer example: 1 wllimit: type: integer example: 1000 lastpage: type: integer example: 1 ml: type: boolean example: true api_version: type: string example: 5.63.12 manatee_version: type: string example: 2.36.7-SkE-2.223.6 request: description: Just summary section of parsed query parameters used in this endpoint call. These parameters are all documented in the beggining of every endpoint box. type: object properties: wlmaxfreq: type: string example: '0' wlpage: type: string example: '1' random: type: string example: '0' wlstruct_attr1: type: string example: lemma wltype: type: string example: struct_wordlist fmaxitems: type: string example: '20000' wlpat: type: string example: (dog.*) wlnums: type: string example: frq wlattr: type: string example: lemma_lc wlicase: type: string example: '1' wlmaxitems: type: string example: '20000' wlstruct_attr3: type: string example: lempos relfreq: type: string example: '1' include_nonwords: type: string example: '1' wlsort: type: string example: frq wlstruct_attr2: type: string example: word corpname: type: string example: preloaded/bnc2_tt21 reldocf: type: string example: '1' wlminfreq: type: string example: '5' 13_freq_distrib: type: object properties: dots: type: array items: type: object properties: frq: type: integer example: 64 description: '' pos: type: integer example: 0 description: '' beg: type: integer example: 72053 description: '' end: type: integer example: 2475660 description: '' granularity: type: integer example: 50 description: '' api_version: type: string example: 5.63.12 manatee_version: type: string example: 2.36-7-SkE-2.223.6 request: type: object properties: concordance_query: type: array items: type: object properties: queryselector: type: string example: lemmarow lemma: type: string example: cat lpos: type: string example: -n qmcase: type: integer example: 0 structs: type: string example: s,g fc_lemword_type: type: string example: all attrs: type: string example: word json: type: string example: '{"concordance_query":[{"queryselector":"lemmarow","lemma":"cat","lpos":"-n","qmcase":false}]}' res: type: string example: '50' fc_lemword_window_type: type: string example: both normalize: type: string example: '0' format: type: string example: json attr_allpos: type: string example: all fc_pos_type: type: string example: all fc_pos_wsize: type: string example: '5' refs: type: string example: =bncdoc.alltyp viewmode: type: string example: sen lpos: type: string example: -n corpname: type: string example: preloaded/bnc2_tt21 default_attr: type: string example: lemma fc_lemword_wsize: type: string example: '5' fc_pos_window_type: type: string example: both 14_fullref: type: object properties: Refs: type: array items: type: object properties: name: type: string example: Token number id: type: string example: '#' val: type: string example: '6270887' bncdoc_id: type: string example: J1C bncdoc_author: type: string example: ===NONE=== bncdoc_year: type: string example: ===NONE=== bncdoc_title: type: string example: '[Leeds United e-mail list]' bncdoc_info: type: string example: '[Leeds United e-mail list]. Sample containing about 41810 words of unpublished miscellanea (domain: leisure)' bncdoc_allava: type: string example: Ownership has not been claimed bncdoc_alltim: type: string example: 1985-1993 bncdoc_alltyp: type: string example: Written miscellaneous bncdoc_genre: type: string example: W_email u_who: type: string example: '' s_audio: type: string example: ===NONE=== api_version: type: string example: 5.63.12 manatee_version: type: string example: 2.36.7-SkE-2.223.6 request: type: object properties: corpname: type: string example: preloaded/bnc2_tt21 pos: type: string example: '6270887' 15_textypes_with_norms: type: object properties: Blocks: type: array items: type: object properties: Line: type: array items: type: object properties: name: type: string example: bncdoc.alltyp label: type: string example: Text type attr_doc: type: string example: '' attr_doc_label: type: string example: '' Values: type: array items: type: object properties: v: type: string example: Spoken context-governed xcnt: type: integer example: 757 Normlist: type: array items: type: object properties: n: type: string example: freq label: type: string example: Document counts api_version: type: string example: 5.63.12 manatee_version: type: string example: 2.36.7-SkE-2.223.6 request: type: object properties: corpname: type: string example: preloaded/bnc2_tt21 16_widectx: type: object properties: wrapdetail: type: string example: <p> deletewrap: type: boolean example: true content: type: array items: type: object properties: str: type: string example: ', in Hungarian also means "majority of", "its belongings", "its goods", "its best portion", a type of pork, and may also be incorrectly identified as an agglutination of a frequent abbreviation in mailing lists.' class: type: string example: '' leftlink: type: string example: pos=831745;detail_left_ctx=110;detail_right_ctx=50 rightlink: type: string example: pos=831745;detail_left_ctx=50;detail_right_ctx=110 pos: type: integer example: 831745 maxcontent: type: integer example: 200 api_version: type: string example: 5.63.12 manatee_version: type: string example: 2.36.7-SkE-2.223.6 request: type: object properties: detail_left_ctx: type: string example: '50' corpname: type: string example: preloaded/ententen21_tt31 hitlen: type: string example: '1' pos: type: string example: '831745' detail_right_ctx: type: string example: '50' structs: type: string example: s,g 17_subcorpus_rename: type: object properties: status: type: string example: OK corpus: type: string example: preloaded/bnc2_tt31 subcorp_id2name: type: object properties: test: type: string example: test_2 api_version: type: string example: 5.66.5 manatee_version: type: string example: 2.36.7-SkE-2.225.6 request: type: object properties: subcorp_id: type: string example: test new_subcorp_name: type: string example: test_2 corpname: type: string example: preloaded/bnc2_tt31 18_subcorp_info: type: object properties: subcorp: type: string example: test_4 corpsize: type: integer example: 112338376 subcsize: type: integer example: 112338376 api_version: type: string example: 5.66.5 manatee_version: type: string example: 2.36.7-SkE-2.225.6 request: type: object properties: subcname: type: string example: test_2 corpname: type: string example: preloaded/bnc2_tt31 19_freqdist: type: object properties: lc: type: string example: the freqdist: type: object properties: 2021-11: description: The name is variable according selected period (wlattr). type: object properties: frq: type: integer example: 679638 rel_frq: type: number format: float example: 3022.91023457566 period_size: type: number format: float example: 224829038 removed_freqdist: type: object properties: 2023-08: description: The name is variable according selected period (wlattr). type: object properties: frq: type: integer example: 0 rel_frq: type: number format: float example: 0 period_size: type: number format: float example: 429451774.0 average_norm: type: number format: float example: 136780187.25 norm_limit: type: number format: float example: 6839009.362500001 01_corpora_request: description: Request for post method to set `name`, `language`, `tagset`, and additional information to the corpus. type: object properties: info: type: string description: The additional information for a newly created corpus. example: Example description of user corpus. language_id: type: string description: Language iso-code. `ISO 639-1`. example: en name: type: string description: Unique `corpus name` for a newly created corpus. example: Example corpus tagset_id: type: string description: Name of used tagset. example: TT_ENG_V3 02_compile_request: type: object properties: structures: type: string description: '`Structures` and `structure attributes` in corpus which should be compiled. Usually: `all`.' example: all 03_corpus_ids: type: object properties: corpus_ids: type: array items: type: integer description: A list of `Corpus ID` of multilingual corpora. example: - 842464 - 842463 04_align_req: type: object properties: alignstruct: type: string description: According to which structure the document should be aligned. Usually, `/<s>`. example: s auto: type: boolean description: True, when documents are not compiled. Sketch Engine will align them automatically. example: true corpus_ids: type: array items: type: integer description: A list of `Corpus ID` of multilingual corpus. ID's in example does not exist. example: - 842464 - 842463 05_empty_request: type: object description: 'In this documentation, an empty request is used mostly used with the `RPC style` method where the content in a request is not needed (in most cases). RPC style endpoints focus on `performing` one action right (procedures, command) easier than REST API-based endpoints. It is not as scalable as REST API style. RPC is mostly used with HTTP: GET (to fetch information) and POST (to everything else) in CA api is it used with POST HTTP method.' 07_corpus_update: type: object description: All possible paramaters that can be changed in user corpus. In corpus update `you don't have to use all parameters`, just the parameters you change. properties: expert_mode: type: boolean description: Set to `True` if you are hard-core. example: false name: type: string description: Corpus name. `Given by user`. example: Example corpus 2 info: type: string description: Additional info about corpus. example: Example description of user corpus 2 document_order: description: Can be set to enforce document order within the corpus. type: array items: type: integer lang_filter: type: boolean example: true structures: description: Available structures or tags in the corpus. Structures like `s` (sentence), `g` (glue), `doc` (document). type: array items: type: object properties: name: type: string description: 'Structure name. Example: `s`' attributes: description: A list of used attributes in corpus. type: array items: type: object properties: name: type: string description: The name of used attribute. file_structure: type: string description: The structure in which individual documents should be wrapped. Usually `doc`. example: doc onion_structure: type: string description: The structure for deduplication. Usually `p` (paragraph), `doc` or `Null` (no deduplication).' example: doc docstructure: type: string description: Structure in which individual documents should be wrapped. Usually `doc`. example: doc sketch_grammar_id: type: string description: "`Sketch grammar ID` (name of sketch grammar file). For sketch\ \ grammars querying. Sketch grammar is a series of rules written in the\ \ CQL query language that search for collocations in a text corpus and\ \ categorize them according to\_their grammatical relations. Example:\ \ `preloaded/english-penn_tt-3.3.wsdef.m4`." example: preloaded/english-penn_tt-3.3.wsdef.m4 term_grammar_id: type: string description: '`Term grammar ID` (name of term grammar file). Term grammar tells Sketch Engine which words and phrases should indentify as terms. Example: `/corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4`.' example: preloaded/english-penn_tt-terms-3.1.termdef.m4 09_doc_put_req: type: object properties: filename_display: type: string description: Name of documents. id: type: integer description: Unique numeric `document ID` to identify individual documents. inProgress: type: boolean description: Represents whether the currently edited document is in use. isArchive: type: boolean description: Represents if the updated document is in a format like .zip (created via some archive manager). metadata: type: object description: Metadata of document. For example, additional `attributes and values`. parameters: type: object description: Parameters for plaintext extraction. properties: encoding: type: string description: Encoding standard of the document. Usually, `UTF-8`. justext_stoplist: type: string description: Represent the list of unimportant words, in a specified language, from an NLP point of view. permutation: type: array items: type: integer description: Changing the order of columns (applies only to `type=vert`). tmx_lang: type: string description: TMX (translation memory exchange). Language of document used for parallel corpus creation. tmx_struct: type: string description: Alignment structure to be used for multilingual documents, `align` is the most used structure. Used within segment distinction, which sentence is in which language and to put sentences with the same meaning into one segment. tmx_untranslated: type: string description: Placeholder for empty segments in multilingual documents. The segments which have no counterpart in a second language of parallel corpus. type: type: string description: File format (.csv, .doc, .docx, .htm, .html etc.). unlegalese: type: boolean description: Convert `all-caps` text to `normal case`. temporary: type: boolean description: Is document temporary or not. word_count: type: integer description: Total number of `words` (tokens minus punctuation etc.) in document. vertical_progress: type: integer description: Progress of `vertical file` creation. vertical_error: type: string description: An error occured while creating the vertical file. If the creation was succesfull the value is `Null`. 10_doc_preview: type: object properties: auto_paragraphs: type: string description: Automatically insert paragraph breaks (`\<p>`) in place of blank lines. encoding: type: string description: Encoding standard of the document. Usually `UTF-8`. justext_stoplist: type: string description: Represent the list of unimportant words, in a specified language, from an NLP point of view. permutation: type: array items: type: integer description: Changing the order of columns (applies only to `type=vert`). tmx_lang: type: string description: TMX (translation memory exchange). Language of document used for parallel corpus creation. tmx_struct: type: string description: Alignment structure to be used for multilingual documents, `align` is the most used structure. Used within segment distinction, which sentence is in which language and to put sentences with the same meaning into one segment. tmx_untranslated: type: string description: Placeholder for empty segments in multilingual documents. The segments which have no counterpart in a second language of parallel corpus. type: type: string description: File format (`.csv`, `.doc`, `.docx`, `.htm`, `.html` etc.). unlegalese: type: boolean description: Convert `all-caps` text to `normal case`. 11_doc_metadata: type: array items: type: object properties: id: type: integer description: Unique numeric `document ID`. metadata: type: object description: Pairs of `attribute_name`:`value`. 13_filesets_creation: type: object properties: bl_max_total_kw: type: integer description: 'Stands for: `blacklist max total keyword`. Means that web page or document will be discarded if it contains more words from the denylist (blacklist) than this limit.' bl_max_unique_kw: type: integer description: 'Stands for: `blacklist max unique keyword`. Means that web page or document will be discarded if it contains more unique words from the denylist (blacklist) than this limit.' black_list: type: string description: A list (separated by whitespaces) of `blocked words`, words you don't want to see in your future corpus. input_type: type: string description: 'Input types the web-crawler will works with. Example: `urls`' max_cleaned_file_size: type: integer description: Web pages and documents with a size `over` this limit (`in kB`) will be ignored. max_file_size: type: integer description: Web pages and documents with a size `over` this limit (`in kB`) will be ignored. min_cleaned_file_size: type: integer description: Web pages and documents `smaller` than this limit (`in kB`) after cleaning will be ignored. Cleaning involves conversion to plain text, removing boilerplate text (e.g. navigation menus, legal text, disclaimers and other repetitive content). min_file_size: type: integer description: Web pages and documents with a `size below` this limit (`in kB`) will be ignored. name: type: integer description: Texts will be organized into a corpus folder `with this name`. seed_words: description: A list of words according to which the `URLs` were chosen to be searched. type: array items: type: string white_list: type: string description: A list (separated by whitespaces) of `allowed words`, words you want to see in your future corpus. wl_min_kw_ratio: type: integer description: 'Stands for: `whitelist minimal keywords ratio`. Means that web page or document will be included only if the `percentage` of allowlist words compared to total words is `higher` than this limit.' wl_min_total_kw: type: integer description: 'Stands for: `whitelist minimal total keywords`. Means that web page or document will be included only if it contains `more words` from the `allowlist` (whitelist) than this limit.' wl_min_unique_kw: type: integer description: 'Stands for: `whitelist minimal unique keywords`. Means that a web page or document will be included only if it contains `more words` from the `allowlist` (whitelist) than this limit.' 19_somefiles_put: type: object properties: corpora: type: object properties: guessed_language_code: type: object properties: language_id: type: string description: Language iso-code. `ISO 639-1`. name: type: string description: Language name in `English`. 03_corpora_list: type: object properties: id: description: Unique numeric `corpus ID` for corpus building. type: integer owner_id: description: Unique numeric `owner ID` (usually you). type: integer owner_name: description: Corpus `owner name` (usually you). type: string corpname: description: Unique `corpus name` for corpus querying. type: string language_id: description: Language iso-code. `ISO 639-1`. type: string language_name: type: string description: Language name in `English`. tagset_id: type: integer description: '`Tagset ID`. Tagset is list of part-of-speech tags (POS tags) for specified language. They are `preselected` to the most relevant one and can be changed only in user corpora. `Tagsets` can be refered also as `templates`.' sketch_grammar_id: type: string description: "`Sketch grammar ID`. Sketch grammar is a series of rules written\ \ in the CQL query language that search for collocations in a text corpus\ \ and categorize them according to\_their grammatical relations." term_grammar_id: type: string description: '`Term grammar ID`. Term grammar tells Sketch Engine which words and phrases should indentify as terms.' sizes: type: object description: Corpus sizes. `Null` if corpus is not compiled. properties: doccount: type: integer description: Total number of `documents` in corpus. parcount: type: integer description: Total number of `paragraphs` in corpus. sentcount: type: integer description: Total number of `sentences` in corpus. wordcount: type: integer description: Total number of `words` (tokens minus punctuation etc.) in corpus. tokencount: type: integer description: Total number of `tokens` in corpus. created: type: string description: Date and time of corpus creation in format `YYYY-MM-DD HH:MM:SS`. needs_recompiling: type: boolean description: '`True` if corpus documents have been altered since last compilation.' user_can_read: type: boolean description: Corpus can be queried a `specific user`. Ignore all corpora where this is false. user_can_refer: type: boolean description: Corpus can be used as a `reference corpus` even by anonymous users. user_can_upload: type: boolean description: Corpus is owned by you or shared with you. You can upload documents to it. user_can_manage: description: Corpus is owned by you or shared with you with `full privileges`. type: boolean is_shared: type: boolean description: True if corpus is shared with other users. new_version: type: string description: If set, the old corpus is deprecated in favor of a new one. name: type: string description: Corpus name. `Given by user.` info: type: string description: Additional info about corpus. aligned: description: List of other corpora (corpus ID) within the `same` multi-lingual set (parallel corpus). type: array items: type: string docstructure: type: string description: Structure in which individual documents should be wrapped. Usually `doc`. 04_corpora_single: type: object properties: id: description: Unique numeric `corpus ID` for corpus building. type: integer owner_id: description: Unique numeric `owner ID` (usually you). type: integer owner_name: description: Corpus `owner name` (usually you). type: string corpname: description: Unique `corpus name` for corpus querying. type: string language_id: description: Language iso-code. `ISO 639-1`. type: string language_name: description: Language name in `English`. type: string sketch_grammar_id: description: "`Sketch grammar ID` (name of sketch grammar file). Sketch\ \ grammar is a series of rules written in the CQL query language that\ \ search for collocations in a text corpus and categorize them according\ \ to\_their grammatical relations. Example: `preloaded/english-penn_tt-3.3.wsdef.m4`." type: string term_grammar_id: description: '`Term grammar ID` (name of term grammar file). Term grammar tells Sketch Engine which words and phrases should indentify as terms. Example: `/corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4`.' type: string sizes: description: Corpus sizes. `Null` if corpus is not compiled. type: object properties: doccount: type: integer description: Total number of `documents` in corpus. parcount: type: integer description: Total number of `paragraphs` in corpus. sentcount: type: integer description: Total number of `sentences` in corpus. wordcount: type: integer description: Total number of `words` (tokens minus punctuation etc.) in corpus. tokencount: type: integer description: Total number of `tokens` in corpus. created: type: string description: Date and time of corpus creation in format `YYYY-MM-DD HH:MM:SS`. needs_recompiling: description: True if corpus documents have been altered since last compilation. type: boolean user_can_read: type: boolean description: Corpus can be queried by a `specific user`. Ignore all corpora where this is false. user_can_refer: type: boolean description: Corpus can be used as a `reference corpus` even by anonymous users. user_can_upload: type: boolean description: Corpus is owned by you or shared with and you can upload documents to it. user_can_manage: description: Corpus is owned by you or shared with you with `full privileges`. type: boolean is_shared: type: boolean description: '`True` if corpus is shared with other users.' new_version: type: string description: If set, the old corpus is deprecated in favor of a new one. name: description: Corpus name. `Given by user`. type: string info: description: Additional info about corpus. type: string aligned: description: Other corpora within the `same` multi-lingual set (parallel corpus). type: array items: type: string docstructure: description: Structure in which individual documents should be wrapped. Usually `doc`. type: string is_error_corpus: description: Current state of corpus. type: boolean attrlist: description: 'Attributes appearing in corpus documents. Attributes like: `word`, `tag`, `lempos`, `pos`, `lemma`, etc.' type: array items: type: string tagset_id: description: Tagset ID. Tagset is list of part-of-speech tags (POS tags) for specified language. They are `preselected` to the most relevant one and can be changed only in user corpora. The terms `tagset` and `templates` are interchangeable. type: integer reference_corpus: description: Default reference corpus for `keyword extraction`. type: string progress: description: 'Compilation status: `0` if not compiled, `100` if compiled successfully, `-1` if failed, otherwise in progress.' type: integer error: description: Informs about last compilation error, if any error `None`. type: string document_count: description: The amount of documents the corpus was build from. type: integer can_be_upgraded: description: '`True` if corpus template is outdated and can be upgraded. The terms `tagset` and `templates` are interchangeable.' type: boolean available_structures: description: All `structures`/`attributes` that appear in corpus documents. type: array items: type: object properties: name: type: string description: Structure name. freq: type: integer description: Frequency of structure. attributes: type: array items: type: string description: List of used attributes. file_structure: description: The structure in which individual documents should be wrapped. Usually `doc`. type: string onion_structure: description: The structure for deduplication. Usually `p` (paragraph) or `Null` (no deduplication).' type: string expert_mode: description: Set to `True` if you are hard-core. type: boolean document_order: description: Not mandatory. Can be set to enforce document order within the corpus. type: array items: type: integer use_all_structure: description: Use `all` structures available in corpus. type: boolean structures: description: Available `structures` or `tags` in the corpus. Structures like `s` (sentence), `g` (glue), `doc` (document). type: array items: type: object properties: name: type: string description: Structure name. attributes: description: A list of used attributes in corpus. type: array items: type: object properties: name: type: string description: The name of used attribute. 05_can_be_compiled: type: object properties: result: type: object properties: can_be_compiled: type: boolean description: True, if the corpus does not contain any potential error, which can break compilation. reason: type: string description: 'Description of problem why it cannot be compiled. If none Null. Example: `QUOTA_EXCEEDED` or `EMPTY`.' error: type: string description: Unexpected server error. If none Null. 06_get_progress: type: object properties: result: type: object properties: progress: type: integer description: 'Compilation status: `0` if not compiled, `100` if compiled successfully, `-1` if failed, otherwise in progress.' error: type: string example: "" description: Problem description. If none Null. error: type: string example: "" description: Unexpected server error. If none Null. 12_corpora_single_full: type: object properties: id: description: Unique numeric `corpus ID` for corpus building. type: integer owner_id: description: Unique numeric `owner ID` (usually you). type: integer owner_name: description: Corpus `owner name` (usually you). type: string corpname: description: Unique `corpus name` for corpus querying. type: string language_id: description: Language iso-code. `ISO 639-1`. type: string language_name: description: Language name in `English`. type: string sketch_grammar_id: description: "`Sketch grammar ID` (name of sketch grammar file). Sketch\ \ grammar is a series of rules written in the CQL query language that\ \ search for collocations in a text corpus and categorize them according\ \ to\_their grammatical relations. Example: `preloaded/english-penn_tt-3.3.wsdef.m4`." type: string term_grammar_id: description: '`Term grammar ID` (name of term grammar file). Term grammar tells Sketch Engine which words and phrases should indentify as terms. Example: `/corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4`.' type: string sizes: description: Corpus sizes. `Null` if corpus is not compiled. type: object properties: doccount: type: integer description: Total number of `documents` in corpus. parcount: type: integer description: Total number of `paragraphs` in corpus. sentcount: type: integer description: Total number of `sentences` in corpus. wordcount: type: integer description: Total number of `words` (tokens minus punctuation etc.) in corpus. tokencount: type: integer description: Total number of `tokens` in corpus. is_sgdev: type: boolean description: TODO is_featured: type: boolean description: TOOD access_level: type: boolean description: TODO access_on_demand: type: boolean description: TODO terms_of_use: type: string description: TODO sort_to_end: type: boolean description: TODO tags: type: array items: type: string description: TODO created: type: string description: Date and time of corpus creation in format `YYYY-MM-DD HH:MM:SS`. needs_recompiling: description: True if corpus documents have been altered since last compilation. type: boolean user_can_read: type: boolean description: Corpus can be queried by a `specific user`. Ignore all corpora where this is false. user_can_refer: type: boolean description: Corpus can be used as a `reference corpus` even by anonymous users. user_can_upload: type: boolean description: Corpus is owned by you or shared with and you can upload documents to it. user_can_manage: description: Corpus is owned by you or shared with you with `full privileges`. type: boolean is_shared: type: boolean description: '`True` if corpus is shared with other users.' new_version: type: string description: If set, the old corpus is deprecated in favor of a new one. name: description: Corpus name. `Given by user`. type: string info: description: Additional info about corpus. type: string wsdef: description: 'Default word sketch definition. Example: `/corpora/wsdef/serbian-multext-rft1-1.0.wsdef.txt`.' type: string termdef: description: Default term definition. type: string diachronic: description: Is this corpus developing over time to keep track in vocabulary changes, grammar and language usage. If yes what time period does the corpus cover. type: string aligned: description: Other corpora within the `same` multi-lingual set (parallel corpus). type: array items: type: string docstructure: description: Structure in which individual documents should be wrapped. Usually `doc`. type: string is_error_corpus: description: Current state of corpus. type: boolean attrlist: description: 'Attributes appearing in corpus documents. Attributes like: `word`, `tag`, `lempos`, `pos`, `lemma`, etc.' type: array items: type: string tagset_id: description: Tagset ID. Tagset is list of part-of-speech tags (POS tags) for specified language. They are `preselected` to the most relevant one and can be changed only in user corpora. The terms `tagset` and `templates` are interchangeable. type: integer reference_corpus: description: Default reference corpus for `keyword extraction`. type: string progress: description: 'Compilation status: `0` if not compiled, `100` if compiled successfully, `-1` if failed, otherwise in progress.' type: integer error: description: Informs about last compilation error, if any error `None`. type: string document_count: description: The amount of documents the corpus was build from. type: integer can_be_upgraded: description: '`True` if corpus template is outdated and can be upgraded. The terms `tagset` and `templates` are interchangeable.' type: boolean available_structures: description: All `structures`/`attributes` that appear in corpus documents. type: array items: type: object properties: name: type: string description: Structure name. freq: type: integer description: Frequency of structure. attributes: type: array items: type: string description: List of used attributes. file_structure: description: The structure in which individual documents should be wrapped. Usually `doc`. type: string onion_structure: description: The structure for deduplication. Usually `p` (paragraph) or `Null` (no deduplication).' type: string expert_mode: description: Set to `True` if you are hard-core. type: boolean document_order: description: Not mandatory. Can be set to enforce document order within the corpus. type: array items: type: integer use_all_structure: description: Use `all` structures available in corpus. type: boolean structures: description: Available `structures` or `tags` in the corpus. Structures like `s` (sentence), `g` (glue), `doc` (document). type: array items: type: object properties: name: type: string description: Structure name. attributes: description: A list of used attributes in corpus. type: array items: type: object properties: name: type: string description: The name of used attribute. 13_documents_get: type: array items: type: object properties: id: type: integer description: Unique numeric `document ID` to identify individual documents from which the corpus was created. filename_display: type: string description: The name of the document. parameters: description: Parameters for plaintext extraction. type: object properties: type: type: string description: 'File format. Possible formats: `.csv`, `.doc`, `.docx`, `.htm`, `.html` etc..' encoding: type: string description: Encoding standard of the document. Usually `UTF-8`. tmx_lang: type: string description: TMX (translation memory exchange). Language of document used for parallel corpus creation. tmx_struct: type: string description: Alignment structure to be used for multilingual documents, `align` is the most used structure. Used within segment distinction, which sentence is in which language and to put sentences with the same meaning into one segment. unlegalese: type: boolean description: Convert `all-caps` text to `normal case`. permutation: type: array items: type: integer description: Changing the order of columns (applies only to `type=vert`). auto_paragraphs: type: string description: Automatically insert paragraph breaks (\<p>) in place of blank lines. justext_stoplist: type: string description: Represent the list of unimportant words, in a specified language, from an NLP point of view. tmx_untranslated: type: string description: Placeholder for empty segments in multilingual documents. The segments which have no counterpart in a second language of parallel corpus. temporary: type: boolean description: Is document temporary or not. word_count: type: integer description: Total number of `words` (tokens minus punctuation etc.) in document. vertical_progress: type: integer description: Progress of `vertical file` creation. vertical_error: description: An error occured while creating the vertical file. If the creation was succesfull the value is `Null`. type: string metadata: type: object description: Metadata of document. For example, additional `attributes and values`. 14_documents_post: type: array items: type: object properties: id: type: integer description: Unique numeric `document ID` to identify individual documents from which the corpus was created. filename_display: type: string description: The name of the document. parameters: description: Parameters for plaintext extraction. type: object properties: type: type: string description: 'File format. Possible formats: `.csv`, `.doc`, `.docx`, `.htm`, `.html` etc..' tmx_struct: type: string description: Alignment structure to be used for multilingual documents, `align` is the most used structure. Used within segment distinction, which sentence is in which language and to put sentences with the same meaning into one segment. tmx_untranslated: type: string description: Placeholder for empty segments in multilingual documents. The segments which have no counterpart in a second language of parallel corpus. unlegalese: type: boolean description: Convert `all-caps` text to `normal case`. justext_stoplist: type: string description: Represent the list of unimportant words, in a specified language, from an NLP point of view. tmx_lang: type: string description: TMX (translation memory exchange). Language of document used for parallel corpus creation. permutation: type: array items: type: integer description: Changing the order of columns (applies only to `type=vert`). temporary: type: boolean description: Is document temporary or not. word_count: type: integer description: Total number of `words` (tokens minus punctuation etc.) in document. vertical_progress: type: integer description: Progress of `vertical file` creation. vertical_error: description: An error occured while creating the vertical file. If the creation was succesfull the value is `Null`. type: string metadata: description: Metadata of document. For example, additional `attributes and values`. type: object 15_doc_preview: type: object properties: result: type: string description: Showcase of few lines from the file the corpus was created from (1kB). error: type: string description: Unexpected server error. If none Null. 16_rpc_expand_archive: type: object properties: result: type: integer description: Returns fileset ID. error: type: string description: Unexpected server error. If none Null. 57_not_found_404: type: object properties: error: type: string example: PreloadedCorpus/UserCorpus/Document/Tagset/Grammar/GdexConf matching query does not exist. 45_not_found_RPC: type: object properties: result: type: boolean description: Result of succesfully finished request otherwise Null. error: type: string example: PreloadedCorpus/UserCorpus/Document/Tagset/Grammar/GdexConf/SiteLicence matching query does not exist. 17_bad_request_RPC_1: type: object properties: result: type: boolean description: Result of succesfully finished request otherwise Null. example: true error: type: string description: 'Examples: `READ_ONLY`, `INVALID_CORPUS_IDS`. You do not have permissions for it or you inputted IDs are not correct.' 28_bad_request_RPC_8: type: object properties: result: type: boolean description: Result of succesfully finished request otherwise Null. error: type: string description: 'Examples: `QUOTA_EXCEEDED`, `READ_ONLY`, `INVALID_CORPUS_IDS`, `CORPUS_BUSY`.' 29_bad_request_RPC_9: type: object properties: result: type: boolean description: Result of succesfully finished request otherwise Null. error: type: string description: 'Examples: `QUOTA_EXCEEDED`, `READ_ONLY`.' 30_bad_request_10: type: object properties: error: type: string description: 'Examples: `QUOTA_EXCEEDED`, `READ_ONLY`, `CORPUS_BUSY`, `DAILY_TAGGING_EXCEEDED`, `INVALID_URL`, `NO_DATA`.' 31_bad_request_11: type: object properties: error: type: string description: 'Examples: `READ_ONLY`, `CORPUS_BUSY`, `INVALID_METADATA`.' 32_bad_request_12: type: object properties: error: type: string description: 'Examples: `READ_ONLY`, `CORPUS_BUSY`.' 33_bad_request_13: type: object properties: error: type: string description: 'Examples: `READ_ONLY`.' 34_bad_request_14: type: object properties: error: type: string description: 'Examples: `READ_ONLY`, `QUOTA_EXCEEDED`, `DAILY_TAGGING_EXCEEDED`, `CORPUS_BUSY`.' 40_bad_request_20: type: object properties: error: type: string description: 'Examples: `READ_ONLY`, `QUOTA_EXCEEDED`, `DAILY_TAGGING_EXCEEDED`, `NO_DATA`.' 27_forbidden_normal: type: object properties: error: type: string description: 'Example: `Permission denied`. You do not have required permissions for specified corpus, document, template or other stuff. Permissions like (read, manage, edit, delete, superuser, etc.).' 21_forbidden: type: object properties: result: type: boolean description: Result of succesfully finished request otherwise Null. error: type: string description: 'Example: `Permission denied`. You do not have required permissions for specified corpus, document, template or other stuff. Permissions like (read, manage, edit, delete, superuser, etc.).' 20_unauthorized: type: object properties: error: type: string description: 'Exampple: `Unauthorized`. You need to authorize first, use API key from Sketch Engine.' 23_unauthorized_rpc: type: object properties: result: type: boolean description: Result of succesfully finished request otherwise Null. error: type: string description: 'Example: `Unauthorized`. You need to authorize first, use API key from Sketch Engine.' 58_fileset: type: object properties: progress: type: integer description: 'Fileset creation status: `0` if not started, `100` if finished succesfully, -1 if failed, otherwise in progress. Example: downloading content for corpus creation from the Internet.' time_elapsed: type: integer description: Duration of action with filesets (in seconds). error: type: string description: Description of problem. If none Null. id: type: integer description: Fileset ID. name: type: string description: Fileset name. `Given by user (except the main one with ID = 0).` word_count: type: integer description: Total number of `words` (tokens minus punctuation etc.) in document. web_crawl: type: object properties: input_type: type: string description: '`Source URL` from where the words were downloaded/extracted: website, documents...' seed_words: type: array items: type: string description: A `List of words` acording which the web-crawler will search and gather data from URLs containing them. urls: type: array items: type: string description: A `List of URLs` to be searched by web-crawler. site: type: array items: type: string description: Specific website to be searched by web-crawler. data_downloaded: type: integer description: The amount of data `downloaded` by a web-crawler to create corpus. remaining_files_count: type: integer description: Counter of files found by web-crawler during crawling, `waiting` to be processed. processed_files_count: type: integer description: Counter of `already processed` files. unprocessed_files_count: type: integer description: Counter of files which `cannot` be processed because `invalid content type`, `size`, `duplication` etc.. invalid_content_types_count: type: integer description: Counter of files containing content like `navigation links`, `advertisement`, `headers`, `footers` etc.. unable_to_convert_count: type: integer description: Counter for files whose format cannot be converted to one of the supported formats. duplicate_count: type: integer description: Counter for files with repeating content. time_elapsed: type: integer description: Duration of words gathering with web-crawler (in seconds). average_file_processing_time: type: integer description: Average time to process single file (in seconds). 59_filesets_get_progress: type: object properties: result: type: object properties: progress: type: integer description: 'Fileset creation status: `0` if not started, `100` if finished succesfully, -1 if failed, otherwise in progress. Example: downloading content for corpus creation from the Internet.' time_elapsed: type: number format: float description: Duration of action with filesets (in second). error: type: string description: Description of problem why it cannot be done. word_count: type: integer description: Amount of words(tokens minus punctuation etc.) downloaded by web-crawler. error: type: string description: Unexpected server error. 61_fileset_creation: type: object properties: progress: type: integer description: 'Fileset creation status: `0` if not started, `100` if finished succesfully, -1 if failed, otherwise in progress. Example: downloading content for corpus creation from the Internet.' time_elapsed: type: integer description: Duration of action with filesets (in seconds). error: type: string description: Description of problem. If none Null. id: type: integer description: Fileset ID. name: type: string description: Fileset name. `Given by user (except the main one with ID = 0).` word_count: type: integer description: Total number of `words` (tokens minus punctuation etc.) in document. web_crawl: type: object properties: input_type: type: string description: '`Source URL` from where the words were downloaded/extracted: website, documents...' seed_words: type: array items: type: string description: A `List of words` acording which the web-crawler will search and gather data from URLs containing them. urls: type: array items: type: string description: A `List of URLs` to be searched by web-crawler. site: type: array items: type: string description: Specific website to be searched by web-crawler. data_downloaded: type: integer description: The amount of data `downloaded` by a web-crawler to create corpus. remaining_files_count: type: integer description: Counter of files found by web-crawler during crawling, `waiting` to be processed. processed_files_count: type: integer description: Counter of `already processed` files. unprocessed_files_count: type: integer description: Counter of files which `cannot` be processed because `invalid content type`, `size`, `duplication` etc.. invalid_content_types_count: type: integer description: Counter of files containing content like `navigation links`, `advertisement`, `headers`, `footers` etc.. unable_to_retrieve_count: type: integer description: Cannot return count. invalid_size_count: type: integer description: Counter for sizes that are bigger or smaller as defined limits (max_file_size, min_file_size). invalid_cleaned_size_count: type: integer description: Counter for sizes that are bigger or smaller as defined limits (max_file_size, min_file_size). keywords_filter_applied_count: type: integer description: Amounth of filter usage. unable_to_convert_count: type: integer description: Counter for files whose format cannot be converted to one of the supported formats. duplicate_count: type: integer description: Counter for files with repeating content. time_elapsed: type: integer description: Duration of words gathering with web-crawler (in seconds). average_file_processing_time: type: integer description: Average time to process single file (in seconds). 63_language: type: object properties: id: type: string description: Language iso-code. `ISO 639-1`. name: type: string description: Language name in `English`. autonym: type: string description: Language name in that language. default_tagset_id: type: string description: '`Tagset ID.` Tagset is list of part-of-speech tags (POS tags) for specified language. Defaulty preselected to the most relevant one. For user corpora can be changed. The terms `tagset` and `templates` are interchangeable.' reference_corpus: type: string description: Default `reference` corpus. has_term_grammar: type: boolean description: True if `term extraction` is supported. script: type: string description: 'Used script. Example: `Latin`, `Cyrillic`, etc.' 69_rpc_style: type: object properties: result: type: boolean description: Represent whether request was finished successfully or not. error: type: string description: 'Unexpected server error. Example: `QUOTA_EXCEEDED`. If none Null.' 78_somefiles_post: type: object properties: data: type: object properties: id: type: string description: An alphanumeric `somefile ID`. name: type: string description: Name of `multilingual file`. file_type: type: string description: 'File type of multilingual file: `.tmx`, .`.xlsx`, etc.' owner_id: type: integer description: Unique numeric `owner ID` (usually you). temporary: type: boolean description: Is document temporary or not. encoding: type: string description: Encoding standard of the document. Usually `UTF-8`. guessed_languages: type: object description: 'An object of automatically guessed languages of inserted files during multilingual corpus creation. Maximum: `2`, because Sketch Engine support multilingual corpora only from 2 languages yet.' properties: language_1: type: string language_2: type: string 79_template: type: object properties: id: type: string description: Alphanumeric `template/tagset ID`. The terms `tagset` and `templates` are interchangeable. name: type: string description: Name of `template/tagset file`. owner_id: type: integer description: Unique numeric `owner ID` (usually you). If tagset/template is preloaded `Null`. owner_name: type: string description: Tagset/template `owner name` (usually you). If tagset/template is preloaded `Null`. has_pipeline: type: boolean description: Vertical creation is supported. False for legacy templates. has_tags: type: boolean description: Morphological tagging is supported. has_lemmas: type: boolean description: Lemmatization is supported. static_attributes: type: array description: A list of attributes which can appear in corpus. items: type: string structures: type: array description: A list of used structures. Examples `<s>`, `<g>`. items: type: string tagsetdoc: type: string description: '`URL` leading to template/tagset documentation.' content: type: string description: Content of tagset. default_sketchgrammar_id: type: string description: Not ID, as you probably imagine, but filename of preselected sketchgrammar (`.m4` format). default_termgrammar_id: type: string description: Not ID, as you probably imagine, but filename of preselected sketchgrammar (`.m4` format). sharing: type: object properties: users: type: array items: type: object properties: id: type: integer description: The ID of user you share template with. name: type: string description: The name of user you share template with. email: type: string description: The email of user you share template with. user_group: type: array items: type: object properties: id: type: integer description: The ID of group you share template with. name: type: string description: The name of group you share template with. 80_template_put: type: object properties: id: type: string description: Alphanumeric `template/tagset ID`. The terms `tagset` and `templates` are interchangeable. name: type: string description: Name of `template/tagset file`. owner_id: type: integer description: Unique numeric `owner ID` (usually you). If tagset/template is preloaded `Null`. owner_name: type: string description: Tagset/template `owner name` (usually you). If tagset/template is preloaded `Null`. has_pipeline: type: boolean description: Vertical creation is supported. False for legacy templates. has_tags: type: boolean description: Morphological tagging is supported. has_lemmas: type: boolean description: Lemmatization is supported. static_attributes: type: array description: A list of attributes which can appear in corpus. items: type: string structures: type: array description: A list of used structures. Examples `<s>`, `<g>`. items: type: string tagsetdoc: type: string description: '`URL` leading to template/tagset documentation.' content: type: string description: Content of tagset. default_sketchgrammar_id: type: string description: Not ID, as you probably imagine, but filename of preselected sketchgrammar (`.m4` format). default_termgrammar_id: type: string description: Not ID, as you probably imagine, but filename of preselected sketchgrammar (`.m4` format). sharing: type: object properties: users: type: array items: type: object properties: id: type: integer description: The ID of user you share template with. user_group: type: array items: type: object properties: id: type: integer description: The ID of group you share template with. 81_get_used_space: type: object properties: result: type: object properties: space_used: type: integer description: The current maximal amount of words in the user's corpora. space_total: type: integer example: 1000000 description: Default maximal amount of words. The default is set to `1 000 000` words. It can be changed. error: type: string description: Unexpected server error. If none Null. securitySchemes: basicAuth: type: http scheme: basic security: - basicAuth: []