openapi: 3.0.3
info:
title: Sketch Engine - API documentation
description: "An **application programming interface** (API) is a set of rules and\
\ protocols that allow different software applications to communicate with each\
\ other.\n \n\nIn the context of Sketch Engine, the API provides\
\ a standardized way for developers to access and use Sketch Engine's language\
\ data and text analysis \n tools in their own software applications.\
\ It is useful for anyone who needs to work with text, from analyzing text data\
\ such as searching collocations, \n generating word lists\
\ and keywords, and building text corpora (databases of written language), etc.\
\ With the API, developers can integrate these features \n \
\ into their own applications and create custom text analysis tools.\n\n \
\ \n\nThis **API documentation** outlines the Sketch Engine endpoints\
\ used mainly for working with corpora, including their creation, compilation\
\ and various \n functions such as word sketches, concordances,\
\ etc. The documentation describes the **requests** and responses of API calls,\
\ with most responses provided in \n either **JSON** or **plain\
\ text** format.\n\n \n\nYou can try every endpoint by **authenticating**\
\ with your **API key**, clicking **Try it out** on the endpoint you want to use,\
\ filling in the requested parameters, \n and executing the\
\ query.\n\n \n\nIt is **recommended** to use your **Sketch\
\ Engine API key** for **authentication** when calling the endpoints, otherwise\
\ it does not have to work. \n The **key** can be retrieved\
\ from the Sketch Engine dashboard by following these steps: select **More options**\
\ (upper right corner), then click on **My Account**.\n\n \n\
\n**Last update:** `2nd April 2024`"
version: 2.0.0
termsOfService: https://www.sketchengine.eu/terms-of-use/
contact:
name: Support
url: https://www.sketchengine.eu/contact-us/
externalDocs:
description: former API documentation
url: https://www.sketchengine.eu/documentation/api-documentation/
servers:
- url: ://
- url: http://
tags:
- name: Corpus Search
description: A variety of tools to search and analyse words or texts in the corpus
and generates their statistics.
- name: Corpora
description: Retrieves information about a corpus. Also creates, compiles and deletes
a corpus.
- name: Documents
description: Uploads new documents and deletes documents from a corpus. Adds and
edits document metadata.
- name: Filesets
description: Creates or deletes folders with documents (filesets) or shows their
content.
- name: Languages
description: Retrieves the list of languages.
- name: Somefiles
description: Uploads or updates aligned multilingual files to build a parallel corpus.
- name: Templates
description: Corpus template management.
- name: Users
description: Retrieves information about user accounts.
paths:
/search/corp_info:
get:
operationId: getCorpInfo
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/002_usesubcorp'
- $ref: '#/components/parameters/006_subcorpora'
- $ref: '#/components/parameters/003_gramrels'
- $ref: '#/components/parameters/004_corpcheck'
- $ref: '#/components/parameters/005_registry'
- $ref: '#/components/parameters/007_struct_attr_stats'
- $ref: '#/components/parameters/008_format'
tags:
- Corpus Search
summary: Statistics and information about the whole corpus.
description: "-"
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/01_corp_info'
/search/wordlist:
get:
operationId: getWordList
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/010_wlattr'
- $ref: '#/components/parameters/002_usesubcorp'
- $ref: '#/components/parameters/011_wlnums'
- $ref: '#/components/parameters/072_wlmaxfreq'
- $ref: '#/components/parameters/012_wlminfreq'
- $ref: '#/components/parameters/014_wlpat'
- $ref: '#/components/parameters/015_wlsort'
- $ref: '#/components/parameters/019_wlblacklist'
- $ref: '#/components/parameters/073_include_nonwords'
- $ref: '#/components/parameters/091_relfreq'
- $ref: '#/components/parameters/092_reldocf'
- $ref: '#/components/parameters/018_wlfile'
- $ref: '#/components/parameters/071_wlicase'
- $ref: '#/components/parameters/013_wlmaxitems'
- $ref: '#/components/parameters/093_wlpage'
- $ref: '#/components/parameters/008_format'
- $ref: '#/components/parameters/074_random'
- $ref: '#/components/parameters/089_wltype'
- $ref: '#/components/parameters/063_ngrams_n'
- $ref: '#/components/parameters/087_ngrams_max_n'
- $ref: '#/components/parameters/112_nest_ngrams'
- $ref: '#/components/parameters/057_simple_n'
- $ref: '#/components/parameters/088_usengrams'
tags:
- Corpus Search
summary: A list of word frequencies from the specified corpus.
description: This method can be used for generating frequency lists of all tokens,
lemmas, word forms etc. or for retrieving frequencies of concrete items. Regex
can be used for detailed criteria.
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/02_wordlist'
/search/struct_wordlist:
get:
operationId: getStructWordList
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/010_wlattr'
- $ref: '#/components/parameters/090_wlstruct_attr1'
- $ref: '#/components/parameters/102_wlstruct_attr2'
- $ref: '#/components/parameters/103_wlstruct_attr3'
- $ref: '#/components/parameters/011_wlnums'
- $ref: '#/components/parameters/072_wlmaxfreq'
- $ref: '#/components/parameters/012_wlminfreq'
- $ref: '#/components/parameters/013_wlmaxitems'
- $ref: '#/components/parameters/014_wlpat'
- $ref: '#/components/parameters/015_wlsort'
- $ref: '#/components/parameters/019_wlblacklist'
- $ref: '#/components/parameters/073_include_nonwords'
- $ref: '#/components/parameters/091_relfreq'
- $ref: '#/components/parameters/092_reldocf'
- $ref: '#/components/parameters/071_wlicase'
- $ref: '#/components/parameters/093_wlpage'
- $ref: '#/components/parameters/008_format'
- $ref: '#/components/parameters/074_random'
- $ref: '#/components/parameters/089_wltype'
tags:
- Corpus Search
summary: Provides a list of frequencies in the specified corpus. Offers more
flexibility.
description: The difference from the wordlist is that this enpoint allows to
customize how the results are displayed.
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/12_struct_wordlist'
/search/concordance:
get:
operationId: getConcordance
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/041_q'
- $ref: '#/components/parameters/127_concordance_query_queryselector'
- $ref: '#/components/parameters/128_concordance_query_iquery'
- $ref: '#/components/parameters/129_concordance_query_cql'
- $ref: '#/components/parameters/130_concordance_query_lemma'
- $ref: '#/components/parameters/131_concordance_query_char'
- $ref: '#/components/parameters/132_concordance_query_word'
- $ref: '#/components/parameters/133_concordance_query_phrase'
- $ref: '#/components/parameters/002_usesubcorp'
- $ref: '#/components/parameters/025_lpos'
- $ref: '#/components/parameters/077_default_attr'
- $ref: '#/components/parameters/058_attrs'
- $ref: '#/components/parameters/078_refs'
- $ref: '#/components/parameters/079_attr_allpos'
- $ref: '#/components/parameters/080_viewmode'
- $ref: '#/components/parameters/081_cup_hl'
- $ref: '#/components/parameters/082_structs'
- $ref: '#/components/parameters/083_fromp'
- $ref: '#/components/parameters/084_pagesize'
- $ref: '#/components/parameters/085_kwicleftctx'
- $ref: '#/components/parameters/086_kwicrightctx'
- $ref: '#/components/parameters/134_errcorr_switch'
- $ref: '#/components/parameters/135_cup_err_code'
- $ref: '#/components/parameters/136_cup_err'
- $ref: '#/components/parameters/137_cup_corr'
- $ref: '#/components/parameters/040_json'
- $ref: '#/components/parameters/039_asyn'
- $ref: '#/components/parameters/008_format'
tags:
- Corpus Search
summary: Concordance - shows the search word or phrase in context.
description: "The concordance allows complex criteria for searching the corpus.\
\ The queries can combine any data, metadata and annotations found in the\
\ corpus.\n\n `To make basic concordance its enough to use just corpname and\
\ q parameters.`"
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/06_concordance'
/search/fullref:
get:
operationId: getFullRef
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/111_pos'
summary: Returns all metadata of one concordance line.
description: Displays all available text types (metadata) related to the concrete
KWIC (hit) defined by its position in the corpus.
tags:
- Corpus Search
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/14_fullref'
/search/widectx:
get:
operationId: getWideCtx
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/111_pos'
- $ref: '#/components/parameters/138_hitlen'
- $ref: '#/components/parameters/082_structs'
- $ref: '#/components/parameters/139_detail_left_ctx'
- $ref: '#/components/parameters/140_detail_right_ctx'
summary: Returns extended context the KWIC in a concrete concordance line.
description: This is the equivalent of clicking KWIC in one concordance line
which displays a popup with an extended context.
tags:
- Corpus Search
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/16_widectx'
/search/freqml:
get:
operationId: getFreqMl
tags:
- Corpus Search
summary: "Calculates frequencies of words, lemmas\u2026 in the concordance."
description: The frequency of any [positional attribute] (https://www.sketchengine.eu/my_keywords/positional-attribute/)
such as word forms, lemmas, tags can be counted with this method. Structure
attributes (metadata/text types) can also be counted.
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/100_ml1attr'
- $ref: '#/components/parameters/101_ml1ctx'
- $ref: '#/components/parameters/141_ml2attr'
- $ref: '#/components/parameters/142_ml2ctx'
- $ref: '#/components/parameters/143_ml3attr'
- $ref: '#/components/parameters/144_ml3ctx'
- $ref: '#/components/parameters/145_ml4attr'
- $ref: '#/components/parameters/146_ml4ctx'
- $ref: '#/components/parameters/147_ml5attr'
- $ref: '#/components/parameters/148_ml5ctx'
- $ref: '#/components/parameters/149_ml6attr'
- $ref: '#/components/parameters/150_ml6ctx'
- $ref: '#/components/parameters/041_q'
- $ref: '#/components/parameters/002_usesubcorp'
- $ref: '#/components/parameters/044_fmaxitems'
- $ref: '#/components/parameters/094_fpage'
- $ref: '#/components/parameters/095_group'
- $ref: '#/components/parameters/096_showpoc'
- $ref: '#/components/parameters/097_showreltt'
- $ref: '#/components/parameters/098_showrel'
- $ref: '#/components/parameters/099_freqlevel'
- $ref: '#/components/parameters/040_json'
- $ref: '#/components/parameters/045_freq_sort'
- $ref: '#/components/parameters/008_format'
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/11_freqml'
/search/freq_distrib:
get:
operationId: getFregDistrib
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/118_res'
- $ref: '#/components/parameters/025_lpos'
- $ref: '#/components/parameters/077_default_attr'
- $ref: '#/components/parameters/058_attrs'
- $ref: '#/components/parameters/082_structs'
- $ref: '#/components/parameters/078_refs'
- $ref: '#/components/parameters/079_attr_allpos'
- $ref: '#/components/parameters/080_viewmode'
- $ref: '#/components/parameters/120_fc_lemword_window_type'
- $ref: '#/components/parameters/121_fc_lemword_wsize'
- $ref: '#/components/parameters/122_fc_lemword_type'
- $ref: '#/components/parameters/125_fc_pos_window_type'
- $ref: '#/components/parameters/123_fc_pos_wsize'
- $ref: '#/components/parameters/124_fc_pos_type'
- $ref: '#/components/parameters/040_json'
- $ref: '#/components/parameters/119_normalize'
- $ref: '#/components/parameters/008_format'
summary: Provides the distribution of hits in the corpus
description: "-"
tags:
- Corpus Search
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/13_freq_distrib'
/search/freqdist:
get:
operationId: getFreqDist
description: "-"
parameters:
- $ref: '#/components/parameters/166_corpname_freqdist'
- $ref: '#/components/parameters/010_wlattr'
- $ref: '#/components/parameters/162_diaattr'
- $ref: '#/components/parameters/163_sse'
- $ref: '#/components/parameters/164_threshold'
- $ref: '#/components/parameters/161_ctx'
- $ref: '#/components/parameters/167_wordlist'
- $ref: '#/components/parameters/165_json_freqdist'
tags:
- Corpus Search
summary: Utility for web interface only. Provides relative frequency data for
wordlist graphs within a specific time period, only for trend corpora.
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/19_freqdist'
/search/collx:
get:
operationId: getCollx
description: "-"
summary: Computes collocation candidates from a concordance.
tags:
- Corpus Search
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/041_q'
- $ref: '#/components/parameters/002_usesubcorp'
- $ref: '#/components/parameters/046_cattr'
- $ref: '#/components/parameters/053_csortfn'
- $ref: '#/components/parameters/052_cbgrfns'
- $ref: '#/components/parameters/047_cfromw'
- $ref: '#/components/parameters/048_ctow'
- $ref: '#/components/parameters/049_cminfreq'
- $ref: '#/components/parameters/050_cminbgr'
- $ref: '#/components/parameters/051_cmaxitems'
- $ref: '#/components/parameters/154_json_collx'
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/10_collx'
/search/subcorp:
get:
operationId: getSubCorp
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/054_subcname'
- $ref: '#/components/parameters/155_create'
- $ref: '#/components/parameters/055_delete'
- $ref: '#/components/parameters/156_q_subcorp'
- $ref: '#/components/parameters/157_struct'
- $ref: '#/components/parameters/160_json_subcorp'
- $ref: '#/components/parameters/008_format'
tags:
- Corpus Search
summary: Get a list of subcorpora in the corpus or create/delete a subcorpus.
description: There is two option how to create subcorpora in Sketch Engine from
`text types` => json parameter (corpus must be annotated for text types) or
from `concordances` => q + struct parameters.
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/07_subcorp'
/search/subcorpus_rename:
get:
operationId: subcorpusRename
description: "-"
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/158_subcorp_id'
- $ref: '#/components/parameters/159_new_subcorp_name'
tags:
- Corpus Search
summary: Rename subcorpus.
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/17_subcorpus_rename'
/search/subcorp_info:
get:
operationId: subcorpusInfo
description: "-"
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/054_subcname'
tags:
- Corpus Search
summary: Statistics about the subcorpus.
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/18_subcorp_info'
/search/extract_keywords:
get:
operationId: getExtractKeywords
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/056_ref_corpname'
- $ref: '#/components/parameters/002_usesubcorp'
- $ref: '#/components/parameters/057_simple_n'
- $ref: '#/components/parameters/018_wlfile'
- $ref: '#/components/parameters/019_wlblacklist'
- $ref: '#/components/parameters/115_attr'
- $ref: '#/components/parameters/060_alnum'
- $ref: '#/components/parameters/061_onealpha'
- $ref: '#/components/parameters/151_minfreq_extract_keywords'
- $ref: '#/components/parameters/152_maxfreq_extract_keywords'
- $ref: '#/components/parameters/062_max_keywords'
- $ref: '#/components/parameters/073_include_nonwords'
- $ref: '#/components/parameters/104_icase'
- $ref: '#/components/parameters/014_wlpat'
- $ref: '#/components/parameters/153_addfreqs'
- $ref: '#/components/parameters/092_reldocf'
- $ref: '#/components/parameters/088_usengrams'
- $ref: '#/components/parameters/063_ngrams_n'
- $ref: '#/components/parameters/087_ngrams_max_n'
- $ref: '#/components/parameters/008_format'
tags:
- Corpus Search
summary: Identifies keywords, key n-grams, key collocations and terms.
description: Keywords, key n-grams, key collocations and terms are identified
by comparing the focus corpus (or a subcorpus) to a refrence corpus (or a
subcorpus). It is the equivalent of using the Keywords and terms tool or using
the key option in n-grams or the word sketch.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/08_extract_keywords'
/search/textypes_with_norms:
get:
operationId: getTextTypesWithNorms
description: "-"
parameters:
- $ref: '#/components/parameters/001_corpname'
summary: Returns a list of text types with values.
tags:
- Corpus Search
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/15_textypes_with_norms'
/search/attr_vals:
get:
operationId: getAttrVals
parameters:
- $ref: '#/components/parameters/001_corpname'
- $ref: '#/components/parameters/020_avattr'
- $ref: '#/components/parameters/021_avpat'
- $ref: '#/components/parameters/023_avfrom'
- $ref: '#/components/parameters/022_avmaxitems'
- $ref: '#/components/parameters/104_icase'
- $ref: '#/components/parameters/008_format'
tags:
- Corpus Search
summary: Utility for web interface only. A list of values for a given structure
attribute (avattr).
description: Not to be used outside the web interface. Replaced by a more powerful
wordlist.
responses:
'200':
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/09_attr_vals'
/ca/api/corpora:
get:
operationId: getCorpora
description: "-"
tags:
- Corpora
summary: Returns a list of all corpora accessible to you.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/03_corpora_list'
post:
operationId: createCorpus
tags:
- Corpora
summary: Creates a new user corpus.
description: Creates a new empty corpus. Use **Documents** endpoints (and optionally
**Filesets**) to add data to the corpus.
requestBody:
description: "Set the language, corpus name and corpus description. \n\n -\
\ `info` => The additional information for a newly created corpus. (string)\
\ \n\n - `language_id` => Language iso-code. `ISO 639-1`. (string) \n\n\
\ - `name` => Unique `corpus name` for a newly created corpus. (string)"
content:
application/json:
schema:
$ref: '#/components/schemas/01_corpora_request'
required: true
responses:
"201":
description: '`Created`'
content:
application/json:
schema:
type: object
properties:
data:
$ref: '#/components/schemas/04_corpora_single'
/ca/api/corpora/{corpusId}:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
get:
operationId: getCorpus
description: "-"
tags:
- Corpora
summary: Retrieves a user corpus.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/04_corpora_single'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
put:
operationId: updateCorpus
description: "-"
tags:
- Corpora
summary: Updates a user corpus.
requestBody:
description: " - `expert_mode` => Set to **True** if you are hard-core. (boolean)\
\ \n\n - `name` => Corpus name. **Given by user**. (string) \n\n - `info`\
\ => Additional info about corpus. (string) \n\n - `document_order` =>\
\ Can be set to enforce document order within the corpus. (list of integers)\
\ \n\n - `structures` => Available structures or tags in the corpus. Structures\
\ like **s** (sentence), **g** (glue), **doc** (document).(list) \n\n \
\ - `name` => Structure name. Example: **s**. (string) \n\n - `attributes`\
\ => A list of used attributes in corpus. (list) \n\n - `name` => The\
\ name of used attribute. (string) \n\n - `file_structure` => The structure\
\ in which individual documents should be wrapped. Usually **doc**. (string)\
\ \n\n - `onion_structure` => The structure for deduplication. Usually\
\ **p** (paragraph) or **Null** (no deduplication). (string) \n\n - `docstructure`\
\ => Structure in which individual documents should be wrapped. Usually\
\ **doc**. (string) \n\n - `sketch_grammar_id` => Name of sketch grammar\
\ file. For sketch grammars querying. Sketch grammar is a series of rules\
\ written in the CQL query language that search for collocations in a text\
\ corpus and categorize them according to\_their grammatical relations.\
\ Example: **preloaded/english-penn_tt-3.3.wsdef.m4**. (string) \n\n -\
\ `term_grammar_ir` => Name of term grammar file. Term grammar tells Sketch\
\ Engine which words and phrases should indentify as terms. Example: **/corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4**.\
\ (string)"
content:
application/json:
schema:
$ref: '#/components/schemas/07_corpus_update'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
type: object
properties:
data:
$ref: '#/components/schemas/04_corpora_single'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
delete:
operationId: deleteCorpus
description: "-"
tags:
- Corpora
summary: Deletes a user corpus.
responses:
"204":
description: '`No Content`'
content: {}
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
/ca/api/corpora/{corpusId}/can_be_compiled:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
post:
operationId: checkCompilable
description: "-"
tags:
- Corpora
summary: Checks if the corpus fulfills all conditions to be compiled. (RPC)
requestBody:
description: ' In this documentation, an empty request is used mostly used
with the **RPC style** method where the content in a request is not needed
(in most cases). RPC style endpoints focus on performing **one action**
right (procedures, command) easier than **REST API**-based endpoints. It
is not as scalable as REST API style. RPC is mostly used with HTTP: GET
(to fetch information) and POST (to everything else) in CA api is it used
with POST HTTP method. '
content:
application/json:
schema:
$ref: '#/components/schemas/05_empty_request'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/05_can_be_compiled'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/45_not_found_RPC'
/ca/api/corpora/{corpusId}/get_progress:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
post:
operationId: getCompilationProgress
description: "-"
tags:
- Corpora
summary: Retrieves the current progress of the corpus compilation. (RPC)
requestBody:
description: ' In this documentation, an empty request is used mostly used
with the **RPC style** method where the content in a request is not needed
(in most cases). RPC style endpoints focus on performing **one action**
right (procedures, command) easier than **REST API**-based endpoints. It
is not as scalable as REST API style. RPC is mostly used with HTTP: GET
(to fetch information) and POST (to everything else) in CA api is it used
with POST HTTP method. '
content:
application/json:
schema:
$ref: '#/components/schemas/05_empty_request'
required: true
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/06_get_progress'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/45_not_found_RPC'
/ca/api/corpora/{corpusId}/compile:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
post:
operationId: compileCorpus
description: "-"
tags:
- Corpora
summary: Performs the corpus compilation. (RPC)
requestBody:
description: ' `Structures` or `structure attributes` in corpus which should
be compiled. Usually: `all`. (string) '
content:
application/json:
schema:
$ref: '#/components/schemas/02_compile_request'
required: true
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/69_rpc_style'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/29_bad_request_RPC_9'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/23_unauthorized_rpc'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/45_not_found_RPC'
/ca/api/corpora/{corpusId}/logs/{logName}:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/05_logname'
get:
operationId: getCompilationLog
tags:
- Corpora
summary: Show the compilation log file '.log' of the corpus.
description: If **logName** == **last.log** it shows the latest version of the
log file.
responses:
"200":
description: '`OK`'
content:
text/plain; charset=utf-8:
schema:
description: The log file of the corpus.
type: string
"401":
description: '`Unauthorized`'
"403":
description: '`Forbidden` (you need `read` permission)'
"404":
description: '`Not Found`'
"405":
description: '`Method Not Allowed`'
/ca/api/corpora/{corpusId}/download:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/09_format'
- $ref: '#/components/parameters/10_file_structure'
- $ref: '#/components/parameters/11_aligned'
get:
operationId: getCorpusSource
tags:
- Corpora
summary: Downloads the documents from which the corpus was created (the source
files).
description: 'Example call can be: **https://app.sketchengine.eu/ca/api/corpora/{corpusId}/download?format=vert&file_structure=doc**.'
responses:
"200":
description: '`OK`'
content:
text/plain:
schema:
type: string
"400":
description: '`Bad Request` Examples: `ALIGNED_NOT_FOUND`, `ALIGNED_FORBIDDEN`,
`INVALID_FORMAT`.'
"401":
description: '`Unauthorized`'
"403":
description: '`Forbidden`'
"404":
description: '`Not Found`'
/ca/api/corpora/{corpusId}/cancel_job:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
post:
operationId: cancelJob
description: "-"
tags:
- Corpora
summary: Cancels running tasks (e.g. compilation) relatated to the corpus. (RPC)
requestBody:
description: ' In this documentation, an empty request is used mostly used
with the **RPC style** method where the content in a request is not needed
(in most cases). RPC style endpoints focus on performing **one action**
right (procedures, command) easier than **REST API**-based endpoints. It
is not as scalable as REST API style. RPC is mostly used with HTTP: GET
(to fetch information) and POST (to everything else) in CA api is it used
with POST HTTP method. '
content:
application/json:
schema:
$ref: '#/components/schemas/05_empty_request'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/69_rpc_style'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/23_unauthorized_rpc'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
/ca/api/corpora/compile_aligned:
post:
operationId: compileAlignedCorpus
description: "-"
tags:
- Corpora
summary: Compiles a parallel corpus (consisting of two or more aligned corpora).
(RPC)
requestBody:
description: "List of corpus IDs used in aligned compilation. \n\n - `corpus_ids`\
\ => A list of **Corpus ID** of multilingual corpora. (integer) \n\n -\
\ `structures` => Represent if **all** structures should be used during\
\ compilation (in that case it should be contain just **all**) or just some\
\ of them. (string) "
content:
application/json:
schema:
$ref: '#/components/schemas/03_corpus_ids'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/69_rpc_style'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/28_bad_request_RPC_8'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
/ca/api/corpora/align:
post:
operationId: segmentAlign
tags:
- Corpora
summary: Creates segment representing the same line in two languages in a parallel
corpus. (RPC)
description: Run if documents inserted into the corpus are not aligned.
requestBody:
description: "- `alignstruct` => According to which structure the document\
\ should be aligned. Usually, **\\**. (string) \n\n - `auto` => **True**,\
\ when documents are not compiled. Sketch Engine will align them automatically.\
\ (boolean) \n\n - `corpus_ids` => A list of **Corpus ID** of multilingual\
\ corpus. (integer) "
content:
application/json:
schema:
$ref: '#/components/schemas/04_align_req'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/69_rpc_style'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/17_bad_request_RPC_1'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/23_unauthorized_rpc'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
/ca/api/corpora/{corpusId}/documents:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/07_fileset_id_query'
get:
operationId: getAllDocuments
description: "-"
tags:
- Documents
summary: Retrieves a list of all documents in the corpus.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/13_documents_get'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need `view` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found` (HTML response).'
post:
operationId: createNewDocument
description: "-"
tags:
- Documents
summary: Uploads a new document.
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/07_fileset_id_query'
- $ref: '#/components/parameters/18_wait_with_tagging'
requestBody:
description: File to upload.
content:
multipart/form-data:
schema:
properties:
file:
type: string
description: File to upload.
format: binary
responses:
"201":
description: '`Created`'
content:
application/json:
schema:
$ref: '#/components/schemas/14_documents_post'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/30_bad_request_10'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need `upload` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found` (HTML response).'
put:
operationId: updateDocumentMetadata
description: "-"
tags:
- Documents
summary: Edits the metadata of a document.
requestBody:
description: " - `id` => Unique numeric `document ID`. (integer) \n\n -\
\ `metadata` => Pairs of `attribute_name`:`value`."
content:
application/json:
schema:
$ref: '#/components/schemas/11_doc_metadata'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/13_documents_get'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/31_bad_request_11'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need `edit` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found` (HTML response).'
/ca/api/corpora/{corpusId}/documents/{documentId}:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/02_document_id'
get:
operationId: getDocument
description: "-"
tags:
- Documents
summary: Retrieves a specific document.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/13_documents_get'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need `view` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
put:
operationId: updateDocument
description: "-"
tags:
- Documents
summary: Updates documents to corpus.
requestBody:
description: " - `filename_display` => Name of documents. (string) \n\n -\
\ `id` => Unique numeric **document ID** to identify individual documents.\
\ (integer) \n\n - `inProgress` => Represents whether the currently edited\
\ document is in use. (boolean) \n\n - `isArchive` => Represents if the\
\ updated document is in a format like .zip (created via some archive manager).\
\ (boolean) \n\n - `metadata` => Metadata of document. For example, additional\
\ attributes and values. \n\n - `parameters` => Parameters for plaintext\
\ extraction. \n\n - `encoding` => Encoding standard of the document.\
\ Usually, **UTF-8**. (string) \n\n - `justext_stoplist` => Represent\
\ the list of unimportant words, in a specified language, from an NLP point\
\ of view. (string) \n\n - `permutation` => Changing the order of columns\
\ (applies only to **type=vert**). (integer) \n\n - `tmx_lang` => TMX\
\ (translation memory exchange). Language of document used for parallel\
\ corpus creation. (string) \n\n - `tmx_struct` => Alignment structure\
\ to be used for multilingual documents, **align** is the most used structure.\
\ Used within segment distinction, which sentence is in which language and\
\ to put sentences with the same meaning into one segment. (string) \n\n\
\ - `tmx_untranslated` => Placeholder for empty segments in multilingual\
\ documents. The segments which have no counterpart in a second language\
\ of parallel corpus. (string) \n\n - `type` => File format (.csv, .doc,\
\ .docx, .htm, .html). (string) \n\n - `unlegalese` => Convert **all-caps**\
\ text to **normal case**. (boolean) \n\n - `temporary` => Is document\
\ temporary or not. (boolean) \n\n - `word_count` => Total number of **words**\
\ (tokens minus punctuation etc.) in document. (integer) \n\n - `vertical_progress`\
\ => Progress of **vertical file** creation. (integer) \n\n - `vertical_error`\
\ => An error occured while creating the vertical file. If the creation\
\ was succesfull the value is **Null**. (string)"
content:
application/json:
schema:
$ref: '#/components/schemas/09_doc_put_req'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/13_documents_get'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/31_bad_request_11'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need `edit` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
delete:
operationId: deleteDocuments
tags:
- Documents
summary: Deletes one or more documents from the corpus.
description: 'To delete more documents, separate document ids with commas. Example
call: `https://app.sketchengine.eu/ca/api/corpora/{corpusId}/documents/{documentId_1},{documentId_2}`'
responses:
"204":
description: '`No Content`'
content: {}
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/31_bad_request_11'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need `delete` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
/ca/api/corpora/{corpusId}/documents/{documentId}/preview:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/02_document_id'
post:
operationId: updateDocumentParameters
tags:
- Documents
summary: Updates document parameters. (RPC)
description: 'Updates parameters like: `File Type`, `Encoding`, etc.'
requestBody:
description: " - `auto_paragraphs` => Automatically insert paragraph breaks\
\ (**\\
**) in place of blank lines. (string) \n\n - `encoding` => Encoding\
\ standard of the document. Usually **UTF-8**. (string) \n\n - `justext_stoplist`\
\ => Represent the list of unimportant words, in a specified language, from\
\ an NLP point of view. (string) \n\n - `permutation` => Changing the order\
\ of columns (applies only to **type=vert**). \n\n - `tmx_lang` => TMX\
\ (translation memory exchange). Language of document used for parallel\
\ corpus creation. (string) \n\n - `tmx_struct` => Alignment structure\
\ to be used for multilingual documents, **align** is the most used structure.\
\ Used within segment distinction, which sentence is in which language and\
\ to put sentences with the same meaning into one segment. (string) \n\n\
\ - `tmx_untranslated` => Placeholder for empty segments in multilingual\
\ documents. The segments which have no counterpart in a second language\
\ of parallel corpus. (string) \n\n - `type` => File format (.csv, .doc,\
\ .docx, .htm, .html). (string) \n\n - `unlegalese` => Convert **all-caps**\
\ text to **normal case**. (boolean) "
content:
application/json:
schema:
$ref: '#/components/schemas/10_doc_preview'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/15_doc_preview'
"403":
description: '`Forbidden`'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
/ca/api/corpora/{corpusId}/documents/{documentId}/original:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/02_document_id'
get:
operationId: getDocumentOriginal
description: "-"
tags:
- Documents
summary: Downloads a corpus file in its original format (the format that was
uploaded). This method cannot be simulated in this online documentation.
responses:
"200":
description: '`OK`'
content:
application/octet-stream:
schema:
type: string
format: binary
description: The document was downloaded successfully in its original
format.
"401":
description: '`Unauthorized`'
"403":
description: '`Forbidden` (you need `view` permission).'
"404":
description: '`Not Found` (Html response).'
"405":
description: '`Method Not Allowed`'
/ca/api/corpora/{corpusId}/documents/{documentId}/plaintext:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/02_document_id'
get:
operationId: getDocumentPlaintext
description: "-"
tags:
- Documents
summary: Retrieves 1KB of data in plaintext format. Can load more than 1KB.
responses:
"206":
description: '`Partial Content`'
content:
text/plain; charset=utf-8:
schema:
description: Document in plaintext format.
type: string
"401":
description: '`Unauthorized`'
"403":
description: '`Forbidden` (you need `view` permission).'
"404":
description: '`Not Found` (Html response).'
"405":
description: '`Method Not Allowed`'
"416":
description: '`Range Not Satisfiable`'
/ca/api/corpora/{corpusId}/documents/{documentId}/vertical:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/02_document_id'
get:
operationId: getDocumentVertical
description: "-"
tags:
- Documents
summary: Retrieves 1KB of data in vertical format. Can load more than 1KB.
responses:
"206":
description: '`Partial Content`'
content:
text/plain; charset=utf-8:
schema:
description: Document in vertical format.
type: string
"401":
description: '`Unauthorized`'
"403":
description: '`Forbidden`'
"404":
description: '`Not Found` (Html response).'
"405":
description: '`Method Not Allowed`'
"416":
description: '`Range Not Satisfiable`'
/ca/api/corpora/{corpusId}/documents/{documentId}/expand_archive:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/02_document_id'
post:
operationId: expandArchive
description: "-"
tags:
- Documents
summary: Expands a ZIP file. (if the corpus files were uploaded as a zip archive).
Expanding is not necessary for the corpus to work.(RPC)
requestBody:
description: ' In this documentation, an empty request is used mostly used
with the **RPC style** method where the content in a request is not needed
(in most cases). RPC style endpoints focus on performing **one action**
right (procedures, command) easier than **REST API**-based endpoints. It
is not as scalable as REST API style. RPC is mostly used with HTTP: GET
(to fetch information) and POST (to everything else) in CA api is it used
with POST HTTP method. '
content:
application/json:
schema:
$ref: '#/components/schemas/05_empty_request'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/16_rpc_expand_archive'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/33_bad_request_13'
"403":
description: '`Forbidden (you need edit permission)`'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
"404":
description: '`Not Found` (Html response).'
/ca/api/corpora/{corpusId}/documents/{documentId}/cancel_job:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/02_document_id'
post:
operationId: cancelDocumentJob
tags:
- Documents
summary: Cancels running task which is in direct relation to the document. (RPC)
description: An example task that can be canceled is `uploading file`.
requestBody:
description: ' In this documentation, an empty request is used mostly used
with the **RPC style** method where the content in a request is not needed
(in most cases). RPC style endpoints focus on performing **one action**
right (procedures, command) easier than **REST API**-based endpoints. It
is not as scalable as REST API style. RPC is mostly used with HTTP: GET
(to fetch information) and POST (to everything else) in CA api is it used
with POST HTTP method. '
content:
application/json:
schema:
$ref: '#/components/schemas/05_empty_request'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/69_rpc_style'
"403":
description: '`Forbidden` (you need `edit` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/45_not_found_RPC'
/ca/api/corpora/{corpusId}/documents/{documentId}/get_progress:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/02_document_id'
post:
operationId: getProgress
tags:
- Documents
summary: Shows the actual progress of the currently running task related to
documents. (RPC)
description: Used in tasks like `uploading` new files which will be used to
create or expand the corpora.
requestBody:
description: ' In this documentation, an empty request is used mostly used
with the **RPC style** method where the content in a request is not needed
(in most cases). RPC style endpoints focus on performing **one action**
right (procedures, command) easier than **REST API**-based endpoints. It
is not as scalable as REST API style. RPC is mostly used with HTTP: GET
(to fetch information) and POST (to everything else) in CA api is it used
with POST HTTP method. '
content:
application/json:
schema:
$ref: '#/components/schemas/05_empty_request'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/06_get_progress'
"403":
description: '`Forbidden` (you need permission to `view` to the specified
corpus).'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/45_not_found_RPC'
/ca/api/corpora/{corpusId}/filesets:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/19_compile_when_finished'
get:
operationId: getFileSets
description: "-"
tags:
- Filesets
summary: List of "subdirectories", i.e. unzipped archives or WebBootCaT runs
for a given corpus.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/58_fileset'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need permission to `upload` to the specified
corpus).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
post:
operationId: createFileSet
tags:
- Filesets
summary: Creates new fileset by the web-crawler.
description: Used within creating or expanding corpus.
requestBody:
description: "Setting parameters to improve web-crawler accuracy. \n\n -\
\ `bl_max_total_kw` => **Blacklist max total keyword**. Means that web page\
\ or document will be discarded if it contains more words from the denylist\
\ (blacklist) than this limit. (integer) \n\n - `bl_max_unique_kw` => **Blacklist\
\ max unique keyword**. Means that web page or document will be discarded\
\ if it contains more unique words from the denylist (blacklist) than this\
\ limit. (integer) \n\n - `black_list` => A list (separated by whitespaces)\
\ of **blocked words**, words you don't want to see in your future corpus.\
\ (string) \n\n - `input_type` => Input types the web-crawler will works\
\ with. Example: **urls**. (string) \n\n - `max_cleaned_file_size` => Web\
\ pages and documents with a size **over** this limit (**in kB**) will be\
\ ignored. (integer) \n\n - `max_file_size` => Web pages and documents\
\ with a size **over** this limit (**in kB**) will be ignored. (integer)\
\ \n\n - `min_cleaned_file_size` => Web pages and documents **smaller**\
\ than this limit (**in kB**) after cleaning will be ignored. Cleaning involves\
\ conversion to plain text, removing boilerplate text (e.g. navigation menus,\
\ legal text, disclaimers and other repetitive content). (integer) \n\n\
\ - `min_file_size` => Web pages and documents with a **size below** this\
\ limit (**in kB**) will be ignored. (integer) \n\n - `name` => Texts will\
\ be organized into a corpus folder with this name. (string) \n\n - `seed_word`\
\ => A list of words according to which the URLs were chosen to be searched.\
\ (string) \n\n - `white_list` => A list (separated by whitespaces) of\
\ allowed words, words you want to see in your future corpus. (list of string)\
\ \n\n - `wl_min_kw_ratio` => **Whitelist minimal keywords ratio**. Means\
\ that web page or document will be included only if the percentage of allowlist\
\ words compared to total words is higher than this limit. (integer) \n\n\
\ - `wl_min_total_kw` => **Whitelist minimal total keywords**. Means that\
\ web page or document will be included only if it contains more words from\
\ the allowlist (whitelist) than this limit. (integer) \n\n - `wl_min_unique_kw`\
\ => **Whitelist minimal unique keywords**. Means that a web page or document\
\ will be included only if it contains more words from the allowlist (whitelist)\
\ than this limit. (integer)"
content:
application/json:
schema:
$ref: '#/components/schemas/13_filesets_creation'
required: true
responses:
"201":
description: '`Created`'
content:
application/json:
schema:
type: object
properties:
data:
$ref: '#/components/schemas/61_fileset_creation'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/34_bad_request_14'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need permission to `upload` to the specified
corpus).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
/ca/api/corpora/{corpusId}/filesets/{filesetId}:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/06_fileset_id'
get:
operationId: getFileSet
description: "-"
tags:
- Filesets
summary: Returns information about a specific "subdirectory".
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
type: object
properties:
data:
$ref: '#/components/schemas/58_fileset'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need permission to `view`).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
delete:
operationId: deleteFileSet
description: "-"
tags:
- Filesets
summary: Deletes subdirectory containing document (for creating corpus).
responses:
"204":
description: '`No Content`'
content: {}
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/32_bad_request_12'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need permission to `delete`).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
/ca/api/corpora/{corpusId}/filesets/{filesetId}/cancel_job:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/06_fileset_id'
post:
operationId: cancelFileSetJob
tags:
- Filesets
summary: Cancel running task. (RPC)
description: 'Example: cancel downloading data from websites by web-crawler.'
requestBody:
description: ' In this documentation, an empty request is used mostly used
with the **RPC style** method where the content in a request is not needed
(in most cases). RPC style endpoints focus on performing **one action**
right (procedures, command) easier than **REST API**-based endpoints. It
is not as scalable as REST API style. RPC is mostly used with HTTP: GET
(to fetch information) and POST (to everything else) in CA api is it used
with POST HTTP method. '
content:
application/json:
schema:
$ref: '#/components/schemas/05_empty_request'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/69_rpc_style'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/23_unauthorized_rpc'
"403":
description: '`Forbidden` (you need `upload` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/45_not_found_RPC'
/ca/api/corpora/{corpusId}/filesets/{filesetId}/get_progress:
parameters:
- $ref: '#/components/parameters/01_corpus_id'
- $ref: '#/components/parameters/06_fileset_id'
post:
operationId: getFileSetProgress
tags:
- Filesets
summary: Shows the actual progress of a running task related to filesets. (RPC)
description: A task like `downloading content from web` to create corpus with
web crawler.
requestBody:
description: ' In this documentation, an empty request is used mostly used
with the **RPC style** method where the content in a request is not needed
(in most cases). RPC style endpoints focus on performing **one action**
right (procedures, command) easier than **REST API**-based endpoints. It
is not as scalable as REST API style. RPC is mostly used with HTTP: GET
(to fetch information) and POST (to everything else) in CA api is it used
with POST HTTP method. '
content:
application/json:
schema:
$ref: '#/components/schemas/05_empty_request'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/59_filesets_get_progress'
"403":
description: '`Forbidden` (you need `view` permission).'
content:
application/json:
schema:
$ref: '#/components/schemas/21_forbidden'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/45_not_found_RPC'
/ca/api/languages:
get:
operationId: getLanguages
description: "-"
tags:
- Languages
summary: Retrieves a list of all languages.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/63_language'
/ca/api/somefiles:
post:
operationId: uploadAligendDocuments
description: "-"
tags:
- Somefiles
summary: Upload aligned documents for creating parallel corpus.
requestBody:
description: Aligned multilingual file (mostly in '.tmx' file type).
content:
multipart/form-data; boundary={boundary}:
schema:
description: Aligned multilingual file (mostly in '.tmx' file type).
type: string
responses:
"201":
description: '`Created`'
content:
application/json:
schema:
$ref: '#/components/schemas/78_somefiles_post'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/40_bad_request_20'
/ca/api/somefiles/{somefileId}:
parameters:
- $ref: '#/components/parameters/08_somefile_id'
get:
operationId: getAlignedDocuments
description: "-"
tags:
- Somefiles
summary: Retrieves specific multilingual file metadata.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/78_somefiles_post'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
put:
operationId: updateAlignedDocs
description: "-"
tags:
- Somefiles
summary: Updates multilingual file metadata.
requestBody:
description: " - `corpora` \n\n - `guessed_language_code` \n\n - `language_id`\
\ => Language iso-code. **ISO 639-1**. (string) \n\n - `name` => Language\
\ name in **English**. (string) "
content:
application/json:
schema:
$ref: '#/components/schemas/19_somefiles_put'
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
type: object
properties:
data:
$ref: '#/components/schemas/12_corpora_single_full'
"400":
description: '`Bad Request`'
content:
application/json:
schema:
$ref: '#/components/schemas/33_bad_request_13'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
/ca/api/tagsets/{templateId}:
parameters:
- $ref: '#/components/parameters/03_template_id'
get:
operationId: getUserTemplate
description: "-"
tags:
- Templates
summary: Retrieves details of specified user template / tagset.
responses:
"200":
description: '`OK`'
content:
application/json:
schema:
$ref: '#/components/schemas/79_template'
"401":
description: '`Unauthorized`'
content:
application/json:
schema:
$ref: '#/components/schemas/20_unauthorized'
"403":
description: '`Forbidden` (you need permission to `read`).'
content:
application/json:
schema:
$ref: '#/components/schemas/27_forbidden_normal'
"404":
description: '`Not Found`'
content:
application/json:
schema:
$ref: '#/components/schemas/57_not_found_404'
put:
operationId: updateUserTemplate
description: "-"
tags:
- Templates
summary: Updates specified user template/tagset.
requestBody:
description: " - `id` => Alphanumeric **template/tagset ID**. The terms **tagset**\
\ and **templates** are interchangeable. (string) \n\n - `name` => Name\
\ of **template/tagset** file. (string) \n\n - `owner_id` => Unique numeric\
\ owner ID (usually you). If tagset/template is preloaded Null. (integer)\
\ \n\n - `owner_name` => Tagset/template owner name (usually you). If tagset/template\
\ is preloaded Null. (string) \n\n - `has_pipeline` => Vertical creation\
\ is supported. False for legacy templates. (boolean) \n\n - `has_tags`\
\ => Morphological tagging is supported. (boolean) \n\n - `has_lemmas`\
\ => Lemmatization is supported. (boolean) \n\n - `static_attributes` =>\
\ A list of attributes which can appear in corpus. \n\n - `structures`\
\ => A list of used structures. Examples \\
deletewrap:
type: boolean
example: true
content:
type: array
items:
type: object
properties:
str:
type: string
example: ', in Hungarian also means "majority of", "its belongings",
"its goods", "its best portion", a type of pork, and may also be
incorrectly identified as an agglutination of a frequent abbreviation
in mailing lists.'
class:
type: string
example: ''
leftlink:
type: string
example: pos=831745;detail_left_ctx=110;detail_right_ctx=50
rightlink:
type: string
example: pos=831745;detail_left_ctx=50;detail_right_ctx=110
pos:
type: integer
example: 831745
maxcontent:
type: integer
example: 200
api_version:
type: string
example: 5.63.12
manatee_version:
type: string
example: 2.36.7-SkE-2.223.6
request:
type: object
properties:
detail_left_ctx:
type: string
example: '50'
corpname:
type: string
example: preloaded/ententen21_tt31
hitlen:
type: string
example: '1'
pos:
type: string
example: '831745'
detail_right_ctx:
type: string
example: '50'
structs:
type: string
example: s,g
17_subcorpus_rename:
type: object
properties:
status:
type: string
example: OK
corpus:
type: string
example: preloaded/bnc2_tt31
subcorp_id2name:
type: object
properties:
test:
type: string
example: test_2
api_version:
type: string
example: 5.66.5
manatee_version:
type: string
example: 2.36.7-SkE-2.225.6
request:
type: object
properties:
subcorp_id:
type: string
example: test
new_subcorp_name:
type: string
example: test_2
corpname:
type: string
example: preloaded/bnc2_tt31
18_subcorp_info:
type: object
properties:
subcorp:
type: string
example: test_4
corpsize:
type: integer
example: 112338376
subcsize:
type: integer
example: 112338376
api_version:
type: string
example: 5.66.5
manatee_version:
type: string
example: 2.36.7-SkE-2.225.6
request:
type: object
properties:
subcname:
type: string
example: test_2
corpname:
type: string
example: preloaded/bnc2_tt31
19_freqdist:
type: object
properties:
lc:
type: string
example: the
freqdist:
type: object
properties:
2021-11:
description: The name is variable according selected period (wlattr).
type: object
properties:
frq:
type: integer
example: 679638
rel_frq:
type: number
format: float
example: 3022.91023457566
period_size:
type: number
format: float
example: 224829038
removed_freqdist:
type: object
properties:
2023-08:
description: The name is variable according selected period (wlattr).
type: object
properties:
frq:
type: integer
example: 0
rel_frq:
type: number
format: float
example: 0
period_size:
type: number
format: float
example: 429451774.0
average_norm:
type: number
format: float
example: 136780187.25
norm_limit:
type: number
format: float
example: 6839009.362500001
01_corpora_request:
description: Request for post method to set `name`, `language`, `tagset`, and
additional information to the corpus.
type: object
properties:
info:
type: string
description: The additional information for a newly created corpus.
example: Example description of user corpus.
language_id:
type: string
description: Language iso-code. `ISO 639-1`.
example: en
name:
type: string
description: Unique `corpus name` for a newly created corpus.
example: Example corpus
tagset_id:
type: string
description: Name of used tagset.
example: TT_ENG_V3
02_compile_request:
type: object
properties:
structures:
type: string
description: '`Structures` and `structure attributes` in corpus which should
be compiled. Usually: `all`.'
example: all
03_corpus_ids:
type: object
properties:
corpus_ids:
type: array
items:
type: integer
description: A list of `Corpus ID` of multilingual corpora.
example:
- 842464
- 842463
04_align_req:
type: object
properties:
alignstruct:
type: string
description: According to which structure the document should be aligned.
Usually, `/ `) in place of
blank lines.
encoding:
type: string
description: Encoding standard of the document. Usually `UTF-8`.
justext_stoplist:
type: string
description: Represent the list of unimportant words, in a specified language,
from an NLP point of view.
permutation:
type: array
items:
type: integer
description: Changing the order of columns (applies only to `type=vert`).
tmx_lang:
type: string
description: TMX (translation memory exchange). Language of document used
for parallel corpus creation.
tmx_struct:
type: string
description: Alignment structure to be used for multilingual documents,
`align` is the most used structure. Used within segment distinction, which
sentence is in which language and to put sentences with the same meaning
into one segment.
tmx_untranslated:
type: string
description: Placeholder for empty segments in multilingual documents. The
segments which have no counterpart in a second language of parallel corpus.
type:
type: string
description: File format (`.csv`, `.doc`, `.docx`, `.htm`, `.html` etc.).
unlegalese:
type: boolean
description: Convert `all-caps` text to `normal case`.
11_doc_metadata:
type: array
items:
type: object
properties:
id:
type: integer
description: Unique numeric `document ID`.
metadata:
type: object
description: Pairs of `attribute_name`:`value`.
13_filesets_creation:
type: object
properties:
bl_max_total_kw:
type: integer
description: 'Stands for: `blacklist max total keyword`. Means that web
page or document will be discarded if it contains more words from the
denylist (blacklist) than this limit.'
bl_max_unique_kw:
type: integer
description: 'Stands for: `blacklist max unique keyword`. Means that web
page or document will be discarded if it contains more unique words from
the denylist (blacklist) than this limit.'
black_list:
type: string
description: A list (separated by whitespaces) of `blocked words`, words
you don't want to see in your future corpus.
input_type:
type: string
description: 'Input types the web-crawler will works with. Example: `urls`'
max_cleaned_file_size:
type: integer
description: Web pages and documents with a size `over` this limit (`in
kB`) will be ignored.
max_file_size:
type: integer
description: Web pages and documents with a size `over` this limit (`in
kB`) will be ignored.
min_cleaned_file_size:
type: integer
description: Web pages and documents `smaller` than this limit (`in kB`)
after cleaning will be ignored. Cleaning involves conversion to plain
text, removing boilerplate text (e.g. navigation menus, legal text, disclaimers
and other repetitive content).
min_file_size:
type: integer
description: Web pages and documents with a `size below` this limit (`in
kB`) will be ignored.
name:
type: integer
description: Texts will be organized into a corpus folder `with this name`.
seed_words:
description: A list of words according to which the `URLs` were chosen to
be searched.
type: array
items:
type: string
white_list:
type: string
description: A list (separated by whitespaces) of `allowed words`, words
you want to see in your future corpus.
wl_min_kw_ratio:
type: integer
description: 'Stands for: `whitelist minimal keywords ratio`. Means that
web page or document will be included only if the `percentage` of allowlist
words compared to total words is `higher` than this limit.'
wl_min_total_kw:
type: integer
description: 'Stands for: `whitelist minimal total keywords`. Means that
web page or document will be included only if it contains `more words`
from the `allowlist` (whitelist) than this limit.'
wl_min_unique_kw:
type: integer
description: 'Stands for: `whitelist minimal unique keywords`. Means that
a web page or document will be included only if it contains `more words`
from the `allowlist` (whitelist) than this limit.'
19_somefiles_put:
type: object
properties:
corpora:
type: object
properties:
guessed_language_code:
type: object
properties:
language_id:
type: string
description: Language iso-code. `ISO 639-1`.
name:
type: string
description: Language name in `English`.
03_corpora_list:
type: object
properties:
id:
description: Unique numeric `corpus ID` for corpus building.
type: integer
owner_id:
description: Unique numeric `owner ID` (usually you).
type: integer
owner_name:
description: Corpus `owner name` (usually you).
type: string
corpname:
description: Unique `corpus name` for corpus querying.
type: string
language_id:
description: Language iso-code. `ISO 639-1`.
type: string
language_name:
type: string
description: Language name in `English`.
tagset_id:
type: integer
description: '`Tagset ID`. Tagset is list of part-of-speech tags (POS tags)
for specified language. They are `preselected` to the most relevant one
and can be changed only in user corpora. `Tagsets` can be refered also
as `templates`.'
sketch_grammar_id:
type: string
description: "`Sketch grammar ID`. Sketch grammar is a series of rules written\
\ in the CQL query language that search for collocations in a text corpus\
\ and categorize them according to\_their grammatical relations."
term_grammar_id:
type: string
description: '`Term grammar ID`. Term grammar tells Sketch Engine which
words and phrases should indentify as terms.'
sizes:
type: object
description: Corpus sizes. `Null` if corpus is not compiled.
properties:
doccount:
type: integer
description: Total number of `documents` in corpus.
parcount:
type: integer
description: Total number of `paragraphs` in corpus.
sentcount:
type: integer
description: Total number of `sentences` in corpus.
wordcount:
type: integer
description: Total number of `words` (tokens minus punctuation etc.)
in corpus.
tokencount:
type: integer
description: Total number of `tokens` in corpus.
created:
type: string
description: Date and time of corpus creation in format `YYYY-MM-DD HH:MM:SS`.
needs_recompiling:
type: boolean
description: '`True` if corpus documents have been altered since last compilation.'
user_can_read:
type: boolean
description: Corpus can be queried a `specific user`. Ignore all corpora
where this is false.
user_can_refer:
type: boolean
description: Corpus can be used as a `reference corpus` even by anonymous
users.
user_can_upload:
type: boolean
description: Corpus is owned by you or shared with you. You can upload documents
to it.
user_can_manage:
description: Corpus is owned by you or shared with you with `full privileges`.
type: boolean
is_shared:
type: boolean
description: True if corpus is shared with other users.
new_version:
type: string
description: If set, the old corpus is deprecated in favor of a new one.
name:
type: string
description: Corpus name. `Given by user.`
info:
type: string
description: Additional info about corpus.
aligned:
description: List of other corpora (corpus ID) within the `same` multi-lingual
set (parallel corpus).
type: array
items:
type: string
docstructure:
type: string
description: Structure in which individual documents should be wrapped.
Usually `doc`.
04_corpora_single:
type: object
properties:
id:
description: Unique numeric `corpus ID` for corpus building.
type: integer
owner_id:
description: Unique numeric `owner ID` (usually you).
type: integer
owner_name:
description: Corpus `owner name` (usually you).
type: string
corpname:
description: Unique `corpus name` for corpus querying.
type: string
language_id:
description: Language iso-code. `ISO 639-1`.
type: string
language_name:
description: Language name in `English`.
type: string
sketch_grammar_id:
description: "`Sketch grammar ID` (name of sketch grammar file). Sketch\
\ grammar is a series of rules written in the CQL query language that\
\ search for collocations in a text corpus and categorize them according\
\ to\_their grammatical relations. Example: `preloaded/english-penn_tt-3.3.wsdef.m4`."
type: string
term_grammar_id:
description: '`Term grammar ID` (name of term grammar file). Term grammar
tells Sketch Engine which words and phrases should indentify as terms.
Example: `/corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4`.'
type: string
sizes:
description: Corpus sizes. `Null` if corpus is not compiled.
type: object
properties:
doccount:
type: integer
description: Total number of `documents` in corpus.
parcount:
type: integer
description: Total number of `paragraphs` in corpus.
sentcount:
type: integer
description: Total number of `sentences` in corpus.
wordcount:
type: integer
description: Total number of `words` (tokens minus punctuation etc.)
in corpus.
tokencount:
type: integer
description: Total number of `tokens` in corpus.
created:
type: string
description: Date and time of corpus creation in format `YYYY-MM-DD HH:MM:SS`.
needs_recompiling:
description: True if corpus documents have been altered since last compilation.
type: boolean
user_can_read:
type: boolean
description: Corpus can be queried by a `specific user`. Ignore all corpora
where this is false.
user_can_refer:
type: boolean
description: Corpus can be used as a `reference corpus` even by anonymous
users.
user_can_upload:
type: boolean
description: Corpus is owned by you or shared with and you can upload documents
to it.
user_can_manage:
description: Corpus is owned by you or shared with you with `full privileges`.
type: boolean
is_shared:
type: boolean
description: '`True` if corpus is shared with other users.'
new_version:
type: string
description: If set, the old corpus is deprecated in favor of a new one.
name:
description: Corpus name. `Given by user`.
type: string
info:
description: Additional info about corpus.
type: string
aligned:
description: Other corpora within the `same` multi-lingual set (parallel
corpus).
type: array
items:
type: string
docstructure:
description: Structure in which individual documents should be wrapped.
Usually `doc`.
type: string
is_error_corpus:
description: Current state of corpus.
type: boolean
attrlist:
description: 'Attributes appearing in corpus documents. Attributes like:
`word`, `tag`, `lempos`, `pos`, `lemma`, etc.'
type: array
items:
type: string
tagset_id:
description: Tagset ID. Tagset is list of part-of-speech tags (POS tags)
for specified language. They are `preselected` to the most relevant one
and can be changed only in user corpora. The terms `tagset` and `templates`
are interchangeable.
type: integer
reference_corpus:
description: Default reference corpus for `keyword extraction`.
type: string
progress:
description: 'Compilation status: `0` if not compiled, `100` if compiled
successfully, `-1` if failed, otherwise in progress.'
type: integer
error:
description: Informs about last compilation error, if any error `None`.
type: string
document_count:
description: The amount of documents the corpus was build from.
type: integer
can_be_upgraded:
description: '`True` if corpus template is outdated and can be upgraded.
The terms `tagset` and `templates` are interchangeable.'
type: boolean
available_structures:
description: All `structures`/`attributes` that appear in corpus documents.
type: array
items:
type: object
properties:
name:
type: string
description: Structure name.
freq:
type: integer
description: Frequency of structure.
attributes:
type: array
items:
type: string
description: List of used attributes.
file_structure:
description: The structure in which individual documents should be wrapped.
Usually `doc`.
type: string
onion_structure:
description: The structure for deduplication. Usually `p` (paragraph) or
`Null` (no deduplication).'
type: string
expert_mode:
description: Set to `True` if you are hard-core.
type: boolean
document_order:
description: Not mandatory. Can be set to enforce document order within
the corpus.
type: array
items:
type: integer
use_all_structure:
description: Use `all` structures available in corpus.
type: boolean
structures:
description: Available `structures` or `tags` in the corpus. Structures
like `s` (sentence), `g` (glue), `doc` (document).
type: array
items:
type: object
properties:
name:
type: string
description: Structure name.
attributes:
description: A list of used attributes in corpus.
type: array
items:
type: object
properties:
name:
type: string
description: The name of used attribute.
05_can_be_compiled:
type: object
properties:
result:
type: object
properties:
can_be_compiled:
type: boolean
description: True, if the corpus does not contain any potential error,
which can break compilation.
reason:
type: string
description: 'Description of problem why it cannot be compiled. If none
Null. Example: `QUOTA_EXCEEDED` or `EMPTY`.'
error:
type: string
description: Unexpected server error. If none Null.
06_get_progress:
type: object
properties:
result:
type: object
properties:
progress:
type: integer
description: 'Compilation status: `0` if not compiled, `100` if compiled
successfully, `-1` if failed, otherwise in progress.'
error:
type: string
example: ""
description: Problem description. If none Null.
error:
type: string
example: ""
description: Unexpected server error. If none Null.
12_corpora_single_full:
type: object
properties:
id:
description: Unique numeric `corpus ID` for corpus building.
type: integer
owner_id:
description: Unique numeric `owner ID` (usually you).
type: integer
owner_name:
description: Corpus `owner name` (usually you).
type: string
corpname:
description: Unique `corpus name` for corpus querying.
type: string
language_id:
description: Language iso-code. `ISO 639-1`.
type: string
language_name:
description: Language name in `English`.
type: string
sketch_grammar_id:
description: "`Sketch grammar ID` (name of sketch grammar file). Sketch\
\ grammar is a series of rules written in the CQL query language that\
\ search for collocations in a text corpus and categorize them according\
\ to\_their grammatical relations. Example: `preloaded/english-penn_tt-3.3.wsdef.m4`."
type: string
term_grammar_id:
description: '`Term grammar ID` (name of term grammar file). Term grammar
tells Sketch Engine which words and phrases should indentify as terms.
Example: `/corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4`.'
type: string
sizes:
description: Corpus sizes. `Null` if corpus is not compiled.
type: object
properties:
doccount:
type: integer
description: Total number of `documents` in corpus.
parcount:
type: integer
description: Total number of `paragraphs` in corpus.
sentcount:
type: integer
description: Total number of `sentences` in corpus.
wordcount:
type: integer
description: Total number of `words` (tokens minus punctuation etc.)
in corpus.
tokencount:
type: integer
description: Total number of `tokens` in corpus.
is_sgdev:
type: boolean
description: TODO
is_featured:
type: boolean
description: TOOD
access_level:
type: boolean
description: TODO
access_on_demand:
type: boolean
description: TODO
terms_of_use:
type: string
description: TODO
sort_to_end:
type: boolean
description: TODO
tags:
type: array
items:
type: string
description: TODO
created:
type: string
description: Date and time of corpus creation in format `YYYY-MM-DD HH:MM:SS`.
needs_recompiling:
description: True if corpus documents have been altered since last compilation.
type: boolean
user_can_read:
type: boolean
description: Corpus can be queried by a `specific user`. Ignore all corpora
where this is false.
user_can_refer:
type: boolean
description: Corpus can be used as a `reference corpus` even by anonymous
users.
user_can_upload:
type: boolean
description: Corpus is owned by you or shared with and you can upload documents
to it.
user_can_manage:
description: Corpus is owned by you or shared with you with `full privileges`.
type: boolean
is_shared:
type: boolean
description: '`True` if corpus is shared with other users.'
new_version:
type: string
description: If set, the old corpus is deprecated in favor of a new one.
name:
description: Corpus name. `Given by user`.
type: string
info:
description: Additional info about corpus.
type: string
wsdef:
description: 'Default word sketch definition. Example: `/corpora/wsdef/serbian-multext-rft1-1.0.wsdef.txt`.'
type: string
termdef:
description: Default term definition.
type: string
diachronic:
description: Is this corpus developing over time to keep track in vocabulary
changes, grammar and language usage. If yes what time period does the
corpus cover.
type: string
aligned:
description: Other corpora within the `same` multi-lingual set (parallel
corpus).
type: array
items:
type: string
docstructure:
description: Structure in which individual documents should be wrapped.
Usually `doc`.
type: string
is_error_corpus:
description: Current state of corpus.
type: boolean
attrlist:
description: 'Attributes appearing in corpus documents. Attributes like:
`word`, `tag`, `lempos`, `pos`, `lemma`, etc.'
type: array
items:
type: string
tagset_id:
description: Tagset ID. Tagset is list of part-of-speech tags (POS tags)
for specified language. They are `preselected` to the most relevant one
and can be changed only in user corpora. The terms `tagset` and `templates`
are interchangeable.
type: integer
reference_corpus:
description: Default reference corpus for `keyword extraction`.
type: string
progress:
description: 'Compilation status: `0` if not compiled, `100` if compiled
successfully, `-1` if failed, otherwise in progress.'
type: integer
error:
description: Informs about last compilation error, if any error `None`.
type: string
document_count:
description: The amount of documents the corpus was build from.
type: integer
can_be_upgraded:
description: '`True` if corpus template is outdated and can be upgraded.
The terms `tagset` and `templates` are interchangeable.'
type: boolean
available_structures:
description: All `structures`/`attributes` that appear in corpus documents.
type: array
items:
type: object
properties:
name:
type: string
description: Structure name.
freq:
type: integer
description: Frequency of structure.
attributes:
type: array
items:
type: string
description: List of used attributes.
file_structure:
description: The structure in which individual documents should be wrapped.
Usually `doc`.
type: string
onion_structure:
description: The structure for deduplication. Usually `p` (paragraph) or
`Null` (no deduplication).'
type: string
expert_mode:
description: Set to `True` if you are hard-core.
type: boolean
document_order:
description: Not mandatory. Can be set to enforce document order within
the corpus.
type: array
items:
type: integer
use_all_structure:
description: Use `all` structures available in corpus.
type: boolean
structures:
description: Available `structures` or `tags` in the corpus. Structures
like `s` (sentence), `g` (glue), `doc` (document).
type: array
items:
type: object
properties:
name:
type: string
description: Structure name.
attributes:
description: A list of used attributes in corpus.
type: array
items:
type: object
properties:
name:
type: string
description: The name of used attribute.
13_documents_get:
type: array
items:
type: object
properties:
id:
type: integer
description: Unique numeric `document ID` to identify individual documents
from which the corpus was created.
filename_display:
type: string
description: The name of the document.
parameters:
description: Parameters for plaintext extraction.
type: object
properties:
type:
type: string
description: 'File format. Possible formats: `.csv`, `.doc`, `.docx`,
`.htm`, `.html` etc..'
encoding:
type: string
description: Encoding standard of the document. Usually `UTF-8`.
tmx_lang:
type: string
description: TMX (translation memory exchange). Language of document
used for parallel corpus creation.
tmx_struct:
type: string
description: Alignment structure to be used for multilingual documents,
`align` is the most used structure. Used within segment distinction,
which sentence is in which language and to put sentences with the
same meaning into one segment.
unlegalese:
type: boolean
description: Convert `all-caps` text to `normal case`.
permutation:
type: array
items:
type: integer
description: Changing the order of columns (applies only to `type=vert`).
auto_paragraphs:
type: string
description: Automatically insert paragraph breaks (\ ) in place
of blank lines.
justext_stoplist:
type: string
description: Represent the list of unimportant words, in a specified
language, from an NLP point of view.
tmx_untranslated:
type: string
description: Placeholder for empty segments in multilingual documents.
The segments which have no counterpart in a second language of parallel
corpus.
temporary:
type: boolean
description: Is document temporary or not.
word_count:
type: integer
description: Total number of `words` (tokens minus punctuation etc.) in
document.
vertical_progress:
type: integer
description: Progress of `vertical file` creation.
vertical_error:
description: An error occured while creating the vertical file. If the
creation was succesfull the value is `Null`.
type: string
metadata:
type: object
description: Metadata of document. For example, additional `attributes
and values`.
14_documents_post:
type: array
items:
type: object
properties:
id:
type: integer
description: Unique numeric `document ID` to identify individual documents
from which the corpus was created.
filename_display:
type: string
description: The name of the document.
parameters:
description: Parameters for plaintext extraction.
type: object
properties:
type:
type: string
description: 'File format. Possible formats: `.csv`, `.doc`, `.docx`,
`.htm`, `.html` etc..'
tmx_struct:
type: string
description: Alignment structure to be used for multilingual documents,
`align` is the most used structure. Used within segment distinction,
which sentence is in which language and to put sentences with the
same meaning into one segment.
tmx_untranslated:
type: string
description: Placeholder for empty segments in multilingual documents.
The segments which have no counterpart in a second language of parallel
corpus.
unlegalese:
type: boolean
description: Convert `all-caps` text to `normal case`.
justext_stoplist:
type: string
description: Represent the list of unimportant words, in a specified
language, from an NLP point of view.
tmx_lang:
type: string
description: TMX (translation memory exchange). Language of document
used for parallel corpus creation.
permutation:
type: array
items:
type: integer
description: Changing the order of columns (applies only to `type=vert`).
temporary:
type: boolean
description: Is document temporary or not.
word_count:
type: integer
description: Total number of `words` (tokens minus punctuation etc.) in
document.
vertical_progress:
type: integer
description: Progress of `vertical file` creation.
vertical_error:
description: An error occured while creating the vertical file. If the
creation was succesfull the value is `Null`.
type: string
metadata:
description: Metadata of document. For example, additional `attributes
and values`.
type: object
15_doc_preview:
type: object
properties:
result:
type: string
description: Showcase of few lines from the file the corpus was created
from (1kB).
error:
type: string
description: Unexpected server error. If none Null.
16_rpc_expand_archive:
type: object
properties:
result:
type: integer
description: Returns fileset ID.
error:
type: string
description: Unexpected server error. If none Null.
57_not_found_404:
type: object
properties:
error:
type: string
example: PreloadedCorpus/UserCorpus/Document/Tagset/Grammar/GdexConf matching
query does not exist.
45_not_found_RPC:
type: object
properties:
result:
type: boolean
description: Result of succesfully finished request otherwise Null.
error:
type: string
example: PreloadedCorpus/UserCorpus/Document/Tagset/Grammar/GdexConf/SiteLicence
matching query does not exist.
17_bad_request_RPC_1:
type: object
properties:
result:
type: boolean
description: Result of succesfully finished request otherwise Null.
example: true
error:
type: string
description: 'Examples: `READ_ONLY`, `INVALID_CORPUS_IDS`. You do not have
permissions for it or you inputted IDs are not correct.'
28_bad_request_RPC_8:
type: object
properties:
result:
type: boolean
description: Result of succesfully finished request otherwise Null.
error:
type: string
description: 'Examples: `QUOTA_EXCEEDED`, `READ_ONLY`, `INVALID_CORPUS_IDS`,
`CORPUS_BUSY`.'
29_bad_request_RPC_9:
type: object
properties:
result:
type: boolean
description: Result of succesfully finished request otherwise Null.
error:
type: string
description: 'Examples: `QUOTA_EXCEEDED`, `READ_ONLY`.'
30_bad_request_10:
type: object
properties:
error:
type: string
description: 'Examples: `QUOTA_EXCEEDED`, `READ_ONLY`, `CORPUS_BUSY`, `DAILY_TAGGING_EXCEEDED`,
`INVALID_URL`, `NO_DATA`.'
31_bad_request_11:
type: object
properties:
error:
type: string
description: 'Examples: `READ_ONLY`, `CORPUS_BUSY`, `INVALID_METADATA`.'
32_bad_request_12:
type: object
properties:
error:
type: string
description: 'Examples: `READ_ONLY`, `CORPUS_BUSY`.'
33_bad_request_13:
type: object
properties:
error:
type: string
description: 'Examples: `READ_ONLY`.'
34_bad_request_14:
type: object
properties:
error:
type: string
description: 'Examples: `READ_ONLY`, `QUOTA_EXCEEDED`, `DAILY_TAGGING_EXCEEDED`,
`CORPUS_BUSY`.'
40_bad_request_20:
type: object
properties:
error:
type: string
description: 'Examples: `READ_ONLY`, `QUOTA_EXCEEDED`, `DAILY_TAGGING_EXCEEDED`,
`NO_DATA`.'
27_forbidden_normal:
type: object
properties:
error:
type: string
description: 'Example: `Permission denied`. You do not have required permissions
for specified corpus, document, template or other stuff. Permissions like
(read, manage, edit, delete, superuser, etc.).'
21_forbidden:
type: object
properties:
result:
type: boolean
description: Result of succesfully finished request otherwise Null.
error:
type: string
description: 'Example: `Permission denied`. You do not have required permissions
for specified corpus, document, template or other stuff. Permissions like
(read, manage, edit, delete, superuser, etc.).'
20_unauthorized:
type: object
properties:
error:
type: string
description: 'Exampple: `Unauthorized`. You need to authorize first, use
API key from Sketch Engine.'
23_unauthorized_rpc:
type: object
properties:
result:
type: boolean
description: Result of succesfully finished request otherwise Null.
error:
type: string
description: 'Example: `Unauthorized`. You need to authorize first, use
API key from Sketch Engine.'
58_fileset:
type: object
properties:
progress:
type: integer
description: 'Fileset creation status: `0` if not started, `100` if finished
succesfully, -1 if failed, otherwise in progress. Example: downloading
content for corpus creation from the Internet.'
time_elapsed:
type: integer
description: Duration of action with filesets (in seconds).
error:
type: string
description: Description of problem. If none Null.
id:
type: integer
description: Fileset ID.
name:
type: string
description: Fileset name. `Given by user (except the main one with ID =
0).`
word_count:
type: integer
description: Total number of `words` (tokens minus punctuation etc.) in
document.
web_crawl:
type: object
properties:
input_type:
type: string
description: '`Source URL` from where the words were downloaded/extracted:
website, documents...'
seed_words:
type: array
items:
type: string
description: A `List of words` acording which the web-crawler will
search and gather data from URLs containing them.
urls:
type: array
items:
type: string
description: A `List of URLs` to be searched by web-crawler.
site:
type: array
items:
type: string
description: Specific website to be searched by web-crawler.
data_downloaded:
type: integer
description: The amount of data `downloaded` by a web-crawler to create
corpus.
remaining_files_count:
type: integer
description: Counter of files found by web-crawler during crawling,
`waiting` to be processed.
processed_files_count:
type: integer
description: Counter of `already processed` files.
unprocessed_files_count:
type: integer
description: Counter of files which `cannot` be processed because `invalid
content type`, `size`, `duplication` etc..
invalid_content_types_count:
type: integer
description: Counter of files containing content like `navigation links`,
`advertisement`, `headers`, `footers` etc..
unable_to_convert_count:
type: integer
description: Counter for files whose format cannot be converted to one
of the supported formats.
duplicate_count:
type: integer
description: Counter for files with repeating content.
time_elapsed:
type: integer
description: Duration of words gathering with web-crawler (in seconds).
average_file_processing_time:
type: integer
description: Average time to process single file (in seconds).
59_filesets_get_progress:
type: object
properties:
result:
type: object
properties:
progress:
type: integer
description: 'Fileset creation status: `0` if not started, `100` if
finished succesfully, -1 if failed, otherwise in progress. Example:
downloading content for corpus creation from the Internet.'
time_elapsed:
type: number
format: float
description: Duration of action with filesets (in second).
error:
type: string
description: Description of problem why it cannot be done.
word_count:
type: integer
description: Amount of words(tokens minus punctuation etc.) downloaded
by web-crawler.
error:
type: string
description: Unexpected server error.
61_fileset_creation:
type: object
properties:
progress:
type: integer
description: 'Fileset creation status: `0` if not started, `100` if finished
succesfully, -1 if failed, otherwise in progress. Example: downloading
content for corpus creation from the Internet.'
time_elapsed:
type: integer
description: Duration of action with filesets (in seconds).
error:
type: string
description: Description of problem. If none Null.
id:
type: integer
description: Fileset ID.
name:
type: string
description: Fileset name. `Given by user (except the main one with ID =
0).`
word_count:
type: integer
description: Total number of `words` (tokens minus punctuation etc.) in
document.
web_crawl:
type: object
properties:
input_type:
type: string
description: '`Source URL` from where the words were downloaded/extracted:
website, documents...'
seed_words:
type: array
items:
type: string
description: A `List of words` acording which the web-crawler will
search and gather data from URLs containing them.
urls:
type: array
items:
type: string
description: A `List of URLs` to be searched by web-crawler.
site:
type: array
items:
type: string
description: Specific website to be searched by web-crawler.
data_downloaded:
type: integer
description: The amount of data `downloaded` by a web-crawler to create
corpus.
remaining_files_count:
type: integer
description: Counter of files found by web-crawler during crawling,
`waiting` to be processed.
processed_files_count:
type: integer
description: Counter of `already processed` files.
unprocessed_files_count:
type: integer
description: Counter of files which `cannot` be processed because `invalid
content type`, `size`, `duplication` etc..
invalid_content_types_count:
type: integer
description: Counter of files containing content like `navigation links`,
`advertisement`, `headers`, `footers` etc..
unable_to_retrieve_count:
type: integer
description: Cannot return count.
invalid_size_count:
type: integer
description: Counter for sizes that are bigger or smaller as defined
limits (max_file_size, min_file_size).
invalid_cleaned_size_count:
type: integer
description: Counter for sizes that are bigger or smaller as defined
limits (max_file_size, min_file_size).
keywords_filter_applied_count:
type: integer
description: Amounth of filter usage.
unable_to_convert_count:
type: integer
description: Counter for files whose format cannot be converted to one
of the supported formats.
duplicate_count:
type: integer
description: Counter for files with repeating content.
time_elapsed:
type: integer
description: Duration of words gathering with web-crawler (in seconds).
average_file_processing_time:
type: integer
description: Average time to process single file (in seconds).
63_language:
type: object
properties:
id:
type: string
description: Language iso-code. `ISO 639-1`.
name:
type: string
description: Language name in `English`.
autonym:
type: string
description: Language name in that language.
default_tagset_id:
type: string
description: '`Tagset ID.` Tagset is list of part-of-speech tags (POS tags)
for specified language. Defaulty preselected to the most relevant one.
For user corpora can be changed. The terms `tagset` and `templates` are
interchangeable.'
reference_corpus:
type: string
description: Default `reference` corpus.
has_term_grammar:
type: boolean
description: True if `term extraction` is supported.
script:
type: string
description: 'Used script. Example: `Latin`, `Cyrillic`, etc.'
69_rpc_style:
type: object
properties:
result:
type: boolean
description: Represent whether request was finished successfully or not.
error:
type: string
description: 'Unexpected server error. Example: `QUOTA_EXCEEDED`. If none
Null.'
78_somefiles_post:
type: object
properties:
data:
type: object
properties:
id:
type: string
description: An alphanumeric `somefile ID`.
name:
type: string
description: Name of `multilingual file`.
file_type:
type: string
description: 'File type of multilingual file: `.tmx`, .`.xlsx`, etc.'
owner_id:
type: integer
description: Unique numeric `owner ID` (usually you).
temporary:
type: boolean
description: Is document temporary or not.
encoding:
type: string
description: Encoding standard of the document. Usually `UTF-8`.
guessed_languages:
type: object
description: 'An object of automatically guessed languages of inserted
files during multilingual corpus creation. Maximum: `2`, because Sketch
Engine support multilingual corpora only from 2 languages yet.'
properties:
language_1:
type: string
language_2:
type: string
79_template:
type: object
properties:
id:
type: string
description: Alphanumeric `template/tagset ID`. The terms `tagset` and `templates`
are interchangeable.
name:
type: string
description: Name of `template/tagset file`.
owner_id:
type: integer
description: Unique numeric `owner ID` (usually you). If tagset/template
is preloaded `Null`.
owner_name:
type: string
description: Tagset/template `owner name` (usually you). If tagset/template
is preloaded `Null`.
has_pipeline:
type: boolean
description: Vertical creation is supported. False for legacy templates.
has_tags:
type: boolean
description: Morphological tagging is supported.
has_lemmas:
type: boolean
description: Lemmatization is supported.
static_attributes:
type: array
description: A list of attributes which can appear in corpus.
items:
type: string
structures:
type: array
description: A list of used structures. Examples `, \\
Kwic:
type: array
items:
type: object
properties:
str:
type: string
example: dogs
coll:
type: integer
example: 1
Right:
type: array
items:
type: object
properties:
str:
type: string
example:
Links:
type: array
items:
type: string
linegroup:
type: string
example: _
linegroup_id:
type: integer
example: 0
fromp:
type: integer
example: 1
concsize:
type: integer
example: 12087
concordance_size_limit:
type: integer
example: 10000
Sort_idx:
type: array
items:
type: string
righttoleft:
type: boolean
example: false
Aligned_rtl:
type: array
items:
type: string
numofcolls:
type: integer
example: 0
finished:
type: integer
example: 1
fullsize:
type: integer
example: 12087
relsize:
type: number
example: 107.59
q:
type: array
items:
type: string
example: q[lc=\"dog\" | lemma_lc=\"dog\"]
Desc:
type: object
properties:
op:
type: string
example: Query
arg:
type: string
example: '[lc=\"dog\" | lemma_lc=\"dog\"]'
nicearg:
type: string
example: dog
rel:
type: number
example: 107.59
size:
type: integer
example: 12087
tourl:
type: string
example: q=q%5Blc%3D%22dog%22+%7C+lemma_lc%3D%22dog%22%5D
port:
type: integer
example: 0
gdex_scores:
type: array
items:
type: string
sc_strcts:
type: array
items:
type: array
items:
type: string
example: bncdoc
api_version:
type: string
example: 5.63.1
manatee_version:
type: string
example: 2.36.7-SkE-2.221
request:
type: object
properties:
concordance_query:
type: array
items:
type: object
properties:
queryselector:
type: string
example: iqueryrow
iquery:
type: string
example: dog
corpname:
type: string
example: preloaded/bnc2_tt21
kwicleftctx:
type: string
example: 100#
structs:
type: string
example: s,g
viewmode:
type: string
example: sen
attr_allpos:
type: string
example: all
fromp:
type: string
example: '1'
json:
type: string
example: '{\"concordance_query\":[{\"queryselector\":\"iqueryrow\",\"iquery\":\"dog\"}]}'
kwicrightctx:
type: string
example: 100#
refs:
type: string
example: =bncdoc.alltyp
cup_hl:
type: string
example: q
attrs:
type: string
example: word
pagesize:
type: string
example: '20'
07_subcorp:
type: object
properties:
subcname:
type: string
example: Australian domain .au
SubcorpList:
type: array
items:
type: object
properties:
n:
type: string
example: Australian domain .au
name:
type: string
example: Australian domain .au
user:
type: integer
example: 0
api_version:
type: string
example: 5.63.1
manatee_version:
type: string
example: 2.36.7-SkE-2.221
request:
type: object
properties:
corpname:
type: string
example: preloaded/ententen13_tt2_1
08_extract_keywords:
type: object
properties:
keywords:
type: array
items:
type: object
properties:
item:
type: string
example: "galsk\xFD"
score:
type: number
example: 2411.16
frq1:
type: integer
example: 2
frq2:
type: integer
example: 512
rel_frq1:
type: number
example: 4291.8457
rel_frq2:
type: number
example: 0.78041
query:
type: string
example: "[lemma=\\\"galsk\xFD\\\"]"
referece_corpus_name:
type: string
example: Slovak Web 2011 (skTenTen11)
reference_corpus_size:
type: integer
example: 656067998
reference_subcorpus_size:
type: integer
example: 656067998
subcorpus_size:
type: integer
example: 466
corpus_size:
type: integer
example: 466
total:
type: integer
example: 175
totalfrq1:
type: integer
example: 466
totalfrq2:
type: integer
example: 250525622
wllimit:
type: integer
example: 1000
note:
type: string
example: ''
api_version:
type: string
example: 5.63.1
manatee_version:
type: string
example: 2.36.7-SkE-2.221
request:
type: object
properties:
alnum:
type: string
example: '1'
maxfreq:
type: string
example: '0'
minfreq:
type: string
example: '1'
wlpat:
type: string
example: .*
attr:
type: string
example: lemma
keywords:
type: string
example: '1'
ref_corpname:
type: string
example: preloaded/sktenten11_rft1
simple_n:
type: string
example: '1'
k_attr:
type: string
example: lemma
include_nonwords:
type: string
example: '0'
reldocf:
type: string
example: :"0"
icase:
type: string
example: '1'
onealpha:
type: string
example: '1'
max_keywords:
type: string
example: '1000'
corpname:
type: string
example: user/matuskostka1/aaaaa_slovak
09_attr_vals:
type: object
properties:
query:
type: string
example: .*
description: The regular expression from query parameter `avpat`.
suggestions:
type: array
items:
type: string
example: "[ \"Cookson, Neil Andrew\", \u2026 ]"
description: Suggestions for avattr `bncdoc.author`.
no_more_values:
type: boolean
example: false
description: Represent if the `suggestion` list is complete.
api_version:
type: string
example: 5.63.1
manatee_version:
type: string
example: 2.36.7-SkE-2.221
request:
type: object
properties:
avpat:
type: string
example: .*
avmaxitems:
type: string
example: '15'
ajax:
type: string
example: '1'
corpname:
type: string
example: preloaded/bnc2_tt21
avfrom:
type: string
example: '0'
icase:
type: string
example: '1'
avattr:
type: string
example: bncdoc.author
10_collx:
type: object
properties:
Head:
type: array
items:
type: object
properties:
n:
type: string
example: Cooccurrence count
s:
type: string
example: f
style:
type: string
example: ' style="word-wrap: break-word; width: 5em;"'
Items:
type: array
items:
type: object
properties:
str:
type: string
example: Belvin
freq:
type: integer
example: 7
coll_freq:
type: integer
example: 5
Stats:
type: array
items:
type: object
properties:
s:
type: string
example: '2.64537'
n:
type: string
example: t
pfilter:
type: string
example: q=P-5+5+1+%5Bword%3D%22Belvin%22%5D
nfilter:
type: string
example: q=N-5+5+1+%5Bword%3D%22Belvin%22%5D
lastpage:
type: integer
example: 0
wllimit:
type: integer
example: 1000
concsize:
type: integer
example: 22685
Desc:
type: array
items:
type: object
properties:
op:
type: string
example: Query
arg:
type: string
example: '[lemma="test"]'
nicearg:
type: string
example: test
rel:
type: number
example: 202.22
size:
type: integer
example: 22685
tourl:
type: string
example: q=q%5Blemma%3D%22test%22%5D
api_version:
type: string
example: 5.63.1
manatee_version:
type: string
example: 2.36.7-SkE-2.223.6
request:
description: Just summary section of parsed query parameters used in this
endpoint call. These parameters are all documented in the beggining of
every endpoint box (after you unwrap the endpoint).
type: object
properties:
csortfn:
type: string
example: m
corpname:
type: string
example: preloaded/bnc2
q:
type: string
example: q[lemma="test"]
11_freqml:
type: object
properties:
fcrit:
type: string
example: fcrit=word%2Fe+-1%3C0+lemma%2Fe+-1%3C0
FCrit:
type: array
items:
type: object
properties:
fcrit:
type: string
example: word/e -1<0 lemma/e -1<0
Blocks:
type: array
items:
type: object
properties:
Head:
type: array
items:
type: object
properties:
n:
type: string
example: word
s:
type: integer
example: 0
id:
type: string
example: word/e
total:
type: integer
example: 1700
totalfrq:
type: integer
example: 12087
Items:
type: array
items:
type: object
properties:
Word:
type: array
items:
type: object
properties:
n:
type: string
example: the
frq:
type: integer
example: 2621
rel:
type: integer
example: 0
reltt:
type: integer
example: 0
norm:
type: integer
example: 0
fbar:
type: integer
example: 301
relbar:
type: integer
example: 0
freqbar:
type: integer
example: 0
pfilter:
type: string
example: ;q=p-1%3C0+-1%3C0+0+%5Bword%3D%22the%22%5D;q=p-1%3C0+-1%3C0+0+%5Blemma%3D%22the%22%5D
nfilter:
type: string
pfilter_list:
type: array
items:
type: array
items:
type: string
example: p-1<0 -1<0 0 [word="the"]
poc:
type: number
example: 21.684454372466284
fpm:
type: number
example: 23.329771292938062
paging:
type: integer
example: 1
concsize:
type: integer
example: 12087
fullsize:
type: integer
example: 14297
Desc:
type: array
items:
type: object
properties:
op:
type: string
example: Query
arg:
type: string
example: '[lc="dog" | lemma_lc="dog"]'
nicearg:
type: string
example: dog
rel:
type: number
example: 107.59
size:
type: integer
example: 12087
tourl:
type: string
example: q=q%5Blc%3D%22dog%22+%7C+lemma_lc%3D%22dog%22%5D
numofcolls:
type: integer
example: 0
hitlen:
type: integer
example: 1
wllimit:
type: integer
example: 1000
lastpage:
type: integer
example: 0
ml:
type: boolean
example: true
api_version:
type: string
example: 5.63.12
manatee_version:
type: string
example: 2.36.7-SkE-2.223.6
request:
description: Just summary section of parsed query parameters used in this
endpoint call. These parameters are all documented in the beggining of
every endpoint box.
type: object
properties:
concordance_query:
type: array
items:
type: object
properties:
queryselector:
type: string
example: iqueryrow
iquery:
type: string
example: dog
format:
type: string
example: json
fpage:
type: string
example: '1'
showpoc:
type: string
example: '1'
freqlevel:
type: string
example: '2'
group:
type: string
example: '1'
freq_sort:
type: string
example: freq
ml1ctx:
type: string
example: -1<0
showreltt:
type: string
example: '1'
ml2attr:
type: string
example: lemma
ml1attr:
type: string
example: word
ml2ctx:
type: string
example: -1<0
fmaxitems:
type: string
example: '5000'
corpname:
type: string
example: preloaded/bnc2_tt21
showrel:
type: string
example: '1'
12_struct_wordlist:
type: object
properties:
fcrit:
type: string
example: fcrit=lemma%2Fe+0+word%2Fe+0+lempos%2Fe+0
FCrit:
type: array
items:
type: object
properties:
fcrit:
type: string
example: lemma/e 0 word/e 0 lempos/e 0
Blocks:
type: array
items:
type: object
properties:
Head:
type: array
items:
type: object
properties:
n:
type: string
example: lemma
s:
type: integer
example: 0
id:
type: string
example: lemma/e
total:
type: integer
example: 77
totalfrq:
type: integer
example: 13931
Items:
type: array
items:
type: object
properties:
Word:
type: array
items:
type: object
properties:
n:
type: string
example: dog
frq:
type: integer
example: 6829
rel:
type: integer
example: 0
reltt:
type: integer
example: 0
norm:
type: integer
example: 0
fbar:
type: integer
example: 301
relbar:
type: integer
example: 0
freqbar:
type: integer
example: 0
pfilter:
type: string
example: ;q=p0+0+0+%5Blemma%3D%22dog%22%5D;q=p0+0+0+%5Bword%3D%22dog%22%5D;q=p0+0+0+%5Blempos%3D%22dog-n%22%5D
nfilter:
type: string
pfilter_list:
type: array
items:
type: array
items:
type: string
example: p0 0 0 [lemma="dog"]
poc:
type: number
example: 47.765265440302166
fpm:
type: number
example: 60.78558113676994
paging:
type: integer
example: 1
concsize:
type: integer
example: 14297
fullsize:
type: integer
example: 14297
Desc:
type: array
items:
type: object
properties:
op:
type: string
example: Query
arg:
type: string
example: '[lemma_lc="(dog.*)"]'
nicearg:
type: string
example: (dog.*)
rel:
type: number
example: 127.26
size:
type: integer
example: 14297
tourl:
type: string
example: q=q%5Blemma_lc%3D%22%28dog.%2A%29%22%5D
numofcolls:
type: integer
example: 0
hitlen:
type: integer
example: 1
wllimit:
type: integer
example: 1000
lastpage:
type: integer
example: 1
ml:
type: boolean
example: true
api_version:
type: string
example: 5.63.12
manatee_version:
type: string
example: 2.36.7-SkE-2.223.6
request:
description: Just summary section of parsed query parameters used in this
endpoint call. These parameters are all documented in the beggining of
every endpoint box.
type: object
properties:
wlmaxfreq:
type: string
example: '0'
wlpage:
type: string
example: '1'
random:
type: string
example: '0'
wlstruct_attr1:
type: string
example: lemma
wltype:
type: string
example: struct_wordlist
fmaxitems:
type: string
example: '20000'
wlpat:
type: string
example: (dog.*)
wlnums:
type: string
example: frq
wlattr:
type: string
example: lemma_lc
wlicase:
type: string
example: '1'
wlmaxitems:
type: string
example: '20000'
wlstruct_attr3:
type: string
example: lempos
relfreq:
type: string
example: '1'
include_nonwords:
type: string
example: '1'
wlsort:
type: string
example: frq
wlstruct_attr2:
type: string
example: word
corpname:
type: string
example: preloaded/bnc2_tt21
reldocf:
type: string
example: '1'
wlminfreq:
type: string
example: '5'
13_freq_distrib:
type: object
properties:
dots:
type: array
items:
type: object
properties:
frq:
type: integer
example: 64
description: ''
pos:
type: integer
example: 0
description: ''
beg:
type: integer
example: 72053
description: ''
end:
type: integer
example: 2475660
description: ''
granularity:
type: integer
example: 50
description: ''
api_version:
type: string
example: 5.63.12
manatee_version:
type: string
example: 2.36-7-SkE-2.223.6
request:
type: object
properties:
concordance_query:
type: array
items:
type: object
properties:
queryselector:
type: string
example: lemmarow
lemma:
type: string
example: cat
lpos:
type: string
example: -n
qmcase:
type: integer
example: 0
structs:
type: string
example: s,g
fc_lemword_type:
type: string
example: all
attrs:
type: string
example: word
json:
type: string
example: '{"concordance_query":[{"queryselector":"lemmarow","lemma":"cat","lpos":"-n","qmcase":false}]}'
res:
type: string
example: '50'
fc_lemword_window_type:
type: string
example: both
normalize:
type: string
example: '0'
format:
type: string
example: json
attr_allpos:
type: string
example: all
fc_pos_type:
type: string
example: all
fc_pos_wsize:
type: string
example: '5'
refs:
type: string
example: =bncdoc.alltyp
viewmode:
type: string
example: sen
lpos:
type: string
example: -n
corpname:
type: string
example: preloaded/bnc2_tt21
default_attr:
type: string
example: lemma
fc_lemword_wsize:
type: string
example: '5'
fc_pos_window_type:
type: string
example: both
14_fullref:
type: object
properties:
Refs:
type: array
items:
type: object
properties:
name:
type: string
example: Token number
id:
type: string
example: '#'
val:
type: string
example: '6270887'
bncdoc_id:
type: string
example: J1C
bncdoc_author:
type: string
example: ===NONE===
bncdoc_year:
type: string
example: ===NONE===
bncdoc_title:
type: string
example: '[Leeds United e-mail list]'
bncdoc_info:
type: string
example: '[Leeds United e-mail list]. Sample containing about 41810 words
of unpublished miscellanea (domain: leisure)'
bncdoc_allava:
type: string
example: Ownership has not been claimed
bncdoc_alltim:
type: string
example: 1985-1993
bncdoc_alltyp:
type: string
example: Written miscellaneous
bncdoc_genre:
type: string
example: W_email
u_who:
type: string
example: ''
s_audio:
type: string
example: ===NONE===
api_version:
type: string
example: 5.63.12
manatee_version:
type: string
example: 2.36.7-SkE-2.223.6
request:
type: object
properties:
corpname:
type: string
example: preloaded/bnc2_tt21
pos:
type: string
example: '6270887'
15_textypes_with_norms:
type: object
properties:
Blocks:
type: array
items:
type: object
properties:
Line:
type: array
items:
type: object
properties:
name:
type: string
example: bncdoc.alltyp
label:
type: string
example: Text type
attr_doc:
type: string
example: ''
attr_doc_label:
type: string
example: ''
Values:
type: array
items:
type: object
properties:
v:
type: string
example: Spoken context-governed
xcnt:
type: integer
example: 757
Normlist:
type: array
items:
type: object
properties:
n:
type: string
example: freq
label:
type: string
example: Document counts
api_version:
type: string
example: 5.63.12
manatee_version:
type: string
example: 2.36.7-SkE-2.223.6
request:
type: object
properties:
corpname:
type: string
example: preloaded/bnc2_tt21
16_widectx:
type: object
properties:
wrapdetail:
type: string
example: `.
example: s
auto:
type: boolean
description: True, when documents are not compiled. Sketch Engine will align
them automatically.
example: true
corpus_ids:
type: array
items:
type: integer
description: A list of `Corpus ID` of multilingual corpus. ID's in example
does not exist.
example:
- 842464
- 842463
05_empty_request:
type: object
description: 'In this documentation, an empty request is used mostly used with
the `RPC style` method where the content in a request is not needed (in most
cases). RPC style endpoints focus on `performing` one action right (procedures,
command) easier than REST API-based endpoints. It is not as scalable as REST
API style. RPC is mostly used with HTTP: GET (to fetch information) and POST
(to everything else) in CA api is it used with POST HTTP method.'
07_corpus_update:
type: object
description: All possible paramaters that can be changed in user corpus. In
corpus update `you don't have to use all parameters`, just the parameters
you change.
properties:
expert_mode:
type: boolean
description: Set to `True` if you are hard-core.
example: false
name:
type: string
description: Corpus name. `Given by user`.
example: Example corpus 2
info:
type: string
description: Additional info about corpus.
example: Example description of user corpus 2
document_order:
description: Can be set to enforce document order within the corpus.
type: array
items:
type: integer
lang_filter:
type: boolean
example: true
structures:
description: Available structures or tags in the corpus. Structures like
`s` (sentence), `g` (glue), `doc` (document).
type: array
items:
type: object
properties:
name:
type: string
description: 'Structure name. Example: `s`'
attributes:
description: A list of used attributes in corpus.
type: array
items:
type: object
properties:
name:
type: string
description: The name of used attribute.
file_structure:
type: string
description: The structure in which individual documents should be wrapped.
Usually `doc`.
example: doc
onion_structure:
type: string
description: The structure for deduplication. Usually `p` (paragraph), `doc`
or `Null` (no deduplication).'
example: doc
docstructure:
type: string
description: Structure in which individual documents should be wrapped.
Usually `doc`.
example: doc
sketch_grammar_id:
type: string
description: "`Sketch grammar ID` (name of sketch grammar file). For sketch\
\ grammars querying. Sketch grammar is a series of rules written in the\
\ CQL query language that search for collocations in a text corpus and\
\ categorize them according to\_their grammatical relations. Example:\
\ `preloaded/english-penn_tt-3.3.wsdef.m4`."
example: preloaded/english-penn_tt-3.3.wsdef.m4
term_grammar_id:
type: string
description: '`Term grammar ID` (name of term grammar file). Term grammar
tells Sketch Engine which words and phrases should indentify as terms.
Example: `/corpora/wsdef/english-penn_tt-terms-3.1.termdef.m4`.'
example: preloaded/english-penn_tt-terms-3.1.termdef.m4
09_doc_put_req:
type: object
properties:
filename_display:
type: string
description: Name of documents.
id:
type: integer
description: Unique numeric `document ID` to identify individual documents.
inProgress:
type: boolean
description: Represents whether the currently edited document is in use.
isArchive:
type: boolean
description: Represents if the updated document is in a format like .zip
(created via some archive manager).
metadata:
type: object
description: Metadata of document. For example, additional `attributes and
values`.
parameters:
type: object
description: Parameters for plaintext extraction.
properties:
encoding:
type: string
description: Encoding standard of the document. Usually, `UTF-8`.
justext_stoplist:
type: string
description: Represent the list of unimportant words, in a specified
language, from an NLP point of view.
permutation:
type: array
items:
type: integer
description: Changing the order of columns (applies only to `type=vert`).
tmx_lang:
type: string
description: TMX (translation memory exchange). Language of document
used for parallel corpus creation.
tmx_struct:
type: string
description: Alignment structure to be used for multilingual documents,
`align` is the most used structure. Used within segment distinction,
which sentence is in which language and to put sentences with the
same meaning into one segment.
tmx_untranslated:
type: string
description: Placeholder for empty segments in multilingual documents.
The segments which have no counterpart in a second language of parallel
corpus.
type:
type: string
description: File format (.csv, .doc, .docx, .htm, .html etc.).
unlegalese:
type: boolean
description: Convert `all-caps` text to `normal case`.
temporary:
type: boolean
description: Is document temporary or not.
word_count:
type: integer
description: Total number of `words` (tokens minus punctuation etc.) in
document.
vertical_progress:
type: integer
description: Progress of `vertical file` creation.
vertical_error:
type: string
description: An error occured while creating the vertical file. If the creation
was succesfull the value is `Null`.
10_doc_preview:
type: object
properties:
auto_paragraphs:
type: string
description: Automatically insert paragraph breaks (`\`, ``, `