OBO-Edit 2.3.1 22:11:2021 06:31 sequence 1.2 David Sant definition term replaced by amino acid modification Alliance of Genome Resources Alliance of Genome Resources Gene Biotype Slim biosapiens database of genomic structural variation RNA modification SO feature annotation variant annotation term amino acid 1 letter code amino acid 3 letter code biosapiens protein feature ontology dbsnp variant terms DBVAR ensembl variant terms subset_property synonym_type_property consider has_alternative_id has_broad_synonym database_cross_reference has_exact_synonym has_narrow_synonym has_obo_format_version has_obo_namespace has_related_synonym has_scope has_synonym_type in_subset A geometric operator, specified in Egenhofer 1989. Two features meet if they share a junction on the sequence. X adjacent_to Y iff X and Y share a boundary but do not overlap. sequence adjacent_to adjacent_to A geometric operator, specified in Egenhofer 1989. Two features meet if they share a junction on the sequence. X adjacent_to Y iff X and Y share a boundary but do not overlap. PMID:20226267 SO:ke sequence associated_with This relationship is vague and up for discussion. associated_with B is complete_evidence_for_feature A if the extent (5' and 3' boundaries) and internal boundaries of B fully support the extent and internal boundaries of A. sequence complete_evidence_for_feature If A is a feature with multiple regions such as a multi exon transcript, the supporting EST evidence is complete if each of the regions is supported by an equivalent region in B. Also there must be no extra regions in B that are not represented in A. This relationship was requested by jeltje on the SO term tracker. The thread for the discussion is available can be accessed via tracker ID:1917222. complete_evidence_for_feature B is complete_evidence_for_feature A if the extent (5' and 3' boundaries) and internal boundaries of B fully support the extent and internal boundaries of A. SO:ke X connects_on Y, Z, R iff whenever Z is on a R, X is adjacent to a Y and adjacent to a Z. kareneilbeck 2010-10-14T01:38:51Z sequence connects_on Example: A splice_junction connects_on exon, exon, mature_transcript. connects_on X connects_on Y, Z, R iff whenever Z is on a R, X is adjacent to a Y and adjacent to a Z. PMID:20226267 X contained_by Y iff X starts after start of Y and X ends before end of Y. kareneilbeck 2010-10-14T01:26:16Z sequence contained_by The inverse is contains. Example: intein contained_by immature_peptide_region. contained_by X contained_by Y iff X starts after start of Y and X ends before end of Y. PMID:20226267 The inverse of contained_by. kareneilbeck 2010-10-14T01:32:15Z sequence contains Example: pre_miRNA contains miRNA_loop. contains The inverse of contained_by. PMID:20226267 sequence derives_from derives_from X is disconnected_from Y iff it is not the case that X overlaps Y. kareneilbeck 2010-10-14T01:42:10Z sequence disconnected_from disconnected_from X is disconnected_from Y iff it is not the case that X overlaps Y. PMID:20226267 kareneilbeck 2009-08-19T02:19:45Z sequence edited_from edited_from kareneilbeck 2009-08-19T02:19:11Z sequence edited_to edited_to B is evidence_for_feature A, if an instance of B supports the existence of A. sequence evidence_for_feature This relationship was requested by nlw on the SO term tracker. The thread for the discussion is available can be accessed via tracker ID:1917222. evidence_for_feature B is evidence_for_feature A, if an instance of B supports the existence of A. SO:ke X is exemplar of Y if X is the best evidence for Y. sequence exemplar_of Tracker id: 2594157. exemplar_of X is exemplar of Y if X is the best evidence for Y. SO:ke Xy is finished_by Y if Y part of X, and X and Y share a 3' boundary. kareneilbeck 2010-10-14T01:45:45Z sequence finished_by Example CDS finished_by stop_codon. finished_by Xy is finished_by Y if Y part of X, and X and Y share a 3' boundary. PMID:20226267 X finishes Y if X is part_of Y and X and Y share a 3' or C terminal boundary. kareneilbeck 2010-10-14T02:17:53Z sequence finishes Example: stop_codon finishes CDS. finishes X finishes Y if X is part_of Y and X and Y share a 3' or C terminal boundary. PMID:20226267 X gained Y if X is a variant_of X' and Y part of X but not X'. kareneilbeck 2011-06-28T12:51:10Z sequence gained A relation with which to annotate the changes in a variant sequence with respect to a reference. For example a variant transcript may gain a stop codon not present in the reference sequence. gained X gained Y if X is a variant_of X' and Y part of X but not X'. SO:ke sequence genome_of genome_of kareneilbeck 2009-08-19T02:27:04Z sequence guided_by guided_by kareneilbeck 2009-08-19T02:27:24Z sequence guides guides X has_integral_part Y if and only if: X has_part Y and Y part_of X. kareneilbeck 2009-08-19T12:01:46Z sequence has_integral_part Example: mRNA has_integral_part CDS. has_integral_part X has_integral_part Y if and only if: X has_part Y and Y part_of X. http://precedings.nature.com/documents/3495/version/1 sequence has_origin has_origin Inverse of part_of. sequence has_part Example: operon has_part gene. has_part Inverse of part_of. http://precedings.nature.com/documents/3495/version/1 sequence has_quality The relationship between a feature and an attribute. has_quality sequence homologous_to homologous_to X integral_part_of Y if and only if: X part_of Y and Y has_part X. kareneilbeck 2009-08-19T12:03:28Z sequence integral_part_of Example: exon integral_part_of transcript. integral_part_of X integral_part_of Y if and only if: X part_of Y and Y has_part X. http://precedings.nature.com/documents/3495/version/1 R is_consecutive_sequence_of R iff every instance of R is equivalent to a collection of instances of U:u1, u2, un, such that no pair of ux uy is overlapping and for all ux, it is adjacent to ux-1 and ux+1, with the exception of the initial and terminal u1,and un (which may be identical). kareneilbeck 2010-10-14T02:19:48Z sequence is_consecutive_sequence_of Example: region is consecutive_sequence of base. is_consecutive_sequence_of R is_consecutive_sequence_of R iff every instance of R is equivalent to a collection of instances of U:u1, u2, un, such that no pair of ux uy is overlapping and for all ux, it is adjacent to ux-1 and ux+1, with the exception of the initial and terminal u1,and un (which may be identical). PMID:20226267 X lost Y if X is a variant_of X' and Y part of X' but not X. kareneilbeck 2011-06-28T12:53:16Z sequence lost A relation with which to annotate the changes in a variant sequence with respect to a reference. For example a variant transcript may have lost a stop codon present in the reference sequence. lost X lost Y if X is a variant_of X' and Y part of X' but not X. SO:ke A maximally_overlaps X iff all parts of A (including A itself) overlap both A and Y. kareneilbeck 2010-10-14T01:34:48Z sequence maximally_overlaps Example: non_coding_region_of_exon maximally_overlaps the intersections of exon and UTR. maximally_overlaps A maximally_overlaps X iff all parts of A (including A itself) overlap both A and Y. PMID:20226267 sequence member_of A subtype of part_of. Inverse is collection_of. Winston, M, Chaffin, R, Herrmann: A taxonomy of part-whole relations. Cognitive Science 1987, 11:417-444. member_of A relationship between a pseudogenic feature and its functional ancestor. sequence non_functional_homolog_of non_functional_homolog_of A relationship between a pseudogenic feature and its functional ancestor. SO:ke sequence orthologous_to orthologous_to X overlaps Y iff there exists some Z such that Z contained_by X and Z contained_by Y. kareneilbeck 2010-10-14T01:33:15Z sequence overlaps Example: coding_exon overlaps CDS. overlaps X overlaps Y iff there exists some Z such that Z contained_by X and Z contained_by Y. PMID:20226267 sequence paralogous_to paralogous_to X part_of Y if X is a subregion of Y. sequence part_of Example: amino_acid part_of polypeptide. part_of X part_of Y if X is a subregion of Y. http://precedings.nature.com/documents/3495/version/1 B is partial_evidence_for_feature A if the extent of B supports part_of but not all of A. sequence partial_evidence_for_feature partial_evidence_for_feature B is partial_evidence_for_feature A if the extent of B supports part_of but not all of A. SO:ke sequence position_of position_of Inverse of processed_into. kareneilbeck 2009-08-19T12:14:00Z sequence processed_from Example: miRNA processed_from miRNA_primary_transcript. processed_from Inverse of processed_into. http://precedings.nature.com/documents/3495/version/1 X is processed_into Y if a region X is modified to create Y. kareneilbeck 2009-08-19T12:15:02Z sequence processed_into Example: miRNA_primary_transcript processed into miRNA. processed_into X is processed_into Y if a region X is modified to create Y. http://precedings.nature.com/documents/3495/version/1 kareneilbeck 2009-08-19T02:21:03Z sequence recombined_from recombined_from kareneilbeck 2009-08-19T02:20:07Z sequence recombined_to recombined_to sequence sequence_of sequence_of sequence similar_to similar_to X is strted_by Y if Y is part_of X and X and Y share a 5' boundary. kareneilbeck 2010-10-14T01:43:55Z sequence started_by Example: CDS started_by start_codon. started_by X is strted_by Y if Y is part_of X and X and Y share a 5' boundary. PMID:20226267 X starts Y if X is part of Y, and A and Y share a 5' or N-terminal boundary. kareneilbeck 2010-10-14T01:47:53Z sequence starts Example: start_codon starts CDS. starts X starts Y if X is part of Y, and A and Y share a 5' or N-terminal boundary. PMID:20226267 kareneilbeck 2009-08-19T02:22:14Z sequence trans_spliced_from trans_spliced_from kareneilbeck 2009-08-19T02:22:00Z sequence trans_spliced_to trans_spliced_to X is transcribed_from Y if X is synthesized from template Y. kareneilbeck 2009-08-19T12:05:39Z sequence transcribed_from Example: primary_transcript transcribed_from gene. transcribed_from X is transcribed_from Y if X is synthesized from template Y. http://precedings.nature.com/documents/3495/version/1 Inverse of transcribed_from. kareneilbeck 2009-08-19T12:08:24Z sequence transcribed_to Example: gene transcribed_to primary_transcript. transcribed_to Inverse of transcribed_from. http://precedings.nature.com/documents/3495/version/1 Inverse of translation _of. kareneilbeck 2009-08-19T12:11:53Z sequence translates_to Example: codon translates_to amino_acid. translates_to Inverse of translation _of. http://precedings.nature.com/documents/3495/version/1 X is translation of Y if Y is translated by ribosome to create X. kareneilbeck 2009-08-19T12:09:59Z sequence translation_of Example: Polypeptide translation_of CDS. translation_of X is translation of Y if Y is translated by ribosome to create X. http://precedings.nature.com/documents/3495/version/1 A' is a variant (mutation) of A = definition every instance of A' is either an immediate mutation of some instance of A, or there is a chain of immediate mutation processes linking A' to some instance of A. sequence variant_of Added to SO during the immunology workshop, June 2007. This relationship was approved by Barry Smith. variant_of A' is a variant (mutation) of A = definition every instance of A' is either an immediate mutation of some instance of A, or there is a chain of immediate mutation processes linking A' to some instance of A. SO:immuno_workshop true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true true sequence SO:0000000 Sequence_Ontology true A sequence_feature with an extent greater than zero. A nucleotide region is composed of bases and a polypeptide region is composed of amino acids. sequence sequence SO:0000001 region A sequence_feature with an extent greater than zero. A nucleotide region is composed of bases and a polypeptide region is composed of amino acids. SO:ke A folded sequence. INSDC_feature:misc_structure sequence secondary structure sequence SO:0000002 sequence_secondary_structure A folded sequence. SO:ke G-quartets are unusual nucleic acid structures consisting of a planar arrangement where each guanine is hydrogen bonded by hoogsteen pairing to another guanine in the quartet. http://en.wikipedia.org/wiki/G-quadruplex G quartet G tetrad G-quadruplex G-quartet G-tetrad G_quadruplex guanine tetrad sequence SO:0000003 G_quartet G-quartets are unusual nucleic acid structures consisting of a planar arrangement where each guanine is hydrogen bonded by hoogsteen pairing to another guanine in the quartet. http://www.ncbi.nlm.nih.gov/pubmed/7919797?dopt=Abstract http://en.wikipedia.org/wiki/G-quadruplex wiki A coding exon that is not the most 3-prime or the most 5-prime in a given transcript. interior coding exon sequence SO:0000004 interior_coding_exon The many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Satellite_DNA INSDC_qualifier:satellite satellite DNA sequence SO:0000005 satellite_DNA The many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Satellite_DNA wiki A region amplified by a PCR reaction. http://en.wikipedia.org/wiki/RAPD PCR product sequence amplicon SO:0000006 This term is mapped to MGED. This term is now located in OBI, with the following ID OBI_0000406. PCR_product A region amplified by a PCR reaction. SO:ke http://en.wikipedia.org/wiki/RAPD wiki One of a pair of sequencing reads in which the two members of the pair are related by originating at either end of a clone insert. mate pair read-pair sequence SO:0000007 read_pair One of a pair of sequencing reads in which the two members of the pair are related by originating at either end of a clone insert. SO:ls sequence SO:0000008 gene_sensu_your_favorite_organism true sequence SO:0000009 gene_class true A gene which, when transcribed, can be translated into a protein. protein-coding sequence SO:0000010 protein_coding A gene which can be transcribed, but will not be translated into a protein. non protein-coding sequence SO:0000011 non_protein_coding The primary transcript of any one of several small cytoplasmic RNA molecules present in the cytoplasm and sometimes nucleus of a Eukaryote. scRNA primary transcript scRNA transcript small cytoplasmic RNA transcript sequence small cytoplasmic RNA small_cytoplasmic_RNA SO:0000012 scRNA_primary_transcript The primary transcript of any one of several small cytoplasmic RNA molecules present in the cytoplasm and sometimes nucleus of a Eukaryote. http://www.ebi.ac.uk/embl/WebFeat/align/scRNA_s.html A small non coding RNA sequence, present in the cytoplasm. INSDC_feature:ncRNA INSDC_qualifier:scRNA small cytoplasmic RNA sequence SO:0000013 scRNA A small non coding RNA sequence, present in the cytoplasm. SO:ke A sequence element characteristic of some RNA polymerase II promoters required for the correct positioning of the polymerase for the start of transcription. Overlaps the TSS. The mammalian consensus sequence is YYAN(T|A)YY; the Drosophila consensus sequence is TCA(G|T)t(T|C). In each the A is at position +1 with respect to the TSS. Functionally similar to the TATA box element. INR motif initiator initiator motif sequence DMp2 SO:0000014 Binds TAF1, TAF2. INR_motif A sequence element characteristic of some RNA polymerase II promoters required for the correct positioning of the polymerase for the start of transcription. Overlaps the TSS. The mammalian consensus sequence is YYAN(T|A)YY; the Drosophila consensus sequence is TCA(G|T)t(T|C). In each the A is at position +1 with respect to the TSS. Functionally similar to the TATA box element. PMID:12651739 PMID:16858867 A sequence element characteristic of some RNA polymerase II promoters; Positioned from +28 to +32 with respect to the TSS (+1). Experimental results suggest that the DPE acts in conjunction with the INR_motif to provide a binding site for TFIID in the absence of a TATA box to mediate transcription of TATA-less promoters. Consensus sequence (A|G)G(A|T)(C|T)(G|A|C). DPE motif downstream core promoter element CRWMGCGWKCGCTTS sequence SO:0000015 Binds TAF6, TAF9. DPE_motif A sequence element characteristic of some RNA polymerase II promoters; Positioned from +28 to +32 with respect to the TSS (+1). Experimental results suggest that the DPE acts in conjunction with the INR_motif to provide a binding site for TFIID in the absence of a TATA box to mediate transcription of TATA-less promoters. Consensus sequence (A|G)G(A|T)(C|T)(G|A|C). PMID:12515390 PMID:12537576 PMID:12651739 PMID:16858867 A sequence element characteristic of some RNA polymerase II promoters, located immediately upstream of some TATA box elements at -37 to -32 with respect to the TSS (+1). Consensus sequence is (G|C)(G|C)(G|A)CGCC. Binds TFIIB. B-recognition element BRE motif BREu motif transcription factor B-recognition element sequence BREu TFIIB recognition element SO:0000016 Binds TFIIB. BREu_motif A sequence element characteristic of some RNA polymerase II promoters, located immediately upstream of some TATA box elements at -37 to -32 with respect to the TSS (+1). Consensus sequence is (G|C)(G|C)(G|A)CGCC. Binds TFIIB. PMID:12651739 PMID:16858867 A sequence element characteristic of the promoters of snRNA genes transcribed by RNA polymerase II or by RNA polymerase III. Located between -45 and -60 relative to the TSS. The human PSE_motif consensus sequence is TCACCNTNA(C|G)TNAAAAG(T|G). The basal transcription factor, snRNA-activating protein complex (SNAPc), binds the PSE_motif and is required for the transcription of both RNA polymerase II and III transcribed small-nuclear RNA genes. PSE motif proximal sequence element sequence SO:0000017 PSE_motif A sequence element characteristic of the promoters of snRNA genes transcribed by RNA polymerase II or by RNA polymerase III. Located between -45 and -60 relative to the TSS. The human PSE_motif consensus sequence is TCACCNTNA(C|G)TNAAAAG(T|G). The basal transcription factor, snRNA-activating protein complex (SNAPc), binds the PSE_motif and is required for the transcription of both RNA polymerase II and III transcribed small-nuclear RNA genes. PMID:11390411 PMID:12621023 PMID:12651739 PMID:23166507 PMID:8339931 A group of loci that can be grouped in a linear order representing the different degrees of linkage among the genes concerned. http://en.wikipedia.org/wiki/Linkage_group linkage group sequence SO:0000018 linkage_group A group of loci that can be grouped in a linear order representing the different degrees of linkage among the genes concerned. ISBN:038752046 http://en.wikipedia.org/wiki/Linkage_group wiki true A region of double stranded RNA where the bases do not conform to WC base pairing. The loop is closed on both sides by canonical base pairing. If the interruption to base pairing occurs on one strand only, it is known as a bulge. RNA internal loop sequence SO:0000020 RNA_internal_loop A region of double stranded RNA where the bases do not conform to WC base pairing. The loop is closed on both sides by canonical base pairing. If the interruption to base pairing occurs on one strand only, it is known as a bulge. SO:ke An internal RNA loop where one of the strands includes more bases than the corresponding region on the other strand. asymmetric RNA internal loop sequence SO:0000021 asymmetric_RNA_internal_loop An internal RNA loop where one of the strands includes more bases than the corresponding region on the other strand. SO:ke A region forming a motif, composed of adenines, where the minor groove edges are inserted into the minor groove of another helix. A minor RNA motif sequence SO:0000022 A_minor_RNA_motif A region forming a motif, composed of adenines, where the minor groove edges are inserted into the minor groove of another helix. SO:ke The kink turn (K-turn) is an RNA structural motif that creates a sharp (~120 degree) bend between two continuous helices. http://en.wikipedia.org/wiki/K-turn K turn RNA motif K-turn kink turn kink-turn motif sequence SO:0000023 K_turn_RNA_motif The kink turn (K-turn) is an RNA structural motif that creates a sharp (~120 degree) bend between two continuous helices. SO:ke http://en.wikipedia.org/wiki/K-turn wiki A loop in ribosomal RNA containing the sites of attack for ricin and sarcin. sarcin like RNA motif sarcin/ricin RNA domain sarcin/ricin domain sarcin/ricin loop sequence SO:0000024 sarcin_like_RNA_motif A loop in ribosomal RNA containing the sites of attack for ricin and sarcin. http://www.ncbi.nlm.nih.gov/pubmed/7897662 An internal RNA loop where the extent of the loop on both stands is the same size. A-minor RNA motif sequence SO:0000025 symmetric_RNA_internal_loop An internal RNA loop where the extent of the loop on both stands is the same size. SO:ke RNA junction loop sequence SO:0000026 RNA_junction_loop RNA hook turn hook-turn motif sequence hook turn SO:0000027 RNA_hook_turn Two bases paired opposite each other by hydrogen bonds creating a secondary structure. http://en.wikipedia.org/wiki/Base_pair base pair sequence SO:0000028 base_pair http://en.wikipedia.org/wiki/Base_pair wiki The canonical base pair, where two bases interact via WC edges, with glycosidic bonds oriented cis relative to the axis of orientation. WC base pair Watson Crick base pair Watson-Crick pair canonical base pair sequence Watson-Crick base pair SO:0000029 WC_base_pair The canonical base pair, where two bases interact via WC edges, with glycosidic bonds oriented cis relative to the axis of orientation. PMID:12177293 A type of non-canonical base-pairing. sugar edge base pair sequence SO:0000030 sugar_edge_base_pair A type of non-canonical base-pairing. PMID:12177293 DNA or RNA molecules that have been selected from random pools based on their ability to bind other molecules. http://en.wikipedia.org/wiki/Aptamer sequence SO:0000031 aptamer DNA or RNA molecules that have been selected from random pools based on their ability to bind other molecules. http://aptamer.icmb.utexas.edu http://en.wikipedia.org/wiki/Aptamer wiki DNA molecules that have been selected from random pools based on their ability to bind other molecules. DNA aptamer sequence SO:0000032 DNA_aptamer DNA molecules that have been selected from random pools based on their ability to bind other molecules. http:aptamer.icmb.utexas.edu RNA molecules that have been selected from random pools based on their ability to bind other molecules. RNA aptamer sequence SO:0000033 RNA_aptamer RNA molecules that have been selected from random pools based on their ability to bind other molecules. http://aptamer.icmb.utexas.edu Morpholino oligos are synthesized from four different Morpholino subunits, each of which contains one of the four genetic bases (A, C, G, T) linked to a 6-membered morpholine ring. Eighteen to 25 subunits of these four subunit types are joined in a specific order by non-ionic phosphorodiamidate intersubunit linkages to give a Morpholino. morphant morpholino morpholino oligo sequence SO:0000034 morpholino_oligo Morpholino oligos are synthesized from four different Morpholino subunits, each of which contains one of the four genetic bases (A, C, G, T) linked to a 6-membered morpholine ring. Eighteen to 25 subunits of these four subunit types are joined in a specific order by non-ionic phosphorodiamidate intersubunit linkages to give a Morpholino. http://www.gene-tools.com/ A riboswitch is a part of an mRNA that can act as a direct sensor of small molecules to control their own expression. A riboswitch is a cis element in the 5' end of an mRNA, that acts as a direct sensor of metabolites. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Riboswitch INSDC_qualifier:riboswitch riboswitch RNA sequence SO:0000035 riboswitch A riboswitch is a part of an mRNA that can act as a direct sensor of small molecules to control their own expression. A riboswitch is a cis element in the 5' end of an mRNA, that acts as a direct sensor of metabolites. PMID:2820954 http://en.wikipedia.org/wiki/Riboswitch wiki A DNA region that is required for the binding of chromatin to the nuclear matrix. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Matrix_attachment_site INSDC_qualifier:matrix_attachment_region MAR S/MAR SMAR matrix association region matrix attachment region matrix attachment site nuclear matrix association region nuclear matrix attachment site scaffold attachment site scaffold matrix attachment region sequence S/MAR element SO:0000036 matrix_attachment_site A DNA region that is required for the binding of chromatin to the nuclear matrix. SO:ma http://en.wikipedia.org/wiki/Matrix_attachment_site wiki A DNA region that includes DNAse hypersensitive sites located near a gene that confers the high-level, position-independent, and copy number-dependent expression to that gene. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Locus_control_region INSDC_qualifier:locus_control_region LCR locus control region sequence locus control element SO:0000037 Definition updated Nov 10 2020, Colin Logie from GREEKC helped us realize that LCRs can also be located 3' to a gene. locus_control_region A DNA region that includes DNAse hypersensitive sites located near a gene that confers the high-level, position-independent, and copy number-dependent expression to that gene. SO:ma http://en.wikipedia.org/wiki/Locus_control_region wiki A collection of match parts. sequence SO:0000038 match_set true A collection of match parts. SO:ke A part of a match, for example an hsp from blast is a match_part. match part sequence SO:0000039 match_part A part of a match, for example an hsp from blast is a match_part. SO:ke A clone of a DNA region of a genome. genomic clone sequence SO:0000040 genomic_clone A clone of a DNA region of a genome. SO:ma An operation that can be applied to a sequence, that results in a change. sequence operation sequence SO:0000041 sequence_operation true An operation that can be applied to a sequence, that results in a change. SO:ke An attribute of a pseudogene (SO:0000336). pseudogene attribute sequence SO:0000042 pseudogene_attribute true An attribute of a pseudogene (SO:0000336). SO:ma A pseudogene created via retrotranposition of the mRNA of a functional protein-coding parent gene followed by accumulation of deleterious mutations lacking introns and promoters, often including a polyA tail. INSDC_feature:gene INSDC_qualifier:processed processed pseudogene retropseudogene sequence R psi G pseudogene by reverse transcription SO:0000043 Please not the synonym R psi M uses the spelled out form of the greek letter. processed_pseudogene A pseudogene created via retrotranposition of the mRNA of a functional protein-coding parent gene followed by accumulation of deleterious mutations lacking introns and promoters, often including a polyA tail. GENCODE:http://www.gencodegenes.org/gencode_biotypes.html A pseudogene caused by unequal crossing over at recombination. pseudogene by unequal crossing over sequence SO:0000044 pseudogene_by_unequal_crossing_over A pseudogene caused by unequal crossing over at recombination. SO:ke To remove a subsection of sequence. sequence SO:0000045 delete true To remove a subsection of sequence. SO:ke To insert a subsection of sequence. sequence SO:0000046 insert true To insert a subsection of sequence. SO:ke To invert a subsection of sequence. sequence SO:0000047 invert true To invert a subsection of sequence. SO:ke To substitute a subsection of sequence for another. sequence SO:0000048 substitute true To substitute a subsection of sequence for another. SO:ke To translocate a subsection of sequence. sequence SO:0000049 translocate true To translocate a subsection of sequence. SO:ke A part of a gene, that has no other route in the ontology back to region. This concept is necessary for logical inference as these parts must have the properties of region. It also allows us to associate all the parts of genes with a gene. sequence SO:0000050 gene_part true A part of a gene, that has no other route in the ontology back to region. This concept is necessary for logical inference as these parts must have the properties of region. It also allows us to associate all the parts of genes with a gene. SO:ke A DNA sequence used experimentally to detect the presence or absence of a complementary nucleic acid. http://en.wikipedia.org/wiki/Hybridization_probe sequence SO:0000051 probe A DNA sequence used experimentally to detect the presence or absence of a complementary nucleic acid. SO:ma http://en.wikipedia.org/wiki/Hybridization_probe wiki sequence assortment-derived_deficiency SO:0000052 assortment_derived_deficiency true A sequence_variant_effect which changes the regulatory region of a gene. SO:0001556 sequence variant affecting regulatory region sequence mutation affecting regulatory region SO:0000053 OBSOLETE: This term was deleted as it conflated more than one term. The alteration is separate from the effect. sequence_variant_affecting_regulatory_region true A sequence_variant_effect which changes the regulatory region of a gene. SO:ke A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number. http://en.wikipedia.org/wiki/Aneuploid sequence SO:0000054 aneuploid A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number. SO:ke http://en.wikipedia.org/wiki/Aneuploid wiki A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number as extra chromosomes are present. http://en.wikipedia.org/wiki/Hyperploid sequence SO:0000055 hyperploid A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number as extra chromosomes are present. SO:ke http://en.wikipedia.org/wiki/Hyperploid wiki A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number as some chromosomes are missing. http://en.wikipedia.org/wiki/Hypoploid sequence SO:0000056 hypoploid A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number as some chromosomes are missing. SO:ke http://en.wikipedia.org/wiki/Hypoploid wiki A regulatory element of an operon to which activators or repressors bind thereby effecting translation of genes in that operon. http://en.wikipedia.org/wiki/Operator_(biology)#Operator operator segment sequence SO:0000057 Moved to transcriptional_cis_regulatory_region (SO:0001055) from gene_group_regulatory_region (SO:0000752) on 11 Feb 2021 when SO:0000752 was merged into SO:0001055. See GitHub Issue #529. operator A regulatory element of an operon to which activators or repressors bind thereby effecting translation of genes in that operon. SO:ma http://en.wikipedia.org/wiki/Operator_(biology)#Operator wiki sequence assortment-derived_aneuploid SO:0000058 assortment_derived_aneuploid true A binding site that, of a nucleotide molecule, that interacts selectively and non-covalently with polypeptide residues of a nuclease. nuclease binding site sequence SO:0000059 nuclease_binding_site A binding site that, of a nucleotide molecule, that interacts selectively and non-covalently with polypeptide residues of a nuclease. SO:cb One arm of a compound chromosome. compound chromosome arm sequence SO:0000060 FLAG - this term is should probably be a part of rather than an is_a. compound_chromosome_arm A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues of a restriction enzyme. restriction endonuclease binding site restriction enzyme binding site sequence SO:0000061 A region of a molecule that binds to a restriction enzyme. restriction_enzyme_binding_site A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues of a restriction enzyme. SO:cb An intrachromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining. deficient intrachromosomal transposition sequence SO:0000062 deficient_intrachromosomal_transposition An intrachromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining. FB:reference_manual An interchromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining. deficient interchromosomal transposition sequence SO:0000063 deficient_interchromosomal_transposition An interchromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining. SO:ke sequence SO:0000064 This classes of attributes was added by MA to allow the broad description of genes based on qualities of the transcript(s). A product of SO meeting 2004. gene_by_transcript_attribute true A chromosome structure variation whereby an arm exists as an individual chromosome element. free chromosome arm sequence SO:0000065 free_chromosome_arm A chromosome structure variation whereby an arm exists as an individual chromosome element. SO:ke sequence SO:0000066 gene_by_polyadenylation_attribute true gene to gene feature sequence SO:0000067 gene_to_gene_feature An attribute describing a gene that has a sequence that overlaps the sequence of another gene. sequence SO:0000068 overlapping An attribute describing a gene that has a sequence that overlaps the sequence of another gene. SO:ke An attribute to describe a gene when it is located within the intron of another gene. inside intron sequence SO:0000069 inside_intron An attribute to describe a gene when it is located within the intron of another gene. SO:ke An attribute to describe a gene when it is located within the intron of another gene and on the opposite strand. inside intron antiparallel sequence SO:0000070 inside_intron_antiparallel An attribute to describe a gene when it is located within the intron of another gene and on the opposite strand. SO:ke An attribute to describe a gene when it is located within the intron of another gene and on the same strand. inside intron parallel sequence SO:0000071 inside_intron_parallel An attribute to describe a gene when it is located within the intron of another gene and on the same strand. SO:ke sequence SO:0000072 end_overlapping_gene true An attribute to describe a gene when the five prime region overlaps with another gene's 3' region. five prime-three prime overlap sequence SO:0000073 five_prime_three_prime_overlap An attribute to describe a gene when the five prime region overlaps with another gene's 3' region. SO:ke An attribute to describe a gene when the five prime region overlaps with another gene's five prime region. five prime-five prime overlap sequence SO:0000074 five_prime_five_prime_overlap An attribute to describe a gene when the five prime region overlaps with another gene's five prime region. SO:ke An attribute to describe a gene when the 3' region overlaps with another gene's 3' region. three prime-three prime overlap sequence SO:0000075 three_prime_three_prime_overlap An attribute to describe a gene when the 3' region overlaps with another gene's 3' region. SO:ke An attribute to describe a gene when the 3' region overlaps with another gene's 5' region. 5' 3' overlap three prime five prime overlap sequence SO:0000076 three_prime_five_prime_overlap An attribute to describe a gene when the 3' region overlaps with another gene's 5' region. SO:ke A region sequence that is complementary to a sequence of messenger RNA. http://en.wikipedia.org/wiki/Antisense sequence SO:0000077 antisense A region sequence that is complementary to a sequence of messenger RNA. SO:ke http://en.wikipedia.org/wiki/Antisense wiki A transcript that is polycistronic. polycistronic transcript sequence SO:0000078 polycistronic_transcript A transcript that is polycistronic. SO:xp A transcript that is dicistronic. dicistronic transcript sequence SO:0000079 dicistronic_transcript A transcript that is dicistronic. SO:ke A gene that is a member of an operon, which is a set of genes transcribed together as a unit. operon member sequence SO:0000080 operon_member gene array member sequence SO:0000081 gene_array_member sequence SO:0000082 processed_transcript_attribute true DNA belonging to the macronuclei of ciliates. macronuclear sequence sequence SO:0000083 macronuclear_sequence DNA belonging to the micronuclei of a cell. micronuclear sequence sequence SO:0000084 micronuclear_sequence sequence SO:0000085 gene_by_genome_location true sequence SO:0000086 gene_by_organelle_of_genome true A gene from nuclear sequence. http://en.wikipedia.org/wiki/Nuclear_gene nuclear gene sequence SO:0000087 nuclear_gene A gene from nuclear sequence. SO:xp http://en.wikipedia.org/wiki/Nuclear_gene wiki A gene located in mitochondrial sequence. http://en.wikipedia.org/wiki/Mitochondrial_gene mitochondrial gene mt gene sequence SO:0000088 mt_gene A gene located in mitochondrial sequence. SO:xp http://en.wikipedia.org/wiki/Mitochondrial_gene wiki A gene located in kinetoplast sequence. kinetoplast gene sequence SO:0000089 kinetoplast_gene A gene located in kinetoplast sequence. SO:xp A gene from plastid sequence. plastid gene sequence SO:0000090 plastid_gene A gene from plastid sequence. SO:xp A gene from apicoplast sequence. apicoplast gene sequence SO:0000091 apicoplast_gene A gene from apicoplast sequence. SO:xp A gene from chloroplast sequence. chloroplast gene ct gene sequence SO:0000092 ct_gene A gene from chloroplast sequence. SO:xp A gene from chromoplast_sequence. chromoplast gene sequence SO:0000093 chromoplast_gene A gene from chromoplast_sequence. SO:xp A gene from cyanelle sequence. cyanelle gene sequence SO:0000094 cyanelle_gene A gene from cyanelle sequence. SO:xp A plastid gene from leucoplast sequence. leucoplast gene sequence SO:0000095 leucoplast_gene A plastid gene from leucoplast sequence. SO:xp A gene from proplastid sequence. proplastid gene sequence SO:0000096 proplastid_gene A gene from proplastid sequence. SO:ke A gene from nucleomorph sequence. nucleomorph gene sequence SO:0000097 nucleomorph_gene A gene from nucleomorph sequence. SO:xp A gene from plasmid sequence. plasmid gene sequence SO:0000098 plasmid_gene A gene from plasmid sequence. SO:xp A gene from proviral sequence. proviral gene sequence SO:0000099 proviral_gene A gene from proviral sequence. SO:xp A proviral gene with origin endogenous retrovirus. endogenous retroviral gene sequence SO:0000100 endogenous_retroviral_gene A proviral gene with origin endogenous retrovirus. SO:xp A transposon or insertion sequence. An element that can insert in a variety of DNA sequences. http://en.wikipedia.org/wiki/Transposable_element transposable element transposon sequence SO:0000101 transposable_element A transposon or insertion sequence. An element that can insert in a variety of DNA sequences. http://www.sci.sdsu.edu/~smaloy/Glossary/T.html http://en.wikipedia.org/wiki/Transposable_element wiki A match to an EST or cDNA sequence. expressed sequence match sequence SO:0000102 expressed_sequence_match A match to an EST or cDNA sequence. SO:ke The end of the clone insert. clone insert end sequence SO:0000103 clone_insert_end The end of the clone insert. SO:ke A sequence of amino acids linked by peptide bonds which may lack appreciable tertiary structure and may not be liable to irreversible denaturation. SO:0000358 http://en.wikipedia.org/wiki/Polypeptide protein sequence SO:0000104 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. The term 'protein' was merged with 'polypeptide'. Although 'protein' was a sequence_attribute and therefore meant to describe the quality rather than an actual feature, it was being used erroneously. It is replaced by 'peptidyl' as the polymer attribute. polypeptide A sequence of amino acids linked by peptide bonds which may lack appreciable tertiary structure and may not be liable to irreversible denaturation. SO:ma http://en.wikipedia.org/wiki/Polypeptide wiki A region of the chromosome between the centromere and the telomere. Human chromosomes have two arms, the p arm (short) and the q arm (long) which are separated from each other by the centromere. chromosome arm sequence SO:0000105 chromosome_arm A region of the chromosome between the centromere and the telomere. Human chromosomes have two arms, the p arm (short) and the q arm (long) which are separated from each other by the centromere. http://www.medterms.com/script/main/art.asp?articlekey=5152 sequence SO:0000106 non_capped_primary_transcript true A single stranded oligo used for polymerase chain reaction. sequencing primer sequence SO:0000107 sequencing_primer An mRNA with a frameshift. frameshifted mRNA mRNA with frameshift sequence SO:0000108 mRNA_with_frameshift An mRNA with a frameshift. SO:xp A sequence_variant is a non exact copy of a sequence_feature or genome exhibiting one or more sequence_alteration. sequence mutation SO:0000109 sequence_variant_obs true A sequence_variant is a non exact copy of a sequence_feature or genome exhibiting one or more sequence_alteration. SO:ke Any extent of continuous biological sequence. INSDC_feature:misc_feature INSDC_note:other INSDC_note:sequence_feature located_sequence_feature sequence feature sequence located sequence feature SO:0000110 sequence_feature Any extent of continuous biological sequence. LAMHDI:mb SO:ke A gene encoded within a transposable element. For example gag, int, env and pol are the transposable element genes of the TY element in yeast. transposable element gene sequence SO:0000111 transposable_element_gene A gene encoded within a transposable element. For example gag, int, env and pol are the transposable element genes of the TY element in yeast. SO:ke An oligo to which new deoxyribonucleotides can be added by DNA polymerase. http://en.wikipedia.org/wiki/Primer_(molecular_biology) DNA primer primer oligonucleotide primer polynucleotide primer sequence sequence SO:0000112 primer An oligo to which new deoxyribonucleotides can be added by DNA polymerase. SO:ke http://en.wikipedia.org/wiki/Primer_(molecular_biology) wiki A viral sequence which has integrated into a host genome. proviral region sequence proviral sequence SO:0000113 proviral_region A viral sequence which has integrated into a host genome. SO:ke A methylated deoxy-cytosine. methylated C methylated cytosine methylated cytosine base methylated cytosine residue methylated_C sequence SO:0000114 methylated_cytosine A methylated deoxy-cytosine. SO:ke sequence SO:0000115 transcript_feature true An attribute describing a sequence that is modified by editing. sequence SO:0000116 edited An attribute describing a sequence that is modified by editing. SO:ke sequence SO:0000117 transcript_with_readthrough_stop_codon true A transcript with a translational frameshift. transcript with translational frameshift sequence SO:0000118 transcript_with_translational_frameshift A transcript with a translational frameshift. SO:xp An attribute to describe a sequence that is regulated. sequence SO:0000119 regulated An attribute to describe a sequence that is regulated. SO:ke A primary transcript that, at least in part, encodes one or more proteins. protein coding primary transcript sequence pre mRNA SO:0000120 May contain introns. protein_coding_primary_transcript A primary transcript that, at least in part, encodes one or more proteins. SO:ke A single stranded oligo used for polymerase chain reaction. DNA forward primer forward DNA primer forward primer forward primer oligo forward primer oligonucleotide forward primer polynucleotide forward primer sequence sequence SO:0000121 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. forward_primer A single stranded oligo used for polymerase chain reaction. http://mged.sourceforge.net/ontologies/MGEDontology.php A folded RNA sequence. RNA sequence secondary structure sequence SO:0000122 RNA_sequence_secondary_structure A folded RNA sequence. SO:ke An attribute describing a gene that is regulated at transcription. transcriptionally regulated sequence SO:0000123 By:<protein_id>. transcriptionally_regulated An attribute describing a gene that is regulated at transcription. SO:ma Expressed in relatively constant amounts without regard to cellular environmental conditions such as the concentration of a particular substrate. transcriptionally constitutive sequence SO:0000124 transcriptionally_constitutive Expressed in relatively constant amounts without regard to cellular environmental conditions such as the concentration of a particular substrate. SO:ke An inducer molecule is required for transcription to occur. transcriptionally induced sequence SO:0000125 transcriptionally_induced An inducer molecule is required for transcription to occur. SO:ke A repressor molecule is required for transcription to stop. transcriptionally repressed sequence SO:0000126 transcriptionally_repressed A repressor molecule is required for transcription to stop. SO:ke A gene that is silenced. silenced gene sequence SO:0000127 silenced_gene A gene that is silenced. SO:xp A gene that is silenced by DNA modification. gene silenced by DNA modification sequence SO:0000128 gene_silenced_by_DNA_modification A gene that is silenced by DNA modification. SO:xp A gene that is silenced by DNA methylation. gene silenced by DNA methylation methylation-silenced gene sequence SO:0000129 gene_silenced_by_DNA_methylation A gene that is silenced by DNA methylation. SO:xp An attribute describing a gene that is regulated after it has been translated. post translationally regulated post-translationally regulated sequence SO:0000130 post_translationally_regulated An attribute describing a gene that is regulated after it has been translated. SO:ke An attribute describing a gene that is regulated as it is translated. translationally regulated sequence SO:0000131 translationally_regulated An attribute describing a gene that is regulated as it is translated. SO:ke A single stranded oligo used for polymerase chain reaction. DNA reverse primer reverse DNA primer reverse primer reverse primer oligo reverse primer oligonucleotide reverse primer sequence sequence SO:0000132 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. reverse_primer A single stranded oligo used for polymerase chain reaction. http://mged.sourceforge.net/ontologies/MGEDontology.php This attribute describes a gene where heritable changes other than those in the DNA sequence occur. These changes include: modification to the DNA (such as DNA methylation, the covalent modification of cytosine), and post-translational modification of histones. epigenetically modified sequence SO:0000133 epigenetically_modified This attribute describes a gene where heritable changes other than those in the DNA sequence occur. These changes include: modification to the DNA (such as DNA methylation, the covalent modification of cytosine), and post-translational modification of histones. SO:ke Imprinted genes are epigenetically modified genes that are expressed monoallelically according to their parent of origin. imprinted http:http://en.wikipedia.org/wiki/Genomic_imprinting genomically imprinted sequence SO:0000134 genomically_imprinted Imprinted genes are epigenetically modified genes that are expressed monoallelically according to their parent of origin. SO:ke http:http://en.wikipedia.org/wiki/Genomic_imprinting wiki The maternal copy of the gene is modified, rendering it transcriptionally silent. maternally imprinted sequence SO:0000135 maternally_imprinted The maternal copy of the gene is modified, rendering it transcriptionally silent. SO:ke The paternal copy of the gene is modified, rendering it transcriptionally silent. paternally imprinted sequence SO:0000136 paternally_imprinted The paternal copy of the gene is modified, rendering it transcriptionally silent. SO:ke Allelic exclusion is a process occurring in diploid organisms, where a gene is inactivated and not expressed in that cell. allelically excluded sequence SO:0000137 Examples are x-inactivation and immunoglobulin formation. allelically_excluded Allelic exclusion is a process occurring in diploid organisms, where a gene is inactivated and not expressed in that cell. SO:ke An epigenetically modified gene, rearranged at the DNA level. gene rearranged at DNA level sequence SO:0000138 gene_rearranged_at_DNA_level An epigenetically modified gene, rearranged at the DNA level. SO:xp Region in mRNA where ribosome assembles. INSDC_feature:regulatory INSDC_qualifier:ribosome_binding_site ribosome entry site sequence SO:0000139 ribosome_entry_site Region in mRNA where ribosome assembles. SO:ke A sequence segment located within the five prime end of an mRNA that causes premature termination of translation. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Attenuator INSDC_qualifier:attenuator attenuator sequence sequence SO:0000140 attenuator A sequence segment located within the five prime end of an mRNA that causes premature termination of translation. SO:as http://en.wikipedia.org/wiki/Attenuator wiki The sequence of DNA located either at the end of the transcript that causes RNA polymerase to terminate transcription. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Terminator_(genetics) INSDC_qualifier:terminator terminator sequence sequence SO:0000141 Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527. terminator The sequence of DNA located either at the end of the transcript that causes RNA polymerase to terminate transcription. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Terminator_(genetics) wiki A folded DNA sequence. DNA sequence secondary structure sequence SO:0000142 DNA_sequence_secondary_structure A folded DNA sequence. SO:ke A region of known length which may be used to manufacture a longer region. assembly component sequence SO:0000143 assembly_component A region of known length which may be used to manufacture a longer region. SO:ke sequence SO:0000144 primary_transcript_attribute true A codon that has been redefined at translation. The redefinition may be as a result of translational bypass, translational frameshifting or stop codon readthrough. recoded codon sequence SO:0000145 recoded_codon A codon that has been redefined at translation. The redefinition may be as a result of translational bypass, translational frameshifting or stop codon readthrough. SO:xp An attribute describing when a sequence, usually an mRNA is capped by the addition of a modified guanine nucleotide at the 5' end. sequence SO:0000146 capped An attribute describing when a sequence, usually an mRNA is capped by the addition of a modified guanine nucleotide at the 5' end. SO:ke A region of the transcript sequence within a gene which is not removed from the primary RNA transcript by RNA splicing. http://en.wikipedia.org/wiki/Exon INSDC_feature:exon sequence SO:0000147 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. exon A region of the transcript sequence within a gene which is not removed from the primary RNA transcript by RNA splicing. SO:ke http://en.wikipedia.org/wiki/Exon wiki One or more contigs that have been ordered and oriented using end-read information. Contains gaps that are filled with N's. sequence scaffold SO:0000148 supercontig One or more contigs that have been ordered and oriented using end-read information. Contains gaps that are filled with N's. SO:ls A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N's from unavailable bases. http://en.wikipedia.org/wiki/Contig sequence SO:0000149 contig A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N's from unavailable bases. SO:ls http://en.wikipedia.org/wiki/Contig wiki A sequence obtained from a single sequencing experiment. Typically a read is produced when a base calling program interprets information from a chromatogram trace file produced from a sequencing machine. sequence SO:0000150 read A sequence obtained from a single sequencing experiment. Typically a read is produced when a base calling program interprets information from a chromatogram trace file produced from a sequencing machine. SO:rd A piece of DNA that has been inserted in a vector so that it can be propagated in a host bacterium or some other organism. http:http://en.wikipedia.org/wiki/Clone_(genetics) sequence SO:0000151 clone A piece of DNA that has been inserted in a vector so that it can be propagated in a host bacterium or some other organism. SO:ke http:http://en.wikipedia.org/wiki/Clone_(genetics) wiki Yeast Artificial Chromosome, a vector constructed from the telomeric, centromeric, and replication origin sequences needed for replication in yeast cells. yeast artificial chromosome sequence SO:0000152 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. YAC Yeast Artificial Chromosome, a vector constructed from the telomeric, centromeric, and replication origin sequences needed for replication in yeast cells. SO:ma Bacterial Artificial Chromosome, a cloning vector that can be propagated as mini-chromosomes in a bacterial host. bacterial artificial chromosome sequence SO:0000153 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. BAC Bacterial Artificial Chromosome, a cloning vector that can be propagated as mini-chromosomes in a bacterial host. SO:ma The P1-derived artificial chromosome are DNA constructs that are derived from the DNA of P1 bacteriophage. They can carry large amounts (about 100-300 kilobases) of other sequences for a variety of bioengineering purposes. It is one type of vector used to clone DNA fragments (100- to 300-kb insert size; average, 150 kb) in Escherichia coli cells. http://en.wikipedia.org/wiki/P1-derived_artificial_chromosome P1 P1 artificial chromosome sequence SO:0000154 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. Drosophila melanogaster PACs carry an average insert size of 80 kb. The library represents a 6-fold coverage of the genome. PAC The P1-derived artificial chromosome are DNA constructs that are derived from the DNA of P1 bacteriophage. They can carry large amounts (about 100-300 kilobases) of other sequences for a variety of bioengineering purposes. It is one type of vector used to clone DNA fragments (100- to 300-kb insert size; average, 150 kb) in Escherichia coli cells. http://en.wikipedia.org/wiki/P1-derived_artificial_chromosome http://en.wikipedia.org/wiki/P1-derived_artificial_chromosome wiki A self replicating, using the hosts cellular machinery, often circular nucleic acid molecule that is distinct from a chromosome in the organism. plasmid sequence sequence SO:0000155 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. plasmid A self replicating, using the hosts cellular machinery, often circular nucleic acid molecule that is distinct from a chromosome in the organism. SO:ma A cloning vector that is a hybrid of lambda phages and a plasmid that can be propagated as a plasmid or packaged as a phage,since they retain the lambda cos sites. http://en.wikipedia.org/wiki/Cosmid cosmid vector sequence SO:0000156 Paper: vans GA et al. High efficiency vectors for cosmid microcloning and genomic analysis. Gene 1989; 79(1):9-20. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. cosmid A cloning vector that is a hybrid of lambda phages and a plasmid that can be propagated as a plasmid or packaged as a phage,since they retain the lambda cos sites. SO:ma http://en.wikipedia.org/wiki/Cosmid wiki A plasmid which carries within its sequence a bacteriophage replication origin. When the host bacterium is infected with "helper" phage, a phagemid is replicated along with the phage DNA and packaged into phage capsids. http://en.wikipedia.org/wiki/Phagemid sequence phagemid vector SO:0000157 phagemid A plasmid which carries within its sequence a bacteriophage replication origin. When the host bacterium is infected with "helper" phage, a phagemid is replicated along with the phage DNA and packaged into phage capsids. SO:ma http://en.wikipedia.org/wiki/Phagemid wiki A cloning vector that utilizes the E. coli F factor. http://en.wikipedia.org/wiki/Fosmid sequence fosmid vector SO:0000158 Birren BW et al. A human chromosome 22 fosmid resource: mapping and analysis of 96 clones. Genomics 1996. fosmid A cloning vector that utilizes the E. coli F factor. SO:ma http://en.wikipedia.org/wiki/Fosmid wiki The point at which one or more contiguous nucleotides were excised. SO:1000033 http://en.wikipedia.org/wiki/Nucleotide_deletion loinc:LA6692-3 deleted_sequence nucleotide deletion nucleotide_deletion sequence SO:0000159 deletion The point at which one or more contiguous nucleotides were excised. SO:ke http://en.wikipedia.org/wiki/Nucleotide_deletion wiki loinc:LA6692-3 Deletion A linear clone derived from lambda bacteriophage. The genes involved in the lysogenic pathway are removed from the from the viral DNA. Up to 25 kb of foreign DNA can then be inserted into the lambda genome. sequence SO:0000160 lambda_clone true A linear clone derived from lambda bacteriophage. The genes involved in the lysogenic pathway are removed from the from the viral DNA. Up to 25 kb of foreign DNA can then be inserted into the lambda genome. ISBN:0-1767-2380-8 A modified base in which adenine has been methylated. methylated A methylated adenine methylated adenine base methylated adenine residue methylated_A sequence SO:0000161 methylated_adenine A modified base in which adenine has been methylated. SO:ke Consensus region of primary transcript bordering junction of splicing. A region that overlaps exactly 2 base and adjacent_to splice_junction. http://en.wikipedia.org/wiki/Splice_site splice site sequence SO:0000162 With spliceosomal introns, the splice sites bind the spliceosomal machinery. splice_site Consensus region of primary transcript bordering junction of splicing. A region that overlaps exactly 2 base and adjacent_to splice_junction. SO:cjm SO:ke http://en.wikipedia.org/wiki/Splice_site wiki Intronic 2 bp region bordering the exon, at the 5' edge of the intron. A splice_site that is downstream_adjacent_to exon and starts intron. 5' splice site donor splice site five prime splice site splice donor site sequence donor SO:0000163 five_prime_cis_splice_site Intronic 2 bp region bordering the exon, at the 5' edge of the intron. A splice_site that is downstream_adjacent_to exon and starts intron. SO:cjm SO:ke http://www.ucl.ac.uk/~ucbhjow/b241/glossary.html Intronic 2 bp region bordering the exon, at the 3' edge of the intron. A splice_site that is upstream_adjacent_to exon and finishes intron. acceptor splice site splice acceptor site three prime splice site sequence 3' splice site acceptor SO:0000164 three_prime_cis_splice_site Intronic 2 bp region bordering the exon, at the 3' edge of the intron. A splice_site that is upstream_adjacent_to exon and finishes intron. SO:cjm SO:ke http://www.ucl.ac.uk/~ucbhjow/b241/glossary.html A cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Enhancer_(genetics) INSDC_qualifier:enhancer sequence SO:0000165 An enhancer may participate in an enhanceosome GO:0034206. A protein-DNA complex formed by the association of a distinct set of general and specific transcription factors with a region of enhancer DNA. The cooperative assembly of an enhanceosome confers specificity of transcriptional regulation. This comment is a place holder should we start to make cross products with GO. enhancer A cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Enhancer_(genetics) wiki An enhancer bound by a factor. enhancer bound by factor sequence SO:0000166 enhancer_bound_by_factor An enhancer bound by a factor. SO:xp A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the core transcription machinery. A region (DNA) to which RNA polymerase binds, to begin transcription. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Promoter INSDC_qualifier:promoter promoter sequence sequence SO:0000167 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. The region on a DNA molecule involved in RNA polymerase binding to initiate transcription. Moved from is_a: SO:0001055 transcriptional_cis_regulatory_region as per request from GREEKC initiative in August 2020. Merged with RNA_polymerase_promoter (SO:0001203) Aug 2020. Moved up one level from is_a CRM (SO:0000727) to is_a transcriptional_cis_regulatory_region (SO:0001055) as part of the GREEKC work January 2021. Pascale Gaudet from Gene Ontology pointed out that CRM can be located upstream of the promoter and therefore cannot include the promoter. promoter A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the core transcription machinery. A region (DNA) to which RNA polymerase binds, to begin transcription. SO:regcreative http://en.wikipedia.org/wiki/Promoter wiki A specific nucleotide sequence of DNA at or near which a particular restriction enzyme cuts the DNA. sequence SO:0000168 restriction_enzyme_cut_site true A specific nucleotide sequence of DNA at or near which a particular restriction enzyme cuts the DNA. SO:ma A DNA sequence in eukaryotic DNA to which RNA polymerase I binds, to begin transcription. RNA polymerase A promoter RNApol I promoter pol I promoter polymerase I promoter sequence SO:0000169 parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221. RNApol_I_promoter A DNA sequence in eukaryotic DNA to which RNA polymerase I binds, to begin transcription. SO:ke A DNA sequence in eukaryotic DNA to which RNA polymerase II binds, to begin transcription. RNA polymerase B promoter RNApol II promoter polymerase II promoter sequence pol II promoter SO:0000170 parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221. RNApol_II_promoter A DNA sequence in eukaryotic DNA to which RNA polymerase II binds, to begin transcription. SO:ke A DNA sequence in eukaryotic DNA to which RNA polymerase III binds, to begin transcription. RNA polymerase C promoter RNApol III promoter pol III promoter polymerase III promoter sequence SO:0000171 parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221. RNApol_III_promoter A DNA sequence in eukaryotic DNA to which RNA polymerase III binds, to begin transcription. SO:ke Part of a conserved sequence located about 75-bp upstream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG(C|T)CAATCT. INSDC_feature:regulatory http://en.wikipedia.org/wiki/CAAT_box CAAT box CAAT signal CAAT-box INSDC_qualifier:CAAT_signal sequence SO:0000172 CAAT_signal Part of a conserved sequence located about 75-bp upstream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG(C|T)CAATCT. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/CAAT_box wiki A conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG. INSDC_feature:regulatory GC rich promoter region GC-rich region INSDC_qualifier:GC_rich_promoter_region sequence SO:0000173 GC_rich_promoter_region A conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG. http://www.insdc.org/files/feature_table.html A conserved AT-rich septamer found about 25-bp before the start point of many eukaryotic RNA polymerase II transcript units; may be involved in positioning the enzyme for correct initiation; consensus=TATA(A|T)A(A|T). INSDC_feature:regulatory http://en.wikipedia.org/wiki/TATA_box Goldstein-Hogness box INSDC_qualifier:TATA_box TATA box sequence SO:0000174 Binds TBP. TATA_box A conserved AT-rich septamer found about 25-bp before the start point of many eukaryotic RNA polymerase II transcript units; may be involved in positioning the enzyme for correct initiation; consensus=TATA(A|T)A(A|T). PMID:16858867 http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/TATA_box wiki A conserved region about 10-bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT. This region is associated with sigma factor 70. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Pribnow_box -10 signal INSDC_qualifier:minus_10_signal Pribnow Schaller box Pribnow box Pribnow-Schaller box minus 10 signal sequence SO:0000175 Changed from is_a SO:0000713 DNA_motif to is_a SO:0002312 core_prokaryotic_promoter_element in response to GREEKC Initiative Dave Sant Aug 2020. Changed from is_a SO:0002312 core_prokaryotic_promoter_element back to is_a SO:0000713 DNA_motif to be consistent with minus_12_signal and minus_24_signal on 12 July 2021. minus_10_signal A conserved region about 10-bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT. This region is associated with sigma factor 70. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Pribnow_box wiki A conserved hexamer about 35-bp upstream of the start point of bacterial transcription units; consensus=TTGACa or TGTTGACA. This region is associated with sigma factor 70. INSDC_feature:regulatory -35 signal INSDC_qualifier:minus_35_signal minus 35 signal sequence SO:0000176 Changed from is_a SO:0000713 DNA_motif to is_a SO:0002312 core_prokaryotic_promoter_element in response to GREEKC Initiative Dave Sant Aug 2020. Changed from is_a SO:0002312 core_prokaryotic_promoter_element back to is_a SO:0000713 DNA_motif to be consistent with minus_12_signal and minus_24_signal on 12 July 2021. minus_35_signal A conserved hexamer about 35-bp upstream of the start point of bacterial transcription units; consensus=TTGACa or TGTTGACA. This region is associated with sigma factor 70. http://www.insdc.org/files/feature_table.html A nucleotide match against a sequence from another organism. cross genome match sequence SO:0000177 cross_genome_match A nucleotide match against a sequence from another organism. SO:ma The DNA region of a group of adjacent genes whose transcription is coordinated on one or several mutually overlapping transcription units transcribed in the same direction and sharing at least one gene. http://en.wikipedia.org/wiki/Operon INSDC_feature:operon sequence SO:0000178 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. Definition updated with per Mejia-Almonte et.al Redefining fundamental concepts of transcription initiation in prokaryotes Aug 5 2020. operon The DNA region of a group of adjacent genes whose transcription is coordinated on one or several mutually overlapping transcription units transcribed in the same direction and sharing at least one gene. SO:ma http://en.wikipedia.org/wiki/Operon wiki The start of the clone insert. clone insert start sequence SO:0000179 clone_insert_start The start of the clone insert. SO:ke A transposable element that is incorporated into a chromosome by a mechanism that requires reverse transcriptase. http://en.wikipedia.org/wiki/Retrotransposon class I transposon retrotransposon element sequence class I SO:0000180 retrotransposon A transposable element that is incorporated into a chromosome by a mechanism that requires reverse transcriptase. http://www.dddmag.com/Glossary.aspx#r http://en.wikipedia.org/wiki/Retrotransposon wiki A match against a translated sequence. translated nucleotide match sequence SO:0000181 translated_nucleotide_match A match against a translated sequence. SO:ke A transposon where the mechanism of transposition is via a DNA intermediate. DNA transposon class II transposon sequence class II SO:0000182 DNA_transposon A transposon where the mechanism of transposition is via a DNA intermediate. SO:ke A region of the gene which is not transcribed. non transcribed region non-transcribed sequence nontranscribed region nontranscribed sequence sequence SO:0000183 non_transcribed_region A region of the gene which is not transcribed. SO:ke A major type of spliceosomal intron spliced by the U2 spliceosome, that includes U1, U2, U4/U6 and U5 snRNAs. U2 intron sequence SO:0000184 May have either GT-AG or AT-AG 5' and 3' boundaries. U2_intron A major type of spliceosomal intron spliced by the U2 spliceosome, that includes U1, U2, U4/U6 and U5 snRNAs. PMID:9428511 A transcript that in its initial state requires modification to be functional. http://en.wikipedia.org/wiki/Primary_transcript INSDC_feature:precursor_RNA INSDC_feature:prim_transcript precursor RNA primary transcript sequence SO:0000185 primary_transcript A transcript that in its initial state requires modification to be functional. SO:ma http://en.wikipedia.org/wiki/Primary_transcript wiki A retrotransposon flanked by long terminal repeat sequences. LTR retrotransposon long terminal repeat retrotransposon sequence SO:0000186 LTR_retrotransposon A retrotransposon flanked by long terminal repeat sequences. SO:ke A group of characterized repeat sequences. sequence SO:0000187 repeat_family true A group of characterized repeat sequences. SO:ke A region of a primary transcript that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. http://en.wikipedia.org/wiki/Intron INSDC_feature:intron sequence SO:0000188 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. intron A region of a primary transcript that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Intron wiki A retrotransposon without long terminal repeat sequences. non LTR retrotransposon sequence SO:0000189 non_LTR_retrotransposon A retrotransposon without long terminal repeat sequences. SO:ke An intron that is the most 5-prime in a given transcript. 5' intron 5' intron sequence five prime intron sequence SO:0000190 five_prime_intron An intron that is not the most 3-prime or the most 5-prime in a given transcript. interior intron sequence SO:0000191 interior_intron An intron that is the most 3-prime in a given transcript. 3' intron three prime intron sequence 3' intron sequence SO:0000192 three_prime_intron A DNA fragment used as a reagent to detect the polymorphic genomic loci by hybridizing against the genomic DNA digested with a given restriction enzyme. http://en.wikipedia.org/wiki/Restriction_fragment_length_polymorphism RFLP RFLP fragment restriction fragment length polymorphism sequence SO:0000193 RFLP_fragment A DNA fragment used as a reagent to detect the polymorphic genomic loci by hybridizing against the genomic DNA digested with a given restriction enzyme. GOC:pj http://en.wikipedia.org/wiki/Restriction_fragment_length_polymorphism wiki A dispersed repeat family with many copies, each from 1 to 6 kb long. New elements are generated by retroposition of a transcribed copy. Typically the LINE contains 2 ORF's one of which is reverse transcriptase, and 3'and 5' direct repeats. LINE LINE element Long interspersed element Long interspersed nuclear element sequence SO:0000194 LINE_element A dispersed repeat family with many copies, each from 1 to 6 kb long. New elements are generated by retroposition of a transcribed copy. Typically the LINE contains 2 ORF's one of which is reverse transcriptase, and 3'and 5' direct repeats. http://www.ucl.ac.uk/~ucbhjow/b241/glossary.html An exon whereby at least one base is part of a codon (here, 'codon' is inclusive of the stop_codon). coding exon sequence SO:0000195 coding_exon An exon whereby at least one base is part of a codon (here, 'codon' is inclusive of the stop_codon). SO:ke The sequence of the five_prime_coding_exon that codes for protein. five prime exon coding region sequence SO:0000196 five_prime_coding_exon_coding_region The sequence of the five_prime_coding_exon that codes for protein. SO:cjm The sequence of the three_prime_coding_exon that codes for protein. three prime exon coding region sequence SO:0000197 three_prime_coding_exon_coding_region The sequence of the three_prime_coding_exon that codes for protein. SO:cjm An exon that does not contain any codons. noncoding exon sequence SO:0000198 noncoding_exon An exon that does not contain any codons. SO:ke A region of nucleotide sequence that has translocated to a new position. The observed adjacency of two previously separated regions. translocated sequence sequence transchr SO:0000199 translocation A region of nucleotide sequence that has translocated to a new position. The observed adjacency of two previously separated regions. NCBI:th SO:ke transchr http://www.ncbi.nlm.nih.gov/dbvar/ The 5' most coding exon. 5' coding exon five prime coding exon sequence SO:0000200 five_prime_coding_exon The 5' most coding exon. SO:ke An exon that is bounded by 5' and 3' splice sites. interior exon sequence SO:0000201 interior_exon An exon that is bounded by 5' and 3' splice sites. PMID:10373547 The coding exon that is most 3-prime on a given transcript. three prime coding exon sequence 3' coding exon SO:0000202 three_prime_coding_exon The coding exon that is most 3-prime on a given transcript. SO:ma Messenger RNA sequences that are untranslated and lie five prime or three prime to sequences which are translated. untranslated region sequence SO:0000203 UTR Messenger RNA sequences that are untranslated and lie five prime or three prime to sequences which are translated. SO:ke A region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein. http://en.wikipedia.org/wiki/5'_UTR 5' UTR INSDC_feature:5'UTR five prime UTR five_prime_untranslated_region sequence SO:0000204 five_prime_UTR A region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/5'_UTR wiki A region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein. http://en.wikipedia.org/wiki/Three_prime_untranslated_region INSDC_feature:3'UTR three prime UTR three prime untranslated region sequence SO:0000205 three_prime_UTR A region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Three_prime_untranslated_region wiki A repetitive element, a few hundred base pairs long, that is dispersed throughout the genome. A common human SINE is the Alu element. http://en.wikipedia.org/wiki/Short_interspersed_nuclear_element SINE element Short interspersed element Short interspersed nuclear element sequence SO:0000206 SINE_element A repetitive element, a few hundred base pairs long, that is dispersed throughout the genome. A common human SINE is the Alu element. SO:ke http://en.wikipedia.org/wiki/Short_interspersed_nuclear_element wiki SSLP are a kind of sequence alteration where the number of repeated sequences in intergenic regions may differ. http://en.wikipedia.org/wiki/Simple_sequence_length_polymorphism simple sequence length variation sequence SSLP simple sequence length polymorphism SO:0000207 simple_sequence_length_variation SSLP are a kind of sequence alteration where the number of repeated sequences in intergenic regions may differ. SO:ke http://en.wikipedia.org/wiki/Simple_sequence_length_polymorphism WIKI A DNA transposable element defined as having termini with perfect, or nearly perfect short inverted repeats, generally 10 - 40 nucleotides long. TIR element terminal inverted repeat element sequence SO:0000208 terminal_inverted_repeat_element A DNA transposable element defined as having termini with perfect, or nearly perfect short inverted repeats, generally 10 - 40 nucleotides long. http://www.genetics.org/cgi/reprint/156/4/1983.pdf A primary transcript encoding a ribosomal RNA. rRNA primary transcript ribosomal RNA primary transcript sequence SO:0000209 rRNA_primary_transcript A primary transcript encoding a ribosomal RNA. SO:ke A primary transcript encoding a transfer RNA (SO:0000253). tRNA primary transcript sequence SO:0000210 tRNA_primary_transcript A primary transcript encoding a transfer RNA (SO:0000253). SO:ke A primary transcript encoding alanyl tRNA. alanine tRNA primary transcript sequence SO:0000211 alanine_tRNA_primary_transcript A primary transcript encoding alanyl tRNA. SO:ke A primary transcript encoding arginyl tRNA (SO:0000255). arginine tRNA primary transcript sequence SO:0000212 arginine_tRNA_primary_transcript A primary transcript encoding arginyl tRNA (SO:0000255). SO:ke A primary transcript encoding asparaginyl tRNA (SO:0000256). asparagine tRNA primary transcript sequence SO:0000213 asparagine_tRNA_primary_transcript A primary transcript encoding asparaginyl tRNA (SO:0000256). SO:ke A primary transcript encoding aspartyl tRNA (SO:0000257). aspartic acid tRNA primary transcript sequence SO:0000214 aspartic_acid_tRNA_primary_transcript A primary transcript encoding aspartyl tRNA (SO:0000257). SO:ke A primary transcript encoding cysteinyl tRNA (SO:0000258). cysteine tRNA primary transcript sequence SO:0000215 cysteine_tRNA_primary_transcript A primary transcript encoding cysteinyl tRNA (SO:0000258). SO:ke A primary transcript encoding glutaminyl tRNA (SO:0000260). glutamic acid tRNA primary transcript sequence SO:0000216 glutamic_acid_tRNA_primary_transcript A primary transcript encoding glutaminyl tRNA (SO:0000260). SO:ke A primary transcript encoding glutamyl tRNA (SO:0000260). glutamine tRNA primary transcript sequence SO:0000217 glutamine_tRNA_primary_transcript A primary transcript encoding glutamyl tRNA (SO:0000260). SO:ke A primary transcript encoding glycyl tRNA (SO:0000263). glycine tRNA primary transcript sequence SO:0000218 glycine_tRNA_primary_transcript A primary transcript encoding glycyl tRNA (SO:0000263). SO:ke A primary transcript encoding histidyl tRNA (SO:0000262). histidine tRNA primary transcript sequence SO:0000219 histidine_tRNA_primary_transcript A primary transcript encoding histidyl tRNA (SO:0000262). SO:ke A primary transcript encoding isoleucyl tRNA (SO:0000263). isoleucine tRNA primary transcript sequence SO:0000220 isoleucine_tRNA_primary_transcript A primary transcript encoding isoleucyl tRNA (SO:0000263). SO:ke A primary transcript encoding leucyl tRNA (SO:0000264). leucine tRNA primary transcript sequence SO:0000221 leucine_tRNA_primary_transcript A primary transcript encoding leucyl tRNA (SO:0000264). SO:ke A primary transcript encoding lysyl tRNA (SO:0000265). lysine tRNA primary transcript sequence SO:0000222 lysine_tRNA_primary_transcript A primary transcript encoding lysyl tRNA (SO:0000265). SO:ke A primary transcript encoding methionyl tRNA (SO:0000266). methionine tRNA primary transcript sequence SO:0000223 methionine_tRNA_primary_transcript A primary transcript encoding methionyl tRNA (SO:0000266). SO:ke A primary transcript encoding phenylalanyl tRNA (SO:0000267). phenylalanine tRNA primary transcript sequence SO:0000224 phenylalanine_tRNA_primary_transcript A primary transcript encoding phenylalanyl tRNA (SO:0000267). SO:ke A primary transcript encoding prolyl tRNA (SO:0000268). proline tRNA primary transcript sequence SO:0000225 proline_tRNA_primary_transcript A primary transcript encoding prolyl tRNA (SO:0000268). SO:ke A primary transcript encoding seryl tRNA (SO:000269). serine tRNA primary transcript sequence SO:0000226 serine_tRNA_primary_transcript A primary transcript encoding seryl tRNA (SO:000269). SO:ke A primary transcript encoding threonyl tRNA (SO:000270). threonine tRNA primary transcript sequence SO:0000227 threonine_tRNA_primary_transcript A primary transcript encoding threonyl tRNA (SO:000270). SO:ke A primary transcript encoding tryptophanyl tRNA (SO:000271). tryptophan tRNA primary transcript sequence SO:0000228 tryptophan_tRNA_primary_transcript A primary transcript encoding tryptophanyl tRNA (SO:000271). SO:ke A primary transcript encoding tyrosyl tRNA (SO:000272). tyrosine tRNA primary transcript sequence SO:0000229 tyrosine_tRNA_primary_transcript A primary transcript encoding tyrosyl tRNA (SO:000272). SO:ke A primary transcript encoding valyl tRNA (SO:000273). valine tRNA primary transcript sequence SO:0000230 valine_tRNA_primary_transcript A primary transcript encoding valyl tRNA (SO:000273). SO:ke A primary transcript encoding a small nuclear RNA (SO:0000274). snRNA primary transcript sequence SO:0000231 snRNA_primary_transcript A primary transcript encoding a small nuclear RNA (SO:0000274). SO:ke A primary transcript encoding one or more small nucleolar RNAs (SO:0000275). snoRNA primary transcript sequence SO:0000232 This definition was broadened 26 Jan 2021 to reflect that a single transcript can encode one or more snoRNAs. Brought to our attention by FlyBase. GitHub Issue #520 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/520). snoRNA_primary_transcript A primary transcript encoding one or more small nucleolar RNAs (SO:0000275). SO:ke A transcript which has undergone the necessary modifications, if any, for its function. In eukaryotes this includes, for example, processing of introns, cleavage, base modification, and modifications to the 5' and/or the 3' ends, other than addition of bases. In bacteria functional mRNAs are usually not modified. http://en.wikipedia.org/wiki/Mature_transcript mature transcript sequence SO:0000233 A processed transcript cannot contain introns. mature_transcript A transcript which has undergone the necessary modifications, if any, for its function. In eukaryotes this includes, for example, processing of introns, cleavage, base modification, and modifications to the 5' and/or the 3' ends, other than addition of bases. In bacteria functional mRNAs are usually not modified. SO:ke http://en.wikipedia.org/wiki/Mature_transcript wiki Messenger RNA is the intermediate molecule between DNA and protein. It includes UTR and coding sequences. It does not contain introns. http://en.wikipedia.org/wiki/MRNA http://www.gencodegenes.org/gencode_biotypes.html INSDC_feature:mRNA messenger RNA protein_coding_transcript sequence SO:0000234 An mRNA does not contain introns as it is a processed_transcript. The equivalent kind of primary_transcript is protein_coding_primary_transcript (SO:0000120) which may contain introns. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. mRNA Messenger RNA is the intermediate molecule between DNA and protein. It includes UTR and coding sequences. It does not contain introns. SO:ma http://en.wikipedia.org/wiki/MRNA wiki http://www.gencodegenes.org/gencode_biotypes.html GENCODE A DNA site where a transcription factor binds. TF binding site transcription factor binding site sequence SO:0000235 Definition updated along with definitions in Mejia-Almonte et.al PMID:32665585. Added relationship part_of SO:0000727 CRM in place of previous CRM relationship has_part TF_binding_site August 2020 in response to requests from GREEKC initiative. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527. TF_binding_site A DNA site where a transcription factor binds. SO:ke The in-frame interval between the stop codons of a reading frame which when read as sequential triplets, has the potential of encoding a sequential string of amino acids. TER(NNN)nTER. open reading frame sequence SO:0000236 The definition was modified by Rama. ORF is defined by the sequence, whereas the CDS is defined according to whether a polypeptide is made. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. ORF The in-frame interval between the stop codons of a reading frame which when read as sequential triplets, has the potential of encoding a sequential string of amino acids. TER(NNN)nTER. SGD:rb SO:ma An attribute describing a transcript. transcript attribute sequence SO:0000237 transcript_attribute A transposable element with extensive secondary structure, characterized by large modular imperfect long inverted repeats. foldback element sequence LVR element long inverted repeat element SO:0000238 foldback_element A transposable element with extensive secondary structure, characterized by large modular imperfect long inverted repeats. http://www.genetics.org/cgi/reprint/156/4/1983.pdf The sequences extending on either side of a specific region. flanking region sequence SO:0000239 flanking_region The sequences extending on either side of a specific region. SO:ke A deviation in chromosome structure or number. chromosome variation sequence SO:0000240 chromosome_variation A UTR bordered by the terminal and initial codons of two CDSs in a polycistronic transcript. Every UTR is either 5', 3' or internal. internal UTR sequence SO:0000241 internal_UTR A UTR bordered by the terminal and initial codons of two CDSs in a polycistronic transcript. Every UTR is either 5', 3' or internal. SO:cjm The untranslated sequence separating the 'cistrons' of multicistronic mRNA. untranslated region polycistronic mRNA sequence SO:0000242 untranslated_region_polycistronic_mRNA The untranslated sequence separating the 'cistrons' of multicistronic mRNA. SO:ke Sequence element that recruits a ribosomal subunit to internal mRNA for translation initiation. http://en.wikipedia.org/wiki/Internal_ribosome_entry_site IRES internal ribosomal entry sequence internal ribosomal entry site internal ribosome entry site sequence internal ribosome entry sequence SO:0000243 internal_ribosome_entry_site Sequence element that recruits a ribosomal subunit to internal mRNA for translation initiation. SO:ke http://en.wikipedia.org/wiki/Internal_ribosome_entry_site wiki sequence 4-cutter_restriction_site four-cutter_restriction_sit SO:0000244 four_cutter_restriction_site true sequence SO:0000245 mRNA_by_polyadenylation_status true A attribute describing the addition of a poly A tail to the 3' end of a mRNA molecule. sequence SO:0000246 polyadenylated A attribute describing the addition of a poly A tail to the 3' end of a mRNA molecule. SO:ke sequence SO:0000247 mRNA_not_polyadenylated true A kind of kind of sequence alteration where the copies of a region present varies across a population. sequence length alteration sequence SO:0000248 sequence_length_alteration A kind of kind of sequence alteration where the copies of a region present varies across a population. SO:ke sequence 6-cutter_restriction_site six-cutter_restriction_site SO:0000249 six_cutter_restriction_site true A post_transcriptionally modified base. modified RNA base feature sequence SO:0000250 modified_RNA_base_feature A post_transcriptionally modified base. SO:ke sequence 8-cutter_restriction_site eight-cutter_restriction_site SO:0000251 eight_cutter_restriction_site true rRNA is an RNA component of a ribosome that can provide both structural scaffolding and catalytic activity. INSDC_qualifier:unknown http://en.wikipedia.org/wiki/RRNA INSDC_feature:rRNA ribosomal RNA ribosomal ribonucleic acid sequence SO:0000252 Definition updated 10 June 2021 as part of restructuring rRNA terms and reforming definitions to have similar structures. Request from EBI. See GitHub Issue #493 rRNA rRNA is an RNA component of a ribosome that can provide both structural scaffolding and catalytic activity. ISBN:0198506732 http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/RRNA wiki Transfer RNA (tRNA) molecules are approximately 80 nucleotides in length. Their secondary structure includes four short double-helical elements and three loops (D, anti-codon, and T loops). Further hydrogen bonds mediate the characteristic L-shaped molecular structure. Transfer RNAs have two regions of fundamental functional importance: the anti-codon, which is responsible for specific mRNA codon recognition, and the 3' end, to which the tRNA's corresponding amino acid is attached (by aminoacyl-tRNA synthetases). Transfer RNAs cope with the degeneracy of the genetic code in two manners: having more than one tRNA (with a specific anti-codon) for a particular amino acid; and 'wobble' base-pairing, i.e. permitting non-standard base-pairing at the 3rd anti-codon position. INSDC_qualifier:unknown http://en.wikipedia.org/wiki/TRNA INSDC_feature:tRNA sequence transfer RNA transfer ribonucleic acid SO:0000253 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. tRNA Transfer RNA (tRNA) molecules are approximately 80 nucleotides in length. Their secondary structure includes four short double-helical elements and three loops (D, anti-codon, and T loops). Further hydrogen bonds mediate the characteristic L-shaped molecular structure. Transfer RNAs have two regions of fundamental functional importance: the anti-codon, which is responsible for specific mRNA codon recognition, and the 3' end, to which the tRNA's corresponding amino acid is attached (by aminoacyl-tRNA synthetases). Transfer RNAs cope with the degeneracy of the genetic code in two manners: having more than one tRNA (with a specific anti-codon) for a particular amino acid; and 'wobble' base-pairing, i.e. permitting non-standard base-pairing at the 3rd anti-codon position. ISBN:0198506732 http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00005 http://en.wikipedia.org/wiki/TRNA wiki A tRNA sequence that has an alanine anticodon, and a 3' alanine binding region. alanyl tRNA alanyl-transfer RNA alanyl-transfer ribonucleic acid sequence SO:0000254 alanyl_tRNA A tRNA sequence that has an alanine anticodon, and a 3' alanine binding region. SO:ke A primary transcript encoding a small ribosomal subunit RNA. rRNA small subunit primary transcript sequence SO:0000255 rRNA_small_subunit_primary_transcript A primary transcript encoding a small ribosomal subunit RNA. SO:ke A tRNA sequence that has an asparagine anticodon, and a 3' asparagine binding region. asparaginyl tRNA asparaginyl-transfer RNA asparaginyl-transfer ribonucleic acid sequence SO:0000256 asparaginyl_tRNA A tRNA sequence that has an asparagine anticodon, and a 3' asparagine binding region. SO:ke A tRNA sequence that has an aspartic acid anticodon, and a 3' aspartic acid binding region. aspartyl tRNA aspartyl-transfer RNA aspartyl-transfer ribonucleic acid sequence SO:0000257 aspartyl_tRNA A tRNA sequence that has an aspartic acid anticodon, and a 3' aspartic acid binding region. SO:ke A tRNA sequence that has a cysteine anticodon, and a 3' cysteine binding region. cysteinyl tRNA cysteinyl-transfer RNA cysteinyl-transfer ribonucleic acid sequence SO:0000258 cysteinyl_tRNA A tRNA sequence that has a cysteine anticodon, and a 3' cysteine binding region. SO:ke A tRNA sequence that has a glutamine anticodon, and a 3' glutamine binding region. glutaminyl tRNA glutaminyl-transfer RNA glutaminyl-transfer ribonucleic acid sequence SO:0000259 glutaminyl_tRNA A tRNA sequence that has a glutamine anticodon, and a 3' glutamine binding region. SO:ke A tRNA sequence that has a glutamic acid anticodon, and a 3' glutamic acid binding region. glutamyl tRNA glutamyl-transfer ribonucleic acid sequence glutamyl-transfer RNA SO:0000260 glutamyl_tRNA A tRNA sequence that has a glutamic acid anticodon, and a 3' glutamic acid binding region. SO:ke A tRNA sequence that has a glycine anticodon, and a 3' glycine binding region. glycyl tRNA sequence glycyl-transfer RNA glycyl-transfer ribonucleic acid SO:0000261 glycyl_tRNA A tRNA sequence that has a glycine anticodon, and a 3' glycine binding region. SO:ke A tRNA sequence that has a histidine anticodon, and a 3' histidine binding region. histidyl tRNA histidyl-transfer RNA histidyl-transfer ribonucleic acid sequence SO:0000262 histidyl_tRNA A tRNA sequence that has a histidine anticodon, and a 3' histidine binding region. SO:ke A tRNA sequence that has an isoleucine anticodon, and a 3' isoleucine binding region. isoleucyl tRNA isoleucyl-transfer RNA isoleucyl-transfer ribonucleic acid sequence SO:0000263 isoleucyl_tRNA A tRNA sequence that has an isoleucine anticodon, and a 3' isoleucine binding region. SO:ke A tRNA sequence that has a leucine anticodon, and a 3' leucine binding region. leucyl tRNA leucyl-transfer RNA leucyl-transfer ribonucleic acid sequence SO:0000264 leucyl_tRNA A tRNA sequence that has a leucine anticodon, and a 3' leucine binding region. SO:ke A tRNA sequence that has a lysine anticodon, and a 3' lysine binding region. lysyl tRNA lysyl-transfer RNA lysyl-transfer ribonucleic acid sequence SO:0000265 lysyl_tRNA A tRNA sequence that has a lysine anticodon, and a 3' lysine binding region. SO:ke A tRNA sequence that has a methionine anticodon, and a 3' methionine binding region. methionyl tRNA methionyl-transfer RNA methionyl-transfer ribonucleic acid sequence SO:0000266 methionyl_tRNA A tRNA sequence that has a methionine anticodon, and a 3' methionine binding region. SO:ke A tRNA sequence that has a phenylalanine anticodon, and a 3' phenylalanine binding region. phenylalanyl tRNA phenylalanyl-transfer RNA phenylalanyl-transfer ribonucleic acid sequence SO:0000267 phenylalanyl_tRNA A tRNA sequence that has a phenylalanine anticodon, and a 3' phenylalanine binding region. SO:ke A tRNA sequence that has a proline anticodon, and a 3' proline binding region. prolyl tRNA prolyl-transfer RNA prolyl-transfer ribonucleic acid sequence SO:0000268 prolyl_tRNA A tRNA sequence that has a proline anticodon, and a 3' proline binding region. SO:ke A tRNA sequence that has a serine anticodon, and a 3' serine binding region. seryl tRNA seryl-transfer RNA sequence seryl-transfer ribonucleic acid SO:0000269 seryl_tRNA A tRNA sequence that has a serine anticodon, and a 3' serine binding region. SO:ke A tRNA sequence that has a threonine anticodon, and a 3' threonine binding region. threonyl tRNA threonyl-transfer ribonucleic acid sequence threonyl-transfer RNA SO:0000270 threonyl_tRNA A tRNA sequence that has a threonine anticodon, and a 3' threonine binding region. SO:ke A tRNA sequence that has a tryptophan anticodon, and a 3' tryptophan binding region. tryptophanyl tRNA tryptophanyl-transfer RNA tryptophanyl-transfer ribonucleic acid sequence SO:0000271 tryptophanyl_tRNA A tRNA sequence that has a tryptophan anticodon, and a 3' tryptophan binding region. SO:ke A tRNA sequence that has a tyrosine anticodon, and a 3' tyrosine binding region. tyrosyl tRNA tyrosyl-transfer ribonucleic acid sequence tyrosyl-transfer RNA SO:0000272 tyrosyl_tRNA A tRNA sequence that has a tyrosine anticodon, and a 3' tyrosine binding region. SO:ke A tRNA sequence that has a valine anticodon, and a 3' valine binding region. valyl tRNA valyl-transfer ribonucleic acid sequence valyl-transfer RNA SO:0000273 valyl_tRNA A tRNA sequence that has a valine anticodon, and a 3' valine binding region. SO:ke A small nuclear RNA molecule involved in pre-mRNA splicing and processing. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/SnRNA INSDC_qualifier:snRNA small nuclear RNA sequence SO:0000274 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. snRNA A small nuclear RNA molecule involved in pre-mRNA splicing and processing. PMID:11733745 WB:ems http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/SnRNA wiki Small nucleolar RNAs (snoRNAs) are short non-coding RNAs enriched in the nucleolus as components of small nucleolar ribonucleoproteins. They guide ribose methylation and pseudouridylation of rRNAs and snRNAs, and a subgroup regulate excision of rRNAs from rRNA precursor transcripts. snoRNAs may also guide rRNA acetylation and tRNA methylation, and regulate mRNA abundance and alternative splicing. INSDC_feature:ncRNA INSDC_qualifier:snoRNA small nucleolar RNA sequence SO:0000275 Updated the definition of snoRNA (SO:0000275) from "A snoRNA (small nucleolar RNA) is any one of a class of small RNAs that are associated with the eukaryotic nucleus as components of small nucleolar ribonucleoproteins. They participate in the processing or modifications of many RNAs, mostly ribosomal RNAs (rRNAs) though snoRNAs are also known to target other classes of RNA, including spliceosomal RNAs, tRNAs, and mRNAs via a stretch of sequence that is complementary to a sequence in the targeted RNA." to "Small nucleolar RNAs (snoRNAs) are short non-coding RNAs enriched in the nucleolus as components of small nucleolar ribonucleoproteins. They guide ribose methylation and pseudouridylation of rRNAs and snRNAs, and a subgroup regulate excision of rRNAs from rRNA precursor transcripts. snoRNAs may also guide rRNA acetylation and tRNA methylation, and regulate mRNA abundance and alternative splicing." to acknowledge that some snoRNAs functionally localize to other compartments (cytoplasm or even secreted). See GitHub Issue #578. snoRNA Small nucleolar RNAs (snoRNAs) are short non-coding RNAs enriched in the nucleolus as components of small nucleolar ribonucleoproteins. They guide ribose methylation and pseudouridylation of rRNAs and snRNAs, and a subgroup regulate excision of rRNAs from rRNA precursor transcripts. snoRNAs may also guide rRNA acetylation and tRNA methylation, and regulate mRNA abundance and alternative splicing. GOC:kgc PMID:31828325 Small, ~22-nt, RNA molecule that is the endogenous transcript of a miRNA gene (or the product of other non coding RNA genes. Micro RNAs are produced from precursor molecules (SO:0001244) that can form local hairpin structures, which ordinarily are processed (usually via the Dicer pathway) such that a single miRNA molecule accumulates from one arm of a hairpin precursor molecule. Micro RNAs may trigger the cleavage of their target molecules or act as translational repressors. SO:0000649 INSDC_feature:ncRNA http://en.wikipedia.org/wiki/MiRNA http://en.wikipedia.org/wiki/StRNA INSDC_qualifier:miRNA micro RNA microRNA small temporal RNA stRNA sequence SO:0000276 miRNA Small, ~22-nt, RNA molecule that is the endogenous transcript of a miRNA gene (or the product of other non coding RNA genes. Micro RNAs are produced from precursor molecules (SO:0001244) that can form local hairpin structures, which ordinarily are processed (usually via the Dicer pathway) such that a single miRNA molecule accumulates from one arm of a hairpin precursor molecule. Micro RNAs may trigger the cleavage of their target molecules or act as translational repressors. PMID:11081512 PMID:12592000 http://en.wikipedia.org/wiki/MiRNA wiki http://en.wikipedia.org/wiki/StRNA wiki An attribute describing a sequence that is bound by another molecule. bound by factor sequence SO:0000277 Formerly called transcript_by_bound_factor. bound_by_factor An attribute describing a sequence that is bound by another molecule. SO:ke A transcript that is bound by a nucleic acid. transcript bound by nucleic acid sequence SO:0000278 Formerly called transcript_by_bound_nucleic_acid. transcript_bound_by_nucleic_acid A transcript that is bound by a nucleic acid. SO:xp A transcript that is bound by a protein. transcript bound by protein sequence SO:0000279 Formerly called transcript_by_bound_protein. transcript_bound_by_protein A transcript that is bound by a protein. SO:xp A gene that is engineered. engineered gene sequence SO:0000280 engineered_gene A gene that is engineered. SO:xp A gene that is engineered and foreign. engineered foreign gene sequence SO:0000281 engineered_foreign_gene A gene that is engineered and foreign. SO:xp An mRNA with a minus 1 frameshift. mRNA with minus 1 frameshift sequence SO:0000282 mRNA_with_minus_1_frameshift An mRNA with a minus 1 frameshift. SO:xp A transposable_element that is engineered and foreign. engineered foreign transposable element gene sequence SO:0000283 engineered_foreign_transposable_element_gene A transposable_element that is engineered and foreign. SO:xp The recognition site is bipartite and interrupted. sequence SO:0000284 type_I_enzyme_restriction_site true The recognition site is bipartite and interrupted. http://www.promega.com A gene that is foreign. foreign gene sequence SO:0000285 foreign_gene A gene that is foreign. SO:xp A sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Long_terminal_repeat INSDC_qualifier:long_terminal_repeat LTR long terminal repeat sequence direct terminal repeat SO:0000286 long_terminal_repeat A sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Long_terminal_repeat wiki A gene that is a fusion. http://en.wikipedia.org/wiki/Fusion_gene fusion gene sequence SO:0000287 fusion_gene A gene that is a fusion. SO:xp http://en.wikipedia.org/wiki/Fusion_gene wiki A fusion gene that is engineered. engineered fusion gene sequence SO:0000288 engineered_fusion_gene A fusion gene that is engineered. SO:xp A repeat_region containing repeat_units of 2 to 10 bp repeated in tandem. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Microsatellite INSDC_qualifier:microsatellite STR microsatellite locus microsatellite marker short tandem repeat sequence SO:0000289 microsatellite A repeat_region containing repeat_units of 2 to 10 bp repeated in tandem. NCBI:th http://www.informatics.jax.org/silver/glossary.shtml http://en.wikipedia.org/wiki/Microsatellite wiki STR http://www.ncbi.nlm.nih.gov/books/NBK21126/def-item/A9651/ A region of a repeating dinucleotide sequence (two bases). dinucleotide repeat microsatellite dinucleotide repeat microsatellite feature dinucleotide repeat microsatellite locus dinucleotide repeat microsatellite marker sequence SO:0000290 dinucleotide_repeat_microsatellite_feature A region of a repeating trinucleotide sequence (three bases). rinucleotide repeat microsatellite trinucleotide repeat microsatellite feature trinucleotide repeat microsatellite locus sequence dinucleotide repeat microsatellite marker SO:0000291 trinucleotide_repeat_microsatellite_feature sequence SO:0000292 repetitive_element true A repetitive element that is engineered and foreign. engineered foreign repetitive element sequence SO:0000293 engineered_foreign_repetitive_element A repetitive element that is engineered and foreign. SO:xp The sequence is complementarily repeated on the opposite strand. It is a palindrome, and it may, or may not be hyphenated. Examples: GCTGATCAGC, or GCTGA-----TCAGC. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Inverted_repeat INSDC_qualifier:inverted inverted repeat inverted repeat sequence sequence SO:0000294 inverted_repeat The sequence is complementarily repeated on the opposite strand. It is a palindrome, and it may, or may not be hyphenated. Examples: GCTGATCAGC, or GCTGA-----TCAGC. SO:ke http://en.wikipedia.org/wiki/Inverted_repeat wiki A type of spliceosomal intron spliced by the U12 spliceosome, that includes U11, U12, U4atac/U6atac and U5 snRNAs. U12 intron U12-dependent intron sequence SO:0000295 May have either GT-AC or AT-AC 5' and 3' boundaries. U12_intron A type of spliceosomal intron spliced by the U12 spliceosome, that includes U11, U12, U4atac/U6atac and U5 snRNAs. PMID:9428511 A region of nucleic acid from which replication initiates; includes sequences that are recognized by replication proteins, the site from which the first separation of complementary strands occurs, and specific replication start sites. http://en.wikipedia.org/wiki/Origin_of_replication INSDC_feature:rep_origin ori origin of replication sequence SO:0000296 origin_of_replication A region of nucleic acid from which replication initiates; includes sequences that are recognized by replication proteins, the site from which the first separation of complementary strands occurs, and specific replication start sites. NCBI:cf http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Origin_of_replication wiki Displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein. http://en.wikipedia.org/wiki/D_loop D-loop INSDC_feature:D-loop sequence displacement loop SO:0000297 Moved from is_a: SO:0000296 origin_of_replication to is_a: SO:0001411 biological_region after Terrence Murphy (INSDC) pointed out that the D loop can also refer to a loop in DNA repair, which is not an origin of replication. See GitHub Issue #417 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/417) D_loop Displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/D_loop wiki A feature where there has been exchange of genetic material in the event of mitosis or meiosis INSDC_feature:misc_recomb INSDC_qualifier:other recombination feature sequence SO:0000298 recombination_feature A location where recombination or occurs during mitosis or meiosis. specific recombination site sequence SO:0000299 specific_recombination_site A location where a gene is rearranged due to recombination during mitosis or meiosis. recombination feature of rearranged gene sequence SO:0000300 recombination_feature_of_rearranged_gene A feature where recombination has occurred for the purpose of generating a diversity in the immune system. vertebrate immune system gene recombination feature sequence SO:0000301 vertebrate_immune_system_gene_recombination_feature Recombination signal including J-heptamer, J-spacer and J-nonamer in 5' of J-region of a J-gene or J-sequence. J gene recombination feature J-RS sequence SO:0000302 J_gene_recombination_feature Recombination signal including J-heptamer, J-spacer and J-nonamer in 5' of J-region of a J-gene or J-sequence. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# Part of the primary transcript that is clipped off during processing. sequence SO:0000303 clip Part of the primary transcript that is clipped off during processing. SO:ke The recognition site is either palindromic, partially palindromic or an interrupted palindrome. Cleavage occurs within the recognition site. sequence SO:0000304 type_II_enzyme_restriction_site true The recognition site is either palindromic, partially palindromic or an interrupted palindrome. Cleavage occurs within the recognition site. http://www.promega.com A modified nucleotide, i.e. a nucleotide other than A, T, C. G. INSDC_feature:modified_base modified base site sequence SO:0000305 Modified base:<modified_base>. modified_DNA_base A modified nucleotide, i.e. a nucleotide other than A, T, C. G. http://www.insdc.org/files/feature_table.html A nucleotide modified by methylation. methylated base feature sequence SO:0000306 methylated_DNA_base_feature A nucleotide modified by methylation. SO:ke Regions of a few hundred to a few thousand bases in vertebrate genomes that are relatively GC and CpG rich; they are typically unmethylated and often found near the 5' ends of genes. http://en.wikipedia.org/wiki/CpG_island CG island CpG island sequence SO:0000307 CpG_island Regions of a few hundred to a few thousand bases in vertebrate genomes that are relatively GC and CpG rich; they are typically unmethylated and often found near the 5' ends of genes. SO:rd http://en.wikipedia.org/wiki/CpG_island wiki sequence SO:0000308 sequence_feature_locating_method true sequence SO:0000309 computed_feature true sequence SO:0000310 predicted_ab_initio_computation true . sequence SO:0000311 similar to:<sequence_id> computed_feature_by_similarity true . SO:ma Attribute to describe a feature that has been experimentally verified. experimentally determined sequence SO:0000312 experimentally_determined Attribute to describe a feature that has been experimentally verified. SO:ke A double-helical region of nucleic acid formed by base-pairing between adjacent (inverted) complementary sequences. SO:0000019 http://en.wikipedia.org/wiki/Stem_loop INSDC_feature:stem_loop RNA_hairpin_loop stem loop stem-loop sequence SO:0000313 stem_loop A double-helical region of nucleic acid formed by base-pairing between adjacent (inverted) complementary sequences. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Stem_loop wiki A repeat where the same sequence is repeated in the same direction. Example: GCTGA-followed by-GCTGA. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Direct_repeat INSDC_qualifier:direct direct repeat sequence SO:0000314 direct_repeat A repeat where the same sequence is repeated in the same direction. Example: GCTGA-followed by-GCTGA. SO:ke http://en.wikipedia.org/wiki/Direct_repeat wiki The first base where RNA polymerase begins to synthesize the RNA transcript. INSDC_feature:misc_feature INSDC_note:transcription_start_site transcription start site transcription_start_site sequence SO:0000315 Added relationship is_a SO:0002309 core_promoter_element with the creation of core_promoter_element as part of GREEKC initiative August 2020 - Dave Sant. TSS The first base where RNA polymerase begins to synthesize the RNA transcript. SO:ke A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon. INSDC_feature:CDS coding sequence coding_sequence sequence SO:0000316 CDS A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon. SO:ma Complementary DNA; A piece of DNA copied from an mRNA and spliced into a vector for propagation in a suitable host. cDNA clone sequence SO:0000317 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. cDNA_clone Complementary DNA; A piece of DNA copied from an mRNA and spliced into a vector for propagation in a suitable host. http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/mbglossary/mbgloss.html First codon to be translated by a ribosome. http://en.wikipedia.org/wiki/Start_codon initiation codon start codon sequence SO:0000318 start_codon First codon to be translated by a ribosome. SO:ke http://en.wikipedia.org/wiki/Start_codon wiki In mRNA, a set of three nucleotides that indicates the end of information for protein synthesis. http://en.wikipedia.org/wiki/Stop_codon stop codon sequence SO:0000319 stop_codon In mRNA, a set of three nucleotides that indicates the end of information for protein synthesis. SO:ke http://en.wikipedia.org/wiki/Stop_codon wiki Sequences within the intron that modulate splice site selection for some introns. intronic splice enhancer sequence SO:0000320 intronic_splice_enhancer Sequences within the intron that modulate splice site selection for some introns. SO:ke An mRNA with a plus 1 frameshift. mRNA with plus 1 frameshift sequence SO:0000321 mRNA_with_plus_1_frameshift An mRNA with a plus 1 frameshift. SO:ke A region of nucleotide sequence targeted by a nuclease enzyme that is found cleaved more than would be expected by chance. nuclease hypersensitive site sequence SO:0000322 Relationship to accessible_DNA_region added 11 Feb 2021. GREEKC pointed out that this is an assay based term, but we need a biological term for the accessible DNA. See GitHub Issue #531. nuclease_hypersensitive_site The first base to be translated into protein. coding start translation initiation site sequence translation start SO:0000323 coding_start The first base to be translated into protein. SO:ke A nucleotide sequence that may be used to identify a larger sequence. sequence SO:0000324 tag A nucleotide sequence that may be used to identify a larger sequence. SO:ke A primary transcript encoding a large ribosomal subunit RNA. 35S rRNA primary transcript rRNA large subunit primary transcript sequence SO:0000325 rRNA_large_subunit_primary_transcript A primary transcript encoding a large ribosomal subunit RNA. SO:ke A short diagnostic sequence tag, serial analysis of gene expression (SAGE), that allows the quantitative and simultaneous analysis of a large number of transcripts. SAGE tag sequence SO:0000326 SAGE_tag A short diagnostic sequence tag, serial analysis of gene expression (SAGE), that allows the quantitative and simultaneous analysis of a large number of transcripts. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=7570003&dopt=Abstract The last base to be translated into protein. It does not include the stop codon. coding end translation termination site translation_end sequence SO:0000327 coding_end The last base to be translated into protein. It does not include the stop codon. SO:ke A DNA sequence used experimentally to detect the presence or absence of a complementary nucleic acid. microarray oligo microarray oligonucleotide sequence SO:0000328 microarray_oligo An mRNA with a plus 2 frameshift. mRNA with plus 2 frameshift sequence SO:0000329 mRNA_with_plus_2_frameshift An mRNA with a plus 2 frameshift. SO:xp Region of sequence similarity by descent from a common ancestor. INSDC_feature:misc_feature http://en.wikipedia.org/wiki/Conserved_region INSDC_note:conserved_region conserved region sequence SO:0000330 conserved_region Region of sequence similarity by descent from a common ancestor. SO:ke http://en.wikipedia.org/wiki/Conserved_region wiki Short (typically a few hundred base pairs) DNA sequence that has a single occurrence in a genome and whose location and base sequence are known. INSDC_feature:STS sequence tag site sequence SO:0000331 STS Short (typically a few hundred base pairs) DNA sequence that has a single occurrence in a genome and whose location and base sequence are known. http://www.biospace.com Coding region of sequence similarity by descent from a common ancestor. coding conserved region sequence SO:0000332 coding_conserved_region Coding region of sequence similarity by descent from a common ancestor. SO:ke The boundary between two exons in a processed transcript. exon junction sequence SO:0000333 exon_junction The boundary between two exons in a processed transcript. SO:ke Non-coding region of sequence similarity by descent from a common ancestor. conserved non-coding element conserved non-coding sequence nc conserved region noncoding conserved region sequence SO:0000334 nc_conserved_region Non-coding region of sequence similarity by descent from a common ancestor. SO:ke A mRNA with a minus 2 frameshift. mRNA with minus 2 frameshift sequence SO:0000335 mRNA_with_minus_2_frameshift A mRNA with a minus 2 frameshift. SO:ke A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their "normal" paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its "normal" paralog). INSDC_feature:gene http://en.wikipedia.org/wiki/Pseudogene INSDC_qualifier:pseudo INSDC_qualifier:unknown sequence SO:0000336 pseudogene A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their "normal" paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its "normal" paralog). http://www.ucl.ac.uk/~ucbhjow/b241/glossary.html http://en.wikipedia.org/wiki/Pseudogene wiki A double stranded RNA duplex, at least 20bp long, used experimentally to inhibit gene function by RNA interference. RNAi reagent sequence SO:0000337 RNAi_reagent A double stranded RNA duplex, at least 20bp long, used experimentally to inhibit gene function by RNA interference. SO:rd A highly repetitive and short (100-500 base pair) transposable element with terminal inverted repeats (TIR) and target site duplication (TSD). MITEs do not encode proteins. miniature inverted repeat transposable element sequence SO:0000338 MITE A highly repetitive and short (100-500 base pair) transposable element with terminal inverted repeats (TIR) and target site duplication (TSD). MITEs do not encode proteins. http://www.pnas.org/cgi/content/full/97/18/10083 A region in a genome which promotes recombination. http://en.wikipedia.org/wiki/Recombination_hotspot recombination hotspot sequence SO:0000339 recombination_hotspot A region in a genome which promotes recombination. SO:rd http://en.wikipedia.org/wiki/Recombination_hotspot wiki Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication. http://en.wikipedia.org/wiki/Chromosome sequence SO:0000340 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. chromosome Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication. SO:ma http://en.wikipedia.org/wiki/Chromosome wiki A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark. http://en.wikipedia.org/wiki/Cytological_band chromosome band cytoband cytological band sequence SO:0000341 chromosome_band A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark. SO:ma http://en.wikipedia.org/wiki/Cytological_band wiki A region specifically recognised by a recombinase where recombination can occur during mitosis or meiosis. site specific recombination target region sequence SO:0000342 site_specific_recombination_target_region A region of sequence, aligned to another sequence with some statistical significance, using an algorithm such as BLAST or SIM4. sequence SO:0000343 match A region of sequence, aligned to another sequence with some statistical significance, using an algorithm such as BLAST or SIM4. SO:ke Region of a transcript that regulates splicing. splice enhancer sequence SO:0000344 splice_enhancer Region of a transcript that regulates splicing. SO:ke A tag produced from a single sequencing read from a cDNA clone or PCR product; typically a few hundred base pairs long. expressed sequence tag sequence SO:0000345 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. EST A tag produced from a single sequencing read from a cDNA clone or PCR product; typically a few hundred base pairs long. SO:ke Cre-Recombination target sequence. loxP site sequence Cre-recombination target region SO:0000346 loxP_site A match against a nucleotide sequence. nucleotide match sequence SO:0000347 nucleotide_match A match against a nucleotide sequence. SO:ke An attribute describing a sequence consisting of nucleobases bound to repeating units. The forms found in nature are deoxyribonucleic acid (DNA), where the repeating units are 2-deoxy-D-ribose rings connected to a phosphate backbone, and ribonucleic acid (RNA), where the repeating units are D-ribose rings connected to a phosphate backbone. http://en.wikipedia.org/wiki/Nucleic_acid nucleic acid sequence SO:0000348 nucleic_acid An attribute describing a sequence consisting of nucleobases bound to repeating units. The forms found in nature are deoxyribonucleic acid (DNA), where the repeating units are 2-deoxy-D-ribose rings connected to a phosphate backbone, and ribonucleic acid (RNA), where the repeating units are D-ribose rings connected to a phosphate backbone. CHEBI:33696 RSC:cb http://en.wikipedia.org/wiki/Nucleic_acid wiki A match against a protein sequence. protein match sequence SO:0000349 protein_match A match against a protein sequence. SO:ke An inversion site found on the Saccharomyces cerevisiae 2 micron plasmid. FLP recombination target region FRT site sequence SO:0000350 FRT_site An inversion site found on the Saccharomyces cerevisiae 2 micron plasmid. SO:ma An attribute to decide a sequence of nucleotides, nucleotide analogs, or amino acids that has been designed by an experimenter and which may, or may not, correspond with any natural sequence. synthetic sequence sequence SO:0000351 synthetic_sequence An attribute to decide a sequence of nucleotides, nucleotide analogs, or amino acids that has been designed by an experimenter and which may, or may not, correspond with any natural sequence. SO:ma An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a 2-deoxy-D-ribose ring connected to a phosphate backbone. sequence SO:0000352 DNA An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a 2-deoxy-D-ribose ring connected to a phosphate backbone. RSC:cb A sequence of nucleotides that has been algorithmically derived from an alignment of two or more different sequences. http://en.wikipedia.org/wiki/Sequence_assembly sequence assembly sequence SO:0000353 sequence_assembly A sequence of nucleotides that has been algorithmically derived from an alignment of two or more different sequences. SO:ma http://en.wikipedia.org/wiki/Sequence_assembly wiki A region of intronic nucleotide sequence targeted by a nuclease enzyme. group 1 intron homing endonuclease target region sequence SO:0000354 group_1_intron_homing_endonuclease_target_region A region of intronic nucleotide sequence targeted by a nuclease enzyme. SO:ke A region of the genome which is co-inherited as the result of the lack of historic recombination within it. haplotype block sequence SO:0000355 haplotype_block A region of the genome which is co-inherited as the result of the lack of historic recombination within it. SO:ma An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a D-ribose ring connected to a phosphate backbone. sequence SO:0000356 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. RNA An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a D-ribose ring connected to a phosphate backbone. RSC:cb An attribute describing a region that is bounded either side by a particular kind of region. sequence SO:0000357 flanked An attribute describing a region that is bounded either side by a particular kind of region. SO:ke true An attribute describing sequence that is flanked by Lox-P sites. http://en.wikipedia.org/wiki/Floxed sequence SO:0000359 floxed An attribute describing sequence that is flanked by Lox-P sites. SO:ke http://en.wikipedia.org/wiki/Floxed wiki A set of (usually) three nucleotide bases in a DNA or RNA sequence, which together code for a unique amino acid or the termination of translation and are contained within the CDS. http://en.wikipedia.org/wiki/Codon sequence SO:0000360 codon A set of (usually) three nucleotide bases in a DNA or RNA sequence, which together code for a unique amino acid or the termination of translation and are contained within the CDS. SO:ke http://en.wikipedia.org/wiki/Codon wiki An attribute to describe sequence that is flanked by the FLP recombinase recognition site, FRT. FRT flanked sequence SO:0000361 FRT_flanked An attribute to describe sequence that is flanked by the FLP recombinase recognition site, FRT. SO:ke A cDNA clone constructed from more than one mRNA. Usually an experimental artifact. invalidated by chimeric cDNA sequence SO:0000362 invalidated_by_chimeric_cDNA A cDNA clone constructed from more than one mRNA. Usually an experimental artifact. SO:ma A transgene that is floxed. floxed gene sequence SO:0000363 floxed_gene A transgene that is floxed. SO:xp The region of sequence surrounding a transposable element. transposable element flanking region sequence SO:0000364 transposable_element_flanking_region The region of sequence surrounding a transposable element. SO:ke A region encoding an integrase which acts at a site adjacent to it (attI_site) to insert DNA which must include but is not limited to an attC_site. http://en.wikipedia.org/wiki/Integron sequence SO:0000365 integron A region encoding an integrase which acts at a site adjacent to it (attI_site) to insert DNA which must include but is not limited to an attC_site. SO:as http://en.wikipedia.org/wiki/Integron wiki The junction where an insertion occurred. insertion site sequence SO:0000366 insertion_site The junction where an insertion occurred. SO:ke A region within an integron, adjacent to an integrase, at which site specific recombination involving an attC_site takes place. attI site sequence SO:0000367 attI_site A region within an integron, adjacent to an integrase, at which site specific recombination involving an attC_site takes place. SO:as The junction in a genome where a transposable_element has inserted. transposable element insertion site sequence SO:0000368 transposable_element_insertion_site The junction in a genome where a transposable_element has inserted. SO:ke sequence SO:0000369 integrase_coding_region true A non-coding RNA less than 200 nucleotides long, usually with a specific secondary structure, that acts to regulate gene expression. These include short ncRNAs such as piRNA, miRNA and siRNAs (among others). small regulatory ncRNA sequence SO:0000370 small_regulatory_ncRNA A non-coding RNA less than 200 nucleotides long, usually with a specific secondary structure, that acts to regulate gene expression. These include short ncRNAs such as piRNA, miRNA and siRNAs (among others). PMID:28541282 PomBase:al SO:ma A transposon that encodes function required for conjugation. conjugative transposon sequence SO:0000371 conjugative_transposon A transposon that encodes function required for conjugation. http://www.sci.sdsu.edu/~smaloy/Glossary/C.html An RNA sequence that has catalytic activity with or without an associated ribonucleoprotein. enzymatic RNA sequence SO:0000372 This was moved to be a child of transcript (SO:0000673) because some enzymatic RNA regions are part of primary transcripts and some are part of processed transcripts. Moved under ncRNA on 18 Nov 2021. See GitHub Issue #533. enzymatic_RNA An RNA sequence that has catalytic activity with or without an associated ribonucleoprotein. RSC:cb A recombinationally rearranged gene by inversion. recombinationally inverted gene sequence SO:0000373 recombinationally_inverted_gene A recombinationally rearranged gene by inversion. SO:xp An RNA with catalytic activity. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Ribozyme INSDC_qualifier:ribozyme sequence SO:0000374 ribozyme An RNA with catalytic activity. SO:ma http://en.wikipedia.org/wiki/Ribozyme wiki Cytosolic 5.8S rRNA is an RNA component of the large subunit of cytosolic ribosomes in eukaryotes. http://en.wikipedia.org/wiki/5.8S_ribosomal_RNA cytosolic 5.8S LSU rRNA cytosolic 5.8S rRNA cytosolic 5.8S ribosomal RNA cytosolic rRNA 5 8S sequence SO:0000375 Dave Sant removed '5_8S rRNA is also found in archaea.' from definition due to lack of references mentioning this on 1 Feb 2021. See GitHub Issue #505. Renamed from rRNA_5_8S to cytosolic_5_8S_rRNA on 10 June 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493. cytosolic_5_8S_rRNA Cytosolic 5.8S rRNA is an RNA component of the large subunit of cytosolic ribosomes in eukaryotes. https://rfam.xfam.org/family/RF00002 http://en.wikipedia.org/wiki/5.8S_ribosomal_RNA wiki A small (184-nt in E. coli) RNA that forms a hairpin type structure. 6S RNA associates with RNA polymerase in a highly specific manner. 6S RNA represses expression from a sigma70-dependent promoter during stationary phase. http://en.wikipedia.org/wiki/6S_RNA 6S RNA RNA 6S sequence SO:0000376 RNA_6S A small (184-nt in E. coli) RNA that forms a hairpin type structure. 6S RNA associates with RNA polymerase in a highly specific manner. 6S RNA represses expression from a sigma70-dependent promoter during stationary phase. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00013 http://en.wikipedia.org/wiki/6S_RNA wiki An enterobacterial RNA that binds the CsrA protein. The CsrB RNAs contain a conserved motif CAGGXXG that is found in up to 18 copies and has been suggested to bind CsrA. The Csr regulatory system has a strong negative regulatory effect on glycogen biosynthesis, glyconeogenesis and glycogen catabolism and a positive regulatory effect on glycolysis. In other bacteria such as Erwinia caratovara the RsmA protein has been shown to regulate the production of virulence determinants, such extracellular enzymes. RsmA binds to RsmB regulatory RNA which is also a member of this family. CsrB RsmB RNA CsrB-RsmB RNA sequence SO:0000377 CsrB_RsmB_RNA An enterobacterial RNA that binds the CsrA protein. The CsrB RNAs contain a conserved motif CAGGXXG that is found in up to 18 copies and has been suggested to bind CsrA. The Csr regulatory system has a strong negative regulatory effect on glycogen biosynthesis, glyconeogenesis and glycogen catabolism and a positive regulatory effect on glycolysis. In other bacteria such as Erwinia caratovara the RsmA protein has been shown to regulate the production of virulence determinants, such extracellular enzymes. RsmA binds to RsmB regulatory RNA which is also a member of this family. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00018 DsrA RNA regulates both transcription, by overcoming transcriptional silencing by the nucleoid-associated H-NS protein, and translation, by promoting efficient translation of the stress sigma factor, RpoS. These two activities of DsrA can be separated by mutation: the first of three stem-loops of the 85 nucleotide RNA is necessary for RpoS translation but not for anti-H-NS action, while the second stem-loop is essential for antisilencing and less critical for RpoS translation. The third stem-loop, which behaves as a transcription terminator, can be substituted by the trp transcription terminator without loss of either DsrA function. The sequence of the first stem-loop of DsrA is complementary with the upstream leader portion of RpoS messenger RNA, suggesting that pairing of DsrA with the RpoS message might be important for translational regulation. http://en.wikipedia.org/wiki/DsrA_RNA DsrA RNA sequence SO:0000378 DsrA_RNA DsrA RNA regulates both transcription, by overcoming transcriptional silencing by the nucleoid-associated H-NS protein, and translation, by promoting efficient translation of the stress sigma factor, RpoS. These two activities of DsrA can be separated by mutation: the first of three stem-loops of the 85 nucleotide RNA is necessary for RpoS translation but not for anti-H-NS action, while the second stem-loop is essential for antisilencing and less critical for RpoS translation. The third stem-loop, which behaves as a transcription terminator, can be substituted by the trp transcription terminator without loss of either DsrA function. The sequence of the first stem-loop of DsrA is complementary with the upstream leader portion of RpoS messenger RNA, suggesting that pairing of DsrA with the RpoS message might be important for translational regulation. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00014 http://en.wikipedia.org/wiki/DsrA_RNA wiki A small untranslated RNA involved in expression of the dipeptide and oligopeptide transport systems in Escherichia coli. http://en.wikipedia.org/wiki/GcvB_RNA GcvB RNA sequence SO:0000379 GcvB_RNA A small untranslated RNA involved in expression of the dipeptide and oligopeptide transport systems in Escherichia coli. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00022 http://en.wikipedia.org/wiki/GcvB_RNA wiki A small catalytic RNA motif that catalyzes self-cleavage reaction. Its name comes from its secondary structure which resembles a carpenter's hammer. The hammerhead ribozyme is involved in the replication of some viroid and some satellite RNAs. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Hammerhead_ribozyme INSDC_qualifier:hammerhead_ribozyme hammerhead ribozyme sequence SO:0000380 hammerhead_ribozyme A small catalytic RNA motif that catalyzes self-cleavage reaction. Its name comes from its secondary structure which resembles a carpenter's hammer. The hammerhead ribozyme is involved in the replication of some viroid and some satellite RNAs. PMID:2436805 http://en.wikipedia.org/wiki/Hammerhead_ribozyme wiki A group II intron that recognizes IBS1/EBS1 and IBS2/EBS2 for the 5-prime exon and gamma/gamma-prime for the 3-prime exon. group IIA intron sequence SO:0000381 group_IIA_intron A group II intron that recognizes IBS1/EBS1 and IBS2/EBS2 for the 5-prime exon and gamma/gamma-prime for the 3-prime exon. PMID:20463000 A group II intron that recognizes IBS1/EBS1 and IBS2/EBS2 for the 5-prime exon and IBS3/EBS3 for the 3-prime exon. group IIB intron sequence SO:0000382 group_IIB_intron A group II intron that recognizes IBS1/EBS1 and IBS2/EBS2 for the 5-prime exon and IBS3/EBS3 for the 3-prime exon. PMID:20463000 A non-translated 93 nt antisense RNA that binds its target ompF mRNA and regulates ompF expression by inhibiting translation and inducing degradation of the message. http://en.wikipedia.org/wiki/MicF_RNA MicF RNA sequence SO:0000383 MicF_RNA A non-translated 93 nt antisense RNA that binds its target ompF mRNA and regulates ompF expression by inhibiting translation and inducing degradation of the message. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00033 http://en.wikipedia.org/wiki/MicF_RNA wiki A small untranslated RNA which is induced in response to oxidative stress in Escherichia coli. Acts as a global regulator to activate or repress the expression of as many as 40 genes, including the fhlA-encoded transcriptional activator and the rpoS-encoded sigma(s) subunit of RNA polymerase. OxyS is bound by the Hfq protein, that increases the OxyS RNA interaction with its target messages. http://en.wikipedia.org/wiki/OxyS_RNA OxyS RNA sequence SO:0000384 OxyS_RNA A small untranslated RNA which is induced in response to oxidative stress in Escherichia coli. Acts as a global regulator to activate or repress the expression of as many as 40 genes, including the fhlA-encoded transcriptional activator and the rpoS-encoded sigma(s) subunit of RNA polymerase. OxyS is bound by the Hfq protein, that increases the OxyS RNA interaction with its target messages. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00035 http://en.wikipedia.org/wiki/OxyS_RNA wiki The RNA molecule essential for the catalytic activity of RNase MRP, an enzymatically active ribonucleoprotein with two distinct roles in eukaryotes. In mitochondria it plays a direct role in the initiation of mitochondrial DNA replication. In the nucleus it is involved in precursor rRNA processing, where it cleaves the internal transcribed spacer 1 between 18S and 5.8S rRNAs. INSDC_feature:ncRNA INSDC_qualifier:RNase_MRP_RNA RNase MRP RNA sequence SO:0000385 Moved under enzymatic_RNA on 18 Nov 2021. See GitHub Issue #533. RNase_MRP_RNA The RNA molecule essential for the catalytic activity of RNase MRP, an enzymatically active ribonucleoprotein with two distinct roles in eukaryotes. In mitochondria it plays a direct role in the initiation of mitochondrial DNA replication. In the nucleus it is involved in precursor rRNA processing, where it cleaves the internal transcribed spacer 1 between 18S and 5.8S rRNAs. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00030 The RNA component of Ribonuclease P (RNase P), a ubiquitous endoribonuclease, found in archaea, bacteria and eukarya as well as chloroplasts and mitochondria. Its best characterized activity is the generation of mature 5 prime ends of tRNAs by cleaving the 5 prime leader elements of precursor-tRNAs. Cellular RNase Ps are ribonucleoproteins. RNA from bacterial RNase Ps retains its catalytic activity in the absence of the protein subunit, i.e. it is a ribozyme. Isolated eukaryotic and archaeal RNase P RNA has not been shown to retain its catalytic function, but is still essential for the catalytic activity of the holoenzyme. Although the archaeal and eukaryotic holoenzymes have a much greater protein content than the bacterial ones, the RNA cores from all the three lineages are homologous. Helices corresponding to P1, P2, P3, P4, and P10/11 are common to all cellular RNase P RNAs. Yet, there is considerable sequence variation, particularly among the eukaryotic RNAs. INSDC_feature:ncRNA INSDC_qualifier:RNase_P_RNA RNase P RNA sequence SO:0000386 Moved under enzymatic_RNA on 18 Nov 2021. See GitHub Issue #533. RNase_P_RNA The RNA component of Ribonuclease P (RNase P), a ubiquitous endoribonuclease, found in archaea, bacteria and eukarya as well as chloroplasts and mitochondria. Its best characterized activity is the generation of mature 5 prime ends of tRNAs by cleaving the 5 prime leader elements of precursor-tRNAs. Cellular RNase Ps are ribonucleoproteins. RNA from bacterial RNase Ps retains its catalytic activity in the absence of the protein subunit, i.e. it is a ribozyme. Isolated eukaryotic and archaeal RNase P RNA has not been shown to retain its catalytic function, but is still essential for the catalytic activity of the holoenzyme. Although the archaeal and eukaryotic holoenzymes have a much greater protein content than the bacterial ones, the RNA cores from all the three lineages are homologous. Helices corresponding to P1, P2, P3, P4, and P10/11 are common to all cellular RNase P RNAs. Yet, there is considerable sequence variation, particularly among the eukaryotic RNAs. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00010 Translational regulation of the stationary phase sigma factor RpoS is mediated by the formation of a double-stranded RNA stem-loop structure in the upstream region of the rpoS messenger RNA, occluding the translation initiation site. Clones carrying rprA (RpoS regulator RNA) increased the translation of RpoS. The rprA gene encodes a 106 nucleotide regulatory RNA. As with DsrA Rfam:RF00014, RprA is predicted to form three stem-loops. Thus, at least two small RNAs, DsrA and RprA, participate in the positive regulation of RpoS translation. Unlike DsrA, RprA does not have an extensive region of complementarity to the RpoS leader, leaving its mechanism of action unclear. RprA is non-essential. http://en.wikipedia.org/wiki/RprA_RNA RprA RNA sequence SO:0000387 RprA_RNA Translational regulation of the stationary phase sigma factor RpoS is mediated by the formation of a double-stranded RNA stem-loop structure in the upstream region of the rpoS messenger RNA, occluding the translation initiation site. Clones carrying rprA (RpoS regulator RNA) increased the translation of RpoS. The rprA gene encodes a 106 nucleotide regulatory RNA. As with DsrA Rfam:RF00014, RprA is predicted to form three stem-loops. Thus, at least two small RNAs, DsrA and RprA, participate in the positive regulation of RpoS translation. Unlike DsrA, RprA does not have an extensive region of complementarity to the RpoS leader, leaving its mechanism of action unclear. RprA is non-essential. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00034 http://en.wikipedia.org/wiki/RprA_RNA wiki The Rev response element (RRE) is encoded within the HIV-env gene. Rev is an essential regulatory protein of HIV that binds an internal loop of the RRE leading, encouraging further Rev-RRE binding. This RNP complex is critical for mRNA export and hence for expression of the HIV structural proteins. RRE RNA sequence SO:0000388 RRE_RNA The Rev response element (RRE) is encoded within the HIV-env gene. Rev is an essential regulatory protein of HIV that binds an internal loop of the RRE leading, encouraging further Rev-RRE binding. This RNP complex is critical for mRNA export and hence for expression of the HIV structural proteins. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00036 A 109-nucleotide RNA of E. coli that seems to have a regulatory role on the galactose operon. Changes in Spot 42 levels are implicated in affecting DNA polymerase I levels. http://en.wikipedia.org/wiki/Spot_42_RNA spot-42 RNA sequence SO:0000389 spot_42_RNA A 109-nucleotide RNA of E. coli that seems to have a regulatory role on the galactose operon. Changes in Spot 42 levels are implicated in affecting DNA polymerase I levels. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00021 http://en.wikipedia.org/wiki/Spot_42_RNA wiki The RNA component of telomerase, a reverse transcriptase that synthesizes telomeric DNA. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Telomerase_RNA INSDC_qualifier:telomerase_RNA telomerase RNA sequence SO:0000390 telomerase_RNA The RNA component of telomerase, a reverse transcriptase that synthesizes telomeric DNA. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00025 http://en.wikipedia.org/wiki/Telomerase_RNA wiki U1 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Its 5' end forms complementary base pairs with the 5' splice junction, thus defining the 5' donor site of an intron. There are significant differences in sequence and secondary structure between metazoan and yeast U1 snRNAs, the latter being much longer (568 nucleotides as compared to 164 nucleotides in human). Nevertheless, secondary structure predictions suggest that all U1 snRNAs share a 'common core' consisting of helices I, II, the proximal region of III, and IV. http://en.wikipedia.org/wiki/U1_snRNA U1 small nuclear RNA U1 snRNA small nuclear RNA U1 snRNA U1 sequence SO:0000391 U1_snRNA U1 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Its 5' end forms complementary base pairs with the 5' splice junction, thus defining the 5' donor site of an intron. There are significant differences in sequence and secondary structure between metazoan and yeast U1 snRNAs, the latter being much longer (568 nucleotides as compared to 164 nucleotides in human). Nevertheless, secondary structure predictions suggest that all U1 snRNAs share a 'common core' consisting of helices I, II, the proximal region of III, and IV. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00003 http://en.wikipedia.org/wiki/U1_snRNA wiki U1 small nuclear RNA RSC:cb small nuclear RNA U1 RSC:cb snRNA U1 RSC:cb U2 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Complementary binding between U2 snRNA (in an area lying towards the 5' end but 3' to hairpin I) and the branchpoint sequence (BPS) of the intron results in the bulging out of an unpaired adenine, on the BPS, which initiates a nucleophilic attack at the intronic 5' splice site, thus starting the first of two transesterification reactions that mediate splicing. http://en.wikipedia.org/wiki/U2_snRNA U2 small nuclear RNA U2 snRNA small nuclear RNA U2 snRNA U2 sequence SO:0000392 U2_snRNA U2 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Complementary binding between U2 snRNA (in an area lying towards the 5' end but 3' to hairpin I) and the branchpoint sequence (BPS) of the intron results in the bulging out of an unpaired adenine, on the BPS, which initiates a nucleophilic attack at the intronic 5' splice site, thus starting the first of two transesterification reactions that mediate splicing. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00004 http://en.wikipedia.org/wiki/U2_snRNA wiki U2 small nuclear RNA RSC:CB small nuclear RNA U2 RSC:CB snRNA U2 RSC:CB U4 small nuclear RNA (U4 snRNA) is a component of the major U2-dependent spliceosome. It forms a duplex with U6, and with each splicing round, it is displaced from U6 (and the spliceosome) in an ATP-dependent manner, allowing U6 to refold and create the active site for splicing catalysis. A recycling process involving protein Prp24 re-anneals U4 and U6. http://en.wikipedia.org/wiki/U4_snRNA U4 small nuclear RNA U4 snRNA small nuclear RNA U4 snRNA U4 sequence SO:0000393 U4_snRNA U4 small nuclear RNA (U4 snRNA) is a component of the major U2-dependent spliceosome. It forms a duplex with U6, and with each splicing round, it is displaced from U6 (and the spliceosome) in an ATP-dependent manner, allowing U6 to refold and create the active site for splicing catalysis. A recycling process involving protein Prp24 re-anneals U4 and U6. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00015 http://en.wikipedia.org/wiki/U4_snRNA wiki U4 small nuclear RNA RSC:cb small nuclear RNA U4 RSC:cb snRNA U4 RSC:cb An snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U6atac_snRNA (SO:0000397). U4atac small nuclear RNA U4atac snRNA small nuclear RNA U4atac snRNA U4atac sequence SO:0000394 U4atac_snRNA An snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U6atac_snRNA (SO:0000397). PMID:12409455 U4atac small nuclear RNA RSC:cb small nuclear RNA U4atac RSC:cb snRNA U4atac RSC:cb U5 RNA is a component of both types of known spliceosome. The precise function of this molecule is unknown, though it is known that the 5' loop is required for splice site selection and p220 binding, and that both the 3' stem-loop and the Sm site are important for Sm protein binding and cap methylation. http://en.wikipedia.org/wiki/U5_snRNA U5 small nuclear RNA U5 snRNA small nuclear RNA U5 snRNA U5 sequence SO:0000395 U5_snRNA U5 RNA is a component of both types of known spliceosome. The precise function of this molecule is unknown, though it is known that the 5' loop is required for splice site selection and p220 binding, and that both the 3' stem-loop and the Sm site are important for Sm protein binding and cap methylation. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00020 http://en.wikipedia.org/wiki/U5_snRNA wiki U5 small nuclear RNA RSC:cb small nuclear RNA U5 RSC:cb snRNA U5 RSC:cb U6 snRNA is a component of the spliceosome which is involved in splicing pre-mRNA. The putative secondary structure consensus base pairing is confined to a short 5' stem loop, but U6 snRNA is thought to form extensive base-pair interactions with U4 snRNA. http://en.wikipedia.org/wiki/U6_snRNA U6 small nuclear RNA U6 snRNA small nuclear RNA U6 snRNA U6 sequence SO:0000396 U6_snRNA U6 snRNA is a component of the spliceosome which is involved in splicing pre-mRNA. The putative secondary structure consensus base pairing is confined to a short 5' stem loop, but U6 snRNA is thought to form extensive base-pair interactions with U4 snRNA. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00015 http://en.wikipedia.org/wiki/U6_snRNA wiki U6 small nuclear RNA RSC:cb small nuclear RNA U6 RSC:cb snRNA U6 RSC:cb U6atac_snRNA is an snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U4atac_snRNA (SO:0000394). U6atac small nuclear RNA U6atac snRNA snRNA U6atac sequence SO:0000397 U6atac_snRNA U6atac_snRNA is an snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U4atac_snRNA (SO:0000394). http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=12409455&dopt=Abstract U6atac small nuclear RNA RSC:cb U6atac snRNA RSC:cb snRNA U6atac RSC:cb U11 snRNA plays a role in splicing of the minor U12-dependent class of eukaryotic nuclear introns, similar to U1 snRNA in the major class spliceosome it base pairs to the conserved 5' splice site sequence. http://en.wikipedia.org/wiki/U11_snRNA U11 small nuclear RNA U11 snRNA small nuclear RNA U11 snRNA U11 sequence SO:0000398 U11_snRNA U11 snRNA plays a role in splicing of the minor U12-dependent class of eukaryotic nuclear introns, similar to U1 snRNA in the major class spliceosome it base pairs to the conserved 5' splice site sequence. PMID:9622129 http://en.wikipedia.org/wiki/U11_snRNA wiki U11 small nuclear RNA RSC:cb small nuclear RNA U11 RSC:cb snRNA U11 RSC:cb The U12 small nuclear (snRNA), together with U4atac/U6atac, U5, and U11 snRNAs and associated proteins, forms a spliceosome that cleaves a divergent class of low-abundance pre-mRNA introns. http://en.wikipedia.org/wiki/U12_snRNA U12 small nuclear RNA U12 snRNA small nuclear RNA U12 snRNA U12 sequence SO:0000399 U12_snRNA The U12 small nuclear (snRNA), together with U4atac/U6atac, U5, and U11 snRNAs and associated proteins, forms a spliceosome that cleaves a divergent class of low-abundance pre-mRNA introns. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00007 http://en.wikipedia.org/wiki/U12_snRNA wiki U12 small nuclear RNA RSC:cb small nuclear RNA U12 RSC:cb snRNA U12 RSC:cb An attribute describes a quality of sequence. sequence attribute sequence SO:0000400 sequence_attribute An attribute describes a quality of sequence. SO:ke An attribute describing a gene. gene attribute sequence SO:0000401 gene_attribute sequence SO:0000402 enhancer_attribute true U14 small nucleolar RNA (U14 snoRNA) is required for early cleavages of eukaryotic precursor rRNAs. In yeasts, this molecule possess a stem-loop region (known as the Y-domain) which is essential for function. A similar structure, but with a different consensus sequence, is found in plants, but is absent in vertebrates. SO:0005839 U14 small nucleolar RNA U14 snoRNA small nucleolar RNA U14 snoRNA U14 sequence SO:0000403 An evolutionarily conserved eukaryotic low molecular weight RNA capable of intermolecular hybridization with both homologous and heterologous 18S rRNA. U14_snoRNA U14 small nucleolar RNA (U14 snoRNA) is required for early cleavages of eukaryotic precursor rRNAs. In yeasts, this molecule possess a stem-loop region (known as the Y-domain) which is essential for function. A similar structure, but with a different consensus sequence, is found in plants, but is absent in vertebrates. PMID:2551119 http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00016 A family of RNAs are found as part of the enigmatic vault ribonucleoprotein complex. The complex consists of a major vault protein (MVP), two minor vault proteins (VPARP and TEP1), and several small untranslated RNA molecules. It has been suggested that the vault complex is involved in drug resistance. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Vault_RNA INSDC_qualifier:vault_RNA vault RNA sequence SO:0000404 vault_RNA A family of RNAs are found as part of the enigmatic vault ribonucleoprotein complex. The complex consists of a major vault protein (MVP), two minor vault proteins (VPARP and TEP1), and several small untranslated RNA molecules. It has been suggested that the vault complex is involved in drug resistance. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00006 http://en.wikipedia.org/wiki/Vault_RNA wiki Y RNAs are components of the Ro ribonucleoprotein particle (Ro RNP), in association with Ro60 and La proteins. The Y RNAs and Ro60 and La proteins are well conserved, but the function of the Ro RNP is not known. In humans the RNA component can be one of four small RNAs: hY1, hY3, hY4 and hY5. These small RNAs are predicted to fold into a conserved secondary structure containing three stem structures. The largest of the four, hY1, contains an additional hairpin. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Y_RNA INSDC_qualifier:Y_RNA Y RNA sequence SO:0000405 Y_RNA Y RNAs are components of the Ro ribonucleoprotein particle (Ro RNP), in association with Ro60 and La proteins. The Y RNAs and Ro60 and La proteins are well conserved, but the function of the Ro RNP is not known. In humans the RNA component can be one of four small RNAs: hY1, hY3, hY4 and hY5. These small RNAs are predicted to fold into a conserved secondary structure containing three stem structures. The largest of the four, hY1, contains an additional hairpin. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00019 http://en.wikipedia.org/wiki/Y_RNA wiki An intron within an intron. Twintrons are group II or III introns, into which another group II or III intron has been transposed. http://en.wikipedia.org/wiki/Twintron sequence SO:0000406 twintron An intron within an intron. Twintrons are group II or III introns, into which another group II or III intron has been transposed. PMID:1899376 PMID:7823908 http://en.wikipedia.org/wiki/Twintron wiki Cytosolic 18S rRNA is an RNA component of the small subunit of cytosolic ribosomes in eukaryotes. http://en.wikipedia.org/wiki/18S_ribosomal_RNA cytosolic 18S rRNA cytosolic 18S ribosomal RNA cytosolic rRNA 18S sequence SO:0000407 Renamed to cytosolic_18S_rRNA from rRNA_18S on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493. cytosolic_18S_rRNA Cytosolic 18S rRNA is an RNA component of the small subunit of cytosolic ribosomes in eukaryotes. SO:ke http://en.wikipedia.org/wiki/18S_ribosomal_RNA wiki The interbase position where something (eg an aberration) occurred. sequence SO:0000408 site true The interbase position where something (eg an aberration) occurred. SO:ke A biological_region of sequence that, in the molecule, interacts selectively and non-covalently with other molecules. A region on the surface of a molecule that may interact with another molecule. When applied to polypeptides: Amino acids involved in binding or interactions. It can also apply to an amino acid bond which is represented by the positions of the two flanking amino acids. BS:00033 http://en.wikipedia.org/wiki/Binding_site INSDC_feature:misc_binding binding site binding_or_interaction_site sequence site SO:0000409 See GO:0005488 : binding. binding_site A biological_region of sequence that, in the molecule, interacts selectively and non-covalently with other molecules. A region on the surface of a molecule that may interact with another molecule. When applied to polypeptides: Amino acids involved in binding or interactions. It can also apply to an amino acid bond which is represented by the positions of the two flanking amino acids. EBIBS:GAR SO:ke http://en.wikipedia.org/wiki/Binding_site wiki A binding site that, in the molecule, interacts selectively and non-covalently with polypeptide molecules. INSDC_feature:protein_bind protein binding site sequence SO:0000410 See GO:0042277 : peptide binding. protein_binding_site A binding site that, in the molecule, interacts selectively and non-covalently with polypeptide molecules. SO:ke A region that rescues. rescue fragment rescue region sequence rescue segment SO:0000411 rescue_region A region that rescues. SO:xp A region of polynucleotide sequence produced by digestion with a restriction endonuclease. http://en.wikipedia.org/wiki/Restriction_fragment restriction fragment sequence SO:0000412 restriction_fragment A region of polynucleotide sequence produced by digestion with a restriction endonuclease. SO:ke http://en.wikipedia.org/wiki/Restriction_fragment wiki A region where the sequence differs from that of a specified sequence. INSDC_feature:misc_difference sequence difference sequence SO:0000413 sequence_difference A region where the sequence differs from that of a specified sequence. SO:ke An attribute to describe a feature that is invalidated due to genomic contamination. invalidated by genomic contamination sequence SO:0000414 invalidated_by_genomic_contamination An attribute to describe a feature that is invalidated due to genomic contamination. SO:ke An attribute to describe a feature that is invalidated due to polyA priming. invalidated by genomic polyA primed cDNA sequence SO:0000415 invalidated_by_genomic_polyA_primed_cDNA An attribute to describe a feature that is invalidated due to polyA priming. SO:ke An attribute to describe a feature that is invalidated due to partial processing. invalidated by partial processing sequence SO:0000416 invalidated_by_partial_processing An attribute to describe a feature that is invalidated due to partial processing. SO:ke A structurally or functionally defined protein region. In proteins with multiple domains, the combination of the domains determines the function of the protein. A region which has been shown to recur throughout evolution. BS:00012 BS:00134 SO:0001069 domain structural domain polypeptide domain polypeptide_structural_domain sequence SO:0000417 Range. Old definition from before biosapiens: A region of a single polypeptide chain that folds into an independent unit and exhibits biological activity. A polypeptide chain may have multiple domains. polypeptide_domain A structurally or functionally defined protein region. In proteins with multiple domains, the combination of the domains determines the function of the protein. A region which has been shown to recur throughout evolution. EBIBS:GAR domain uniprot:feature_type structural domain polypeptide_structural_domain The signal_peptide is a short region of the peptide located at the N-terminus that directs the protein to be secreted or part of membrane components. BS:00159 http://en.wikipedia.org/wiki/Signal_peptide INSDC_feature:sig_peptide signal peptide signal peptide coding sequence sequence signal SO:0000418 Old def before biosapiens:The sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane leader sequence. signal_peptide The signal_peptide is a short region of the peptide located at the N-terminus that directs the protein to be secreted or part of membrane components. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Signal_peptide wiki signal uniprot:feature_type The polypeptide sequence that remains when the cleaved peptide regions have been cleaved from the immature peptide. BS:00149 INSDC_feature:mat_peptide mature protein region sequence chain mature peptide SO:0000419 This term mature peptide, merged with the biosapiens term mature protein region and took that to be the new name. Old def: The coding sequence for the mature or final peptide or protein product following post-translational modification. mature_protein_region The polypeptide sequence that remains when the cleaved peptide regions have been cleaved from the immature peptide. EBIBS:GAR SO:cb http://www.insdc.org/files/feature_table.html chain uniprot:feature_type An inverted repeat (SO:0000294) occurring at the 5-prime termini of a DNA transposon. 5' TIR five prime terminal inverted repeat sequence SO:0000420 five_prime_terminal_inverted_repeat An inverted repeat (SO:0000294) occurring at the 3-prime termini of a DNA transposon. 3' TIR three prime terminal inverted repeat sequence SO:0000421 three_prime_terminal_inverted_repeat The U5 segment of the long terminal repeats. U5 LTR region U5 long terminal repeat region sequence SO:0000422 U5_LTR_region The R segment of the long terminal repeats. R LTR region R long terminal repeat region sequence SO:0000423 R_LTR_region The U3 segment of the long terminal repeats. U3 LTR region U3 long terminal repeat region sequence SO:0000424 U3_LTR_region The long terminal repeat found at the five-prime end of the sequence to be inserted into the host genome. 5' LTR 5' long terminal repeat five prime LTR sequence SO:0000425 five_prime_LTR The long terminal repeat found at the three-prime end of the sequence to be inserted into the host genome. 3' LTR 3' long terminal repeat three prime LTR sequence SO:0000426 three_prime_LTR The R segment of the three-prime long terminal repeat. R 5' long term repeat region R five prime LTR region sequence SO:0000427 R_five_prime_LTR_region The U5 segment of the three-prime long terminal repeat. U5 5' long terminal repeat region U5 five prime LTR region sequence SO:0000428 U5_five_prime_LTR_region The U3 segment of the three-prime long terminal repeat. U3 5' long term repeat region U3 five prime LTR region sequence SO:0000429 U3_five_prime_LTR_region The R segment of the three-prime long terminal repeat. R 3' long terminal repeat region R three prime LTR region sequence SO:0000430 R_three_prime_LTR_region The U3 segment of the three-prime long terminal repeat. U3 3' long terminal repeat region U3 three prime LTR region sequence SO:0000431 U3_three_prime_LTR_region The U5 segment of the three-prime long terminal repeat. U5 3' long terminal repeat region U5 three prime LTR region sequence SO:0000432 U5_three_prime_LTR_region A polymeric tract, such as poly(dA), within a non_LTR_retrotransposon. INSDC_feature:repeat_region INSDC_qualifier:non_ltr_retrotransposon_polymeric_tract non LTR retrotransposon polymeric tract sequence SO:0000433 non_LTR_retrotransposon_polymeric_tract A polymeric tract, such as poly(dA), within a non_LTR_retrotransposon. SO:ke A sequence of the target DNA that is duplicated when a transposable element or phage inserts; usually found at each end the insertion. target site duplication sequence SO:0000434 target_site_duplication A sequence of the target DNA that is duplicated when a transposable element or phage inserts; usually found at each end the insertion. http://www.koko.gov.my/CocoaBioTech/Glossaryt.html A polypurine tract within an LTR_retrotransposon. RR tract sequence LTR retrotransposon poly purine tract SO:0000435 RR_tract A polypurine tract within an LTR_retrotransposon. SO:ke A sequence that can autonomously replicate, as a plasmid, when transformed into a bacterial host. autonomously replicating sequence sequence SO:0000436 ARS A sequence that can autonomously replicate, as a plasmid, when transformed into a bacterial host. SO:ma sequence SO:0000437 assortment_derived_duplication true sequence SO:0000438 gene_not_polyadenylated true A ring chromosome is a chromosome whose arms have fused together to form a ring in an inverted fashion, often with the loss of the ends of the chromosome. inverted ring chromosome sequence SO:0000439 inverted_ring_chromosome A replicon that has been modified to act as a vector for foreign sequence. http://en.wikipedia.org/wiki/Vector_(molecular_biology) vector vector replicon sequence SO:0000440 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. vector_replicon A replicon that has been modified to act as a vector for foreign sequence. SO:ma http://en.wikipedia.org/wiki/Vector_(molecular_biology) wiki A single stranded oligonucleotide. single strand oligo single strand oligonucleotide single stranded oligonucleotide ss oligo ss oligonucleotide sequence SO:0000441 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. ss_oligo A single stranded oligonucleotide. SO:ke A double stranded oligonucleotide. double stranded oligonucleotide ds oligo ds-oligonucleotide sequence SO:0000442 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. ds_oligo A double stranded oligonucleotide. SO:ke An attribute to describe the kind of biological sequence. polymer attribute sequence SO:0000443 polymer_attribute An attribute to describe the kind of biological sequence. SO:ke Non-coding exon in the 3' UTR. three prime noncoding exon sequence SO:0000444 three_prime_noncoding_exon Non-coding exon in the 3' UTR. SO:ke Non-coding exon in the 5' UTR. 5' nc exon 5' non coding exon five prime noncoding exon sequence SO:0000445 five_prime_noncoding_exon Non-coding exon in the 5' UTR. SO:ke Intron located in the untranslated region. UTR intron sequence SO:0000446 UTR_intron Intron located in the untranslated region. SO:ke An intron located in the 5' UTR. five prime UTR intron sequence SO:0000447 five_prime_UTR_intron An intron located in the 5' UTR. SO:ke An intron located in the 3' UTR. three prime UTR intron sequence SO:0000448 three_prime_UTR_intron An intron located in the 3' UTR. SO:ke A sequence of nucleotides or amino acids which, by design, has a "random" order of components, given a predetermined input frequency of these components. random sequence sequence SO:0000449 random_sequence A sequence of nucleotides or amino acids which, by design, has a "random" order of components, given a predetermined input frequency of these components. SO:ma A light region between two darkly staining bands in a polytene chromosome. sequence chromosome interband SO:0000450 interband A light region between two darkly staining bands in a polytene chromosome. SO:ma A gene that encodes a polyadenylated mRNA. gene with polyadenylated mRNA sequence SO:0000451 gene_with_polyadenylated_mRNA A gene that encodes a polyadenylated mRNA. SO:xp sequence SO:0000452 transgene_attribute true A chromosome structure variant whereby a region of a chromosome has been transferred to another position. Among interchromosomal rearrangements, the term transposition is reserved for that class in which the telomeres of the chromosomes involved are coupled (that is to say, form the two ends of a single DNA molecule) as in wild-type. chromosomal transposition transposition sequence SO:0000453 chromosomal_transposition A chromosome structure variant whereby a region of a chromosome has been transferred to another position. Among interchromosomal rearrangements, the term transposition is reserved for that class in which the telomeres of the chromosomes involved are coupled (that is to say, form the two ends of a single DNA molecule) as in wild-type. FB:reference_manual SO:ke A 17-28-nt, small interfering RNA derived from transcripts of repetitive elements. INSDC_feature:ncRNA INSDC_qualifier:rasiRNA repeat associated small interfering RNA sequence SO:0000454 Changed parent term from ncRNA (SO:0000655) to piRNA (SO:0001035). See GitHub Issue #573. rasiRNA A 17-28-nt, small interfering RNA derived from transcripts of repetitive elements. PMID:18032451 http://www.developmentalcell.com/content/article/abstract?uid=PIIS1534580703002284 A gene that encodes an mRNA with a frameshift. gene with mRNA with frameshift sequence SO:0000455 gene_with_mRNA_with_frameshift A gene that encodes an mRNA with a frameshift. SO:xp A gene that is recombinationally rearranged. recombinationally rearranged gene sequence SO:0000456 recombinationally_rearranged_gene A gene that is recombinationally rearranged. SO:ke A chromosome duplication involving an insertion from another chromosome. interchromosomal duplication sequence SO:0000457 interchromosomal_duplication A chromosome duplication involving an insertion from another chromosome. SO:ke Germline genomic DNA including D-region with 5' UTR and 3' UTR, also designated as D-segment. D gene D-GENE INSDC_feature:D_segment sequence SO:0000458 D_gene_segment Germline genomic DNA including D-region with 5' UTR and 3' UTR, also designated as D-segment. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# A gene with a transcript that is trans-spliced. gene with trans spliced transcript sequence SO:0000459 gene_with_trans_spliced_transcript A gene with a transcript that is trans-spliced. SO:xp Germline genomic DNA with the sequence for a V, D, C, or J portion of an immunoglobulin/T-cell receptor. vertebrate immunoglobulin T cell receptor segment vertebrate_immunoglobulin/T-cell receptor gene sequence SO:0000460 I am using the term segment instead of gene here to avoid confusion with the region 'gene'. vertebrate_immunoglobulin_T_cell_receptor_segment A chromosomal deletion whereby a chromosome generated by recombination between two inversions; has a deficiency at each end of the inversion. inversion derived bipartite deficiency sequence SO:0000461 inversion_derived_bipartite_deficiency A chromosomal deletion whereby a chromosome generated by recombination between two inversions; has a deficiency at each end of the inversion. FB:km A non-functional descendant of a functional entity. pseudogenic region sequence SO:0000462 pseudogenic_region A non-functional descendant of a functional entity. SO:cjm A gene that encodes more than one transcript. encodes alternately spliced transcripts sequence SO:0000463 encodes_alternately_spliced_transcripts A gene that encodes more than one transcript. SO:ke A non-functional descendant of an exon. decayed exon sequence SO:0000464 Does not have to be part of a pseudogene. decayed_exon A non-functional descendant of an exon. SO:ke A chromosome deletion whereby a chromosome is generated by recombination between two inversions; there is a deficiency at one end of the inversion and a duplication at the other end of the inversion. inversion derived deficiency plus duplication sequence SO:0000465 inversion_derived_deficiency_plus_duplication A chromosome deletion whereby a chromosome is generated by recombination between two inversions; there is a deficiency at one end of the inversion and a duplication at the other end of the inversion. FB:km Germline genomic DNA including L-part1, V-intron and V-exon, with the 5' UTR and 3' UTR. INSDC_feature:V_segment V gene V gene segment V-GENE variable_gene sequence SO:0000466 V_gene_segment Germline genomic DNA including L-part1, V-intron and V-exon, with the 5' UTR and 3' UTR. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# An attribute describing a gene sequence where the resulting protein is regulated by the stability of the resulting protein. post translationally regulated by protein stability post-translationally regulated by protein stability sequence SO:0000467 post_translationally_regulated_by_protein_stability An attribute describing a gene sequence where the resulting protein is regulated by the stability of the resulting protein. SO:ke One of the pieces of sequence that make up a golden path. golden path fragment sequence SO:0000468 golden_path_fragment One of the pieces of sequence that make up a golden path. SO:rd An attribute describing a gene sequence where the resulting protein is modified to regulate it. post translationally regulated by protein modification post-translationally regulated by protein modification sequence SO:0000469 post_translationally_regulated_by_protein_modification An attribute describing a gene sequence where the resulting protein is modified to regulate it. SO:ke Germline genomic DNA of an immunoglobulin/T-cell receptor gene including J-region with 5' UTR (SO:0000204) and 3' UTR (SO:0000205), also designated as J-segment. INSDC_feature:J_segment J gene J-GENE sequence SO:0000470 J_gene_segment Germline genomic DNA of an immunoglobulin/T-cell receptor gene including J-region with 5' UTR (SO:0000204) and 3' UTR (SO:0000205), also designated as J-segment. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# The gene product is involved in its own transcriptional regulation. sequence SO:0000471 autoregulated The gene product is involved in its own transcriptional regulation. SO:ke A set of regions which overlap with minimal polymorphism to form a linear sequence. tiling path sequence SO:0000472 tiling_path A set of regions which overlap with minimal polymorphism to form a linear sequence. SO:cjm The gene product is involved in its own transcriptional regulation where it decreases transcription. negatively autoregulated sequence SO:0000473 negatively_autoregulated The gene product is involved in its own transcriptional regulation where it decreases transcription. SO:ke A piece of sequence that makes up a tiling_path (SO:0000472). tiling path fragment sequence SO:0000474 tiling_path_fragment A piece of sequence that makes up a tiling_path (SO:0000472). SO:ke The gene product is involved in its own transcriptional regulation, where it increases transcription. positively autoregulated sequence SO:0000475 positively_autoregulated The gene product is involved in its own transcriptional regulation, where it increases transcription. SO:ke A DNA sequencer read which is part of a contig. contig read sequence SO:0000476 contig_read A DNA sequencer read which is part of a contig. SO:ke A gene that is polycistronic. sequence SO:0000477 polycistronic_gene true A gene that is polycistronic. SO:ke Genomic DNA of immunoglobulin/T-cell receptor gene including C-region (and introns if present) with 5' UTR (SO:0000204) and 3' UTR (SO:0000205). C gene C_GENE INSDC_feature:C_region constant gene sequence SO:0000478 C_gene_segment Genomic DNA of immunoglobulin/T-cell receptor gene including C-region (and introns if present) with 5' UTR (SO:0000204) and 3' UTR (SO:0000205). http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# A transcript that is trans-spliced. INSDC_feature:tRNA INSDC_qualifier:trans_splicing trans spliced transcript trans-spliced transcript sequence SO:0000479 trans_spliced_transcript A transcript that is trans-spliced. SO:xp A clone which is part of a tiling path. A tiling path is a set of sequencing substrates, typically clones, which have been selected in order to efficiently cover a region of the genome in preparation for sequencing and assembly. tiling path clone sequence SO:0000480 tiling_path_clone A clone which is part of a tiling path. A tiling path is a set of sequencing substrates, typically clones, which have been selected in order to efficiently cover a region of the genome in preparation for sequencing and assembly. SO:ke An inverted repeat (SO:0000294) occurring at the termini of a DNA transposon. TIR terminal inverted repeat sequence SO:0000481 terminal_inverted_repeat An inverted repeat (SO:0000294) occurring at the termini of a DNA transposon. SO:ke Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration. vertebrate immunoglobulin T cell receptor gene cluster vertebrate_immunoglobulin/T-cell receptor gene cluster sequence SO:0000482 vertebrate_immunoglobulin_T_cell_receptor_gene_cluster A primary transcript that is never translated into a protein. nc primary transcript noncoding primary transcript sequence SO:0000483 nc_primary_transcript A primary transcript that is never translated into a protein. SO:ke The sequence of the 3' exon that is not coding. three prime coding exon noncoding region three_prime_exon_noncoding_region sequence SO:0000484 three_prime_coding_exon_noncoding_region The sequence of the 3' exon that is not coding. SO:ke Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one DJ-gene, and one J-gene. (DJ)-J-CLUSTER DJ J cluster sequence SO:0000485 DJ_J_cluster Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one DJ-gene, and one J-gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# The sequence of the 5' exon preceding the start codon. five prime coding exon noncoding region five_prime_exon_noncoding_region sequence SO:0000486 five_prime_coding_exon_noncoding_region The sequence of the 5' exon preceding the start codon. SO:ke Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VDJ-gene, one J-gene and one C-gene. (VDJ)-J-C-CLUSTER VDJ J C cluster sequence SO:0000487 VDJ_J_C_cluster Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VDJ-gene, one J-gene and one C-gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VDJ-gene and one J-gene. (VDJ)-J-CLUSTER VDJ J cluster sequence SO:0000488 VDJ_J_cluster Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VDJ-gene and one J-gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene and one C-gene. VJ C cluster sequence (VJ)-C-CLUSTER SO:0000489 VJ_C_cluster Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene and one C-gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene, one J-gene and one C-gene. (VJ)-J-C-CLUSTER VJ J C cluster sequence SO:0000490 VJ_J_C_cluster Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene, one J-gene and one C-gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene and one J-gene. (VJ)-J-CLUSTER VJ J cluster sequence SO:0000491 VJ_J_cluster Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one VJ-gene and one J-gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# Recombination signal including D-heptamer, D-spacer and D-nonamer in 5' of D-region of a D-gene or D-sequence. D gene recombination feature sequence SO:0000492 D_gene_recombination_feature 7 nucleotide recombination site like CACAGTG, part of a 3' D-recombination signal sequence of an immunoglobulin/T-cell receptor gene. 3'D-HEPTAMER three prime D heptamer sequence SO:0000493 three_prime_D_heptamer 7 nucleotide recombination site like CACAGTG, part of a 3' D-recombination signal sequence of an immunoglobulin/T-cell receptor gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# A 9 nucleotide recombination site (e.g. ACAAAAACC), part of a 3' D-recombination signal sequence of an immunoglobulin/T-cell receptor gene. 3'D-NOMAMER three prime D nonamer sequence SO:0000494 three_prime_D_nonamer A 9 nucleotide recombination site (e.g. ACAAAAACC), part of a 3' D-recombination signal sequence of an immunoglobulin/T-cell receptor gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# A 12 or 23 nucleotide spacer between the 3'D-HEPTAMER and 3'D-NONAMER of a 3'D-RS. 3'D-SPACER three prime D spacer sequence SO:0000495 three_prime_D_spacer A 12 or 23 nucleotide spacer between the 3'D-HEPTAMER and 3'D-NONAMER of a 3'D-RS. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# 7 nucleotide recombination site (e.g. CACTGTG), part of a 5' D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene. 5'D-HEPTAMER five prime D heptamer sequence SO:0000496 five_prime_D_heptamer 7 nucleotide recombination site (e.g. CACTGTG), part of a 5' D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene. http://www.imgt.org/cgi-bin/IMGTlect.jv?query=7# 9 nucleotide recombination site (e.g. GGTTTTTGT), part of a five_prime_D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene. 5'D-NONAMER five prime D nonamer sequ