id: https://w3id.org/nmdc/nmdc name: NMDC title: NMDC Schema notes: - not importing any MIxS terms where the relationship between the name (SCN) and the id isn't 1:1 description: >- The NMDC Schema is a foundational framework designed to standardize metadata for the National Microbiome Data Collaborative (NMDC) and collaborating data providors. By establishing a structured approach to metadata, the NMDC Schema enables researchers to organize, share, and interpret complex datasets with consistency and clarity. The NMDC Schema is critical substrate used to facilitate interoperability and collaboration, as it provide a common language for data exchange across systems and disciplines. In the context of the NMDC, this schema supports the integration of microbiome data from medicine, agriculture, bioenergy, and environmental science into a cohesive platform. license: https://creativecommons.org/publicdomain/zero/1.0/ version: 0.0.0 imports: - annotation # also brings core and portal_* - workflow_execution_activity prefixes: BFO: "http://purl.obolibrary.org/obo/BFO_" CATH: "https://bioregistry.io/cath:" CHEBI: "http://purl.obolibrary.org/obo/CHEBI_" CHEMBL.COMPOUND: "https://bioregistry.io/chembl.compound:" # https://bioregistry.io/chembl.compound:CHEMBL465070 CHMO: "http://purl.obolibrary.org/obo/CHMO_" COG: "https://bioregistry.io/cog:" Contaminant: "http://example.org/contaminant/" # only because it is present in MongoDB DRUGBANK: "https://bioregistry.io/drugbank:" # https://bioregistry.io/drugbank:DB14938 EC: "https://bioregistry.io/eccode:" # https://bioregistry.io/eccode:1.1.1.1 EFO: "http://www.ebi.ac.uk/efo/" EGGNOG: "https://bioregistry.io/eggnog:" # https://bioregistry.io/eggnog:veNOG12876 ENVO: "http://purl.obolibrary.org/obo/ENVO_" FBcv: "http://purl.obolibrary.org/obo/FBcv_" # for Biosample only FMA: "http://purl.obolibrary.org/obo/FMA_" GENEPIO: "http://purl.obolibrary.org/obo/GENEPIO_" # for library_preparation_kit, contig identifier and count only GO: "http://purl.obolibrary.org/obo/GO_" HMDB: "https://bioregistry.io/hmdb:" # https://bioregistry.io/hmdb:HMDB00001 ISA: "http://example.org/isa/" KEGG.COMPOUND: "https://bioregistry.io/kegg.compound:" KEGG.MODULE: "https://bioregistry.io/kegg.module:" #https://bioregistry.io/kegg.module:M00002 KEGG.ORTHOLOGY: "https://bioregistry.io/kegg.orthology:" # https://github.com/prefixcommons/biocontext/blob/master/registry/idot_context.jsonld KEGG.REACTION: "https://bioregistry.io/kegg.reaction:" KEGG_PATHWAY: "https://bioregistry.io/kegg.pathway:" MASSIVE: "https://bioregistry.io/reference/massive:" MCO: "http://purl.obolibrary.org/obo/MICRO_" MESH: "https://bioregistry.io/mesh:" # https://bioregistry.io/mesh:C063233 MISO: "http://purl.obolibrary.org/obo/MISO_" MIXS: "https://w3id.org/mixs/" MS: "http://purl.obolibrary.org/obo/MS_" MetaCyc: "https://bioregistry.io/metacyc.compound:" MetaNetX: "http://example.org/metanetx/" NCBI: "http://example.org/ncbitaxon/" # temporary. see https://github.com/microbiomedata/issues/issues/893 NCBITaxon: "http://purl.obolibrary.org/obo/NCBITaxon_" NCIT: "http://purl.obolibrary.org/obo/NCIT_" # for Biosample, Study, StudyCategoryEnum PVs, doi_provider, funding_sources, extraction_targets, 'BAI File' only OBI: "http://purl.obolibrary.org/obo/OBI_" OMIT: "http://purl.obolibrary.org/obo/OMIT_" # for RNA subtypes only ORCID: "https://orcid.org/" PANTHER.FAMILY: "https://bioregistry.io/panther.family:" # https://bioregistry.io/panther.family:PTHR12345 PATO: "http://purl.obolibrary.org/obo/PATO_" PFAM.CLAN: "https://bioregistry.io/pfam.clan:" # https://bioregistry.io/pfam.clan:CL0192 PFAM: "https://bioregistry.io/pfam:" # https://bioregistry.io/pfam:PF11779 PO: "http://purl.obolibrary.org/obo/PO_" PR: "http://purl.obolibrary.org/obo/PR_" PUBCHEM.COMPOUND: "https://bioregistry.io/pubchem.compound:" RHEA: "https://bioregistry.io/rhea:" RO: "http://purl.obolibrary.org/obo/RO_" RetroRules: "http://example.org/retrorules/" SEED: "https://bioregistry.io/seed:" SIO: "http://semanticscience.org/resource/SIO_" # for Study, StudyCategoryEnum PVs, objective SO: "http://purl.obolibrary.org/obo/SO_" SUPFAM: "https://bioregistry.io/supfam:" # https://bioregistry.io/supfam:SSF57615 TIGRFAM: "https://bioregistry.io/tigrfam:" # https://bioregistry.io/tigrfam:TIGR00010 UBERON: "http://purl.obolibrary.org/obo/UBERON_" UO: "http://purl.obolibrary.org/obo/UO_" UniProtKB: "https://bioregistry.io/uniprot:" biolink: "https://w3id.org/biolink/vocab/" bioproject: "https://bioregistry.io/bioproject:" biosample: "https://bioregistry.io/biosample:" cas: "https://bioregistry.io/cas:" dcterms: "http://purl.org/dc/terms/" doi: "https://bioregistry.io/doi:" edam.data: "http://edamontology.org/data_" # for doi_value and DataCategoryEnum only edam.format: "http://edamontology.org/format_" # for 'Configuration toml' PV only only emsl.project: "https://bioregistry.io/emsl.project:" emsl: "http://example.org/emsl_in_mongodb/" emsl_uuid_like: "http://example.org/emsl_uuid_like/" generic: "http://example.org/generic/" gnps.task: "https://bioregistry.io/gnps.task:" gold: "https://bioregistry.io/gold:" gtpo: "http://example.org/gtpo/" igsn: "https://app.geosamples.org/sample/igsn/" img.taxon: "https://bioregistry.io/img.taxon:" insdc.sra: "https://bioregistry.io/insdc.sra:" jgi.analysis: "https://data.jgi.doe.gov/search?q=" jgi.proposal: 'https://bioregistry.io/jgi.proposal:' jgi: "http://example.org/jgi/" kegg: "https://bioregistry.io/kegg:" # https://bioregistry.io/kegg:hsa00190 linkml: "https://w3id.org/linkml/" mgnify.analysis: "https://bioregistry.io/mgnify.analysis:" # https://www.ebi.ac.uk/metagenomics/analyses/MGYA00002270 mgnify.proj: "https://bioregistry.io/mgnify.proj:" my_emsl: "https://release.my.emsl.pnnl.gov/released_data/" neon.identifier: "http://example.org/neon/identifier/" neon.schema: "http://example.org/neon/schema/" nmdc: "https://w3id.org/nmdc/" owl: "http://www.w3.org/2002/07/owl#" prov: "http://www.w3.org/ns/prov#" # for startedAtTime, endedAtTime, Association, Activity, wasInformedBy, wasGeneratedBy, qualifiedAssociation qud: "http://qudt.org/1.1/schema/qudt#" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" ror: 'https://bioregistry.io/ror:' # https://ror.org/ schema: "http://schema.org/" skos: "http://www.w3.org/2004/02/skos/core#" wgs84: "http://www.w3.org/2003/01/geo/wgs84_pos#" wikidata: "http://www.wikidata.org/entity/" xsd: "http://www.w3.org/2001/XMLSchema#" default_prefix: nmdc default_range: string emit_prefixes: - KEGG.ORTHOLOGY - MASSIVE - biosample - cas - doi - gnps.task - gold - img.taxon - jgi.proposal - kegg - rdf - rdfs - skos - xsd settings: # NMDC ID pattern components id_nmdc_prefix: "^(nmdc)" id_shoulder: "([0-9][a-z]{0,6}[0-9])" id_blade: "([A-Za-z0-9]{1,})" id_version: "(\\.[0-9]{1,})" id_locus: "(_[A-Za-z0-9_\\.-]+)?$" # MIxS structured_pattern settings for pattern interpolation. # These settings define regex patterns for placeholders like {scientific_float}, {text}, {URL} # used in MIxS slot structured_pattern definitions. # # IMPORTANT: These settings are defined here (not in mixs.yaml) because: # 1. The sheets-and-friends do_shuttle tool does not import settings from source schemas # 2. The MIxS import pipeline explicitly deletes settings (see project.Makefile) # When updating MIxS import, these settings must be manually synced if the source changes. # # Source: GSC MIxS v6.2.3 (commit 77cd78836ee01ee8ba2ad10724871a962e3ad694) add_recov_methods: 'Water Injection|Dump Flood|Gas Injection|Wag Immiscible Injection|Polymer Addition|Surfactant Addition|Not Applicable|other' agrochemical_name: ".*" amount: '[-+]?[0-9]*\.?[0-9]+' boolean: '(?:yes|no)' # a non-capturing group matching either 'yes' or 'no' country: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) date_time_stamp: '(\d{4})(-(0[1-9]|1[0-2])(-(0[1-9]|[12]\d|3[01])(T([01]\d|2[0-3]):([0-5]\d)(:([0-5]\d))?(\.\d+)?(Z|([+-][01]\d:[0-5]\d))?)?)?)?' # ISO 8601 timestamp; seconds optional; alias of 'timestamp' dna_bases: '[ACGT]' DOI: 'doi:10.\d{2,9}/.*' duration: P(?:(?:\d+D|\d+M(?:\d+D)?|\d+Y(?:\d+M(?:\d+D)?)?)(?:T(?:\d+H(?:\d+M(?:\d+S)?)?|\d+M(?:\d+S)?|\d+S))?|T(?:\d+H(?:\d+M(?:\d+S)?)?|\d+M(?:\d+S)?|\d+S)|\d+W) float: '[-+]?[0-9]*\.?[0-9]+' integer: '[1-9][0-9]*' lat: (-?((?:[0-8]?[0-9](?:\.\d{0,8})?)|90)) lon: -?[0-9]+(?:\.[0-9]{0,8})?$|^-?(1[0-7]{1,2}) name: '.*' NCBItaxon_id: NCBITaxon:\d+ parameters: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) particulate_matter_name: '.*' PMID: 'PMID:\d+' primer_adapter_codes: '[ACGTRYSWKMBDHVNI]' region: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) room_name: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) room_number: '[1-9][0-9]*' scientific_float: '[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?' software: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) specific_location: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) storage_condition_type: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) termID: '[a-zA-Z]{2,}:[a-zA-Z0-9]\d+' termLabel: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) text: '.*' timestamp: '(\d{4})(-(0[1-9]|1[0-2])(-(0[1-9]|[12]\d|3[01])(T([01]\d|2[0-3]):([0-5]\d)(:([0-5]\d))?(\.\d+)?(Z|([+-][01]\d:[0-5]\d))?)?)?)?' # ISO 8601 timestamp; seconds optional; alias of 'date_time_stamp' unit: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) URL: 'https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)' version: ([^\s-]{1,2}|[^\s-]+.+[^\s-]+) classes: EukEval: description: This class contains information pertaining to evaluating if a Metagenome-Assembled Genome (MAG) is eukaryotic. comments: - A tool like eukCC (https://doi.org/10.1186/s13059-020-02155-4) would generate information for this class. slots: - type - completeness - contamination - ncbi_lineage_tax_ids - ncbi_lineage class_uri: nmdc:EukEval NucleotideSequencing: class_uri: nmdc:NucleotideSequencing is_a: DataGeneration description: A DataGeneration in which the sequence of DNA or RNA molecules is generated. comments: - For example data generated from an Illumina or Pacific Biosciences instrument. slots: - gold_sequencing_project_identifiers - insdc_bioproject_identifiers - insdc_experiment_identifiers - ncbi_project_name slot_usage: id: structured_pattern: syntax: "{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$" interpolated: true analyte_category: range: NucleotideSequencingEnum MassSpectrometry: class_uri: nmdc:MassSpectrometry is_a: DataGeneration description: Spectrometry where the sample is converted into gaseous ions which are characterised by their mass-to-charge ratio and relative abundance. exact_mappings: - CHMO:0000470 slots: - eluent_introduction_category - generates_calibration - has_chromatography_configuration - has_mass_spectrometry_configuration slot_usage: id: structured_pattern: syntax: "{id_nmdc_prefix}:(dgms|omprc)-{id_shoulder}-{id_blade}$" interpolated: true has_chromatography_configuration: structured_pattern: syntax: "{id_nmdc_prefix}:chrcon-{id_shoulder}-{id_blade}$" interpolated: true has_mass_spectrometry_configuration: structured_pattern: syntax: "{id_nmdc_prefix}:mscon-{id_shoulder}-{id_blade}$" interpolated: true required: true analyte_category: range: MassSpectrometryEnum eluent_introduction_category: required: true rules: - title: generates_calibration_required_if_gc description: >- If eluent_introduction_category is gas_chromatography, then generates_calibration is required. preconditions: slot_conditions: eluent_introduction_category: equals_string: gas_chromatography postconditions: slot_conditions: generates_calibration: required: true - title: has_chromatography_configuration_required_if_lc description: >- If eluent_introduction_category is liquid_chromatography, then has_chromatography_configuration is required. preconditions: slot_conditions: eluent_introduction_category: equals_string: liquid_chromatography postconditions: slot_conditions: has_chromatography_configuration: required: true - title: has_chromatography_configuration_required_if_gc description: >- If eluent_introduction_category is gas_chromatography, then has_chromatography_configuration is required. preconditions: slot_conditions: eluent_introduction_category: equals_string: gas_chromatography postconditions: slot_conditions: has_chromatography_configuration: required: true Configuration: abstract: true is_a: InformationObject class_uri: nmdc:Configuration description: A set of parameters that define the actions of a process and is shared among multiple instances of the process. notes: - This class is intended to represent the parameters within a method file (or similar) that control a process. slots: - protocol_link MassSpectrometryConfiguration: is_a: Configuration class_uri: nmdc:MassSpectrometryConfiguration description: A set of parameters that define and control the actions of a mass spectrometry process. notes: - This class is intended to represent a mass spectrometry method file that controls a mass spectrometry process. slots: - mass_spectrometry_acquisition_strategy - resolution_categories - mass_analyzers - ionization_source - mass_spectrum_collection_modes - polarity_mode slot_usage: name: required: true description: required: true id: structured_pattern: syntax: "{id_nmdc_prefix}:mscon-{id_shoulder}-{id_blade}$" interpolated: true mass_spectrometry_acquisition_strategy: required: true resolution_categories: required: true mass_analyzers: required: true ionization_source: required: true mass_spectrum_collection_modes: required: true polarity_mode: required: true ChromatographyConfiguration: is_a: Configuration class_uri: nmdc:ChromatographyConfiguration description: A set of parameters that define and control the actions of a chromatography process. notes: - This class is intended to represent a chromatography method file associated with a mass spectrometry process. slots: - chromatographic_category - ordered_mobile_phases - stationary_phase - temperature slot_usage: name: required: true description: required: true id: structured_pattern: syntax: "{id_nmdc_prefix}:chrcon-{id_shoulder}-{id_blade}$" interpolated: true chromatographic_category: required: true stationary_phase: required: true Manifest: is_a: InformationObject class_uri: nmdc:Manifest description: A qualified collection of DataObjects that can be analyzed together in the same experimental context. comments: - Manifest are currently uncoupled from other modelling. For example, there is no schema requirement that DataObjects in a fractions Manifest were all obtained by analyzing the same ProcessedSample. slots: - manifest_category slot_usage: id: structured_pattern: syntax: "{id_nmdc_prefix}:manif-{id_shoulder}-{id_blade}$" CalibrationInformation: class_uri: nmdc:CalibrationInformation is_a: InformationObject description: A calibration object that is associated with a process. slots: - calibration_object - internal_calibration - calibration_target - calibration_standard rules: - title: calibration_standard_if_rt description: >- If the calibration_target is retention_index, a calibration_standard is required. preconditions: slot_conditions: calibration_target: equals_string: retention_index postconditions: slot_conditions: calibration_standard: required: true - title: calibration_object_if_not_internal_calibration description: >- If internal_calibration is false, a calibration_object is required. preconditions: slot_conditions: internal_calibration: equals_expression: "False" postconditions: slot_conditions: calibration_object: required: true slot_usage: internal_calibration: required: true calibration_target: required: true id: structured_pattern: syntax: "{id_nmdc_prefix}:calib-{id_shoulder}-{id_blade}$" interpolated: true FunctionalAnnotationAggMember: description: >- This class is used to store aggregated results from workflows which produce functional annotations such as metaproteomics and metagenomics. class_uri: nmdc:FunctionalAnnotationAggMember slots: - was_generated_by - gene_function_id - count - type slot_usage: was_generated_by: structured_pattern: syntax: "{id_nmdc_prefix}:(wfmgan|wfmp|wfmtan)-{id_shoulder}-{id_blade}{id_version}$" interpolated: true required: true range: AnnotatingWorkflow count: description: The number of sequences (for a metagenome or metatranscriptome) or spectra (for metaproteomics) associated with the specified function. gene_function_id: pattern: "(COG:COG\\d+|PFAM:PF\\d{5}|KEGG.ORTHOLOGY:K\\d+)" Database: class_uri: nmdc:Database tree_root: true aliases: - NMDC metadata object description: An abstract holder for any set of metadata and data. It does not need to correspond to an actual managed database top level holder class. When translated to JSON-Schema this is the 'root' object. It should contain pointers to other objects of interest. For MongoDB, the lists of objects that Database slots point to correspond to **collections**. slots: - biosample_set - calibration_set - collecting_biosamples_from_site_set - configuration_set - data_generation_set - data_object_set - field_research_site_set - functional_annotation_agg - functional_annotation_set - genome_feature_set - instrument_set - manifest_set - material_processing_set - processed_sample_set - storage_process_set - study_set - workflow_execution_set Pooling: class_uri: nmdc:Pooling is_a: MaterialProcessing description: physical combination of several instances of like material. slots: exact_mappings: - OBI:0600016 slot_usage: has_input: minimum_cardinality: 2 required: true structured_pattern: syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$" interpolated: true has_output: required: true minimum_cardinality: 1 maximum_cardinality: 1 structured_pattern: syntax: "{id_nmdc_prefix}:procsm-{id_shoulder}-{id_blade}$" interpolated: true id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:poolp-{id_shoulder}-{id_blade}$" interpolated: true Extraction: class_uri: nmdc:Extraction is_a: MaterialProcessing description: A material separation in which a desired component of an input material is separated from the remainder. exact_mappings: - OBI:0302884 slots: - substances_used - extraction_targets - input_mass - volume - temperature slot_usage: has_input: required: true range: Sample structured_pattern: syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$" interpolated: true has_output: required: true structured_pattern: syntax: "{id_nmdc_prefix}:(procsm)-{id_shoulder}-{id_blade}$" interpolated: true id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:extrp-{id_shoulder}-{id_blade}$" interpolated: true volume: description: The volume of the solvent/solute being used, not the input. LibraryPreparation: class_uri: nmdc:LibraryPreparation aliases: - LibraryConstruction is_a: MaterialProcessing slots: - is_stranded - library_preparation_kit - library_type - nucl_acid_amp - pcr_cond - pcr_cycles - pcr_primers - stranded_orientation - target_gene - target_subfragment close_mappings: - OBI:0000711 comments: - OBI:0000711 specifies a DNA input (but not ONLY a DNA input) slot_usage: has_input: required: true structured_pattern: syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$" interpolated: true has_output: required: true structured_pattern: syntax: "{id_nmdc_prefix}:(procsm)-{id_shoulder}-{id_blade}$" interpolated: true id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:libprp-{id_shoulder}-{id_blade}$" interpolated: true pcr_cond: description: Description of reaction conditions and components of polymerase chain reaction performed during library preparation CollectingBiosamplesFromSite: class_uri: nmdc:CollectingBiosamplesFromSite is_a: PlannedProcess title: Collecting Biosamples From Site comments: - "this illustrates implementing a Biosample relation with a process class" close_mappings: - OBI:0000744 slot_usage: has_input: range: Site required: true structured_pattern: syntax: "{id_nmdc_prefix}:(frsite|site)-{id_shoulder}-{id_blade}$" interpolated: true has_output: range: Biosample required: true structured_pattern: syntax: "{id_nmdc_prefix}:bsm-{id_shoulder}-{id_blade}$" interpolated: true id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:clsite-{id_shoulder}-{id_blade}$" interpolated: true SubSamplingProcess: class_uri: nmdc:SubSamplingProcess description: > Separating a sample aliquot from the starting material for downstream activity. related_mappings: - OBI:0000744 notes: - A subsample may be (a) a portion of the sample obtained by selection or division; (b) an individual unit of the lot taken as part of the sample; (c) the final unit of multistage sampling. The term 'subsample' is used either in the sense of a 'sample of a sample' or as a synonym for 'unit'. In practice, the meaning is usually apparent from the context or is defined. - TODO - Montana to visit slot descriptions contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-8683-0050 #Montana Smith - ORCID:0000-0001-9076-6066 #Mark Miller - ORCID:0009-0008-4013-7737 #James Tessmer is_a: MaterialProcessing slots: - container_size - contained_in - temperature - volume - mass - sampled_portion slot_usage: id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:subspr-{id_shoulder}-{id_blade}$" interpolated: true volume: description: The output volume of the SubSampling Process. mass: description: The output mass of the SubSampling Process. has_input: range: Sample structured_pattern: # MAM 2024-05-22 isn't that inherited from a parent class? syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$" interpolated: true has_output: range: ProcessedSample description: The subsample. structured_pattern: # MAM 2024-05-22 isn't that inherited from a parent class? syntax: "{id_nmdc_prefix}:(procsm)-{id_shoulder}-{id_blade}$" interpolated: true MixingProcess: class_uri: nmdc:MixingProcess description: > The combining of components, particles or layers into a more homogeneous state. contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-8683-0050 #Montana Smith is_a: MaterialProcessing comments: - The mixing may be achieved manually or mechanically by shifting the material with stirrers or pumps or by revolving or shaking the container. - The process must not permit segregation of particles of different size or properties. - Homogeneity may be considered to have been achieved in a practical sense when the sampling error of the processed portion is negligible compared to the total error of the measurement system. slots: - duration slot_usage: id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:mixpro-{id_shoulder}-{id_blade}$" has_input: range: Sample structured_pattern: # MAM 2024-05-22 isn't that inherited from a parent class? syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$" interpolated: true has_output: range: ProcessedSample description: The mixed sample. structured_pattern: syntax: "{id_nmdc_prefix}:procsm-{id_shoulder}-{id_blade}$" interpolated: true FiltrationProcess: class_uri: nmdc:FiltrationProcess description: >- The process of segregation of phases; e.g. the separation of suspended solids from a liquid or gas, usually by forcing a carrier gas or liquid through a porous medium. related_mappings: - CHMO:0001640 contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-8683-0050 #Montana Smith - ORCID:0000-0001-9076-6066 #Mark Miller - ORCID:0009-0008-4013-7737 #James Tessmer is_a: MaterialProcessing slots: - conditionings - container_size - filter_material - filter_pore_size - filtration_category - is_pressurized - separation_method - volume slot_usage: id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:filtpr-{id_shoulder}-{id_blade}$" interpolated: true volume: description: The volume of sample filtered. has_input: range: Sample structured_pattern: syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$" interpolated: true has_output: range: ProcessedSample structured_pattern: syntax: "{id_nmdc_prefix}:procsm-{id_shoulder}-{id_blade}$" interpolated: true StorageProcess: class_uri: nmdc:StorageProcess description: >- A planned process with the objective to preserve and protect material entities by placing them in an identified location which may have a controlled environment. is_a: PlannedProcess related_mappings: - OBI:0302893 slots: - substances_used - contained_in - temperature slot_usage: substances_used: description: The substance(s) that a processed sample is stored in. id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:storpr-{id_shoulder}-{id_blade}$" interpolated: true has_input: structured_pattern: syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$" interpolated: true range: Sample has_output: structured_pattern: syntax: "{id_nmdc_prefix}:procsm-{id_shoulder}-{id_blade}$" interpolated: true range: ProcessedSample ChromatographicSeparationProcess: class_uri: nmdc:ChromatographicSeparationProcess description: The process of using a selective partitioning of the analyte or interferent between two immiscible phases. contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-1368-8217 #Yuri Corilo is_a: MaterialProcessing slots: - chromatographic_category - ordered_mobile_phases - stationary_phase - temperature slot_usage: id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:cspro-{id_shoulder}-{id_blade}$" has_input: range: Sample structured_pattern: syntax: "{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$" interpolated: true has_output: range: ProcessedSample structured_pattern: syntax: "{id_nmdc_prefix}:procsm-{id_shoulder}-{id_blade}$" interpolated: true DissolvingProcess: class_uri: nmdc:DissolvingProcess aliases: - Solubilization description: > A mixing step where a soluble component is mixed with a liquid component. exact_mappings: - CHMO:0002773 contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-1368-8217 #Yuri Corilo is_a: MaterialProcessing slots: - duration - temperature - substances_used slot_usage: id: required: true structured_pattern: syntax: "{id_nmdc_prefix}:dispro-{id_shoulder}-{id_blade}$" interpolated: true enums: CalibrationTargetEnum: permissible_values: mass_charge_ratio: title: m/z aliases: - Mass - m/z retention_time: aliases: - RT retention_index: aliases: - RI CalibrationStandardEnum: permissible_values: fames: aliases: - FAMES alkanes: aliases: - Alkanes srfa: aliases: - Suwannee River fulvic acid see_also: - https://humic-substances.org/source-materials-for-ihss-samples/ comments: - Reference standard Suwannee River Fulvic Acid Standard II (2S101F) from International Humic Substances Society StrandedOrientationEnum: description: This enumeration specifies information about stranded RNA library preparations. permissible_values: antisense orientation: description: Orientation that is complementary (non-coding) to a sequence of messenger RNA. comments: - See https://www.genome.gov/genetics-glossary/antisense exact_mappings: - SO:0000077 sense orientation: description: Orientation that corresponds to the coding sequence of messenger RNA. MassSpectrometryAcquisitionStrategyEnum: permissible_values: data_independent_acquisition: description: Data independent mass spectrometer acquisition method wherein the full mass range is fragmented. Examples of such an approach include MS^E, AIF, and bbCID. aliases: - DIA - data independent acquisition from dissociation of full mass range exact_mappings: - MS:1003227 data_dependent_acquisition: description: Mass spectrometer data acquisition method wherein MSn spectra are triggered based on the m/z of precursor ions detected in the same run. aliases: - DDA exact_mappings: - MS:1003221 full_scan_only: aliases: - MS description: Mass spectrometer data acquisition method wherein only MS1 data are acquired. ResolutionCategoryEnum: permissible_values: high: description: higher than unit resolution low: description: at unit resolution MassAnalyzerEnum: permissible_values: time_of_flight: aliases: - TOF description: Instrument that separates ions by m/z in a field-free region after acceleration to a fixed acceleration energy. exact_mappings: - MS:1000084 quadrupole: aliases: - Quad - Q description: A mass spectrometer that consists of four parallel rods whose centers form the corners of a square and whose opposing poles are connected. The voltage applied to the rods is a superposition of a static potential and a sinusoidal radio frequency potential. The motion of an ion in the x and y dimensions is described by the Matthieu equation whose solutions show that ions in a particular m/z range can be transmitted along the z axis. exact_mappings: - MS:1000081 Orbitrap: aliases: - Orbi description: An ion trapping device that consists of an outer barrel-like electrode and a coaxial inner spindle-like electrode that form an electrostatic field with quadro-logarithmic potential distribution. The frequency of harmonic oscillations of the orbitally trapped ions along the axis of the electrostatic field is independent of the ion velocity and is inversely proportional to the square root of m/z so that the trap can be used as a mass analyzer. exact_mappings: - MS:1000484 ion_cyclotron_resonance: aliases: - ICR description: A mass spectrometer based on the principle of ion cyclotron resonance in which an ion in a magnetic field moves in a circular orbit at a frequency characteristic of its m/z value. Ions are coherently excited to a larger radius orbit using a pulse of radio frequency energy and their image charge is detected on receiver plates as a time domain signal. Fourier transformation of the time domain signal results in a frequency domain signal which is converted to a mass spectrum based in the inverse relationship between frequency and m/z. exact_mappings: - MS:1000079 ion_trap: aliases: - LTQ - Ion Trap - Paul Trap description: A device for spatially confining ions using electric and magnetic fields alone or in combination. exact_mappings: - MS:1000264 IonizationSourceEnum: permissible_values: electrospray_ionization: aliases: - ESI matrix_assisted_laser_desorption_ionization: aliases: - MALDI atmospheric_pressure_photo_ionization: aliases: - APPI atmospheric_pressure_chemical_ionization: aliases: - APCI electron_ionization: aliases: - EI MassSpectrumCollectionModeEnum: permissible_values: full_profile: { } reduced_profile: { } centroid: { } PolarityModeEnum: permissible_values: positive: { } negative: { } EluentIntroductionCategoryEnum: permissible_values: liquid_chromatography: aliases: - LC title: liquid chromatography description: The processed sample is introduced into the mass spectrometer through a liquid chromatography process. gas_chromatography: aliases: - GC title: gas chromatography description: The processed sample is introduced into the mass spectrometer through a gas chromatography process. direct_infusion_syringe: title: direct infusion syringe description: The processed sample is introduced into the mass spectrometer through a direct infusion process using a syringe. direct_infusion_autosampler: title: direct infusion autosampler description: The processed sample is introduced into the mass spectrometer through a direct infusion process using an autosampler. LibraryTypeEnum: permissible_values: DNA: { } RNA: { } ContainerCategoryEnum: description: The permitted types of containers used in processing metabolomic samples. contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-8683-0050 #Montana Smith permissible_values: v-bottom_conical_tube: falcon_tube: SeparationMethodEnum: description: The tool/substance used to separate or filter a solution or mixture. contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-8683-0050 #Montana Smith permissible_values: ptfe_96_well_filter_plate: syringe: StationaryPhaseEnum: description: The type of stationary phase used in a chromatography process. contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-4504-1039 #Katherine Heal permissible_values: BEH-HILIC: description: Hydrophilic Interaction Chromatography (HILIC) employing BEH (Bridged Ethylene Hybrid) particles as the stationary phase. is_a: HILIC C18: description: A stationary phase consisting of octadecyl chains (C18) bonded to silica particles. C8: description: A stationary phase consisting of octyl chains (C8) bonded to silica particles. C4: description: A stationary phase consisting of butyl chains (C4) bonded to silica particles. C2: description: A stationary phase consisting of ethyl chains (C2) bonded to silica particles. C1: description: A stationary phase consisting of methyl chains (C1) bonded to silica particles. C30: description: A stationary phase consisting of triacontyl chains (C30) bonded to silica particles. C60: description: A stationary phase consisting of hexatriacontyl chains (C60) bonded to silica particles. CNT: description: Carbon Nanotube stationary phase. CN: description: Cyano (CN) bonded stationary phase. Diol: description: A stationary phase with diol (1,2-diol) functional groups. HILIC: description: Hydrophilic Interaction Chromatography (HILIC) stationary phase. HLB: description: Hydrophilic-Lipophilic-Balance (HLB) stationary phase. NH2: description: Amino (NH2) bonded stationary phase. Phenyl: description: Phenyl bonded stationary phase. Polysiloxane: description: A stationary phase made of polysiloxane, usually used in gas chromatography. PS-DVB: description: Polystyrene-divinylbenzene stationary phase, often used in solid-phase extraction, including proprietary Priority PolLutant (PPL). SAX: description: Strong Anion Exchange (SAX) stationary phase. SCX: description: Strong Cation Exchange (SCX) stationary phase. Silica: description: A stationary phase made of silica, commonly used in chromatography. WCX: description: Weak Cation Exchange (WCX) stationary phase. WAX: description: Weak Anion Exchange (WAX) stationary phase. ZIC-HILIC: description: Zwitterionic Hydrophilic Interaction Chromatography (ZIC-HILIC) stationary phase. is_a: HILIC ZIC-pHILIC: description: Zwitterionic pH-Responsive Hydrophilic Interaction Chromatography (ZIC-pHILIC) stationary phase. is_a: ZIC-HILIC ZIC-cHILIC: description: Zwitterionic Charged Hydrophilic Interaction Chromatography (ZIC-cHILIC) stationary phase. is_a: ZIC-HILIC ProtocolCategoryEnum: description: The possible protocols that may be followed for an assay. permissible_values: mplex: derivatization: filter_clean_up: organic_matter_extraction: solid_phase_extraction: phosphorus_extraction: ph_measurement: respiration_measurement: texture_measurement: dna_extraction: phenol_chloroform_extraction: { } ChromatographicCategoryEnum: permissible_values: liquid_chromatography: aliases: - LC gas_chromatography: aliases: - GC solid_phase_extraction: aliases: - SPE SamplePortionEnum: permissible_values: supernatant: aliases: - top_layer pellet: aliases: - bottom_layer organic_layer: title: Organic layer description: The portion of a mixture containing dissolved organic material aqueous_layer: title: Aqueous layer description: The portion of a mixture containing molecules dissolved in water aliases: - water layer interlayer: title: Interlayer description: The layer of material between liquid layers of a separated mixture chloroform_layer: title: Chloroform layer description: The portion of a mixture containing molecules dissolved in chloroform is_a: organic_layer methanol_layer: title: Methanol layer description: The portion of a mixture containing molecules dissolved in methanol is_a: organic_layer slots: # ============================================================================ # Legacy MIxS slots removed from GSC MIxS commit 0368da846b197bef1c0dd27a9cf337a8aeea17f2 # These slots were present in microbiomedata/mixs tag pre-2024-05-15 but removed # from the current GSC MIxS. NMDC retains them for backward compatibility with # submission-schema which references them. No production data uses these slots. # ============================================================================ host_family_relation: title: host family relationship description: >- Familial relationships to other hosts in the same study; can include multiple relationships. comments: - "Legacy MIxS slot removed from GSC MIxS v6.2.2. Retained for submission-schema compatibility." range: string multivalued: true examples: - value: "offspring;Mussel25" annotations: Expected_value: "relationship type;arbitrary identifier" slot_uri: MIXS:0000872 salinity_meth: title: salinity method description: >- Reference or method used in determining salinity. comments: - "Legacy MIxS slot removed from GSC MIxS v6.2.2. Retained for submission-schema compatibility." range: string multivalued: false structured_pattern: syntax: "^{PMID}|{DOI}|{URL}$" interpolated: true annotations: Expected_value: "PMID,DOI or url" slot_uri: MIXS:0000341 soil_text_measure: title: soil texture measurement description: >- The relative proportion of different grain sizes of mineral particles in a soil, as described using a standard system; express as % sand (50 um to 2 mm), silt (2 um to 50 um), and clay (<2 um) with textural name (e.g., silty clay loam) optional. comments: - "Legacy MIxS slot removed from GSC MIxS v6.2.2. Retained for submission-schema compatibility." - "Old MIxS used 'quantity value' range but this is clearly structured text, not a single quantity." range: string multivalued: false examples: - value: "ite loam; 20% sand; 40% silt; 40% clay" annotations: Expected_value: "measurement value" slot_uri: MIXS:0000335 biomaterial_purity: abstract: true range: QuantityValue description: A measure of the purity of a biomaterial sample generates_calibration: range: CalibrationInformation description: calibration information is generated a process comments: - A gas chromatography mass spectromery run generates data to calibrate the retention index structured_pattern: syntax: "{id_nmdc_prefix}:calib-{id_shoulder}-{id_blade}$" interpolated: true uses_calibration: range: CalibrationInformation multivalued: true description: calibration information is used by a process comments: - Retenion index calibration data generated by a gas chromatography mass spectromery run is used when analyzing metabolomics data structured_pattern: syntax: "{id_nmdc_prefix}:calib-{id_shoulder}-{id_blade}$" interpolated: true calibration_object: range: DataObject description: the file containing calibration data object structured_pattern: syntax: "{id_nmdc_prefix}:dobj-{id_shoulder}-{id_blade}$" interpolated: true internal_calibration: range: boolean description: whether internal calibration was used, if false, external calibration was used calibration_target: range: CalibrationTargetEnum description: the target measurement of the calibration calibration_standard: range: CalibrationStandardEnum description: the reference standard(s) used for calibration polarity_mode: range: PolarityModeEnum description: the polarity of which ions are generated and detected mass_spectrum_collection_modes: range: MassSpectrumCollectionModeEnum description: Indicates whether mass spectra were collected in full profile, reduced profile, or centroid mode during acquisition. multivalued: true eukaryotic_evaluation: range: EukEval description: Contains results from evaluating if a Metagenome-Assembled Genome is of eukaryotic lineage. ncbi_lineage_tax_ids: range: string pattern: '^\d+(-\d+)*$' description: Dash-delimited ordered list of NCBI taxonomy identifiers (TaxId) examples: - value: 1-131567-2759-2611352-33682-191814-2603949 ncbi_lineage: range: string description: Comma delimited ordered list of NCBI taxonomy names. examples: - value: root,cellular organisms,Eukaryota,Discoba,Euglenozoa,Diplonemea,Diplonemidae has_failure_categorization: range: FailureCategorization multivalued: true inlined_as_list: true ionization_source: range: IonizationSourceEnum description: The ionization source used to introduce processed samples into a mass spectrometer exact_mappings: - MS:1000008 mass_analyzers: range: MassAnalyzerEnum description: The kind of mass analyzer(s) used during the spectra collection. multivalued: true exact_mappings: - MS:1000443 resolution_categories: range: ResolutionCategoryEnum description: The relative resolution at which spectra were collected. examples: - value: high - value: low multivalued: true mass_spectrometry_acquisition_strategy: range: MassSpectrometryAcquisitionStrategyEnum description: Mode of running a mass spectrometer method by which m/z ranges are selected and ions possibly fragment. exact_mappings: - MS:1003213 eluent_introduction_category: range: EluentIntroductionCategoryEnum description: A high-level categorization for how the processed sample is introduced into a mass spectrometer. examples: - value: liquid_chromatography - value: direct_infusion_syringe has_mass_spectrometry_configuration: range: MassSpectrometryConfiguration description: The identifier of the associated MassSpectrometryConfiguration. has_chromatography_configuration: range: ChromatographyConfiguration description: The identifier of the associated ChromatographyConfiguration, providing information about how a sample was introduced into the mass spectrometer. gene_function_id: range: uriorcurie description: The identifier for the gene function. examples: - value: KEGG.ORTHOLOGY:K00627 required: true count: range: integer required: true functional_annotation_agg: description: This property links a database object to a set of functional annotation aggregation (agg) results. range: FunctionalAnnotationAggMember multivalued: true inlined: true inlined_as_list: true sample_collection_year: range: integer sample_collection_month: library_preparation_kit: range: string exact_mappings: - GENEPIO:0001450 pcr_cycles: range: integer exact_mappings: - OBI:0002475 is_stranded: description: Is the (RNA) library stranded or non-stranded (unstranded). range: boolean comments: - A value of true means the library is stranded, flase means non-stranded. stranded_orientation: description: Lists the strand orientiation for a stranded RNA library preparation. range: StrandedOrientationEnum input_mass: title: sample mass used description: Total mass of sample used in activity. aliases: - sample mass - sample weight exact_mappings: - MS:1000004 narrow_mappings: - MIXS:0000111 range: QuantityValue annotations: storage_units: g library_type: title: library type examples: - value: DNA range: LibraryTypeEnum date_created: description: from database class etl_software_version: description: from database class object_set: inlined_as_list: true mixin: true multivalued: true description: Applies to a property that links a database object to a set of objects. This is necessary in a json document to provide context for a list, and to allow for a single json object that combines multiple object types chemical_entity_set: deprecated: >- Deprecation of the ChemicalEntity class, means deprecation of the chemical_entity_set as well. mixins: - object_set description: This property links a database object to the set of chemical entities within it. ontology_class_set: mixins: - object_set range: OntologyClass description: This property links a database object to the set of ontology classes within it. biosample_set: mixins: - object_set range: Biosample description: This property links a database object to the set of samples within it. study_set: mixins: - object_set range: Study description: This property links a database object to the set of studies within it. field_research_site_set: mixins: - object_set range: FieldResearchSite collecting_biosamples_from_site_set: mixins: - object_set range: CollectingBiosamplesFromSite data_object_set: mixins: - object_set range: DataObject description: This property links a database object to the set of data objects within it. genome_feature_set: mixins: - object_set range: GenomeFeature description: This property links a database object to the set of all features functional_annotation_set: mixins: - object_set range: FunctionalAnnotation description: This property links a database object to the set of all functional annotations workflow_execution_set: mixins: - object_set range: WorkflowExecution description: This property links a database object to the set of workflow executions. data_generation_set: mixins: - object_set range: DataGeneration description: This property links a database object to the set of data generations within it. processed_sample_set: mixins: - object_set range: ProcessedSample description: This property links a database object to the set of processed samples within it. instrument_set: mixins: - object_set range: Instrument description: This property links a database object to the set of instruments within it. calibration_set: mixins: - object_set range: CalibrationInformation description: This property links a database object to the set of calibrations within it. configuration_set: mixins: - object_set range: Configuration description: This property links a database object to the set of configurations within it. manifest_set: mixins: - object_set range: Manifest description: This property links a database object to the set of manifests within it. storage_process_set: mixins: - object_set range: StorageProcess description: This property links a database object to the set of storage processes within it. material_processing_set: mixins: - object_set range: MaterialProcessing description: This property links a database object to the set of material processing within it. sample_collection_day: range: integer sample_collection_hour: range: integer sample_collection_minute: range: integer biogas_temperature: range: string soil_annual_season_temp: range: string biogas_retention_time: range: string completion_date: range: string container_size: range: QuantityValue description: The volume of the container an analyte is stored in or an activity takes place in contributors: - ORCID:0009-0001-1555-1601 #Anastasiya Prymolenna - ORCID:0000-0002-8683-0050 #Montana Smith annotations: storage_units: mL filter_material: description: "A porous material on which solid particles present in air or other fluid which flows through it are largely caught and retained." comments: - "Filters are made with a variety of materials: cellulose and derivatives, glass fibre, ceramic, synthetic plastics and fibres. Filters may be naturally porous or be made so by mechanical or other means. Membrane/ceramic filters are prepared with highly controlled pore size in a sheet of suitable material such as polyfluoroethylene, polycarbonate or cellulose esters. Nylon mesh is sometimes used for reinforcement. The pores constitute 80–85% of the filter volume commonly and several pore sizes are available for air sampling (0.45−0.8 μm are commonly employed)." range: string filter_pore_size: range: QuantityValue description: "A quantitative or qualitative measurement of the physical dimensions of the pores in a material." annotations: storage_units: um conditionings: range: string description: "Preliminary treatment of either phase with a suitable solution of the other phase (in the absence of main extractable solute(s)) so that when the subsequent equilibration is carried out changes in the (volume) phase ratio or in the concentrations of other components are minimized." multivalued: true list_elements_ordered: true separation_method: range: SeparationMethodEnum description: The method that was used to separate a substance from a solution or mixture. filtration_category: range: string description: The type of conditioning applied to a filter, device, etc. material_component_separation: range: string description: "A material processing in which components of an input material become segregated in space" value: range: QuantityValue annotations: units_alignment_excuse: pending_analysis modifier_substance: range: string description: The type of modification being done is_pressurized: range: boolean description: Whether or not pressure was applied to a thing or process. contained_in: range: ContainerCategoryEnum description: A type of container. examples: - value: test tube - value: falcon tube - value: whirlpak input_volume: # see also `volume` range: QuantityValue description: The volume of the input sample. annotations: storage_units: mL ordered_mobile_phases: range: MobilePhaseSegment description: The solution(s) that moves through a chromatography column. multivalued: true list_elements_ordered: true inlined_as_list: true stationary_phase: range: StationaryPhaseEnum description: The material the stationary phase is comprised of used in chromatography. chromatographic_category: range: ChromatographicCategoryEnum description: The type of chromatography used in a process. sampled_portion: range: SamplePortionEnum multivalued: true description: The portion of the sample that is taken for downstream activity.