janno_column_name	description	data_type	multi	choice	range	choice_options	range_lower	range_upper	mandatory	unique
Poseidon_ID	sample identifier as defined by the genetics laboratory (e.g. I1234, BOT001), must fit to the values in the Poseidon package .fam/.ind file, must be unique within one package, if multiple datasets exist for the same individual different Poseidon_IDs are required	String	FALSE	FALSE	FALSE				TRUE	TRUE
Genetic_Sex	genetic sex of the individual derived from this sample, only F, M or U because the EIGENSTRAT and PLINK formats only support these three, edge cases (e.g. XXY, XYY, X0) are undefined and should be grouped as F, M or U, with a Note added	Char	FALSE	TRUE	FALSE	F;M;U			TRUE	FALSE
Group_Name	meaningful population/group identifiers for the sample, should follow the geographic-temporal nomenclature proposed by Eisenmann et al. 2018 (https://doi.org/10.1038/s41598-018-31123-z), multiple entries separated by ;, the first value must be equal the group name in the .fam/.ind file	String	TRUE	FALSE	FALSE				TRUE	FALSE
Alternative_IDs	alternative identifiers for the same sampled individual, e.g. IDs in other databases or popular names like Ötzi/Iceman	String	TRUE	FALSE	FALSE				FALSE	FALSE
Relation_To	other samples (by Poseidon_ID) that are related/identical to this sample, multiple entries separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
Relation_Degree	relationship degree for relatives mentioned in Related_To, multiple values separated by ; in the same order as Related_To in case of multiple relations	String	TRUE	TRUE	FALSE	identical;first;second;thirdToFifth;sixthToTenth;unrelated;other			FALSE	FALSE
Relation_Type	relationship type for relatives mentioned in Related_To (e.g. sister_of, child_of, nephew_of), multiple values separated by ; in the same order as Related_To in case of multiple relations	String	TRUE	FALSE	FALSE				FALSE	FALSE
Relation_Note	arbitrary comments about the genetic relationships of the sampled individual	String	FALSE	FALSE	FALSE				FALSE	FALSE
Collection_ID	alternative sample identifier shared by the provider/owner of the sample (e.g. grave 40 skeleton 2)	String	FALSE	FALSE	FALSE				FALSE	FALSE
Country	present-day political country of origin for the sample	String	FALSE	FALSE	FALSE				FALSE	FALSE
Country_ISO	present-day political country expressed in ISO 3166-1 alpha-2 country codes	String	FALSE	FALSE	FALSE				FALSE	FALSE
Location	unspecified location information for the sample, e.g. administrative or topographic region or mountains/rivers/lakes/cities nearby	String	FALSE	FALSE	FALSE				FALSE	FALSE
Site	name of the archaeological site where the sample was found	String	FALSE	FALSE	FALSE				FALSE	FALSE
Latitude	latitude where the sample was found with up to 5 places after the decimal point	Float	FALSE	FALSE	TRUE		-90	90	FALSE	FALSE
Longitude	longitude with up to 5 places after the decimal point	Float	FALSE	FALSE	TRUE		-180	180	FALSE	FALSE
Date_Type	type of dating information available for the sample, C14 if there is a set of radiocarbon dates in the columns Date_C14_Labnr, Date_C14_Uncal_BP and Date_C14_Uncal_BP_Err whose post-calibration probability distribution is a meaningful prior for the individual’s year of death, contextual for any other age information only given in Date_BC_AD_Start, Date_BC_AD_Median and Date_BC_AD_Stop, “modern” for present-day individuals	String	FALSE	TRUE	FALSE	C14;contextual;modern			FALSE	FALSE
Date_C14_Labnr	lab numbers of C14 ages, multiple values separated by ; in case of multiple dates	String	TRUE	FALSE	FALSE				FALSE	FALSE
Date_C14_Uncal_BP	uncalibrated years BP (as in before 1950AD) for the C14 ages as reported by C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates, only relevant if Date_Type is C14	Integer	TRUE	FALSE	TRUE		0	Inf	FALSE	FALSE
Date_C14_Uncal_BP_Err	standard deviation (1-sigma ±) for the uncalibrated C14 ages as reported by the C14 labs, multiple values separated by ; in the same order as Date_C14_Labnr in case of multiple dates, only relevant if Date_Type is C14	Integer	TRUE	FALSE	TRUE		0	Inf	FALSE	FALSE
Date_BC_AD_Start	lower (older) bound for the age of the sample in years BC/AD, negative numbers for BC, positive numbers for AD, in case of C14 dates 2-sigma post calibration interval, 2000 for modern samples	Integer	FALSE	FALSE	TRUE		-Inf	2050	FALSE	FALSE
Date_BC_AD_Median	median age of the sample in years BC/AD, for C14-dated samples median, for contextually dated samples simple mid-point of the archaeological intervals, 2000 for modern samples	Integer	FALSE	FALSE	TRUE		-Inf	2050	FALSE	FALSE
Date_BC_AD_Stop	upper (more recent) bound for the age of the sample in years BC/AD, counter point to Date_BC_AD_Start	Integer	FALSE	FALSE	TRUE		-Inf	2050	FALSE	FALSE
Date_Note	arbitrary comments about the dating information for the sample	String	FALSE	FALSE	FALSE				FALSE	FALSE
MT_Haplogroup	mitochondrial haplogroup derived for the sample as specified on phylotree.org and as reported by the Haplofind or Haplogrep software tools	String	FALSE	FALSE	FALSE				FALSE	FALSE
Y_Haplogroup	Y-chromosome haplogroup derived for the sample following a syntax with the main branch + the most terminal derived Y-SNP (e.g. R1b-P312)	String	FALSE	FALSE	FALSE				FALSE	FALSE
Source_Tissue	skeletal element, tissue or other material sampled, the specific bone should be reported after an underscore (e.g. bone_phalanx), multiple values separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
Nr_Libraries	number of libraries produced for the sample	Integer	FALSE	FALSE	FALSE				FALSE	FALSE
Library_Names	identifiers of the libraries used to generate the genotype data for the sample, multiple values separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE
Capture_Type	specifics of the data generation method (e.g. capture method) for the individual libraries generated for the sample, multiple values separated by ;	String	TRUE	TRUE	FALSE	Shotgun;1240K;ArborComplete;ArborPrimePlus;ArborAncestralPlus;TwistAncientDNA;OtherCapture;ReferenceGenome			FALSE	FALSE
UDG 	udg treatment for the libraries, mixed in case multiple libraries with different UDG treatment were merged	String	FALSE	TRUE	FALSE	minus;half;plus;mixed			FALSE	FALSE
Library_Built	strandedness of the libraries, “mixed” in case multiple libraries with different protocols were merged	String	FALSE	TRUE	FALSE	ds;ss;mixed			FALSE	FALSE
Genotype_Ploidy	ploidy of the genotypes for the sample	String	FALSE	TRUE	FALSE	diploid;haploid			FALSE	FALSE
Data_Preparation_Pipeline_URL	url pointing to a description of the computational pipeline used to generate the genotype data from the source data	String	FALSE	FALSE	FALSE				FALSE	FALSE
Endogenous	% endogenous DNA as estimated from SG libraries (before capture) as for example estimated by EAGER, not on target and no quality filter, in case of multiple libraries only the highest values should be reported	Float	FALSE	FALSE	TRUE		0	100	FALSE	FALSE
Nr_SNPs	number of non-missing SNPs for the sample, counted on the SNP-set stored in the Poseidon package	Integer	FALSE	FALSE	FALSE				FALSE	FALSE
Coverage_on_Target_SNPs	average X-fold coverage across targeted SNP sites after quality filtering	Float	FALSE	FALSE	FALSE				FALSE	FALSE
Damage	% damage on the 5' end for the main shotgun library used for sequencing and/or capture, in case of multiple libraries a value from the merged read alignment should be reported	Float	FALSE	FALSE	TRUE		0	100	FALSE	FALSE
Contamination	(modern) contamination of the sample as measured by the method in Contamination_Meas, multiple values separated by ; (for different methods, in case of multiple libraries report a value from the merged read alignment), the variables Contamination, Contamination_Err and Contamination_Meas must have the same number and order of (non-n/a) entries	String	TRUE	FALSE	FALSE				FALSE	FALSE
Contamination_Err	(modern) contamination estimate error of the sample	String	TRUE	FALSE	FALSE				FALSE	FALSE
Contamination_Meas	method to measure contamination, should be a software tool (ANGSD, Schmutzi, …) and the respective software versions, details should go to Contamination_Note	String	TRUE	FALSE	FALSE				FALSE	FALSE
Contamination_Note	arbitrary comments about the contamination estimation	String	FALSE	FALSE	FALSE				FALSE	FALSE
Genetic_Source_Accession_IDs	ENA or SRA accession identifiers pointing to the source data used to generate the genotyping data for the sample, multiple values separated by ;, if multiple are given they should be arranged by descending specificity (e.g. project id > sample id > sequencing run id)	String	TRUE	FALSE	FALSE				FALSE	FALSE
Primary_Contact	project lead or first author who generated and published the data for the sample	String	FALSE	FALSE	FALSE				FALSE	FALSE
Publication	bibtex keys for the publications where a sample was published (e.g. “AuthorJournalYear“) or “unpublished“, multiple values separated by ;, all must be present with complete BibTeX entries in the Poseidon package’s .bib file	String	TRUE	FALSE	FALSE				FALSE	FALSE
Note	arbitrary comments about the sample	String	FALSE	FALSE	FALSE				FALSE	FALSE
Keywords	arbitrary tags, multiple values separated by ;	String	TRUE	FALSE	FALSE				FALSE	FALSE