- schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: ÖNB, Cod. 3891. Ground Truth url: 10.5281/zenodo.7467249 authors: - name: Ainonen surname: Tuija roles: - transcriber - name: Andresen surname: Suse roles: - transcriber - name: Bakker surname: Loïs roles: - transcriber - name: Boylan surname: Amy roles: - transcriber - name: Della Manna surname: Silvia roles: - transcriber - name: Dziemski surname: Wiktor orcid: 0000-0001-8166-2249 - name: Henderson surname: C. E. M. orcid: 0000-0002-5040-9926 roles: - transcriber - name: ' Impagnatiello' surname: Michele roles: - transcriber - name: Jenko Kovačič surname: Ana orcid: 0000-0001-7243-7082 roles: - transcriber - name: Komatović surname: Stevan roles: - transcriber - name: Ku surname: Ruby Wai-Ying orcid: 0000-0003-2688-6287 roles: - transcriber - name: Loss surname: Edward orcid: 0000-0002-9837-8321 roles: - transcriber - name: Mairhofer surname: Daniela orcid: 0000-0002-3531-9658 roles: - transcriber - project-manager - name: Morcos surname: Erene roles: - transcriber - name: Odstrčilík surname: Jan orcid: 0000-0001-9104-9827 roles: - transcriber - name: Paternicò surname: Giuseppe orcid: 0000-0002-7124-8869 roles: - transcriber - name: Riparante surname: Marta roles: - transcriber - name: Schimdt surname: Nathalie roles: - transcriber - name: Sołomieniuk surname: Michal roles: - transcriber - name: Walczak surname: 'Tomasz ' roles: - transcriber - name: Zharov surname: Dmitry roles: - transcriber institutions: [] description: >- The Ground Truth was produced by the participants of the HTR Winter School 2022 in the Late Latin Group (more information: https://www.oeaw.ac.at/imafo/veranstaltungen/detail/introduction-into-handwritten-text-recognition). The Ground Thruth includes the following folios: 1-3r, 6-8, 11r, 27 and is still work in progress. We are adding more pages soon. If you find any errors we kindly ask you to contact Jan Odstrčilík (jan.odstrcilik@oeaw.ac.at). The Supervisors of the Late Latin Group: Jan Odstrčilík PhD, Austrian Acadamy of Sciences, Daniela Mairhofer PhD, Princeton University, Tobias Hodel PhD, University of Bern. project-name: HTR Winter School 2022, Vienna language: - lat production-software: Transkribus script: - iso: Latn script-type: only-manuscript time: notBefore: '1200' notAfter: '1299' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Page-XML volume: - metric: lines count: 952 transcription-guidelines: |- Regular transcription with expansion of abbreviations. - Normalization of J to I - V to U in the vowel function, U to V in the consonant function - long S to S. - No correction of mispellings (tagged in the ground truth) - No standardization of lower-case and upper-case letters - No added interpunction automatically-aligned: false - authors: - name: Gabay roles: - project-manager surname: Simon - name: Pinche roles: - project-manager surname: Ariane - name: Leroy roles: - transcriber surname: Noé - name: Christensen roles: - support surname: Kelly characters: members: - e - i - s - t - u - n - a - r - o - l - d - c - m - p - q - f - g - . - ̃ - h - b - z - y - I - x - ⁊ - ',' - R - E - C - ̾ - Q - L - S - A - D - M - ͣ - ꝑ - ͥ - P - ꝯ - T - N - ¶ - O - B - ͤ - U - '-' - '1' - ꝰ - ᷑ - ̽ - '2' - '3' - ẜ - F - ⟦ - ⟧ - '6' - ħ - ꝓ - '7' - '4' - ͨ - '9' - '8' - ; - G - '0' - ͦ - '5' - H - "'" - ̀ - ł - đ - ́ - ͫ - ‸ - '&' - k - ° - ẞ - ͬ - ᷤ - K - '[' - ']' - ͯ - ̧ - ( - ) - Y - Z - ':' - ͧ - ᷠ - X mode: NFD citation-file-link: https://github.com/Gallicorpora/HTR-MSS-15e-Siecle/CITATION. description: Corpus d'entrainement pour l'HTR composé de manuscrits français du 15e s. format: Alto-XML hands: count: 1-per-folder precision: estimated language: - frm - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: Gallicorpora project-website: https://github.com/Gallicorpora schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1500' notBefore: '1400' title: Données HTR manuscrits du 15e siècle transcription-guidelines: 'Les normes de transcription suivent les préconisations du projet CREMMALAB : https://cremmalab.hypotheses.org' url: https://github.com/Gallicorpora/HTR-MSS-15e-Siecle volume: - count: 169207 metric: characters - count: 85 metric: files - count: 5937 metric: lines - count: 458 metric: regions automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Données imprimés du 16e siècle description: Corpus d'entrainement pour l'HTR constitué d'imprimés du 16e siècle url: https://github.com/Gallicorpora/HTR-imprime-16e-siecle authors: - name: Gabay surname: Simon roles: - project-manager - name: Pinche roles: - project-manager surname: Ariane - name: Vlachou-Efstathiou surname: malamatenia roles: - transcriber - name: Christensen surname: Kelly roles: - support format: Alto-XML hands: count: 1-per-folder precision: estimated language: - frm - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ project-name: Gallicorpora project-website: https://github.com/Gallicorpora script: - iso: Latn script-type: only-typed time: notAfter: '1599' notBefore: '1500' transcription-guidelines: Les normes de transcription suivent les préconisations du projet Gallicorpora volume: - metric: characters count: 186202 - metric: files count: 180 - metric: lines count: 4918 - metric: regions count: 591 citation-file-link: https://github.com/Gallicorpora/HTR-imprime-16e-siecle/CITATION.cff production-software: eScriptorium + Kraken characters: mode: NFD members: - e - u - r - a - n - i - t - o - l - s - ſ - d - c - m - p - ',' - q - y - v - f - g - b - h - . - ’ - '&' - E - x - "'" - z - ́ - ̀ - A - ¬ - ̃ - D - C - R - ':' - L - I - S - P - N - M - O - Q - T - V - G - H - B - F - '-' - ̧ - j - '?' - ( - ̈ - ) - » - '1' - œ - ¶ - '!' - U - '2' - X - ; - '9' - Y - '4' - '3' - ß - '5' - '"' - '7' - J - '8' - æ - ꝰ - '6' - '0' - ̂ - ʳ - ⁊ - Z - « - '*' - ꝗ - ꝓ -   - ⁋ - Ι - ꝑ - ']' - ͥ - ᵉ - Ε - '[' - Τ - / automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Imprimés 17e siècle description: Corpus d'entrainement pour l'HTR composé d'imprimés français du 17e s. url: https://github.com/Gallicorpora/HTR-imprime-17e-siecle authors: - name: Gabay surname: Simon roles: - project-manager - name: Pinche surname: Ariane roles: - project-manager - name: Fabert surname: Eliott roles: - transcriber - name: Vlachou-Efstathiou surname: malamatenia roles: - transcriber - name: Christensen surname: Kelly roles: - support project-name: Gallicorpora project-website: https://github.com/Gallicorpora language: - frm - fra script: - iso: Latn script-type: only-typed time: notBefore: '1600' notAfter: '1699' hands: count: 1-per-folder precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 255981 - metric: files count: 327 - metric: lines count: 8950 - metric: regions count: 1185 transcription-guidelines: Les normes de transcription suivent les préconisations du projet gallicorpora citation-file-link: https://github.com/Gallicorpora/HTR-imprime-17e-siecle/CITATION.cff production-software: eScriptorium + Kraken characters: mode: NFD members: - e - u - r - a - n - i - t - o - l - s - ſ - d - c - m - p - ',' - v - q - . - f - g - b - E - ’ - h - y - ́ - A - '&' - "'" - S - I - x - ¬ - L - C - R - P - D - ̀ - M - V - T - O - N - z - ':' - Q - j - '-' - F - G - ̃ - B - ; - H - ̈ - '1' - ̂ - ̧ - '2' - '?' - '3' - œ - '4' - '5' - Y - U - Z - '6' - '7' - '8' - '0' - X - J - '9' - ( - æ - ) - Æ - ι - α - '!' - ß - ο - ν - ε - ρ - ̓ - υ - κ - '*' - σ - τ - ω - '[' - ']' - ꝰ - K - Α - χ - ς - π - γ - ̨ - μ - k - ͂ - Ν - Β - λ - Σ - Κ - η - θ - W - Œ - δ - Τ - ͅ - » - ᵉ - ˡ - ͧ - Ζ - β - ̔ - ̇ - ° - w - ẞ - Φ - Λ - Χ - φ - Ι - ʳ - ᵐ automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Données imprimés du 18e siècle description: Corpus d'entrainement pour l'HTR constitué d'imprimés du 18e siècle url: https://github.com/Gallicorpora/HTR-imprime-18e-siecle authors: - name: Gabay roles: - project-manager surname: Simon - name: Pinche roles: - project-manager surname: Ariane - name: Fabert roles: - transcriber surname: Eliott - name: Christensen roles: - support surname: Kelly project-name: Gallicorpora project-website: https://github.com/Gallicorpora language: - fra script: - iso: Latn script-type: only-typed time: notBefore: '1700' notAfter: '1799' hands: count: 1-per-folder precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 255981 - metric: files count: 327 - metric: lines count: 8950 - metric: regions count: 1185 transcription-guidelines: Les normes de transcription suivent les préconisations du projet gallicorpora citation-file-link: https://github.com/Gallicorpora/HTR-imprime-18e-siecle/CITATION.cff production-software: eScriptorium + Kraken characters: mode: NFD members: - e - u - r - a - n - i - t - o - l - s - ſ - d - c - m - p - ',' - v - q - . - f - g - b - E - ’ - h - y - ́ - A - '&' - "'" - S - I - x - ¬ - L - C - R - P - D - ̀ - M - V - T - O - N - z - ':' - Q - j - '-' - F - G - ̃ - B - ; - H - ̈ - '1' - ̂ - ̧ - '2' - '?' - '3' - œ - '4' - '5' - Y - U - Z - '6' - '7' - '8' - '0' - X - J - '9' - ( - æ - ) - Æ - ι - α - '!' - ß - ο - ν - ε - ρ - ̓ - υ - κ - '*' - σ - τ - ω - '[' - ']' - ꝰ - K - Α - χ - ς - π - γ - ̨ - μ - k - ͂ - Ν - Β - λ - Σ - Κ - η - θ - W - Œ - δ - Τ - ͅ - » - ᵉ - ˡ - ͧ - Ζ - β - ̔ - ̇ - ° - w - ẞ - Φ - Λ - Χ - φ - Ι - ʳ - ᵐ automatically-aligned: false - authors: - name: Pinche roles: - project-manager surname: Ariane - name: Gabay roles: - project-manager surname: Simon - name: Vlachou-Efstathiou roles: - transcriber surname: malamatenia - name: Christensen roles: - support surname: Kelly characters: members: - e - u - a - i - t - r - n - o - s - l - d - c - m - p - ſ - q - y - ̃ - f - g - b - . - h - ',' - z - ⁊ - x - E - ¬ - ¶ - C - S - L - D - P - A - I - ͥ - M - v - Q - ꝰ - O - T - ':' - V - B - '?' - ꝑ - H - N - ͬ - R - ; - G - F - ̌ - ꝓ - J - '-' - ꝯ - ( - ) - '1' - U - '9' - ̾ - æ - X - '4' - ꝙ - ̧ - ͤ - '2' - '*' - '6' - "'" - Ι - '7' - ⟦ - ⟧ - '8' - Y - '5' - '0' mode: NFD description: Corpus d'entrainement pour l'HTR constitué d'imprimés du 16e siècle format: Alto-XML hands: count: 1-per-folder precision: estimated language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: Gallicorpora project-website: https://github.com/Gallicorpora schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: evenly-mixed time: notAfter: '1599' notBefore: '1500' title: Données imprimés gothiques du 16e siècle transcription-guidelines: Les transcriptions suivent les normes de transcription du projet Gallicorpora url: https://github.com/Gallicorpora/HTR-imprime-16e-siecle volume: - count: 90731 metric: characters - count: 80 metric: files - count: 2971 metric: lines - count: 233 metric: regions automatically-aligned: false - authors: - name: Gabay roles: - project-manager surname: Simon - name: Pinche roles: - project-manager surname: Ariane - name: Leroy roles: - transcriber surname: Noé - name: Christensen roles: - support surname: Kelly characters: members: - e - u - s - t - a - i - r - o - n - l - d - c - m - p - ̃ - f - q - g - y - h - b - . - z - ⁊ - x - E - '-' - ',' - ſ - ¶ - L - ͥ - D - C - ; - ᷤ - I - ꝰ - Q - A - S - ꝑ - P - M - O - T - U - N - F - R - ꝓ - B - G - ꝯ - ̾ - H - ᷑ - ͬ - ̌ - ':' - ( - '[' - ']' - v - J - Ꝙ - ) - k - ꝙ - ͣ - V - '4' - ͦ - w - ͨ - ͤ - Ι - ̧ - '1' - '9' - '7' - ̶ - "'" - ́ - '|' mode: NFD citation-file-link: https://github.com/Gallicorpora/HTR-incunable-15e-siecle/CITATION.cff description: Corpus d'entrainement pour l'HTR composé d'incunable français du 15e s. format: Alto-XML hands: count: 1-per-folder precision: estimated language: - frm - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: Gallicorpora project-website: https://github.com/Gallicorpora schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-typed time: notAfter: '1500' notBefore: '1400' title: Données HTR incunables du 15e siècle transcription-guidelines: 'Les normes de transcription suivent les préconisations du projet CREMMALAB : https://cremmalab.hypotheses.org' url: https://github.com/Gallicorpora/HTR-incunable-15e-siecle volume: - count: 245094 metric: characters - count: 149 metric: files - count: 7608 metric: lines - count: 535 metric: regions automatically-aligned: false - authors: - name: Emmanuelle roles: - project-manager surname: de Champs - name: Florence roles: - project-manager - quality-control - transcriber surname: Clavaud - name: Pauline roles: - project-manager - quality-control - transcriber surname: Charbonnier - name: Christine roles: - project-manager - quality-control - support surname: Nougaret - name: Alix roles: - aligner - project-manager - quality-control - support surname: Chagué - name: Thibault roles: - aligner - project-manager - quality-control - support surname: Clérice - name: Falcoz roles: - aligner surname: Elsa - name: Marie-Françoise roles: - project-manager surname: Limon-Bonnet - name: Elise roles: - project-manager surname: Wojszvzyk - name: Sylvie roles: - project-manager - quality-control - support surname: Dechavanne - roles: - transcriber surname: ALemoine - roles: - transcriber surname: ASJPeronneau - roles: - transcriber surname: Alcofrybas - roles: - transcriber surname: BeaLct - roles: - transcriber surname: CLbt - roles: - transcriber surname: Chloelsa - roles: - transcriber surname: DMichel - roles: - transcriber surname: Desauthieux - roles: - transcriber surname: EPerrin - roles: - transcriber surname: GBMireille - roles: - transcriber surname: GPINET - roles: - transcriber surname: Genea78 - roles: - transcriber surname: JMGoux - roles: - transcriber surname: Jideuxhemme - roles: - transcriber surname: LBIsabelle - roles: - transcriber surname: Lamotte - roles: - transcriber surname: MFGarreau - roles: - transcriber surname: MIna - roles: - transcriber surname: Maniet - roles: - transcriber surname: MarionJo - roles: - transcriber surname: PGambette - roles: - transcriber surname: PPocard - roles: - transcriber surname: PROMBAUT - roles: - transcriber surname: PaulineTest - roles: - transcriber surname: SCayeux - roles: - transcriber surname: SL. - roles: - transcriber surname: SLespinasse - roles: - transcriber surname: Silver08 - roles: - transcriber surname: TPellé - roles: - transcriber surname: Valérie - roles: - transcriber surname: alp - roles: - transcriber surname: jmorvan - roles: - transcriber surname: lelia - roles: - transcriber surname: majubama - roles: - transcriber surname: mickael.lefevr - roles: - transcriber surname: sgauthier - roles: - quality-control surname: EdChamps - name: Danièle roles: - support surname: Allezard - name: Françoise roles: - support surname: Auriau - name: Sophie roles: - support surname: Blanchard - name: Laure roles: - support surname: Cadars - name: Paul roles: - support surname: Cazin-Bernier - name: Rosine roles: - support surname: Cleyet-Michaud - name: Sophie roles: - support surname: Delinge - name: Christiane roles: - support surname: Demeulenaere-Douyère - name: Mathilde roles: - support surname: Deuve - name: Tristan roles: - support surname: Girard - name: Wilfried roles: - support surname: Gourdon - name: Emilie roles: - support surname: Laffitte-Louisou - name: Valérie roles: - support surname: Lemée - name: Jean-Claude roles: - support surname: Lescure - name: Mélisa roles: - support surname: Locatelli - name: Aurélie roles: - support surname: Massie - name: Thomas roles: - support surname: Olivier - name: Françoise roles: - support surname: Pinchard - name: Tiffanie roles: - support surname: Pitot - name: Anais roles: - support surname: Pontoparia - name: Michel roles: - support surname: Renard - name: Thierry roles: - support surname: Rihouey - name: Christian roles: - support surname: Rodriguez - name: Konstantinos roles: - support surname: Sifakis - name: Marie-Thérèse roles: - support surname: Solignat - name: Lucie roles: - support surname: Vieillon - roles: - support surname: SL characters: members: - e - a - i - n - t - s - r - u - o - l - m - d - c - p - ́ - ̀ - f - v - g - ',' - q - b - . - ’ - '1' - h - M - J - j - P - C - A - '-' - x - L - S - F - '9' - y - D - B - ̂ - R - '2' - ^ - '4' - z - '0' - E - V - G - '3' - '5' - T - ) - ( - H - '6' - N - '7' - '8' - I - ':' - O - ; - Q - ̧ - ° - U -   - / - W - '"' - ̈ - '>' - < - '=' - œ - w - '?' - _ - X - '%' - k - '*' - ſ - '!' - Z - '&' - "'" - – - K - + mode: NFD citation-file-link: https://github.com/Dummy/depot-test/CITATION.cff description: WWI’s Poilus' testaments edited by the Archives National during the Testaments de Poilus project. format: Alto-XML hands: count: 1-per-file precision: estimated language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: Testaments de Poilus project-website: https://edition-testaments-de-poilus.huma-num.fr/ schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1918' notBefore: '1914' title: ' CREMMA-AN Testament De Poilus ' transcription-guidelines: 'The original transcriptions were performed on a crowdsourcing application (https://testaments-de-poilus.huma-num.fr/#!/) under the supervision of the Archives nationales de France. Only the allographic portions of the documents were transcribed. Any marginal elements added later by clerks or archivists are neither segmented nor transcribed. The segmentation follows the SegmOnto ontology. Abbreviations and mispelling were not corrected. Superscripted portions of text are preceeded by ^. ' url: https://github.com/HTR-United/CREMMA-AN-TestamentDePoilus volume: - count: 87726 metric: characters - count: 226 metric: files - count: 3330 metric: lines - count: 553 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix and Clérice, Thibault\ \ and Mazoue, Anaïs and Van Kote, Elsa},\ntitle = {CREMMA-AN-TestamentDePoilus\ \ },\nurl = {https://github.com/HTR-United/CREMMA-AN-TestamentDePoilus}\n}\n" _apa: "Chagué A., Clérice T., Mazoue A., Van Kote E. CREMMA-AN-TestamentDePoilus\ \ URL: https://github.com/HTR-United/CREMMA-AN-TestamentDePoilus\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: CREMMA Manuscrits du 15e url: https://github.com/HTR-United/CREMMA-MSS-15 project-name: CREMMA authors: - name: Clérice surname: Thibault roles: - project-manager - quality-control - name: Chagué surname: Alix roles: - project-manager - quality-control description: "Manuscripts of the 15th century\n" language: - fra script: - iso: Latn script-type: only-manuscript time: notBefore: '1400' notAfter: '1499' hands: count: 1-per-folder precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - count: 0 metric: lines - count: 0 metric: files - count: 0 metric: regions - count: 0 metric: characters transcription-guidelines: Abréviations conservées. production-software: eScriptorium + Kraken automatically-aligned: false - authors: - name: Thibault orcid: 0000-0003-1852-9204 roles: - project-manager - quality-control - support surname: Clérice - name: Alix orcid: 0000-0002-0136-4434 roles: - project-manager - quality-control - support surname: Chagué - name: Anaïs roles: - transcriber surname: Mazoue automatically-aligned: false characters: members: - e - r - n - a - u - o - t - i - l - ſ - d - s - c - m - p - v - y - q - g - f - b - z - h - J - / - x - R - ^ - L - I - . - E - ẜ - ⁊ - M - '1' - ꝑ - A - ́ - ̾ - < - '>' - j - C - D - '3' - ꝙ - '9' - V - '7' - '6' - ’ - P - '8' - Ꝑ - ̃ - T - ( - S - N - ; - Q - ̀ - '5' - '0' - U mode: NFD citation-file-link: https://github.com/HTR-United/CREMMA-MSS-16/CITATION.cff description: Manuscripts of the 16th century format: Alto-XML hands: count: 1-per-folder precision: exact institutions: [] language: - fra license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: CREMMA schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1599' notBefore: '1500' title: CREMMA MSS 16 transcription-guidelines: Abréviations conservées. url: https://github.com/HTR-United/CREMMA-MSS-16 volume: - count: 10911 metric: characters - count: 9 metric: files - count: 244 metric: lines - count: 18 metric: regions _bibtex: "@misc{YourReferenceHere,\nauthor = {Mazoue, Anaïs and Clérice, Thibault\ \ and Chagué, Alix},\nmonth = {3},\ntitle = {CREMMA-MSS-16},\nurl = {https://github.com/HTR-United/CREMMA-MSS-16},\n\ year = {2024}\n}\n" _apa: "Mazoue A., Clérice T., Chagué A. (2024). CREMMA-MSS-16 URL: https://github.com/HTR-United/CREMMA-MSS-16\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: CREMMA Manuscrits du 17e url: https://github.com/HTR-United/CREMMA-MSS-17 project-name: CREMMA authors: - name: Clérice surname: Thibault roles: - project-manager - quality-control - name: Chagué surname: Alix roles: - project-manager - quality-control - name: Faure surname: Margaux roles: - transcriber - name: Norindr surname: Jade roles: - transcriber - name: Mazoue surname: Anais roles: - transcriber - name: Davoury surname: Baudoin roles: - transcriber description: Various Manuscripts of the 17th century language: - fra script: - iso: Latn script-type: only-manuscript time: notBefore: '1600' notAfter: '1699' hands: count: 1-per-folder precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 81909 - metric: files count: 111 - metric: lines count: 2245 - metric: regions count: 264 transcription-guidelines: Abréviations conservées. production-software: eScriptorium + Kraken characters: mode: NFD members: - e - s - r - a - n - u - i - o - t - l - d - c - m - p - v - q - . - ',' - y - "'" - f - b - g - ́ - h - j - ̃ - M - x - R - z - C - '1' - J - ^ - ̀ - P - L - S - V - '&' - A - E - '>' - I - < - '2' - X - '3' - T - '7' - D - '6' - ']' - B - '4' - '[' - '0' - '?' - '-' - ̂ - ̈ - '9' - '5' - ; - G - N - '8' - ':' - F - ̧ - ) - ( - Q - O - H - W - œ - ‸ - ⁊ - U - ̄ - / - ꝗ - + - k - ° -   - w - ם - Z - ς - '#' - æ - ꝙ - ͣ - ε - ϕ automatically-aligned: false - authors: - name: Chagué roles: - project-manager - quality-control surname: Alix - name: Clérice roles: - project-manager - quality-control surname: Thibault - name: Norindr roles: - transcriber surname: Jade - name: Norindr roles: - transcriber surname: Jade - name: Van Kote roles: - transcriber - aligner surname: Elsa - name: Faure roles: - transcriber - aligner surname: Margaux characters: members: - e - s - a - r - t - n - u - i - o - l - d - p - c - m - v - . - q - f - ́ - "'" - ',' - g - b - h - y - x - j - L - C - ̀ - ^ - '1' - M - S - ̂ - z - E - R - ; - '2' - I - '6' - '0' - '>' - < - D - V - J - '4' - '3' - ( - ) - P - ̈ - '5' - ̃ - '-' - '7' - B - '8' - A - '[' - ']' - '9' - N - F - G - T - '?' - X - ̧ - / - ':' - O - H - ’ - ¬ - + -   - œ - U - '&' - « - Q - '=' - K - '!' - k - W - Z - w - ° - ⁊ - ꝑ - ſ - ‸ - '#' - ̶ - _ - Y - ̄ - » - ͦ mode: NFD description: Manuscripts of the 18th century format: Alto-XML hands: count: 1-per-folder precision: exact language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: CREMMA schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1799' notBefore: '1700' title: CREMMA Manuscrits du 18e transcription-guidelines: Abréviations conservées. url: https://github.com/HTR-United/CREMMA-MSS-18 volume: - count: 141690 metric: characters - count: 125 metric: files - count: 4019 metric: lines - count: 329 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Van Kote, Elsa and Faure, Margaux\ \ and Norindr, Jade and Clérice, Thibault and Chagué, Alix},\nmonth = {3},\ntitle\ \ = {CREMMA-MSS-18},\nurl = {https://github.com/HTR-United/CREMMA-MSS-18},\nyear\ \ = {2024}\n}\n" _apa: "Van Kote E., Faure M., Norindr J., Clérice T., Chagué A. (2024). CREMMA-MSS-18\ \ URL: https://github.com/HTR-United/CREMMA-MSS-18\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: CREMMA Manuscrits du 19e url: https://github.com/HTR-United/CREMMA-MSS-19 project-name: CREMMA authors: - name: Clérice surname: Thibault roles: - project-manager - quality-control - name: Chagué surname: Alix roles: - project-manager - quality-control - name: Davoury surname: Baudouin roles: - transcriber - aligner - name: Doat surname: Soline roles: - transcriber - aligner - name: Faure surname: Margaux roles: - transcriber - aligner - name: Humeau surname: Maxime roles: - transcriber - aligner description: Manuscripts of the 19th century language: - fra script: - iso: Latn script-type: only-manuscript time: notBefore: '1800' notAfter: '1899' hands: count: 1-per-folder precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 55581 - metric: files count: 69 - metric: lines count: 1807 - metric: regions count: 167 transcription-guidelines: Abréviations conservées. production-software: eScriptorium + Kraken characters: mode: NFD members: - e - s - a - i - u - n - r - t - o - l - d - m - c - p - v - ',' - ́ - "'" - q - f - . - g - b - h - ̀ - j - x - '-' - ̂ - L - C - M - y - J - z - A - D - P - '"' - '>' - < - E - '!' - N - S - Q - '1' - ; - '?' - ':' - R - I - T - B - V - œ - '6' - O - ( - _ - ) - '2' - '3' - H - '4' - ^ - '9' - '8' - '7' - F - '0' - G - '5' - ̧ - U - '&' - '[' - ']' - ° - ̈ - k - $ - w - X - W - Y - + - Z automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: CREMMA Manuscrits du 20e url: https://github.com/HTR-United/CREMMA-MSS-20 project-name: CREMMA authors: - name: Clérice surname: Thibault roles: - project-manager - quality-control - name: Chagué surname: Alix roles: - project-manager - quality-control description: "Manuscripts of the 20th century\n" language: - fra script: - iso: Latn script-type: only-manuscript time: notBefore: '1900' notAfter: '1999' hands: count: 1-per-folder precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 5764 - metric: files count: 13 - metric: lines count: 224 - metric: regions count: 25 transcription-guidelines: Abréviations conservées. production-software: eScriptorium + Kraken characters: mode: NFKD members: - e - a - s - n - t - r - i - u - l - o - d - c - m - p - ́ - < - '>' - "'" - v - q - ',' - . - ̀ - b - g - h - j - f - F - J - '1' - '-' - ̂ - M - A - E - x - T - y - C - D - ^ - O - '8' - N - '7' - B - S - '0' - ̧ - P - G - R - H - L - '9' - z - I - '2' - ':' - U - '&' - k - + - ; - $ - V - œ - '[' - '?' - ']' - '4' - '3' - ( - ) - '6' automatically-aligned: false - authors: - name: Cl\xE9rice orcid: 0000-0003-1852-9204 roles: - transcriber - aligner - project-manager - quality-control surname: Thibault - name: Chagu\xE9 orcid: 0000-0002-0136-4434 roles: - project-manager surname: Alix - name: Vlachou Efstathiou orcid: 0000-0002-9397-356X roles: - transcriber - aligner surname: Malamatenia characters: members: - i - e - t - a - u - s - ̃ - o - n - r - c - d - m - l - p - . - ̾ - q - b - g - f - ⁊ -  - ͣ - h - ꝰ - ꝑ - ͥ - x - ł - ᷑ - ᷤ - ͦ - ꝙ - ꝯ - I - ':' - ͤ - ͭ - ꝵ - ꝓ - S - ͫ - ¶ - ẜ - E - U - A - ͨ - C - ħ - N - Q - y - ꝗ - ᵈ - D - ̵ - R - P - ͬ - ᷝ - M - T - ꝭ - / - ^ - '2' - ͧ - '&' - z - ',' - H - O - ¬ - L - '1' - '3' - '4' - F - '=' - G - ᷠ - ÷ - ℥ - '5' - B - '9' - Ø - ̇ - Ꝙ - '6' - ̧ - X - '8' - '0' - ᵇ - k - '7' - "'" - '*' -  - w - '-' - Y - ́ - ̈ - + - Z - đ -   - K - ⁋ - ᵖ -  - ι mode: NFD description: Ground truth for medieval latin manuscripts. Formerly `CREMMA-Medieval-LAT`. format: Alto-XML hands: count: 1-per-folder precision: exact institutions: [] language: - lat license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: CREMMA schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1599' notBefore: '1100' title: CREMMA Medii Aevi transcription-guidelines: 'Not a graphetic/"allographetique" transcription but rather a graphemic one that preserves the sequence of letters and reduces each form to its meaning in an alphabetical system. Abbreviations are preserved (e.g. pro, pre, tironian et, "est" etc.), as well as abbreviative signs, ligatures are reduced to their component letters. Spaces between letters reproduce the original (e.g. in the case of a semicontinuous script). Punctuations are simplified, reducing to ":" all two-component punctuation (e.g. punctus elevatus). Rare characters have been preserved such as "instans" and metric values (e.g. ounces). ' url: https://github.com/HTR-United/CREMMA-Medieval-LAT volume: - count: 263222 metric: characters - count: 121 metric: files - count: 7274 metric: lines - count: 441 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Clérice, Thibault and Chagué, Alix\ \ and Vlachou-Efstathiou, Malamatenia},\ndoi = {10.5281/zenodo.7013436},\ntitle\ \ = {CREMMA Medii Aevi},\nurl = {https://github.com/HTR-United/CREMMA-Medieval-LAT}\n\ }\n" _apa: "Clérice T., Chagué A., Vlachou-Efstathiou M. CREMMA Medii Aevi DOI: 10.5281/zenodo.7013436\ \ URL: https://github.com/HTR-United/CREMMA-Medieval-LAT\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: CREMMA Early Modern Books url: https://github.com/HTR-United/cremma-16-17-print project-name: CREMMA authors: - name: Clérice surname: Thibault roles: - transcriber - project-manager description: Collection of book samples in early print forms, 16th to 17th century, in Latin and pre-orthographic French. language: - frm - lat script: - iso: Latn script-type: only-typed time: notBefore: '1500' notAfter: '1779' hands: count: 1-per-folder precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 84726 - metric: files count: 98 - metric: lines count: 2603 - metric: regions count: 451 sources: - reference: Omnia Andreae Alciati v.c. emblemata cum commentariis link: http://pid.emory.edu/ark:/25593/b70rv - reference: "15.. \tHistoria de duobus amantibus Eurialo et Lucretia " link: https://gallica.bnf.fr/ark:/12148/bpt6k533863 - reference: "1520 \tEpigrammata clarissimi disertissimique viri Thomae Mori..." link: https://doi.org/10.3931/e-rara-74397 - reference: "'1550 \tLa description de l'isle d'Utopie, oú est comprins '\n" link: https://gallica.bnf.fr/ark:/12148/bpt6k6566444g - reference: "1779 \tZoologia Danica, seu, Animalium Daniae et Norvegiae " link: https://archive.org/details/zoologiadanicase01mlle - reference: "'L'Achileyde de Stace... traduction en vers, avec ...'" link: https://gallica.bnf.fr/view3if/ga/ark:/12148/bpt6k3103841 - reference: "1681 \tVigiliæ Rhetorum, Et Somnia Poetarvm, Symbolicè" link: http://diglib.hab.de/drucke/qun-607-5/start.htm - reference: 'Aneau, Barthélemy: Picta Poesis - Lugduni : Pesnot, 1564' link: http://diglib.hab.de/drucke/231-5-poet/start.htm citation-file-link: https://raw.githubusercontent.com/HTR-United/cremma-16-17-print/main/CITATION.CFF transcription-guidelines: Kept abbreviation and transcribed long s as long s production-software: eScriptorium + Kraken characters: mode: NFD members: - e - i - u - a - t - r - n - o - l - ſ - m - s - c - d - p - ',' - . - q - b - g - f - h - v - A - I - E - ¬ - '&' - x - S - ́ - ̃ - y - ’ - C - P - ̀ - T - R - M - ':' - V - æ - L - N - O - D -  - z - Q - j - H - G - B - F - '2' - ̈ - '-' - '1' - "'" - œ - ; - '?' - ( - ̂ - ) - '7' - U - X - '3' - ο - ι - α - '5' - '6' - '4' - ε - ̧ - ν - τ - '8' - ̓ - π - '9' - '!' - J - '0' - ꝰ - ς - λ - υ - Y - § - ꝙ - Æ - σ - Α - ω - ']' - Z - / - ρ - k - Ο - Ν - η - ͂ - μ - κ - '*' - K - Υ - δ - θ - ꝗ - ℟ - Ε - Ρ - Ω - Π - Ι - Τ - φ - ł - ̊ - Μ - Θ - Σ - Β - Λ - γ - '|' - ½ - ̰ -   - ̔ - χ - ϛ - ß - ͅ - Γ - Δ - W - Χ - ξ -  - '#' automatically-aligned: false - authors: - name: Pinche orcid: 0000-0002-7843-5050 roles: - transcriber - aligner - project-manager - quality-control - support surname: Ariane - name: Camps roles: - transcriber surname: Jean-Baptiste - name: Mariotti roles: - transcriber surname: Viola - name: Nolibois roles: - transcriber surname: Alice - name: Carnaille roles: - transcriber surname: Camille - name: Deleville roles: - transcriber surname: Prunelle - name: Lecomte roles: - transcriber surname: Sophie - name: Meylan roles: - transcriber surname: Aminoel - name: Ventura roles: - transcriber surname: Simone - name: Dugaz roles: - transcriber surname: Lucien characters: members: - e - i - s - t - n - a - r - u - o - l - c - d - m - p - . - q - f - g - ̃ - z - b - h - ⁊ - y - ':' - E - x - Q - L - S - ꝑ - D - ̾ - ͥ - C - ꝯ - ͣ - A - I - M - "'" - ꝰ - ́ - T - P - O - k - N - '9' - U - ͬ - G - R - ᷑ - F -  - ͤ - '&' - '1' - B - ꝓ - H - ͦ - ᷤ - '7' - '2' - Λ - ÷ - ł - '6' - '0' - '3' - '8' - '4' - ̽ - w - '-' - '5' - ',' - ͭ - ¶ - Y - ẜ -   - ⟦ - ⟧ - ͨ - ̈ - X - ħ - K - δ - / - ŧ - j mode: NFD citation-file-link: https://github.com/HTR-United/cremma-medieval/blob/main/citation.cff description: Transcription corpora for training HTR models for medieval manuscripts from the 12th to the 15th century. format: Alto-XML hands: count: 1-per-folder precision: exact language: - fra - fro license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: CremmaLab project-website: https://cremmalab.hypotheses.org schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1499' notBefore: '1100' title: Cremma Medieval transcription-guidelines: "As the data come from different projects, transcriptions\ \ have been standardized to strengthen HTR models. We chose a graphemic transcription\ \ method, following D. Stutzmann definitions (see bibliography), to have a sign\ \ in the image corresponding to a sign in our text: all the abbreviations are\ \ kept, and u/v or i/j are not distinguished. The spaces in the dataset are not\ \ homogeneously represented, sometimes transcriptions reproduce the manuscript\ \ spacing while others use lexical spaces. It must be stressed that spaces are\ \ the most important source of error in medieval HTR models. Most of the transcription\ \ follow the layout segmentation of the SegmOnto ontology (https://github.com/SegmOnto/examples),\ \ separating the main column, margin, numbering, drop capital, etc. All the recommendations\ \ are described in\n the following document : Ariane Pinche, Guide de transcription\ \ pour les manuscrits du Xe au XVe siècle, 2022, ⟨hal-03697382>, en ligne : ." url: https://github.com/HTR-United/cremma-medieval volume: - count: 612134 metric: characters - count: 279 metric: files - count: 22913 metric: lines - count: 1889 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Pinche, Ariane},\ndoi = {10.5281/zenodo.5235185},\n\ month = {6},\ntitle = {Cremma Medieval},\nurl = {https://github.com/HTR-United/cremma-medieval},\n\ year = {2022}\n}\n" _apa: "Pinche A. (2022). Cremma Medieval (version Bicerin 1.1.0). DOI: 10.5281/zenodo.5235185\ \ URL: https://github.com/HTR-United/cremma-medieval\n" - authors: - name: Chagué orcid: 0000-0002-0136-4434 roles: - project-manager - quality-control - digitization - support surname: Alix - name: Clérice orcid: 0000-0003-1852-9204 roles: - project-manager - quality-control surname: Thibault - name: Van Kote roles: - aligner - transcriber surname: Elsa - name: Carrow roles: - aligner - transcriber - support surname: Jennifer - name: Wissam roles: - support surname: Antoum - name: Yann roles: - support surname: Audin - name: Anne roles: - support surname: Baillot - name: Marlène roles: - support surname: Baron - name: Alexandre roles: - support surname: Bartz - name: Rachel roles: - support surname: Bawden - name: Alice roles: - support surname: Beaudry-Lagarde - name: Rishika roles: - support surname: Bhagwatkar - name: Federico roles: - support surname: Boschetti - name: Camille roles: - support surname: Bourgeois - name: Alice roles: - support surname: Brenon - name: William roles: - support surname: Brubacher - name: Donovan roles: - support surname: Brunot - name: Roxanne roles: - support surname: Brusseau - name: Talitha roles: - support surname: Bueno Mottes - name: Zoé roles: - support surname: Cappe - name: Roman roles: - support surname: Castagné - name: Galo roles: - support surname: Castillo - name: Brigitte roles: - support surname: Chagué - name: Denis roles: - support surname: Chagué - name: Emeric roles: - support surname: Chagué - name: Léa roles: - support surname: Charette - name: Emmanuel roles: - support surname: Chateau - name: Jean-Baptiste roles: - support surname: Chaudron - name: Anna roles: - support surname: Chepaikina - name: Floriane roles: - support surname: Chiffoleau - name: Kelly roles: - support surname: Christensen - name: Federico roles: - support surname: Cuartas Aristizabal - name: Maria Laura roles: - support surname: Cucciniello - name: Aurore roles: - support surname: Cuéllar - name: Baudoin roles: - support surname: Davoury - name: Eric roles: - support surname: de la Clergerie - name: Roch roles: - support surname: Delanney - name: Camille roles: - support surname: Delattre - name: Béatrice roles: - support surname: Denis - name: Philippe roles: - support surname: Deschamps - name: Valentine roles: - support surname: Desmorat - name: Cindy roles: - support surname: Dionisio - name: Amélie roles: - support surname: Disant - name: Elsa roles: - support surname: Dufourg - name: Jean-Luc roles: - support surname: Falcone - name: Margaux roles: - support surname: Faure - name: Glenda roles: - support surname: Ferbeyre Rodriguez - name: Giulia roles: - support surname: Ferretti - name: Fabien roles: - support surname: Fizaine - name: Jeanne roles: - support surname: Flamant - name: Clémence roles: - support surname: Foisy-Marquis - name: Anna roles: - support surname: Fröhlich - name: Anne roles: - support surname: Garcia Fernancez - name: Vincent roles: - support surname: Giovannangeli - name: Gabrielle roles: - support surname: Grondin - name: Morgane roles: - support surname: Guichard - name: Jessica roles: - support surname: Guiraud - name: Anahi roles: - support surname: Haedo - name: Pauline roles: - support surname: Hennequart - name: Yanet roles: - support surname: Hernandez Pedroza - name: Lucence roles: - support surname: Ing - name: Pauline roles: - support surname: Jacsont - name: Juliette roles: - support surname: Janes - name: Corinne roles: - support surname: Jeanne - name: Arilys roles: - support surname: Jia - name: Vincent roles: - support surname: Jolivet - name: Katrina roles: - support surname: Kaustina - name: Ben roles: - support surname: Kiessling - name: Ozcar roles: - support surname: Koc - name: Lena roles: - support surname: Krause - name: Gabriel roles: - support surname: Labrie - name: Amélie roles: - support surname: Lapointe - name: David roles: - support surname: Lassner - name: Emmanuelle roles: - support surname: Lescouet - name: Danny roles: - support surname: Létourneau - name: Marie-Françoise roles: - support surname: Limon-Bonnet - name: Gabrielle roles: - support surname: Lodi - name: Victoria roles: - support surname: Lupascu - name: Elsa roles: - support surname: Marguin-Hamon - name: Orestis roles: - support surname: Marinamis - name: Gina roles: - support surname: Mars - name: Eugénie roles: - support surname: Matthey-Jonais - name: Dilson roles: - support surname: Mayunga - name: Margot roles: - support surname: Mellet - name: Matt roles: - support surname: Moskal - name: Shannon roles: - support surname: Moskal - name: Zoé roles: - support surname: Mozin - name: Lydia orcid: 0009-0009-7082-4711 roles: - support surname: Nishimwe - name: Jade roles: - support surname: Norindr - name: Jules roles: - support surname: Nuguet - name: Sarah roles: - support surname: Orsini - name: Pedro roles: - support surname: Ortiz Suarez - name: Kenan roles: - support surname: Oudin - name: Gabrielle roles: - support surname: Pannetier-Leboeuf - name: Thierry roles: - support surname: Paquet - name: Thomas roles: - support surname: Parisot - name: Elodie roles: - support surname: Paupe - name: Gaël roles: - support surname: Poux - name: Montaine roles: - support surname: Prophête - name: Alix roles: - support surname: Raoux - name: Gaëtan roles: - support surname: Raoux - name: Elise roles: - support surname: Razafindrakoto - name: Camille roles: - support surname: Rey - name: Arij roles: - support surname: Riabi - name: Karen roles: - support surname: Ross - name: Manon roles: - support surname: Rouillé - name: Louise roles: - support surname: Ruby - name: Benoît roles: - support surname: Sagot - name: Hugo roles: - support surname: Scheithauer - name: Anne-Valérie roles: - support surname: Schweyer - name: Djamé roles: - support surname: Seddah - name: Paula roles: - support surname: Seidel - name: Peter roles: - support surname: Stokes - name: Yves roles: - support surname: Tadjo - name: Lionel roles: - support surname: Tadjou - name: Kristin roles: - support surname: Tanton - name: Marie roles: - support surname: Tariol - name: Rian roles: - support surname: Touchent - name: Anne-Kim roles: - support surname: Tremblay - name: Pierre roles: - support surname: Vauterin - name: Mathilde roles: - support surname: Verstraete - name: Magalie roles: - support surname: Vetter - name: Marcello roles: - support surname: Vitali Rosati - name: Malamatenia roles: - support surname: Vlachou-Estathiou - name: Rosanne roles: - support surname: Wingert - name: Débora roles: - support surname: Yi - name: Antoine" roles: - support surname: '' - name: Camille roles: - support surname: '' - name: Manon roles: - support surname: '' - name: Yohan roles: - support surname: '' characters: members: - e - a - n - i - s - t - r - l - o - u - d - c - m - p - ́ - ',' - g - h - v - f - . - b - ̀ - "'" - q - '1' - L - y - '0' - C - '9' - E - S - '2' - '-' - A - ( - ) - I - x - k - M - P - R - j - B - '8' - T - N - D - ̂ - '6' - '4' - O - G - '3' - '5' - '7' - F - H - U - w - V - '=' - z - ̧ - J - ':' - ̈ - W - K - '>' - < - '"' - « - » - Y - X - '[' - ']' - ^ - / - ſ - ̄ - ; - Q - Z - œ - ̌ - '!' - ’ - ø - ̃ - '%' - '&' - – - ɛ - ̊ - ° - ß - ɹ - — - Æ - ² - ̆ - ᑕ - '#' - ə - € - … - ł -   - ɑ - ɔ - ʁ mode: NFD description: "The CREMMA-WIKIPEDIA project aims at creating a collection of ground\ \ truth to train HTR models on contemporary French handwriting.\n\nEach image\ \ represents an exerpt from a randomly selected Wikipedia page, copied by hand\ \ by volunteers. We then took care of the alignment between the handwritten portion\ \ and the original text, also present on the image." format: Alto-XML hands: count: 1-per-file precision: estimated institutions: - name: 6e-1 du Collège Martin-Luther-King de Charvieu-Chavagneux roles: - support language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: CREMMA schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '2023' notBefore: '2022' title: CREMMA WIKIPEDIA transcription-guidelines: "The transcription guidelines follow CREMMA's convention\ \ (https://gist.github.com/alix-tz/6f89444521bf1cab0522da520f7e4ff4). In short:\ \ superscript is preceded by a ^. Strikethrough elements are transcribed with\ \ \"><\" when unreadable, \">word<\" when readeable. The text to copy may have\ \ included phonetic transcription. Non-french letters and diacritics were rendered\ \ as well. " url: https://github.com/HTR-United/cremma-wikipedia volume: - count: 99680 metric: characters - count: 350 metric: files - count: 1971 metric: lines - count: 351 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix and Clérice, Thibault\ \ and Van Kote, Elsa and Carrow, Jennifer and Antoum, Wissam and Audin, Yann and\ \ Baillot, Anne and Baron, Marlène and Bartz, Alexandre and Bawden, Rachel and\ \ Beaudry-Lagarde, Alice and Bhagwatkar, Rishika and Boschetti, Federico and Bourgeois,\ \ Camille and Brenon, Alice and Brubacher, William and Brunot, Donovan and Brusseau,\ \ Roxanne and Bueno Mottes, Talitha and Cappe, Zoé and Castagné, Roman and Castillo,\ \ Galo and Chagué, Brigitte and Chagué, Denis and Chagué, Emeric and Charette,\ \ Léa and Chateau, Emmanuel and Chaudron, Jean-Baptiste and Chepaikina, Anna and\ \ Chiffoleau, Floriane and Christensen, Kelly and Cuartas Aristizabal, Federico\ \ and Cucciniello, Maria Laura and Cuéllar, Aurore and Davoury, Baudoin and de\ \ la Clergerie, Eric and Delanney, Roch and Delattre, Camille and Denis, Béatrice\ \ and Deschamps, Philippe and Desmorat, Valentine and Dionisio, Cindy and Disant,\ \ Amélie and Dufourg, Elsa and Falcone, Jean-Luc and Faure, Margaux and Ferbeyre\ \ Rodriguez, Glenda and Ferretti, Giulia and Fizaine, Fabien and Flamant, Jeanne\ \ and Foisy-Marquis, Clémence and Fröhlich, Anna and Garcia Fernancez, Anne and\ \ Giovannangeli, Vincent and Grondin, Gabrielle and Guichard, Morgane and Guiraud,\ \ Jessica and Haedo, Anahi and Hennequart, Pauline and Hernandez Pedroza, Yanet\ \ and Ing, Lucence and Jacsont, Pauline and Janes, Juliette and Jeanne, Corinne\ \ and Jia, Arilys and Jolivet, Vincent and Kaustina, Katrina and Kiessling, Ben\ \ and Koc, Ozcar and Krause, Lena and Labrie, Gabriel and Lapointe, Amélie and\ \ Lassner, David and Lescouet, Emmanuelle and Létourneau, Danny and Limon-Bonnet,\ \ Marie-Françoise and Lodi, Gabrielle and Lupascu, Victoria and Marguin-Hamon,\ \ Elsa and Marinamis, Orestis and Mars, Gina and Matthey-Jonais, Eugénie and Mayunga,\ \ Dilson and Mellet, Margot and Moskal, Matt and Moskal, Shannon and Mozin, Zoé\ \ and Nishimwe, Lydia and Norindr, Jade and Nuguet, Jules and Orsini, Sarah and\ \ Ortiz Suarez, Pedro and Oudin, Kenan and Pannetier-Leboeuf, Gabrielle and Paquet,\ \ Thierry and Parisot, Thomas and Paupe, Elodie and Poux, Gaël and Prophête, Montaine\ \ and Raoux, Alix and Raoux, Gaëtan and Razafindrakoto, Elise and Rey, Camille\ \ and Riabi, Arij and Ross, Karen and Rouillé, Manon and Ruby, Louise and Sagot,\ \ Benoît and Scheithauer, Hugo and Schweyer, Anne-Valérie and Seddah, Djamé and\ \ Seidel, Paula and Stokes, Peter and Tadjo, Yves and Tadjou, Lionel and Tanton,\ \ Kristin and Tariol, Marie and Touchent, Rian and Tremblay, Anne-Kim and Vauterin,\ \ Pierre and Verstraete, Mathilde and Vetter, Magalie and Vitali Rosati, Marcello\ \ and Vlachou-Estathiou, Malamatenia and Wingert, Rosanne and Yi, Débora and other\ \ anonymous contributers},\ndoi = {10.5281/zenodo.7782065},\nmonth = {3},\ntitle\ \ = {CREMMA WIKIPEDIA},\nurl = {https://github.com/HTR-United/cremma-wikipedia},\n\ year = {2023}\n}\n" _apa: "Chagué A., Clérice T., Van Kote E., Carrow J., Antoum W., Audin Y., Baillot\ \ A., Baron M., Bartz A., Bawden R., Beaudry-Lagarde A., Bhagwatkar R., Boschetti\ \ F., Bourgeois C., Brenon A., Brubacher W., Brunot D., Brusseau R., Bueno Mottes\ \ T., Cappe Z., Castagné R., Castillo G., Chagué B., Chagué D., Chagué E., Charette\ \ L., Chateau E., Chaudron J., Chepaikina A., Chiffoleau F., Christensen K., Cuartas\ \ Aristizabal F., Cucciniello M.L., Cuéllar A., Davoury B., de la Clergerie E.,\ \ Delanney R., Delattre C., Denis B., Deschamps P., Desmorat V., Dionisio C.,\ \ Disant A., Dufourg E., Falcone J., Faure M., Ferbeyre Rodriguez G., Ferretti\ \ G., Fizaine F., Flamant J., Foisy-Marquis C., Fröhlich A., Garcia Fernancez\ \ A., Giovannangeli V., Grondin G., Guichard M., Guiraud J., Haedo A., Hennequart\ \ P., Hernandez Pedroza Y., Ing L., Jacsont P., Janes J., Jeanne C., Jia A., Jolivet\ \ V., Kaustina K., Kiessling B., Koc O., Krause L., Labrie G., Lapointe A., Lassner\ \ D., Lescouet E., Létourneau D., Limon-Bonnet M., Lodi G., Lupascu V., Marguin-Hamon\ \ E., Marinamis O., Mars G., Matthey-Jonais E., Mayunga D., Mellet M., Moskal\ \ M., Moskal S., Mozin Z., Nishimwe L., Norindr J., Nuguet J., Orsini S., Ortiz\ \ Suarez P., Oudin K., Pannetier-Leboeuf G., Paquet T., Parisot T., Paupe E.,\ \ Poux G., Prophête M., Raoux A., Raoux G., Razafindrakoto E., Rey C., Riabi A.,\ \ Ross K., Rouillé M., Ruby L., Sagot B., Scheithauer H., Schweyer A., Seddah\ \ D., Seidel P., Stokes P., Tadjo Y., Tadjou L., Tanton K., Tariol M., Touchent\ \ R., Tremblay A., Vauterin P., Verstraete M., Vetter M., Vitali Rosati M., Vlachou-Estathiou\ \ M., Wingert R., Yi D., other anonymous contributers (2023). CREMMA WIKIPEDIA\ \ (version 1.0.3). DOI: 10.5281/zenodo.7782065 URL: https://github.com/HTR-United/cremma-wikipedia\n" - authors: - name: Chiffoleau roles: - project-manager - aligner surname: Floriane characters: members: - e - s - a - n - r - i - t - u - o - l - d - c - m - p - ́ - ',' - v - . - f - q - g - ̀ - '-' - E - b - ’ - "'" - h - A - L - N - x - j - S - R - I - T - M - ̂ - C - P - y - O - ; - '1' - £ - U - D - B - F - J - G - '"' - '0' - z - V - '9' - '2' - ':' - X -   - € - H - '5' - '!' - '3' - '4' - ̧ - ° - W - Y - '6' - '8' - '?' - '7' - K - Q - / - ( - ) - k - œ - w - ̈ - … - Z - – - '&' - '%' - '=' - $ - _ mode: NFD description: OCR ground Truth dataset based on French 20th typewritten letters format: Alto-XML hands: count: less-than-11 precision: exact language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: DAHN project-website: https://digitalintellectuals.hypotheses.org/category/dahn schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-typed time: notAfter: '1924' notBefore: '1914' title: DAHN Corpus url: https://github.com/HTR-United/dahncorpus volume: - count: 475849 metric: characters - count: 547 metric: files - count: 12539 metric: lines - count: 527 metric: pages - count: 547 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Chiffoleau, Floriane},\ndoi = {10.5281/zenodo.5911868},\n\ month = {3},\ntitle = {dahncorpus},\nurl = {https://github.com/HTR-United/dahncorpus},\n\ year = {2021}\n}\n" _apa: "Chiffoleau F. (2021). dahncorpus (version 1.0.0). DOI: 10.5281/zenodo.5911868\ \ URL: https://github.com/HTR-United/dahncorpus\n" - authors: - name: Limon-Bonnet roles: - transcriber - aligner - quality-control surname: Françoise - name: Chagué roles: - support - project-manager - quality-control surname: Alix - name: Rostaing roles: - project-manager surname: Aurélia characters: members: - e - t - a - / - '0' - c - n - r - m - h - p - s - o - g - '5' - '7' - '1' - E - . - i - '-' - '3' - '9' - '2' - f - d - '8' - < - l - '{' - ':' - P - A - G - '}' - U - x - '>' - b - '4' - '6' mode: NFD citation-file-link: https://raw.githubusercontent.com/HTR-United/lectaurep-bronod/master/CITATION.cff description: "Ground truth for Maître Bronod’s registers, notary in Paris during\ \ the 18th century.\n" format: Page-XML hands: count: '1' precision: exact language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: "LECTAUREP\n" project-website: https://lectaurep.hypotheses.org/ schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript sources: - link: '' reference: Limon-Bonnet, M. (2021). Lectaurep-Bronod, ground truth for Maitre Bronod\u0027s documents (French XVIIIth century) (Version 1.0) [Computer software]. https://doi.org/10.5072/zenodo.977735 time: notAfter: '1745' notBefore: '1742' title: Notaires de Paris - Bronod transcription-guidelines: "Transcription fidèle aux manuscrits : la casse et les\ \ abréviations sont respectées. Les portions de texte suscrites sont précédées\ \ d'un symbole `^`. Pas de traitement particulier des éventuels s longs.'\n" url: https://github.com/HTR-United/lectaurep-bronod volume: - count: 359094 metric: characters - count: 100 metric: files - count: 3702 metric: lines - count: 200 metric: pages - count: 296 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Limon-Bonnet, Marie-Françoise and\ \ Chagué, Alix and Rostaing, Aurélia},\ndoi = {10.5281/zenodo.10631355},\nmonth\ \ = {2},\ntitle = {Lectaurep-Bronod, ground truth for Maitre Bronod's documents\ \ (French XVIIIth century)},\nurl = {https://lectaurep.hypotheses.org/},\nyear\ \ = {2024}\n}\n" _apa: "Limon-Bonnet M., Chagué A., Rostaing A. (2024). Lectaurep-Bronod, ground\ \ truth for Maitre Bronod's documents (French XVIIIth century) DOI: 10.5281/zenodo.10631355\ \ URL: https://lectaurep.hypotheses.org/\n" - authors: - name: Denis roles: - transcriber - aligner surname: Nathalie - name: Rostaing roles: - project-manager - quality-control - support surname: Aurélia - name: Chagué roles: - project-manager - quality-control - support surname: Alix characters: members: - e - t - / - a - c - '0' - n - r - m - h - p - s - o - g - '1' - '7' - '2' - E - . - i - '-' - f - '9' - d - '8' - '5' - < - l - '{' - ':' - P - A - G - '}' - U - x - '>' - b - '4' - '6' - '3' mode: NFD citation-file-link: https://raw.githubusercontent.com/HTR-United/lectaurep-mariages-et-divorces/main/CITATION.cff description: "Ground truth for the Registres des Contrats de Mariages et des Séparations\ \ et Divorces in Paris. The documents are written in Franch during the 19th century,\ \ contain many names and addresses. The information is organized in tables spreading\ \ on two pages. The table’s headers and the preamble are printed.\n" format: Page-XML hands: count: more-than-10 precision: estimated language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: "LECTAUREP\n" project-website: https://lectaurep.hypotheses.org/ schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: mainly-manuscript sources: - link: '' reference: 'Rostaing, A., Denis, N., & Chagué, A. (2021). Lectaurep-Mariages-et-Divorces: ground truth for the Enregistrements des Contrats de Mariages et des Séparations et Divorces in Paris (French 19th century) (Version 1.0) [Computer software]. https://doi.org/10.5072/zenodo.977697' time: notAfter: '1928' notBefore: '1829' title: Notaires de Paris - Mariages et Divorces transcription-guidelines: "The transcription respects what is written (abbreviations\ \ are not developed, capitalization follows 19th century practices). Superscripted\ \ portions of text are signaled by `^` and many signatures are transcription with\ \ ¥. The lines containing printed text are associated with the type `printed`\ \ and the signatures are associated with the type `signature`. Thus they can both\ \ be removed from the dataset if necessary.'\n" url: https://github.com/HTR-United/lectaurep-mariages-et-divorces volume: - count: 1969488 metric: characters - count: 104 metric: files - count: 20304 metric: lines - count: 105 metric: pages - count: 324 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Rostaing, Aurélia and Denis, Nathalie\ \ and Chagué, Alix},\ndoi = {10.5281/zenodo.10632593},\nmonth = {2},\ntitle =\ \ {Lectaurep-Mariages-et-Divorces: ground truth for the Enregistrements des Contrats\ \ de Mariages et des Séparations et Divorces in Paris (French 19th century) },\n\ url = {https://github.com/HTR-United/lectaurep-mariages-et-divorces},\nyear =\ \ {2024}\n}\n" _apa: "Rostaing A., Denis N., Chagué A. (2024). Lectaurep-Mariages-et-Divorces:\ \ ground truth for the Enregistrements des Contrats de Mariages et des Séparations\ \ et Divorces in Paris (French 19th century) (version 2.0). DOI: 10.5281/zenodo.10632593\ \ URL: https://github.com/HTR-United/lectaurep-mariages-et-divorces\n" - authors: - name: Durand roles: - transcriber - aligner surname: Marc - name: Rostaing roles: - transcriber - project-manager - quality-control surname: Aurélia - name: Chagué roles: - project-manager - quality-control - support surname: Alix characters: members: - e - r - a - i - n - t - o - u - s - d - l - c - p - '1' - m - S - ̀ - ',' - E - ́ - '2' - P - . - M - '0' - A - C - '5' - '3' - h - T - v - g - D - '7' - ) - ( - R - N - f - I - b - L - '8' - '9' - ^ - '4' - '6' - B - O - J - V - y - "'" - G - F - '-' - x - q - ° - H - ̂ - U - '"' - X - '&' - z - ; - ̧ - ':' - j - + - Q - '|' - ̈ - / - k - '=' - '%' - W - K - Y - Z - w - '~' - ¥ - ȼ - _ - € - '`' - '[' - ']' - œ - '?' - '*' - ̃ - '>' - ½ mode: NFD citation-file-link: https://github.com/HTR-United/lectaurep-repertoires/raw/main/CITATION.cff description: Ground truth for various Parisian registries of notary deeds written in French during the 19th century. The information is organized following pre-printed tables (with printed headers) and contain many names, addresses, numbers and abbreviations. format: Alto-XML hands: count: more-than-10 precision: estimated language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: LECTAUREP project-website: https://lectaurep.hypotheses.org/ schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: mainly-manuscript time: notAfter: '1939' notBefore: '1830' title: Notaires de Paris - Répertoires url: https://github.com/HTR-United/lectaurep-repertoires volume: - count: 525786 metric: characters - count: 218 metric: files - count: 29410 metric: lines - count: 1181 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {LECTAUREP and Rostaing, Aurélia and\ \ Durand, Marc and Chagué, Alix},\ndoi = {10.5072/zenodo.977691},\nmonth = {12},\n\ title = {Notaires de Paris - Répertoires, ground truth for various Parisian registries\ \ of notary deeds (French 19th and 20th centuries)},\nurl = {https://github.com/HTR-United/lectaurep-repertoires},\n\ year = {2021}\n}\n" _apa: "LECTAUREP, Rostaing A., Durand M., Chagué A. (2021). Notaires de Paris -\ \ Répertoires, ground truth for various Parisian registries of notary deeds (French\ \ 19th and 20th centuries) (version 2.0.0). DOI: 10.5072/zenodo.977691 URL: https://github.com/HTR-United/lectaurep-repertoires\n" - authors: - name: Chagué roles: - transcriber - project-manager surname: Alix characters: members: - e - a - s - n - t - r - i - u - o - l - d - c - m - p - ́ - . - '~' - v - ',' - "'" - '-' - f - g - h - q - b - ̀ - _ - E - L - A - I - C - x - S - M - j - T - ̂ - R - N - '1' - O - P - y - '"' - U - J - D - '2' - ':' - ) - ( - B - '0' - '5' - '3' - '4' - z - '6' - F - H - Q - '!' - '9' - G - '7' - V - '8' - '?' - ⟦ - ⟧ - ̧ - Y - ; - ’ - ° - k - X - ̈ - + - '=' - W - / - K - ^ - w - Z - '%' - '*' mode: NFD citation-file-link: https://github.com/HTR-United/tapuscorpus/raw/main/citation.cff description: Ground truth based on a variety of French typewritten documents from the 20th century. Contains exerpts plays, poems, letters and administrative reports. format: Page-XML hands: count: 1-per-folder precision: exact language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: "HTR-United\n" project-website: https://htr-united.github.io/ schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-typed sources: - link: '' reference: Chagué, A. (2021). Tapuscorpus (Version 1.0) [Computer software]. https://doi.org/10.5072/zenodo.977649 time: notAfter: '1999' notBefore: '1900' title: Tapus Corpus transcription-guidelines: See README in repository. url: https://github.com/HTR-United/tapuscorpus volume: - count: 131511 metric: characters - count: 151 metric: files - count: 4376 metric: lines - count: 150 metric: pages - count: 375 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix},\ndoi = {10.5072/zenodo.977649},\n\ month = {12},\ntitle = {Tappus Corpus},\nurl = {https://github.com/HTR-United/tapuscorpus},\n\ year = {2021}\n}\n" _apa: "Chagué A. (2021). Tappus Corpus (version 1.0). DOI: 10.5072/zenodo.977649\ \ URL: https://github.com/HTR-United/tapuscorpus\n" - authors: - name: Chagué roles: - transcriber - aligner - support surname: Alix - name: Riondet roles: - support surname: Charles - name: Le Fourner roles: - transcriber surname: Victoria - name: Bey roles: - transcriber surname: Laura - name: Vanneau roles: - transcriber surname: Laurie - name: Skilbeck-Gaborit roles: - transcriber surname: Eden - name: Meissel roles: - transcriber surname: Nina - name: Genero roles: - aligner surname: Jean-Damien - name: Champougny roles: - transcriber surname: Kevin - name: Albert roles: - project-manager surname: Anaïs - name: Martini roles: - project-manager surname: Manuela characters: members: - e - n - a - i - t - r - u - s - l - o - d - p - c - m - ́ - v - f - q - x - ̀ - ',' - g - h - "'" - ; - j - C - b - P - D - ’ - y - B - . - L - M - ̂ - z - J - A - G - E - '-' -   - S - V - '?' - T - Q - F - '=' - R - '4' - '2' - ̧ - k - — - '7' - W - O - N - '1' - '3' - '8' - '0' - '9' - ':' - – - '5' - ̈ - Y - K - H - œ - I - ) - ( - U - Z - _ - '@' - '!' - ‘ - » - '&' - '6' - ─ - / mode: NFD description: Ground-Truth for French 19th century pre-printed documents created by administrative services. format: Page-XML hands: count: less-than-11 precision: estimated language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: ANR TIME US project-website: https://timeus.hypotheses.org/ schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: evenly-mixed time: notAfter: '1858' notBefore: '1858' title: TIMEUS Corpus url: https://github.com/HTR-United/timeuscorpus volume: - count: 401304 metric: characters - count: 250 metric: files - count: 7701 metric: lines - count: 159 metric: pages - count: 586 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix and Champougny, Kévin\ \ and Meissel, Nina and Genero, Jean-Damien and Skilbeck-Gaborit, Eden and Vanneau,\ \ Laurie and Bey, Laura and Le Fourner, Victoria and Albert, Anaïs and Riondet,\ \ Charles and Martini, Manuela},\ndoi = {10.5281/zenodo.6230755},\ntitle = {Time\ \ Us Corpus},\nurl = {https://github.com/HTR-United/timeuscorpus}\n}\n" _apa: "Chagué A., Champougny K., Meissel N., Genero J., Skilbeck-Gaborit E., Vanneau\ \ L., Bey L., Le Fourner V., Albert A., Riondet C., Martini M. Time Us Corpus\ \ DOI: 10.5281/zenodo.6230755 URL: https://github.com/HTR-United/timeuscorpus\n" - authors: - name: Leroy orcid: 0000-0002-7843-5050 roles: - transcriber surname: Noé - name: Pinche orcid: 0000-0001-7764-9690 roles: - project-manager - quality-control surname: Ariane - name: Jean-Baptiste orcid: 0000-0001-7764-9690 roles: - project-manager surname: Camps - name: Alix orcid: 0000-0002-0136-4434 roles: - project-manager surname: Chagué - name: Thibault orcid: 0000-0003-1852-9204 roles: - project-manager surname: Clérice automatically-aligned: false characters: members: - e - i - s - a - t - n - u - o - r - l - c - m - d - p - . - ̃ - f - q - g - ⁊ - h - b - z - ̾ - y - x - ͥ - Q - S - E - C - ꝑ - I - ͣ - L - ꝯ - A - D - ꝰ - R - M - k - ',' - ':' - T - P - N - ᷑ - O - U - ͤ - ᷤ - ⟦ - ⟧ - B - F - K - ¶ - G - ͦ - ^ - w - H -  - ꝓ - ÷ - '-' - ẜ - ̵ - '3' - '9' - '0' - '2' - '1' - ͭ - '5' - ̌ - ł - '4' - '6' - '7' - ͬ - ͫ - '&' - ꝙ - ꝭ - / - ˣ - ͨ - Y - ᷠ - ⁜ - "'" - '8' - ꝵ - ͧ - ᷝ - ħ -  - '*' - ́ - ̂ - X - ̧ - ᵈ - Ʒ mode: NFD description: >- Ground truth of Old French and Middle French manuscripts. Manuscripts vary in themes, period, etc. Most manuscript have at most 10 columns transcribed. format: Alto-XML hands: count: 1-per-folder precision: estimated institutions: [] language: - fro license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: HTRomance project-website: https://htromance-project.github.io schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1499' notBefore: '1200' title: HTRomance, Medieval French corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation transcription-guidelines: >2- The transcription guidelines are described in a paper available on [HAL](https://hal-enc.archives-ouvertes.fr/hal-03828353) and published at the Journal for Open Humanities Data. It provides specific details about the selection process, the transcription methods and choices, as well as details about output (mainly the [Generic CREMMA Model for Medieval Manuscripts (Latin and Old French)](https://zenodo.org/record/7234166#.Y7f69afMJhE) for [Kraken](https://kraken.re)) url: https://github.com/HTRomance-Project/medieval-french volume: - count: 247397 metric: characters - count: 124 metric: files - count: 8890 metric: lines - count: 714 metric: regions _bibtex: "@misc{YourReferenceHere,\nauthor = {Leroy, Noé and Pinche, Ariane and\ \ Camps, Jean-Baptiste and Clérice, Thibault and Chagué, Alix},\ntitle = {HTRomance,\ \ Medieval French corpus of ground-truth for Handwritten Text Recognition and\ \ Layout Segmentation},\nurl = {https://github.com/HTRomance-Project/middle-ages-in-spain}\n\ }\n" _apa: "Leroy N., Pinche A., Camps J., Clérice T., Chagué A. HTRomance, Medieval\ \ French corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation\ \ URL: https://github.com/HTRomance-Project/middle-ages-in-spain\n" - authors: - name: Rachele roles: - transcriber surname: Alba - name: Giorgia roles: - transcriber surname: Rubin - name: Federico orcid: 0000-0002-7810-7735 roles: - project-manager - quality-control surname: Boschetti - name: Franz roles: - project-manager surname: Fischer - name: Alix orcid: 0000-0002-0136-4434 roles: - project-manager surname: Chagué - name: Thibault orcid: 0000-0003-1852-9204 roles: - project-manager surname: Clérice automatically-aligned: false characters: members: - e - a - o - i - l - n - r - t - u - s - c - d - m - p - g - h - f - . - ̃ - q - b - ⁊ - ',' - ꝑ - E - C - z - x - ̾ - A - I - ̧ - D - L - M - ͤ - O - S - R - ͧ - y - ꝙ - ͬ - ł - F - N - U - T - Q - ͦ - P - B - ́ - ͥ - '=' - ':' - ꝯ - X - ẜ - G - ͣ - H - '2' - '9' - '1' - ¶ - '4' - ꝓ - '3' - '5' - k - ͭ - '7' - '8' - / - "'" - ε - ɨ - đ - '6' - ι - ο - '0' - ̓ - ν - ꝗ - ̈ - μ - λ - ꝰ - α - ω - π - σ - ͫ - Y - '-' - θ - γ - η - Ο - υ - ρ - ̔ - ͂ - β - + - Z mode: NFD description: Transcription of samples of Medieval Italian manuscripts format: Alto-XML hands: count: 1-per-folder precision: estimated language: - ita - vec license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: HTRomance schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1499' notBefore: '1100' title: HTRomance, Medieval Italian corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation url: https://github.com/HTRomance-Project/medieval-italian volume: - count: 84383 metric: characters - count: 60 metric: files - count: 3086 metric: lines - count: 60 metric: pages - count: 353 metric: regions _bibtex: "@misc{YourReferenceHere,\nauthor = {Alba, Rachele and Rubin, Giorgia and\ \ Boschetti, Federico and Fischer, Franz and Clérice, Thibault and Chagué, Alix},\n\ doi = {10.5281/zenodo.8256728},\ntitle = {HTRomance, Medieval Italian corpus of\ \ ground-truth for Handwritten Text Recognition and Layout Segmentation},\nurl\ \ = {https://github.com/HTRomance-Project/medieval-italian}\n}\n" _apa: "Alba R., Rubin G., Boschetti F., Fischer F., Clérice T., Chagué A. HTRomance,\ \ Medieval Italian corpus of ground-truth for Handwritten Text Recognition and\ \ Layout Segmentation DOI: 10.5281/zenodo.8256728 URL: https://github.com/HTRomance-Project/medieval-italian\n" - authors: - name: Anthony orcid: 0000-0003-4715-5184 roles: - transcriber surname: Glaise - name: Thibault orcid: 0000-0003-1852-9204 roles: - project-manager - quality-control surname: Clérice - name: Alix orcid: 0000-0002-0136-4434 roles: - project-manager surname: Chagué - name: Federico orcid: 0000-0002-7810-7735 roles: - project-manager surname: Boschetti - name: Franz orcid: 0000-0002-2162-5531 roles: - project-manager surname: Fischer automatically-aligned: false characters: members: - i - e - t - a - u - s - o - n - ̃ - r - c - d - m - l - . - p - b - q - g - ̾ - f - x - h -  - ꝰ - ⁊ - ꝑ - ͥ - ł - ͣ - ꝵ - ꝯ - ꝙ - ͦ - ᷑ - ¶ - D - y - ꝓ - ':' - / - N - I - '&' - ħ - '-' - C - S - Q - E - A - z - R - ᷤ - U - ͫ - ̧ - '2' - L - ^ - '4' - ⟦ - ⟧ - M - '3' - '1' - '0' - T - ẜ - ͨ - G - H - P - ÷ - ꝗ - ͭ - '7' - ͤ - '6' - đ - k - O - ᷝ - '9' - '*' - ͧ - B - F - '8' - Ø - ¬ - K - ᷠ - + - '5' - ͬ - X - ᵈ mode: NFD description: >- Ground truth of Latin medieval manuscripts. Manuscripts vary in themes, period, etc. Most manuscript have at most 10 columns transcribed. format: Alto-XML hands: count: 1-per-folder precision: estimated institutions: [] language: - lat license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: HTRomance project-website: https://htromance-project.github.io schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1499' notBefore: '1100' title: HTRomance, Medieval Latin corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation transcription-guidelines: >2- The transcription guidelines are described in a paper available on [HAL](https://hal-enc.archives-ouvertes.fr/hal-03828353) and published at the Journal for Open Humanities Data. It provides specific details about the selection process, the transcription methods and choices, as well as details about output (mainly the [Generic CREMMA Model for Medieval Manuscripts (Latin and Old French)](https://zenodo.org/record/7234166#.Y7f69afMJhE) for [Kraken](https://kraken.re)) url: https://github.com/HTRomance-Project/medieval-latin volume: - count: 102887 metric: characters - count: 33 metric: files - count: 3046 metric: lines - count: 288 metric: regions _bibtex: "@misc{YourReferenceHere,\nauthor = {Glaise, Anthony and Clérice, Thibault\ \ and Boschetti, Federico and Fischer, Franz and Chagué, Alix},\ntitle = {HTRomance,\ \ Medieval Latin corpus of ground-truth for Handwritten Text Recognition and Layout\ \ Segmentation},\nurl = {https://github.com/HTRomance-Project/medieval-latin}\n\ }\n" _apa: "Glaise A., Clérice T., Boschetti F., Fischer F., Chagué A. HTRomance, Medieval\ \ Latin corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation\ \ URL: https://github.com/HTRomance-Project/medieval-latin\n" - authors: - name: Julie orcid: 0009-0004-0769-8875 roles: - transcriber surname: Bordier - name: Matthias orcid: 0000-0001-9488-5986 roles: - project-manager - quality-control surname: Gille Levenson - name: Olivier orcid: 0000-0001-7809-3890 roles: - project-manager - quality-control surname: Brisville-Fertin - name: Alix orcid: 0000-0002-0136-4434 roles: - project-manager surname: Chagué - name: Thibault orcid: 0000-0003-1852-9204 roles: - project-manager surname: Clérice automatically-aligned: false characters: members: - e - a - o - s - n - r - l - i - d - u - t - c - ̃ - m - p - q - g - f - b - . - y - / - h - ⁊ - ̧ - z - E - x - R - C - ¶ - ꝑ - ',' - ͣ - ͥ - ̾ - D - M - ͦ - ':' - '-' - ᷤ - S - ẜ - ́ - A - L - P - ꝯ - Q - ͬ - I - B - ⟦ - ⟧ - O - N - ̇ - T - ꝓ - ̈ - F - U - ͤ - '1' - G - X - '2' - '0' - '3' - H - Y - ᷎ - ℥ - '4' - '6' - '8' - Ꞧ - '7' - '5' - ͫ - '9' - ꝰ - ħ - ͭ - ꝫ - ᷑ - ᷝ - ͧ - ꝟ - ⁿ - † - K - ꝵ - ꝙ - ᷠ - Z - ł - Ꝯ -  - ͪ - ͩ - k mode: NFD description: >- Ground truth of medieval manuscripts from Spain. Manuscripts vary in themes, period, etc. Most manuscript have at most 10 columns transcribed. format: Alto-XML hands: count: 1-per-folder precision: estimated institutions: [] language: - lat license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: HTRomance project-website: https://htromance-project.github.io schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1499' notBefore: '1100' title: HTRomance, Medieval Spain corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation transcription-guidelines: >2- The transcription guidelines are described in a paper available on [HAL](https://hal-enc.archives-ouvertes.fr/hal-03828353) and published at the Journal for Open Humanities Data. It provides specific details about the selection process, the transcription methods and choices, as well as details about output (mainly the [Generic CREMMA Model for Medieval Manuscripts (Latin and Old French)](https://zenodo.org/record/7234166#.Y7f69afMJhE) for [Kraken](https://kraken.re)) url: https://github.com/HTRomance-Project/middle-ages-in-spain volume: - count: 160876 metric: characters - count: 86 metric: files - count: 4437 metric: lines - count: 395 metric: regions _bibtex: "@misc{YourReferenceHere,\nauthor = {Bordier, Julie and Gille Levenson,\ \ Matthias and Brisville-Fertin, Olivier and Clérice, Thibault and Chagué, Alix},\n\ title = {HTRomance, Medieval Spain corpus of ground-truth for Handwritten Text\ \ Recognition and Layout Segmentation},\nurl = {https://github.com/HTRomance-Project/middle-ages-in-spain}\n\ }\n" _apa: "Bordier J., Gille Levenson M., Brisville-Fertin O., Clérice T., Chagué A.\ \ HTRomance, Medieval Spain corpus of ground-truth for Handwritten Text Recognition\ \ and Layout Segmentation URL: https://github.com/HTRomance-Project/middle-ages-in-spain\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Corpus Modern Roman Languages url: https://github.com/HTRomance-Project/modern-roman-languages authors: - name: Jade surname: Norindr roles: - transcriber - name: Alix surname: Chagué orcid: 0000-0002-0136-4434 roles: - project-manager - quality-control - support institutions: [] description: >- Dataset for modern roman languages created within the context of the HTRomance project, using manuscripts from the Gallica digital library. project-name: HTRomance language: - fra production-software: eScriptorium + Kraken automatically-aligned: false script: - iso: Latn script-type: only-manuscript time: notBefore: '1600' notAfter: '1800' hands: count: 1-per-folder precision: estimated license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: lines count: 456 transcription-guidelines: >- The transcription guidelines are described in a paper available on HAL and published at the Journal for Open Humanities Data. It provides specific details about the selection process, the transcription methods and choices, as well as details about output (mainly the Generic CREMMA Model for Medieval Manuscripts (Latin and Old French) for Kraken) _bibtex: "@misc{YourReferenceHere,\nauthor = {Norindr, Jade and Clérice, Thibault\ \ and Chagué, Alix},\ntitle = {HTRomance, Modern language corpus of ground-truth\ \ for Handwritten Text Recognition and Layout Segmentation},\nurl = {https://github.com/HTRomance-Project/medieval-italian}\n\ }\n" _apa: "Norindr J., Clérice T., Chagué A. HTRomance, Modern language corpus of ground-truth\ \ for Handwritten Text Recognition and Layout Segmentation URL: https://github.com/HTRomance-Project/medieval-italian\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923) url: http://dx.doi.org/10.34847/nkl.acb724xs project-name: "Groupe annuaires et adresses - Consortium Huma-num Paris Time Machine\n" project-website: https://paris-timemachine.huma-num.fr/groupe-adresses-et-annuaires/ authors: - name: Elgarrista surname: Gabriela roles: - transcriber - quality-control - name: Mélanie-Becquet surname: Frédérique roles: - project-manager - quality-control - name: Brando surname: Carmen roles: - project-manager - quality-control description: "Annuaire des propriétaires et des propriétés de Paris et du département\ \ de la Seine. Lien dans le catalogue de la BNF : https://catalogue.bnf.fr/ark:/12148/cb32697229h.\ \ Crédits : Bibliothèque nationale de France. Données vérité de terrain résultant\ \ de la transcription et la segmentation manuelle d’un échantillon de 169 pages\ \ des annuaires appartenant aux volumes 1898 et 1923. Un modèle de transcription\ \ HTR+ a été entrainé à partir de cet échantillon grâce à Transkribus et est disponible\ \ sur cette plateforme en mode public. Ce modèle est valable pour transcrire automatiquement\ \ les volumes de 1903 et 1913 et tout autre document imprimé à deux colonnes et\ \ en utilisant l'alphabet latin et particulièrement en français. Le choix de l'échantillon\ \ est fait par critère alphabétique car c'est le mode d'organisation de l'information\ \ dans ce document. Les accolades présentes dans le document n'ont pas été segmentées.\ \ 118 pages pour entrainer et 51 pages pour validation.\nContexte et financement\ \ : Subvention DAHN (Dispositif de soutien à l'archivistique et aux humanités\ \ numériques) par le MESRI. Equipes : Consortium Paris Time Machine - TGIR Humanum\ \ EHESS / CNRS / LATTICE / INRIA Contact si besoin d'anonymiser les noms de personnes\ \ : carmen.brando@ehess.fr.\n" language: - fra script: - iso: Latn script-type: only-typed time: notBefore: '1898' notAfter: '1923' hands: count: less-than-11 precision: estimated license: - name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Alto-XML volume: - count: 169 metric: pages - count: 19022 metric: lines - count: 641401 metric: characters transcription-guidelines: "Transcription diplomatique. Les accolades n'ont pas été\ \ segmentées.\n" production-software: Transkribus automatically-aligned: false _bibtex: "@misc{https://doi.org/10.34847/nkl.acb724xs,\n doi = {10.34847/NKL.ACB724XS},\n\ \ url = {https://nakala.fr/10.34847/nkl.acb724xs},\n author = {Brando, Carmen\ \ and Elgarrista, Gabriela and Mélanie-Becquet, Frédérique},\n keywords = {Paris,\ \ Historical source material, HTR, Transcripción, Apprentissage (intelligence\ \ artificielle)},\n language = {fr},\n title = {Données vérité de terrain HTR+\ \ Annuaire des propriétaires et des propriétés de Paris et du département de la\ \ Seine (1898-1923)},\n publisher = {NAKALA - https://nakala.fr (Huma-Num - CNRS)},\n\ \ year = {2021}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: >- University of Denver Jewish Consumptives Relief Society Medical Records Training and Validation Set url: http://dx.doi.org/10.5281/zenodo.4243023 authors: - name: Pham surname: Kim orcid: 0000-0002-9115-4739 roles: - project-manager institutions: [] description: >- Training and validation set. Transcribed records available upon request. The transcribed corpus of records from the Jewish Consumptive Relief Society contains data that include individually identifiable health information, among other sensitive information regarding persons and people. All individuals for whom records are provided have been deceased for at least 70 years, but were they still living today, these records would be recognized as being protected health information under the US Health Insurance Portability and Accountability Act of 1996 (HIPAA). While HIPPA and other privacy laws no longer apply to these individuals, in providing these data the University of Denver wishes to foster research practices that express the utmost respect for the human beings whose lives are represented, at least in some part, in these collections. In addition, we ask researchers respect the lives of these individuals’ ancestors and their communities. To foster practices that honor patients, staff, nurses and physicians connected with the JCRS Sanitorium, as well as their families, ancestors and communities, we ask that researchers disclose their intended use of the collection for review by our Advisory Board (see reverse). This Board is comprised of ethicists, historians, librarians, attorneys, physicians, and members of the Jewish community. In addition, we ask researchers agree to conduct their work under the following set of principles: 1. I affirm the role of JCRS patients and staff as data creators and will avoid exploiting and/or dehumanizing them by treating them simply as data. 2. My research will, when possible and appropriate, account for the contexts surrounding the JCRS subjects as data arise. My work will recognize that all data and datasets are shaped by decisions about how histories are recorded, remembered, and valued. 3. If the nature of my work is such that I am sharing the life stories and/or narratives of individuals in these data, and I can do so with no potential harm to their reputation or that of their ancestors, I will honor them by naming them. If the nature of my work is such that I am exploring large-scale patterns in the dataset, and naming individuals serves no specific research purpose, I will anonymize and/or redact names within the data. 4. If I am publishing the results of research conducted with these data, I will, if possible and appropriate, include a note of recognition and/or gratitude in my publication. We suggest a version of: “This work was made possible in part by the patients, staff, nurses, physicians, and community of the Jewish Consumptive Relief Society (JCRS). The people who lived, worked, and died at the JCRS sought to relieve human suffering. I am grateful to them.” project-name: >- Collections as Data - University of Denver Transcribing Handwritten Medical Records project-website: https://du-collections-as-data.netlify.app/ language: - eng production-software: Transkribus script: - iso: Latn script-type: mainly-manuscript time: notBefore: '1900' notAfter: '1950' hands: count: unknown precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Page-XML volume: - metric: lines count: 36027 - metric: characters count: 3494619 - metric: files count: 2660 - metric: regions count: 4254 automatically-aligned: false _bibtex: "@misc{https://doi.org/10.5281/zenodo.4243023,\n doi = {10.5281/ZENODO.4243023},\n\ \ url = {https://zenodo.org/record/4243023},\n author = {Pham, Kim},\n title\ \ = {University of Denver Collections as Data - HTR Train and Validation Set JCRS_2020_5_27},\n\ \ publisher = {Zenodo},\n year = {2020},\n copyright = {Creative Commons Attribution\ \ 4.0 International}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Ground truth data for printed Devanagari url: https://doi.org/10.11588/data/EGOKEI authors: - name: Nicole surname: Merkel-Hilf orcid: 0000-0002-0344-6169 roles: - transcriber - project-manager - name: Daria surname: Peshcherova roles: - support institutions: - name: Heidelberg University Library description: >- Ground truth (GT) data (jpg and alto xml files) for an OCR model that recognizes printed text in Devanagari script. The GT data was trained on Transkribus with the HTR+ engine. The training was performed on appr. 220 pages with appr. 27,000 words. The validation set was 10% of the training set. The training material is comprised of letterpress printings from the Naval Kishore Press (Lakhnau, North India) from the late 19th and early 20th century in the Hindi, Sanskrit, Braj Bhasha and Awadhi languages. Transcription was performed by Nicole Merkel-Hilf (CATS Library / Heidelberg University Library) with support by Daria Peshcherova (CATS Library / Heidelberg University Library). project-name: Naval Kishore Press - digital project-website: https://digi.ub.uni-heidelberg.de/en/sammlungen/suedasien/navalkishore.html language: - hin - san - bra production-software: Transkribus script: - iso: Deva script-type: only-typed time: notBefore: '1880' notAfter: '1953' hands: count: less-than-11 precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: lines count: 4333 transcription-guidelines: Diplomatic transcription, no correction of mispelling automatically-aligned: false _bibtex: "@misc{https://doi.org/10.11588/data/egokei,\n doi = {10.11588/DATA/EGOKEI},\n\ \ url = {https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/data/EGOKEI},\n\ \ author = {Merkel-Hilf, Nicole},\n title = {Ground Truth data for printed Devanagari},\n\ \ publisher = {heiDATA},\n year = {2022}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Ground Truth data for printed Malayalam url: https://doi.org/10.11588/data/L2KRZO authors: [] institutions: - name: Tübingen University Library roles: - project-manager description: >- Ground Truth (GT) data (JPG and ALTO XML files) which can be used to train OCR models that recognize printed text in Malayalam script. The training material is gathered from 19th and 20th centuries prints. The GT data was trained in Transkribus with the HTR+ and the PyLaia engine with a resulting CER of 2.29% on validation set with HTR+ and 3,20% with PyLaia. The training was performed on 43 pages with appr. 9,000 words. The validation set consisted of 5 pages (ca. 1,000 words). Transcription was performed by Tübingen University Library, the Ground Truth data was created by Elena Mucciarelli (University of Groningen) with support and model training by Dorothee Huff (Tübingen University Library). (2022-11-02) project-name: DigitalSouthAsia project-website: http://idb.ub.uni-tuebingen.de/digitue/southasia language: - mal production-software: Transkribus script: - iso: Mlym script-type: only-typed time: notBefore: '1850' notAfter: '1996' hands: count: unknown precision: exact license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Page-XML volume: - metric: pages count: 43 _bibtex: "@misc{https://doi.org/10.11588/data/l2krzo,\n doi = {10.11588/DATA/L2KRZO},\n\ \ url = {https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/data/L2KRZO},\n\ \ author = {{Tübingen University Library}},\n title = {Ground Truth data for\ \ printed Malayalam},\n publisher = {heiDATA},\n year = {2023}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: 'Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X' url: https://doi.org/10.5281/zenodo.5153263 authors: - name: Susanna surname: Burghartz roles: - project-manager - name: Calvi surname: Sonia roles: - project-manager - quality-control - name: Vogeler surname: Georg roles: - project-manager - name: Baur surname: Laila roles: - transcriber - name: Egli surname: Benedikt roles: - transcriber - name: Gehrig surname: Gabriela roles: - transcriber - name: Heini surname: Alexandra Isabelle roles: - transcriber - name: Rossi surname: Rosanna roles: - transcriber - name: Siegrist surname: Benjamin roles: - transcriber - name: Wasmer surname: Remo roles: - transcriber - name: Zimmermann surname: Lynn roles: - transcriber - name: Schoch surname: David roles: - aligner - name: Dängeli surname: Peter roles: - digitization - name: Hodel surname: Tobias roles: - project-manager - aligner description: Ground Truth for "Urfehdenbuch X der Stadt Basel (1563-1569)" at Staatsarchiv Basel-Stadt (StABS). project-website: hdl:11471/1010.2.1 language: - deu script: - iso: Latn script-type: only-manuscript time: notBefore: '1563' notAfter: '1569' hands: count: unknown precision: estimated license: - name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Page-XML volume: - metric: lines count: 8000 transcription-guidelines: 'See: http://gams.uni-graz.at/o:ufbas.1563' production-software: Transkribus automatically-aligned: false _bibtex: "@misc{https://doi.org/10.5281/zenodo.5153263,\n doi = {10.5281/ZENODO.5153263},\n\ \ url = {https://zenodo.org/record/5153263},\n author = {Hodel, Tobias and Schoch,\ \ David and Dängeli, Peter},\n keywords = {Handwritten Text Recognition, Ground\ \ Truth, Early Modern German Kurrent},\n language = {de},\n title = {Handwritten\ \ Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X},\n\ \ publisher = {Zenodo},\n year = {2021},\n copyright = {Creative Commons Attribution\ \ Non Commercial Share Alike 4.0 International}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662) url: https://doi.org/10.5281/zenodo.5179361 authors: - name: Hodel surname: Tobias roles: - transcriber - project-manager - support - name: Halter-Pernet surname: Colette roles: - transcriber - aligner - project-manager - quality-control - digitization - support - name: Teuscher surname: Simon roles: - project-manager description: The data set is the publication of the data of the scholarly edition "Urkunden und Akten des Klosters und der Hofmeisterei Königsfelden". project-website: https://www.koenigsfelden.uzh.ch/ language: - lat - deu script: - iso: Latn script-type: only-manuscript time: notBefore: '1292' notAfter: '1570' hands: count: more-than-10 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Page-XML volume: - metric: lines count: 60000 transcription-guidelines: 'See: https://www.koenigsfelden.uzh.ch/exist/apps/ssrq/intro.html#richtlinien' production-software: Transkribus automatically-aligned: false _bibtex: "@misc{https://doi.org/10.5281/zenodo.5179361,\n doi = {10.5281/ZENODO.5179361},\n\ \ url = {https://zenodo.org/record/5179361},\n author = {Halter-Pernet, Colette\ \ and Teuscher, Simon and Hodel, Tobias and Barwitzki, Lukas and Egloff, Salome\ \ and Henggeler, Fabian and Nadig, Michael and Steinmann, Anina and Stettler,\ \ Sabine and Prada Ziegler, Ismail},\n keywords = {Scholarly Edition, Monastery,\ \ Königsfelden Abbey, Poor Clares, Franciscan Friars, Hapsburg, Handwritten Text\ \ Recognition},\n title = {Charters and Records of Königsfelden Abbey and Bailiwick\ \ (1308-1662)},\n publisher = {Zenodo},\n year = {2021},\n copyright = {Creative\ \ Commons Attribution 4.0 International}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: >- GT and HTR of VOC (Dutch East-Asia Company), WIC (Dutch West-Asia Company) and notarial deeds. url: https://doi.org/10.5281/zenodo.6414086 authors: - name: Keijser surname: Liesbeth roles: - transcriber - project-manager - name: Noppe surname: Vincent institutions: - name: National Archive Netherlands / Nationaal Archief roles: - digitization - support description: >- 6000 ground truth of VOC and notarial deeds and 3.000.000 HTR of VOC, WIC and notarial deeds The National Archives of the Netherlands and Noord-Hollands Archief conducted a project using the Transkribus HTR (Handwritten Text Recognition) platform. The aim was to semi automatically transcribe 2 million pages of old Dutch texts. The transcribed archives are 17th and 18th century documents from the Dutch East-Asia Company (VOC). And 19th century notarial deeds from Noord-Hollands Archief and other archives in the provinces. In order to train the HTR software a team produced transcriptions of approximately 6000 scans. The scans are randomly selected from the dataset and contain hundreds of hands. With these transcriptions a model is trained that can recognize more than 90% of the characters correctly. Transkribus transcribed the 2 million scans automatically using the trained model. Later on, 1 million extra scans concerning the West India Company (WIC) were transcribed automatically without adding extra ground truth or training. These archives are from the 17th and 18th century. The datasets published in Zenodo contain the ground truth (scans in JPG, transcription in PAGE XML) and the HTR results (in PAGE XML and TXT). See the overview on the Zenodo page. A specification on which archives have been transcribed (both GT and HTR) can be found on the Zenodo. For open data access of scans and inventories of the National Archives click here: https://www.nationaalarchief.nl/onderzoeken/open-data/archiefinventarissen-digitale-objecten-en-scans-van-archieven Disclaimer: due to a variety of languages used and the bad state of the documents the HTR results of "1.05.21, Dutch series Guyana" can be of poor quality. project-name: De ijsberg zichtbaar maken project-website: >- https://www.nationaalarchief.nl/beleven/nieuws/kijk-symposium-de-ijsberg-zichtbaar-maken-terug#:~:text=In%20het%20project%20De%20IJsberg,de%20website%20zoekintranscripties.nl%20ontwikkeld. language: - nld production-software: Transkribus script: - iso: Latn script-type: only-manuscript time: notBefore: '1600' notAfter: '1899' hands: count: more-than-10 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Page-XML volume: - metric: pages count: 6000 - {count: 251889, metric: lines} - {count: 6350, metric: files} - {count: 10735, metric: regions} - {count: 24432166, metric: characters} automatically-aligned: false _bibtex: "@misc{https://doi.org/10.5281/zenodo.6414086,\n doi = {10.5281/ZENODO.6414086},\n\ \ url = {https://zenodo.org/record/6414086},\n author = {Keijser, Liesbeth},\n\ \ keywords = {Transciptions, Verenigde Oost-Indische Compagnie, West-Indische\ \ Compagnie, Notarial deeds, Nationaal Archief, Noord-Hollands Archief, Transkribus},\n\ \ title = {6000 ground truth of VOC and notarial deeds 3.000.000 HTR of VOC,\ \ WIC and notarial deeds},\n publisher = {Zenodo},\n year = {2020},\n copyright\ \ = {Creative Commons Attribution 4.0 International}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: 'Dataset for late medieval Castilian text recognition ' url: https://doi.org/10.5281/zenodo.7386489 authors: - name: Gille Levenson surname: Matthias orcid: 0000-0001-9488-5986 roles: - transcriber - quality-control institutions: [] description: >- HTR/OCR open access gold corpus for spanish late medieval sources, based on the allographetic transcription of more than 300 pages of several manuscripts of the Regimiento de los Prínçipes, as well as a first set of general transcription models trained with kraken and out-of-domain test data. See https://doi.org/10.5281/zenodo.7387376 for full description of the dataset. language: - spa production-software: eScriptorium + Kraken script: - iso: Latn script-type: mainly-manuscript time: notBefore: '1300' notAfter: '1500' hands: count: more-than-10 precision: estimated license: - name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Alto-XML volume: - metric: lines count: 28000 transcription-guidelines: >- Allographetic transcription. See the article (https://doi.org/10.5281/zenodo.7387376) for full transcription guidelines. 320 pages in-domain; 40 pages out-of-domain automatically-aligned: false _bibtex: "@misc{https://doi.org/10.5281/zenodo.7386489,\n doi = {10.5281/ZENODO.7386489},\n\ \ url = {https://zenodo.org/doi/10.5281/zenodo.7386489},\n author = {Matthias\ \ Gille Levenson, },\n keywords = {OCR, HTR, dataset, allographetic, medieval\ \ castilian},\n language = {en},\n title = {Towards a general open dataset and\ \ model for late medieval Castilian text recognition (HTR/OCR). Datasets and scripts},\n\ \ publisher = {Zenodo},\n year = {2023},\n copyright = {Creative Commons Attribution\ \ Non Commercial Share Alike 4.0 International}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: 'Klosterneuburg, Stiftsbibl., Cod. 48 - Ground Truth: Initial Release' url: https://doi.org/10.5281/zenodo.7466927 authors: - name: Berger surname: Michael orcid: 0000-0002-6627-5272 - name: Bolte surname: Henrike - name: Führer surname: Veronika orcid: 0000-0003-3145-4083 - name: Hausleitner surname: Felix orcid: 0000-0002-9788-8127 - name: Hutterer surname: Sarah - name: Lüthi surname: Tim orcid: 0000-0003-1925-7175 - name: Nancu surname: Mihaela - name: Passoni surname: Erica - name: Pataki surname: Katalin orcid: 0000-0003-0331-8295 - name: Schröcksnadel surname: Sophie - name: Verri surname: Giovanni orcid: 0000-0002-1297-2152 - name: Wegener surname: Dennis orcid: 0000-0002-9410-9191 institutions: [] description: >- This is ground truth for the vast collection of sermons of Nikolaus von Dinkelsbühl (ca. 1360 to 17th March 1433), translated and reorganised by a German redactor, from the 15th century has never been edited until now. It consists of 361 folios of parchment and paper. The text speaks about various topics such as fasting and other religious practices. Being one of the leading intellectuals of his time, Nikolaus von Dinkelsbühl also contributed to the development of the University of Vienna. The manuscript was probably produced in the vicinity of Klosterneuburg in Austria and is still kept there today (Shelfmark: Cod. 48). Data collection and ground truth creation: The edition at hand was produced by an international team of researchers from various fields in the context of the Vienna HTR Winter School 2022 with the help of Transkribus Expert Client. We uploaded the images of the manuscript into the Transkribus platform, applied the line recognition tool and manually copied the transcribed text lines into the recognised line boxes. Various models were trained with the ground truth (20% of the entire codex) created by the team. Images of the Klosterneuburg, Augustiner-Chorherrenstift, Cod. 48 are available at: https://manuscripta.at/diglit/AT5000-48/0001 project-name: HTR Winter School 2022, Vienna language: - gmh production-software: Transkribus script: - iso: Latn script-type: only-manuscript time: notBefore: '1440' notAfter: '1449' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: pages count: 68 - metric: lines count: 4605 automatically-aligned: false _bibtex: "@misc{https://doi.org/10.5281/zenodo.7466927,\n doi = {10.5281/ZENODO.7466927},\n\ \ url = {https://zenodo.org/record/7466927},\n author = {Berger, Michael and\ \ Bolte, Henrike and Führer, Veronika and Hausleitner, Felix and Hutterer, Sarah\ \ and Lüthi, Tim and Nancu, Mihaela and Passoni, Erica and Pataki, Katalin and\ \ Schröcksnadel, Sophie and Verri, Giovanni and Wegener, Dennis and Hofert, Sandra},\n\ \ keywords = {Digital Humanities, Handwritten Text Recognition, German, Nikolaus-von-Dinkelsbühl-Redaktor},\n\ \ title = {Klosterneuburg, Stiftsbibl., Cod. 48 - Ground Truth: Initial Release},\n\ \ publisher = {Zenodo},\n year = {2022},\n copyright = {Creative Commons Attribution\ \ 4.0 International}\n}\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: 'GT4HistCommentLayout: Layout Ground Truth for Historical Commentaries' url: https://github.com/AjaxMultiCommentary/GT-commentaries-OLR authors: - name: Matteo surname: Romanello orcid: 0000-0002-7406-6286 roles: - project-manager - name: Sven surname: Najem-Meyer orcid: 0000-0002-3661-4579 roles: - transcriber - quality-control - name: Carla surname: Amaya roles: - transcriber description: 'This dataset contains layout annotations for ca. 370 pages sampled from 8 public domain classical commentaries, published in the 19th century in English, German and Latin. The commentaries concern Ancient Greek and Latin works from prose and poetry (caveat: AGreek poetry is slightly over-represented). Pages were annotated according to a taxonomy mapped to the SegmOnto controlled vocabulary.' project-name: Ajax Multi-Commentary project-website: https://mromanello.github.io/ajax-multi-commentary/ language: - eng - deu - lat - grc production-software: Kraken + VGG Image Annotator (VIA) script: - iso: Latn - iso: Grek script-type: only-typed time: notBefore: '1835' notAfter: '1903' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 0 - metric: files count: 371 - metric: lines count: 0 - metric: regions count: 2386 transcription-guidelines: SegmOnto guidelines (v. 0.9) citation-file-link: https://github.com/AjaxMultiCommentary/GT-commentaries-layout/blob/master/CITATION.cff characters: mode: NFD members: [] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Matteo and Najem-Meyer, Sven and Amaya,\ \ Carla},\ndoi = {10.5281/zenodo.7271729},\ntitle = {GT4HistCommentLayout: Layout\ \ Ground Truth for Historical Commentaries}\n}\n" _apa: "Matteo, Najem-Meyer S., Amaya C. GT4HistCommentLayout: Layout Ground Truth\ \ for Historical Commentaries (version 1.0). DOI: 10.5281/zenodo.7271729\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Fabliaux url: https://github.com/CIHAM-HTR/Fabliaux authors: - name: Corinne surname: Pierreville orcid: 0009-0003-3074-3841 roles: - project-manager - name: Ariane surname: Pinche orcid: 0000-0002-7843-5050 roles: - transcriber - aligner - quality-control institutions: [] description: HTR data sets from medieval manuscripts (13th-14th c.) collecting "fabliaux" funded by Biblissima+ project-website: https://projet.biblissima.fr/fr/appels-projets/projets-retenus/fabliaux language: - fro production-software: eScriptorium + Kraken script: - iso: Latn script-type: only-manuscript time: notBefore: '1200' notAfter: '1402' hands: count: 1-per-folder precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML citation-file-link: https://github.com/CIHAM-HTR/Fabliaux/blob/master/CITATION.cff transcription-guidelines: The data follow the standards recommended by the CREMMALAB project, see Ariane Pinche. Transcription Guide for 10th to 15th Century Manuscripts. 2022. ⟨hal-03697382⟩ volume: - metric: characters count: 44963 - metric: files count: 25 - metric: lines count: 2070 - metric: regions count: 94 characters: mode: NFD members: - e - i - s - a - t - u - o - n - r - l - m - c - d - ̃ - p - f - h - b - ⁊ - g - . - q - z - ̾ - Q - ꝑ - S - x - I - L - D - C - ͥ - E - A - ꝰ - T - k - ꝯ - M - N - O - P - U - ͣ - y - F - '9' - Ꝙ - B - G - J - '1' - / - ẜ - ł - ⟦ - ⟧ - ᷑ - R - '7' - H - "'" - ͤ - w - ':' - '4' - '0' - '6' - '8' - '5' - K -  - ͦ - v - ͫ - V - ᷤ - ⁜ - '3' - đ - X - ‸ - ᷠ - '2' - ꝓ automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Pinche, Ariane and Pierreville, Corinne},\n\ month = {4},\ntitle = {Fabliaux},\nurl = {https://github.com/CIHAM-HTR/Fabliaux/data},\n\ year = {2023}\n}\n" _apa: "Pinche A., Pierreville C. (2023). Fabliaux URL: https://github.com/CIHAM-HTR/Fabliaux/data\n" - authors: - name: Davide roles: - transcriber - aligner surname: Aruta - name: Martina roles: - transcriber - aligner surname: Lenzi - name: Armelle orcid: 0000-0001-7938-2686 roles: - transcriber - aligner surname: Le Huërou - name: Marylène orcid: 0000-0002-9250-370X roles: - project-manager surname: Possamaï - name: Ariane orcid: 0000-0002-7843-5050 roles: - quality-control surname: Pinche characters: members: - e - i - u - s - a - t - n - r - o - l - c - m - d - p - . - q - ̃ - g - b - f - z - h - y - x - '-' - ͥ - ͣ - ⁊ - E - ¶ - ̾ - ꝙ - C - ꝰ - ͦ - ꝑ - S - ꝓ - Q - H - ꝯ - I - M - ͭ - '2' - L - ͫ - D - ꝵ - T - ͨ - A - ł - ͬ - ͤ - ᷑ - N - O - U - P - R - ħ - ':' - F - ꝭ - '7' - ᵈ -  - '3' - ⟦ - ⟧ - Y - ͧ - đ - G - '1' - '9' - B - ',' - Ꝙ mode: NFD citation-file-link: https://github.com/CIHAM-HTR/Liber/blob/main/CITATION.cff description: HTR datasets of medieval manuscripts (14th-15th c.) with Pierre Bersuire’s translation into Old French of the work of Titus Livius and Nicolas Trevet Commentaries format: Alto-XML hands: count: '1' precision: estimated institutions: [] language: - fro - lat license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-website: https://anr.fr/Projet-ANR-21-CE27-0008 schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript sources: - link: https://github.com/CIHAM-HTR/Liber reference: Aruta, D., Lenzi, M., Le Huërou, A., Possamaï, M., & Pinche, A. (2023). Liber [Data set]. https://github.com/CIHAM-HTR/Liber/data time: notAfter: '1400' notBefore: '1300' title: Liber transcription-guidelines: 'Data follow the standards recommended by the CREMMA projects, see Ariane Pinche. Transcription Guide for 10th to 15th Century Manuscripts. 2022. hal-03697382 - and Thibault Clérice, Malamatenia Vlachou-Efstathiou, Alix Chagué. CREMMA Medii Aevi: Literary manuscript text recognition in Latin. Journal of Open Humanities Data, 2023, 9, pp.4. ⟨10.5334/johd.97⟩. ⟨hal-03828353v5⟩' url: https://github.com/CIHAM-HTR/Liber volume: - count: 134899 metric: characters - count: 37 metric: files - count: 3789 metric: lines - count: 152 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Aruta, Davide and Lenzi, Martina and\ \ Le Huërou, Armelle and Possamaï, Marylène and Pinche, Ariane},\nmonth = {4},\n\ title = {Liber},\nurl = {https://github.com/CIHAM-HTR/Liber/data},\nyear = {2023}\n\ }\n" _apa: "Aruta D., Lenzi M., Le Huërou A., Possamaï M., Pinche A. (2023). Liber URL:\ \ https://github.com/CIHAM-HTR/Liber/data\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: FoNDUE Spanish chapbooks 19th c. Dataset url: https://github.com/DesenrollandoElCordel/FoNDUE-Spanish-chapbooks-Dataset authors: - name: Carta surname: Constance roles: - transcriber - project-manager - name: Leblanc surname: Élina roles: - digitization - name: Jacsont surname: Pauline roles: - digitization - name: Palacios surname: Belinda roles: - transcriber - quality-control - name: Bermudez surname: Luana roles: - transcriber - quality-control description: Digital editions of the second part of the Genevan Spanish chapbooks collection (19th c.). project-name: Desenrollando El Cordel project-website: https://github.com/DesenrollandoElCordel language: - cat - spa - lat script: - iso: Latn script-type: only-typed time: notBefore: '1770' notAfter: '1920' hands: count: more-than-10 precision: exact license: - name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Alto-XML sources: - reference: '' link: https://unige.swisscovery.slsp.ch/permalink/41SLSP_UGE/btt5ev/alma991008229029705502 - reference: '' link: https://unige.swisscovery.slsp.ch/permalink/41SLSP_UGE/kjkm12/alma991002834309705502 volume: - metric: characters count: 270718 - metric: lines count: 12526 - metric: pages count: 198 citation-file-link: https://github.com/DesenrollandoElCordel/FoNDUE-Spanish-chapbooks-Dataset/blob/main/Grountruth/CITATION.cff transcription-guidelines: "Les règles de transcription suivante ont été adoptées\ \ :\n- Respecter les accents ;\n- Respecter la casse ;\n- Respecter la ponctuation\ \ ;\n- Respecter les espaces ;\n- Respecter les retours à la ligne ;\n- Respecter\ \ la graphie des mots (ne pas corriger les erreurs s’il y en a) ;\n- Supprimer\ \ le bruit (tâches qui ont été prises pour du texte par l’OCR)." production-software: eScriptorium + Kraken automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: FoNDUE_Kunsthistorisches-UZH_Archivdatenbank url: https://github.com/FoNDUE-HTR/FoNDUE_Kunsthistorisches-UZH_Archivdatenbank authors: - name: Pauline surname: Jacsont orcid: 0000-0002-6296-3246 roles: - project-manager - transcriber - aligner - quality-control - name: Simon surname: Gabay orcid: 0000-0001-9094-4475 roles: - project-manager - quality-control - support - name: Tristan surname: Weddigen orcid: 0000-0002-4609-8950 roles: - support institutions: [] description: HTR data made with the Kunsthistorisches UZH corpus. project-name: FoNDUE project-website: https://www.unige.ch/lettres/humanites-numeriques/recherche/projets-de-la-chaire/fondue language: - deu - fra - ita production-software: eScriptorium + Kraken script: - iso: Latn script-type: evenly-mixed time: notBefore: '1900' notAfter: '1999' hands: count: more-than-10 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: pages count: 1100 citation-file-link: >- https://github.com/FoNDUE-HTR/FoNDUE_Kunsthistorisches-UZH_Archivdatenbank/blob/main/CITATION.cff transcription-guidelines: "The transcription is strictly diplomatic: no abbreviations\ \ are resolved. \LItems that are crossed out or struck through will be transcribed\ \ with a \"€\"." automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: gt_structure_text url: https://github.com/OCR-D/gt_structure_text authors: - name: Matthias surname: Boenig orcid: 0000-0003-4615-4753 roles: - transcriber - aligner - project-manager - quality-control - digitization - support institutions: [] description: >- The OCR-D Ground Truth text and structure corpus was created between 2015-2017. In the years since 2017, this corpus has been further curated and supplemented with metadata where appropriate. The corpus includes page XML files within annotations of the text and structure include. The data is based on transcription data stored in the German Text Archive (DTA) (https://www.deutschestextarchiv.de/). project-name: OCR-D project-website: https://ocr-d.de/ language: - eng - fra - deu - heb - lat production-software: Aletheia automatically-aligned: false script: - iso: Latn - iso: Goth script-type: only-typed time: notAfter: '1900' notBefore: '1500' hands: count: less-than-11 precision: exact license: name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Page-XML volume: - count: 640976 metric: characters - count: 217 metric: files - count: 6608 metric: lines - count: 1647 metric: regions citation-file-link: https://raw.githubusercontent.com/OCR-D/gt_structure_text/main/CITATION.cff transcription-guidelines: OCR-D Ground Truth Guidelines https://ocr-d.de/en/gt-guidelines/trans/ _bibtex: "@misc{YourReferenceHere,\nauthor = {Boenig, Matthias},\nmonth = {4},\n\ title = {gt_structure_text},\nurl = {https://github.com/OCR-D/gt_structure_text},\n\ year = {2024}\n}\n" _apa: "Boenig M. (2024). gt_structure_text (version 62_v1.4.3). URL: https://github.com/OCR-D/gt_structure_text\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: De la généalogie des dieux url: https://github.com/PSL-Chartes-HTR-Students/HN2021-Boccace project-name: ENC - Bonnes pratiques du developpement collaboratif authors: - name: Vlachou Efstathiou surname: Malamatenia roles: - transcriber - project-manager - name: Leroy surname: Noé roles: - transcriber - project-manager - name: Maulu surname: Marco roles: - project-manager - quality-control description: "This repository hosts all the documents, including transcriptions,\ \ bibliographical references and introduction that serve the team Boccace for\ \ the validation of the course \"Bonnes pratiques du developpement collaboratif\ \ : initiation à Git\" (prof. Thibault Clérice), of the first semester - Master\ \ Humanités Numériques ENC-PSL 2021-2022. At the same time it and constitutes\ \ part of the biannual project \"Per un’edizione digitale della Genealogia deorum\ \ gentilium\" di Boccaccio\" (dir. F. Duval, M. Maulu). Financed in 2021, this\ \ project foresees to put on line in XML format the unpublished translation in\ \ Middle French entitled \"De la genealogie des dieux\".\n" language: - frm - lat script: - iso: Latn script-type: only-typed time: notBefore: '1472' notAfter: '1498' hands: count: 1-per-folder precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 109409 - metric: files count: 47 - metric: lines count: 3656 - metric: pages count: 52 - metric: regions count: 292 sources: - reference: Laurent Premierfait, Boccace (1498), "De la genealogie des dieux", Paris, A. Vérard. link: 'https://gallica.bnf.fr/ark:/12148/bpt6k105063r?rk=21459;2 ' citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/HN2021-Boccace/main/CITATION.cff transcription-guidelines: "No development of abbreviations. Special characters are\ \ used for the graphemic transcription, compatible with the Unicode mufi qnd the\ \ special character table of cremma-medieval. No correction of orthography errors,\ \ BUT proper transcription of inversed letters (for Inc59) such as character \"\ n\" printed as \"u\" in several cases. Spaces were added freely for word separation\ \ according to dictionaries of middle French and Latin (latin forms verified on\ \ Collatinus). For more documentation regarding the transcription norms and guidelines\ \ head to the repository and the report file.''\n" production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Vlachou Efstathiou, Malamatenia and\ \ Leroy, Noé and Maulu, Marco},\ndoi = {10.5281/zenodo.6126613},\ntitle = {git-project-Boccace}\n\ }\n" _apa: "Vlachou Efstathiou M., Leroy N., Maulu M. git-project-Boccace (version 1.0).\ \ DOI: 10.5281/zenodo.6126613\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Chateau de Chavigny url: https://github.com/PSL-Chartes-HTR-Students/HN2021-ChateauChavigny project-name: ENC - Bonnes pratiques du developpement collaboratif authors: - name: Pascual surname: Margot roles: - transcriber - name: Franchet d\u0027Espèrey surname: Louis-Fiacre roles: - transcriber - digitization - name: Gabay surname: Simon roles: - quality-control description: "Le document sur lequel nous travaillons porte sur le Château de Chavigny\ \ à Lerné en Touraine. Au XVIème siècle, c’est la famille des seigneurs Leroy\ \ qui possède ce château. Avant 1568, en pleine guerre de religion, François Leroy,\ \ du parti du roi et des catholiques, participe à la capture et la rançon du prince\ \ de Condé, du parti protestant. En 1568, François Leroy, en tant que capitaine\ \ de 50 lances au service du roi, part en campagne avec lui. L'objectif est de\ \ transcrire cinq feuillets d'un manuscrit à l'aide d'eScriptorium. Le but étant\ \ d'apprendre à utiliser git et github pour mener à bien notre premier projet\ \ collaboratif.\n" language: - frm script: - iso: Latn script-type: only-manuscript time: notBefore: '1568' notAfter: '1599' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/HN-2021-ChateauChavigny/main/CITATION.cff transcription-guidelines: "- Gestion des abbréviations: \n - Si développement\ \ (pas toujours), les développer entre crochets.\n - L'orthographe originale\ \ et les abréviations doivent être conservées.\n- Gestion des échecs de transcription\ \ de caractère : lorsqu'un qu'un caractère nous paraît non sur, nous préférons\ \ mettre un [?] pour indiquer qu'il y a un caractère non transcrit dans un mot.\ \ Pour plusieurs caractères, faire autant de ? que de caractère non reconnu :\ \ tel [???] pour 3 caractères.\n" volume: - metric: characters count: 9126 - metric: files count: 6 - metric: lines count: 253 - metric: regions count: 22 production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Pascual, Margot and Franchet d'Espèrey,\ \ Louis-Fiacre and Gabay, Simon},\ndoi = {10.5281/zenodo.6126655},\nmonth = {2},\n\ title = {Château de Chavigny},\nurl = {https://github.com/PSL-Chartes-HTR-Students/HN2021-ChateauChavigny},\n\ year = {2022}\n}\n" _apa: "Pascual M., Franchet d'Espèrey L., Gabay S. (2022). Château de Chavigny (version\ \ 1.0). DOI: 10.5281/zenodo.6126655 URL: https://github.com/PSL-Chartes-HTR-Students/HN2021-ChateauChavigny\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: 'Maxime Kovalewsky - Coutume contemporaine et loi ancienne: droit coutumier ossétien' url: https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893 project-name: "ENC - Bonnes pratiques du developpement collaboratif\n" authors: - name: L’Eveque surname: Zoé roles: - transcriber - name: Ekaterina surname: Kate roles: - transcriber - name: Kasparian surname: Anahide roles: - transcriber description: "Nous avons choisi de transcrire le deuxième chapitre de l’ouvrage\ \ de Maxime Kovalewsky : Coutume contemporaine et loi ancienne : droit coutumier\ \ ossétien, éclairé par l’histoire comparée. Paris, L. Larose, 1893. \n" language: - fra script: - iso: Latn script-type: only-typed time: notBefore: '1893' notAfter: '1893' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML citation-file-link: https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893/main/CITATION.CFF volume: - metric: characters count: 45626 - metric: files count: 28 - metric: lines count: 983 - metric: regions count: 72 production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {L’Eveque, Zoé and Ekaterina, Kate\ \ and Kasparian, Anahide},\ndoi = {10.5281/zenodo.6126633},\nmonth = {2},\ntitle\ \ = {Projet Kovaleswky - 1893},\nurl = {https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893},\n\ year = {2022}\n}\n" _apa: "L’Eveque Z., Ekaterina K., Kasparian A. (2022). Projet Kovaleswky - 1893\ \ (version 1.0). DOI: 10.5281/zenodo.6126633 URL: https://github.com/PSL-Chartes-HTR-Students/HN2021-Kovalewsky-1893\n" - authors: - name: Ingrid roles: - transcriber - aligner surname: Guimarães - name: Perrine roles: - transcriber - aligner surname: Maurel - name: Yagmur roles: - transcriber - aligner surname: Ozturk - name: Alix orcid: 0000-0002-0136-4434 roles: - quality-control surname: Chagué - name: Thibault orcid: 0000-0003-1852-9204 roles: - support surname: Clérice automatically-aligned: false characters: members: - e - t - r - o - a - n - i - s - h - l - . - d - f - c - u - y - m - ',' - S - M - w - p - g - b - C - I - v - R - A - E - D - F - P - T - k - O - L - N - W - '1' - B - J - H - '2' - '-' - U - '0' - G - Y - '5' - '9' - ':' - "'" - q - x - V - '3' - K - '4' - ᗅ - '8' - '7' - ( - ) - j - ^ - '"' - '&' - z - '6' - '?' - ⟦ - ⟧ - ᗞ - Q - ; - ᑕ - $ - + - '*' - Z mode: NFD citation-file-link: >- https://github.com/PSL-Chartes-HTR-Students/HN2021-Memorials_Jane_Lathrop_Stanford/main/CITATION.CFF description: >- "Les données sources ont été téléversées sur le site From the page par les Archives de l’Université Stanford qui en sont les propriétaires. Elles ont ensuite été retranscrites par des bénévoles anonymes ; c'est leur travail nous a servi de base pour corriger nos propres retranscriptions. Les documents sources choisies sont des lettres de diffé rents auteurs portant sur les obsèques de Jane Lathrop Stanford. Les lettres sélectionnées étaient les lettres : 42, 43, 46, 49, 50, 54, 57 à 60, 69, 75, 76 [section 1, retranscrites par Perrine MAUREL] ; 80 à 93 [section 2, retranscrites par Ingrid GUIMARÃES] ; 241 à 242 [section 3, retranscrites par Yagmur OZTURK]. format: Alto-XML hands: count: 1-per-file precision: estimated institutions: [] language: - eng license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: ENC - Bonnes pratiques du developpement collaboratif schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: evenly-mixed time: notAfter: '1905' notBefore: '1905' title: Memorials for Jane Lathrop Stanford transcription-guidelines: >- Notre retranscription en elle-même a cherché à retranscrire le texte ipsis litteris, sans le corriger, en conservant donc les erreurs éventuelles intrinsèques au document. Il convient toutefois de noter que dans certains cas, les documents présentaient des mentions imprécises qui n'avaient pas été prises en compte par les retranscriptions originelles, ou alors qui avaient été soulignées comme étant une retranscription incertaine. Nous avons alors fait le choix d'être plus exhaustif que la retranscription originelle si possible, et nous avons parfois fait des choix de retranscription différents sur la base de notre ressenti visuel lors du travail. En raison de ces choix, la taille d'une page s'est donc parfois avérée rallongée par rapport à l'estimation première. Addition: les règles de transcriptions ont été adaptées pour être compatibles avec les préconisations CREMMA/CATMuS, à savoir : les portions de texte suscrites sont précédées d'un "^", les mots barrés ou illisible sont encadrés des signes "⟦" et "⟧". Les zones ne sont pas tracées dans le document, mais l'ontologie segmOnto a été appliquée pour le typage des lignes, en suivant 5 types possibles: DefaultLine:Handwritten, DefaultLine:Print, DefaultLine:Typewritten, DefaultLine:Signature et InterlinearLine:Handwritten. Cela permet de distinguer aisément les lignes manuscrites ou tapuscrites des en-têtes préimprimées des papiers à lettre. url: >- https://github.com/PSL-Chartes-HTR-Students/HN2021-Memorials_Jane_Lathrop_Stanford volume: - count: 18323 metric: characters - count: 41 metric: files - count: 774 metric: lines - count: 50 metric: regions _bibtex: "@misc{YourReferenceHere,\nauthor = {Guimarães, Ingrid and Maurel, Perrine\ \ and Ozturk, Yagmur and Chagué, Alix},\ndoi = {10.5281/zenodo.6126625},\nmonth\ \ = {2},\ntitle = {Memorials for Jane Lathrop Stanford},\nyear = {2022}\n}\n" _apa: "Guimarães I., Maurel P., Ozturk Y., Chagué A. (2022). Memorials for Jane\ \ Lathrop Stanford (version 1.0). DOI: 10.5281/zenodo.6126625\n" - authors: - name: Sarbach-Pulicani roles: - transcriber - project-manager surname: Vincent - name: Saïag surname: Violette - name: Escoda roles: - transcriber surname: Adrien - name: Miaille roles: - transcriber - project-manager surname: Théophile - name: Gabay orcid: 0000-0001-9094-4475 roles: - transcriber - quality-control surname: Simon characters: members: - e - a - i - u - n - r - t - s - o - l - c - d - p - m - ',' - g - . - ̀ - v - ’ - f - h - b - C - ́ - ¬ - P - q - z - '?' - "'" - A - M - I - L - S - '1' - D - ̂ - G - j - F - U - E - Q - '-' - x - '!' - B - ':' - V - '7' - '9' - R - N - ; - – - O - '8' - T - '2' - '0' - « - '3' - '6' - y - » - '5' - ̧ - ( - ) - — - '4' - J - ° - H - '*' - X - œ - '"' - ̈ - K - ^ - “ - '=' mode: NFD citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/HN2021-OCR-Poesie-Corse/main/CITATION.CFF description: "Le premier ouvrage s’intitule *Pontenôvu* a été écrit par Petru Rocca\ \ et publié par la \"Stamparia di a Muvra\" en 1927. Il s'agit d'un recueil de\ \ poèmes en corse et en français dont les thèmes varient. *A Muvra* est un journal\ \ autonomiste corse d'influence maurassienne qui a existé pendant toute la période\ \ de l'entre-deux-guerres. Se revendiquant comme étant une revue culturelle, la\ \ dimension politique de la revue (incarnée par le PCA, ou Partitu corsu d'azione),\ \ en a fait un mouvement controversé. C'est dans ce contexte de lutte politique\ \ et d'éveil culturel corse que s'inscrit ce recueil.\nLe second ouvrage s'intitule\ \ *A nostra Santa Fede - Catechismu Corsu*, écrit par Ageniu Grimaldi en 1926\ \ sous le pseudonyme de Saveriu Malaspina. Proche de Petru Rocca, ce-dernier est\ \ l'un des théoriciens de l'autonomisme corse de l'entre-deux-guerres et fidèle\ \ muvriste. Dans l'ouvrage, il est fait mention notamment de la façon dont un\ \ vrai corse doit se comproter vis-à-vis de sa foi envers Dieu et son île. Bien\ \ qu'il ne s'agisse pas réellement d'un recueil de poèmes, le style d'écriture\ \ de cet ouvrage est particulièrement intéressant. Il reprend un style qui se\ \ rapproche des écrits bibliques.\n" format: Alto-XML hands: count: 1-per-folder precision: exact language: - cos - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ project-name: ENC - Bonnes pratiques du developpement collaboratif schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-typed time: notAfter: '1927' notBefore: '1926' title: OCR Corse transcription-guidelines: SegmOnto url: https://github.com/PSL-Chartes-HTR-Students/HN2021-OCR-Poesie-Corse volume: - count: 41205 metric: characters - count: 47 metric: files - count: 1681 metric: lines - count: 126 metric: regions production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Sarbach-Pulicani, Vincent and Miaille,\ \ Théophile and Escoda, Adrien and Saïag, Violette and Gabay, Simon},\ndoi = {10.5281/zenodo.6126641},\n\ month = {2},\ntitle = {OCR d'une poésie corse},\nurl = {https://github.com/PSL-Chartes-HTR-Students/HN2021-OCR-Poesie-Corse},\n\ year = {2022}\n}\n" _apa: "Sarbach-Pulicani V., Miaille T., Escoda A., Saïag V., Gabay S. (2022). OCR\ \ d'une poésie corse (version 1.0). DOI: 10.5281/zenodo.6126641 URL: https://github.com/PSL-Chartes-HTR-Students/HN2021-OCR-Poesie-Corse\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Argus des Brevets url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets project-name: "ENC - Bonnes pratiques du developpement collaboratif'\n" authors: - name: De Craene surname: Valentin roles: - transcriber - name: Humeau surname: Maxime roles: - transcriber - name: Reignier surname: Virgile roles: - transcriber description: "L’argus des brevets de 1910 se présente sous la forme d’un imprimé\ \ contemporain, organisé en rubriques regroupant de manière chronologique puis\ \ thématique les brevets déposés en France. Cette énumération et présentation\ \ succincte des brevets est répartie en deux colonnes et présente des abréviations\ \ normalisées. Dès lors, ce présent guide de contribution au projet entend présenter\ \ l’ensemble des normes de transcriptions adoptées au cours de ce projet de transcription,\ \ réalisé sur la plateforme E-scriptorium, dans le cadre du cours Git du master\ \ TNAH à l’ENC.\n" language: - fra script: - iso: Latn script-type: only-typed time: notBefore: '1910' notAfter: '1910' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets/main/CITATION.cff transcription-guidelines: "En premier lieu, nous avons décidé de fonder notre transcription\ \ sur les recommandations publiées dans l’ouvrage *L’édition critique des textes\ \ contemporains, XIXe-XXIe siècle*, par Christine Nougaret, Elisabeth Parinet\ \ et Florence Clavaud. Néanmoins, certaines adaptations ont été nécessaires afin\ \ de fournir un jeu de données issue de la transcription, qui soit à la fois proche\ \ du document source et exploitable par la suite. Ainsi, concernant les abréviations,\ \ nous avons décidé de conserver la graphie originale au sein de la transcription.\ \ Ce choix fut guidé par deux éléments : d’une part, la volonté de conserver une\ \ graphie intègre, afin de fournir aux chercheurs s’intéressant à ce sujet un\ \ texte facilement exploitable de manière automatique, comme par exemple une analyse\ \ quantitative des types de sociétés (anonymes, familiales,…) déposant des brevets.\ \ Cette décision fut motivée par la facilité de résolution et compréhension des\ \ abréviations par le lecteur. D'autre part, il nous semble que cette approche\ \ permettrait une réutilisation générales des données, telle qu'un processus d'apprentissage\ \ machine.\nNous avons été amené à réaliser certains choix relevant de la transcription\ \ et de l’édition du document. Pour ce faire, nous nous sommes référé au *Lexique\ \ typographique en usage à l’Imprimerie nationale* : - les tirets en fin de ligne\ \ faisant la césure au sein des mots ont été rétablis (ex : direc-tion). - les\ \ numéros de page en haut de page ont été transcris ainsi : « _ N _ » où N correspond\ \ au numéro de page. - en cas de caractères mal imprimés ou usés, ceux-ci ont\ \ été rétablis dans la mesure où ils sont facilement interprétables (mais non\ \ devinables) par le lecteur. \n" volume: - metric: characters count: 55156 - metric: files count: 17 - metric: lines count: 1962 - metric: regions count: 86 production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {De Craene, Valentin and Humeau, Maxime\ \ and Reignier, Virgile},\ndoi = {10.5281/zenodo.6126366},\nmonth = {1},\ntitle\ \ = {Projet Argus des Brevets},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets},\n\ year = {2022}\n}\n" _apa: "De Craene V., Humeau M., Reignier V. (2022). Projet Argus des Brevets (version\ \ 1.0). DOI: 10.5281/zenodo.6126366 URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-ArgusDesBrevets\n" - authors: - name: Biay roles: - transcriber surname: Sébastien - name: Cappe roles: - transcriber surname: Zoé - name: Konstantinova roles: - transcriber surname: Kristina - name: Boby roles: - transcriber - aligner surname: Victor citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR/main/CITATION.cff description: "Le projet vise à la consitution de vérités de terrain pour l’entraînement\ \ de modèles HTR à partir d'un manuscrit français des années 1430-1455 : le manuscrit\ \ 5070 de la Bibliothèque de l'Arsenal (reproduit sur Gallica). Ce manuscrit contient\ \ la traduction française du Decameron de Boccace par Laurent de Premierfait.\ \ Nos vérités de terrain recouvrent la description de la peste à Florence située\ \ dans le prologue de l'ouvrage.\n" format: Alto-XML hands: count: '1' precision: exact language: - frm license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ project-name: "ENC - Bonnes pratiques du developpement collaboratif\n" schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1455' notBefore: '1430' title: DecameronFR transcription-guidelines: "Cf. https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR/blob/main/normesTranscription.md\n" url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR volume: - count: 19821 metric: characters - count: 9 metric: files - count: 751 metric: lines - count: 41 metric: regions production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Biay, Sébastien and Boby, Victor and\ \ Konstantinova, Kristina and Cappe, Zoé},\ndoi = {10.5281/zenodo.6126376},\n\ title = {TNAH-2021-DecameronFR},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR}\n\ }\n" _apa: "Biay S., Boby V., Konstantinova K., Cappe Z. TNAH-2021-DecameronFR (version\ \ 1.0). DOI: 10.5281/zenodo.6126376 URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-DecameronFR\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Projet Exposition universelle de 1878 url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles project-name: "ENC - Bonnes pratiques du developpement collaboratif'\n" authors: - name: Christensen surname: Kelly roles: - transcriber - name: Davoury surname: Baudoin roles: - transcriber - name: Anahi surname: Haedo roles: - transcriber - name: Kervegan surname: Paul roles: - transcriber - name: Sanchez-Oeconomo surname: Esteban roles: - transcriber description: "Le Congrès international des sciences ethnographiques de 1878 a eu\ \ lieu à l’occasion de l'Exposition universelle de 1878, à Paris. Édité en 1881\ \ par l'Imprimerie nationale, le compte rendu de ce congrès a été mis à disposition\ \ par le Conservatoire numérique des Arts et Métiers.\n" language: - fra script: - iso: Latn - iso: Grek - iso: Deva - iso: Arab script-type: only-typed time: notBefore: '1881' notAfter: '1881' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles/main/CITATION.cff transcription-guidelines: Diplomatique, mais pas allographétique. volume: - metric: characters count: 155022 - metric: files count: 56 - metric: lines count: 2620 - metric: regions count: 158 production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Christensen, Kelly and Davoury, Baudoin\ \ and Haedo, Anahi and Kervegan, Paul and Sanchez-Oeconomo, Esteban},\ndoi = {10.5281/zenodo.6126447},\n\ month = {1},\ntitle = {Projet Exposition Universelle de 1878},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles},\n\ year = {2022}\n}\n" _apa: "Christensen K., Davoury B., Haedo A., Kervegan P., Sanchez-Oeconomo E. (2022).\ \ Projet Exposition Universelle de 1878 (version 1.0). DOI: 10.5281/zenodo.6126447\ \ URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Expositions_Universelles\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Projet Correspondance Berlioz url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz project-name: "ENC - Bonnes pratiques du developpement collaboratif'\n" authors: - name: Céard surname: Lien roles: - transcriber - name: Sajdak surname: Cécile roles: - transcriber - name: Lebreton surname: Fanny roles: - transcriber description: "Nous avons choisi de travailler sur la correspondance active de Hector\ \ Berlioz adressée à sa sœur Anne-Marguerite \"Nanci\" Berlioz. L’ensemble des\ \ lettres adressées à Nanci Berlioz représentait un volume trop important pour\ \ notre projet, aussi nous les avons sélectionnées, par souci de cohérence, selon\ \ un ordre chronologique (voir le tableau de gestion) pour la liste exacte des\ \ lettres transcrites).\n" language: - fra script: - iso: Latn script-type: only-manuscript time: notBefore: '1823' notAfter: '1844' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz/main/CITATION.cff transcription-guidelines: "**Orthographe :** - Aucune modification opérée sur l'orthographe,\ \ même en présence de fautes. - L'orthographe ancienne est laissée telle quelle.\ \ - Aucune restitution des accents manquants. Aucune correction des accents fautifs.\ \ Restitution de la bonne graphie de l'accent, lorsque nous considérons qu'il\ \ y a une variation de la graphie de celui-ci à cause de la rapidité d'écriture.\ \ - Aucune restitution des traits d'union manquants. - Séparation des mots\ \ collés dès lors que la ligature entre ces mots semble due à la rapidité de l'écriture.\n\ **Abréviations :** - Aucune résolution d'abréviation. - Utilisation du symbole\ \ monétaire de la livre tournois → **₶** (Unicode U+20B6).\n**Mots en exposant\ \ :** - Restitution seulement du mot sans le mettre en exposant.\n**Majuscules\ \ et minuscules :** - Aucune restitution des majuscules, même lorsqu'elles sont\ \ absentes en début de phrase ou de nom propre.\n**Ponctuation :** - Aucune restitution\ \ de la ponctuation manquante. Aucune correction de la ponctuation fautive. -\ \ Emploi du tiret cadratin (—, unicode U+2014) de part et d'autre d'une incise.\ \ - Emploi du tiret demi-cadratin (–, unicode U+2013) pour marquer le changement\ \ d’interlocuteur dans les dialogues et devant les éléments des listes/ énumérations.\n" volume: - metric: characters count: 13474 - metric: files count: 16 - metric: lines count: 367 - metric: regions count: 64 production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Ceard, Lien and Lebreton, Fanny and\ \ Sajdak, Cécile},\ndoi = {10.5281/zenodo.6126475},\nmonth = {1},\ntitle = {Projet\ \ Correspondance Berlioz},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz},\n\ year = {2022}\n}\n" _apa: "Ceard L., Lebreton F., Sajdak C. (2022). Projet Correspondance Berlioz (version\ \ 1.0). DOI: 10.5281/zenodo.6126475 URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Correspondance-Berlioz\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Projet Notre-Dame url: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame project-name: "ENC - Bonnes pratiques du developpement collaboratif\n" authors: - name: Doat surname: Soline roles: - transcriber - name: Menu surname: Ariane roles: - transcriber - name: Falcoz surname: Elsa roles: - transcriber - name: Faure surname: Margaux roles: - transcriber - name: Mazoué surname: Anaïs roles: - transcriber description: "Le Projet Notre-Dame consiste en une transcription des journaux quotidiens\ \ de l’année 1860 (https://mediatheque-patrimoine.culture.gouv.fr/sites/mediatheque/files/jnd_1860.pdf)\ \ des travaux de restauration effectués de 1844 à 1865 à la cathédrale Notre-Dame\ \ de Paris sous la direction d'Eugène Viollet-le-Duc et Jean-Baptiste Lassus.\ \ Celle-ci a été effectuée sur eScriptorium à partir de la numérisation des journaux\ \ des travaux (https://mediatheque-patrimoine.culture.gouv.fr/travaux-de-notre-dame-de-paris-1844-1865)\ \ réalisée par la Médiathèque de l'architecture et du patrimoine. \n" language: - fra script: - iso: Latn script-type: only-manuscript time: notBefore: '1860' notAfter: '1860' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML citation-file-link: https://raw.githubusercontent.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame/main/CITATION.cff transcription-guidelines: "- respect des majuscules et minuscules - respect des\ \ ligatures (par exemple, transcrire \"chœur\") - mot qui est barré : 难 (une seule\ \ fois par mot) mais seulement s'ils sont totalement/à moitié illisibles. Les\ \ restranscrire entre accolades {} s'ils sont lisibles. - Pour mettre en exergue\ \ les doutes de transcription : \n - mot incertain: [incertain]\n - mot\ \ que l'on ne parvient pas à transcrire : [??]\n" volume: - metric: characters count: 29286 - metric: files count: 12 - metric: lines count: 735 - metric: regions count: 86 production-software: Unknown [Automatically filled] automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Doat, Soline and Falcoz, Elsa and\ \ Faure, Margaux and Mazoué, Anaïs and Menu, Ariane},\ndoi = {10.5281/zenodo.6126491},\n\ month = {1},\ntitle = {Projet Notre-Dame},\nurl = {https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame},\n\ year = {2022}\n}\n" _apa: "Doat S., Falcoz E., Faure M., Mazoué A., Menu A. (2022). Projet Notre-Dame\ \ (version 1.0). DOI: 10.5281/zenodo.6126491 URL: https://github.com/PSL-Chartes-HTR-Students/TNAH-2021-Projet-Notre-Dame\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: FoNDUE-GasparoSardiToponomasia-Dataset url: https://github.com/PaulineJac/GasparoSardiToponomasia/tree/main/HTR authors: - name: Jacsont surname: Pauline roles: - transcriber - quality-control - digitization - name: Mittenhuber surname: Florian institutions: [] description: >- Dataset produced as for the project to edit Gasparo Sardi’s Toponomasia from codex 174 of the Burgerbibliothek of Bern. Images are available on request by writing to: pauline.jacsont [ at ] unige.ch. project-name: FoNDUE language: - lat production-software: eScriptorium + Kraken script: - iso: Latn - iso: Grek script-type: only-manuscript time: notBefore: '1561' notAfter: '1570' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML sources: - reference: '' link: http://katalog.burgerbib.ch/detail.aspx?ID=340662 volume: - metric: pages count: 49 citation-file-link: >- https://github.com/PaulineJac/GasparoSardiToponomasia/blob/main/HTR/CITATION.cff transcription-guidelines: " The transcriptions were made following the rules of\ \ the github cremma-medieval repository - https://github.com/HTR-United/cremma-medieval.\ \ The transcription is strictly diplomatic and graphmatic. No abbreviations are\ \ resolved, no standardization of 'i' and 'v' with ramist letters, and accents,\ \ punctuation, spaces, and line breaks are strictly adhered to. Following Leiden\ \ conventions, crossed out or crossed out elements are transcribed with double\ \ brackets ⟦⟧, and elements that are illegible in the picture will not be restored\ \ but indicated by this type of bracket ⟨ ⟩. Special characters are encoded according\ \ to the MUFI fonts." automatically-aligned: false - authors: - name: Dubois roles: - project-manager surname: Alain - name: Clérice roles: - project-manager - quality-control surname: Thibault - name: Rudaz roles: - transcriber surname: Clemence - name: Schlaeppi roles: - transcriber surname: Darius - name: Mamie roles: - transcriber surname: Delphine - name: Schmied roles: - support surname: Marie-Caroline characters: members: - e - '1' - a - i - r - l - n - s - t - o - u - '8' - c - / - h - '"' - d - '2' - m - M - b - f - g - V - '3' - '6' - '4' - '5' - F - J - p - '7' - v - A - S - '0' - ̧ - ̀ - ́ - z - y - C - B - '9' - D - L - . - W - P - G - E - T - ̶ - R - H - N - O - ̈ - x - I - K - k - w - ° - q - '-' - j - ̂ - '?' - Z - "'" - _ - ^ - ̵ - X - U - ( - ) - '=' - ',' - Q - ':' - < - '>' - œ - '!' - '&' - '[' - ']' - ᗅ - ¨ - '*' - § - '}' - \ - + - '#' mode: NFD citation-file-link: https://raw.githubusercontent.com/PonteIneptique/valais-recensement/main/CITATION.CFF description: Ensemble de formulaire de recensement format: Alto-XML hands: count: 1-per-file precision: exact institutions: - name: Archives du Valais roles: - digitization language: - fra - deu license: - name: CC-BY-BC 4.0 url: https://creativecommons.org/licenses/by-nc/4.0/ production-software: eScriptorium + Kraken project-name: Valais Time Machine project-website: https://www.timemachinevs.ch/ schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1890' notBefore: '1870' title: Recensement Valaisan (Valais Time Machine) transcription-guidelines: "- Superscript are transcribed with a ^ before the string.\n\ - Transcription is faithful: nothing is corrected.\n- Checkmarks in table are\ \ transcribed as `/`. Some checkmarks looking character can be transcribed as\ \ `1` if the 1 in the dates looks the same\n- Printed part of the form is not\ \ transcribed.\n- Only `Col` and `Header` regions are used for table segmentation.\ \ If a Signature is at the bottom, we also use `Signature`" url: https://github.com/PonteIneptique/valais-recensement volume: - count: 282260 metric: characters - count: 915 metric: files - count: 59368 metric: lines - count: 34083 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Alain, Dubois and Clérice, Thibault\ \ and Mamie, Delphine and Darius, Schlaeppi and Rudaz, Clémence and Schmied, Marie-Caroline},\n\ title = {Tables du recensement du Valais},\nurl = {https://github.com/PonteIneptique/valais-recensement}\n\ }\n" _apa: "Alain D., Clérice T., Mamie D., Darius S., Rudaz C., Schmied M. Tables du\ \ recensement du Valais URL: https://github.com/PonteIneptique/valais-recensement\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: HTR - Araucania manuscript XIX url: https://github.com/Proyecto-Ocupacion-Araucania-UChile/HTR_Araucania_XIX authors: - name: Humeau surname: Maxime - name: Chiaretti surname: Alessandro institutions: - name: Archivo Central Andres Bello description: "Ground Truth dataset for Spanish 19th typewritten OCR. \nThe archives\ \ come from the events of the Occupation of Araucania (1850-1881) in Chile. They\ \ are archived in the ’Colección manuscritos' of the Archivo Central Andres Bello\ \ - Universidad de Chile." language: - spa production-software: eScriptorium + Kraken script: - iso: Latn script-type: mainly-manuscript time: notBefore: '1859' notAfter: '1877' hands: count: more-than-10 precision: estimated license: - name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Alto-XML volume: - metric: characters count: 117155 - metric: files count: 180 - metric: lines count: 3932 - metric: regions count: 981 transcription-guidelines: "- xxx for erased or unreadable characters\n- ^+letters\ \ for superscript letters\n- ⁋ for new paragraph\n" characters: mode: NFD members: - e - a - o - n - s - r - i - d - l - u - t - c - m - p - q - b - ́ - g - . - h - ',' - ⁋ - v - '-' - f - y - S - C - '0' - ^ - A - j - U - '1' - z - x - D - M - ̃ - E - '2' - L - P - N - '8' - V - J - B - T - G - '6' - I - '5' - '3' - ':' - '9' - '4' - H - R - '7' - ; - O - “ - º - ” - F - Q - Y - ̄ - '*' - _ - '=' - $ - ( - '"' - ) - ¿ - / - ̀ - '?' - ̈ - ¡ - '!' - '{' - '~' - '}' - '&' - W - Z - ‘ - ’ - K - '[' - ']' automatically-aligned: false - authors: - name: Sonia orcid: 0009-0009-7367-048X roles: - transcriber - project-manager - quality-control surname: Solfrini - name: Simon orcid: 0000-0001-9094-4475 roles: - support surname: Gabay - name: Geneviève orcid: 0009-0006-5367-4262 roles: - transcriber - project-manager - quality-control surname: Gross - name: Pierre-Olivier orcid: 0009-0009-2475-6017 roles: - transcriber - quality-control surname: Beaulnes - name: Aurélia orcid: 0009-0009-9678-9811 roles: - transcriber - quality-control surname: Marques Oliveira - name: Daniela orcid: 0000-0002-2601-668X roles: - project-manager surname: Solfaroli Camillocci characters: members: - e - s - u - a - i - n - t - r - o - l - c - d - p - m - . - ',' - f - q - g - ̃ - y - b - h - / - z - ⁊ - ¬ - ':' - C - D - x - E - I - P - L - S - '1' - A - M - Q - '2' - U - '?' - '3' - N - T - '4' - O - ͥ - B - R - ꝰ - H - '6' - '5' - ͬ - G - '8' - F - ( - ) - '0' - '9' - ¶ - '7' - ◊ - ꝓ -   - ꝑ - ᑕ - V - '-' - Y - ; - ᗞ - J - k - ̀ - ꝯ - Z - v mode: NFD citation-file-link: https://github.com/SETAFDH/HTR-SETAF-Jean-Michel/blob/main/CITATION.cff description: >- OCR data for the SETAF project, 16th-century French prints in Gothic characters. format: Alto-XML hands: count: '1' precision: exact language: - fra license: type: CC-BY version: 4.0 production-software: eScriptorium + Kraken project-name: FoNDUE project-website: >- https://www.unige.ch/lettres/humanites-numeriques/recherche/projets-de-la-chaire/fondue schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-typed time: notAfter: '1600' notBefore: '1500' title: HTR-SETAF-Jean-Michel transcription-guidelines: >- Our data follow SegmOnto segmentation standards (https://segmonto.github.io). Our transcription guidelines follow a graphematic approach, without regularisation. We keep the original punctuation and abbreviations. A detailed presentation of our rules is available on HAL (https://hal.science/hal-04281804). url: https://github.com/SETAFDH/HTR-SETAF-Jean-Michel volume: - count: 286256 metric: characters - count: 404 metric: files - count: 11778 metric: lines - count: 1365 metric: regions - authors: - name: Sonia orcid: 0009-0009-7367-048X roles: - transcriber - project-manager - quality-control surname: Solfrini - name: Simon orcid: 0000-0001-9094-4475 roles: - support surname: Gabay - name: Geneviève orcid: 0009-0006-5367-4262 roles: - transcriber - project-manager - quality-control surname: Gross - name: Pierre-Olivier orcid: 0009-0009-2475-6017 roles: - transcriber - quality-control surname: Beaulnes - name: Aurélia orcid: 0009-0009-9678-9811 roles: - transcriber - quality-control surname: Marques Oliveira - name: Daniela orcid: 0000-0002-2601-668X roles: - project-manager surname: Solfaroli Camillocci characters: members: - e - s - u - i - a - t - n - r - o - l - c - d - p - m - / - . - q - f - ⁊ - ̃ - y - g - h - z - b - ¬ - x - I - ':' - C - E - P - '1' - ¶ - L - D - S - '2' - A - ꝰ - M - R - ͥ - '3' - N - '?' - '4' - Q - T - '6' - ͬ - '7' - H - '8' - '5' - '0' - '9' - U - B - G - O - F - ) - ( - ꝑ - ꝓ - v - J - '-' - ꝫ - ł - ꝯ - Z - k - K - ᗅ - ð - ꝗ - ̈ - ◊ - ',' - V - ᑕ - j mode: NFD citation-file-link: https://github.com/SETAFDH/HTR-SETAF-LesFaictzJCH/blob/main/CITATION.cff description: >- OCR data for the SETAF project, 16th-century French prints in Gothic characters. format: Alto-XML hands: count: '1' precision: exact language: - fra license: type: CC-BY version: 4.0 production-software: eScriptorium + Kraken project-name: FoNDUE project-website: >- https://www.unige.ch/lettres/humanites-numeriques/recherche/projets-de-la-chaire/fondue schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-typed time: notAfter: '1600' notBefore: '1500' title: HTR-SETAF-LesFaictzJCH transcription-guidelines: >- Our data follow SegmOnto segmentation standards (https://segmonto.github.io). Our transcription guidelines follow a graphematic approach, without regularisation. We keep the original punctuation and abbreviations. A detailed presentation of our rules is available on HAL (https://hal.science/hal-04281804). url: https://github.com/SETAFDH/HTR-SETAF-LesFaictzJCH volume: - count: 205523 metric: characters - count: 144 metric: files - count: 4765 metric: lines - count: 485 metric: regions - authors: - name: Sonia orcid: 0009-0009-7367-048X roles: - transcriber - project-manager - quality-control surname: Solfrini - name: Simon orcid: 0000-0001-9094-4475 roles: - support surname: Gabay - name: Geneviève orcid: 0009-0006-5367-4262 roles: - transcriber - project-manager - quality-control surname: Gross - name: Pierre-Olivier orcid: 0009-0009-2475-6017 roles: - transcriber - quality-control surname: Beaulnes - name: Aurélia orcid: 0009-0009-9678-9811 roles: - transcriber - quality-control surname: Marques Oliveira - name: Daniela orcid: 0000-0002-2601-668X roles: - project-manager surname: Solfaroli Camillocci characters: members: - e - s - u - i - a - t - r - n - o - l - c - d - p - . - m - ̃ - / - q - f - y - g - h - b - ⁊ - z - x - ¬ - ':' - C - I - E - D - '1' - P - L - A - M - S - '2' - ¶ - ͥ - '3' - T - N - Q - '4' - ͬ - '5' - R - U - '?' - ꝰ - '6' - O - '0' - H - ',' - G - B - ( - ) - '8' - '7' - '9' - ꝓ - F - ꝑ - ̈ - ł -   - ꝯ - '-' - ◊ - ð - ꝝ - "'" - ̀ - k - v - Z - K - ́ - Y - V - X - J - ꝫ - w - ; - ꝗ - ̇ - ̌ mode: NFD citation-file-link: https://github.com/SETAFDH/HTR-SETAF-Pierre-de-Vingle/blob/main/CITATION.cff description: >- OCR data for the SETAF project, 16th-century French prints in Gothic characters. format: Alto-XML hands: count: '1' precision: exact language: - fra license: type: CC-BY version: 4.0 production-software: eScriptorium + Kraken project-name: FoNDUE project-website: >- https://www.unige.ch/lettres/humanites-numeriques/recherche/projets-de-la-chaire/fondue schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-typed time: notAfter: '1600' notBefore: '1500' title: HTR-SETAF-Pierre-de-Vingle transcription-guidelines: >- Our data follow SegmOnto segmentation standards (https://segmonto.github.io). Our transcription guidelines follow a graphematic approach, without regularisation. We keep the original punctuation and abbreviations. A detailed presentation of our rules is available on HAL (https://hal.science/hal-04281804). url: https://github.com/SETAFDH/HTR-SETAF-Pierre-de-Vingle volume: - count: 648218 metric: characters - count: 895 metric: files - count: 24701 metric: lines - count: 2752 metric: regions - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Moonshines url: https://github.com/alix-tz/moonshines authors: - name: Alix surname: Chagué orcid: 0000-0002-0136-4434 roles: - transcriber - aligner - project-manager - digitization institutions: [] description: This dataset is composed of pages of text written in 2023 by a single person, copying texts taken from Guillaume Apollinaire's poems published in Alcools, and taken from Guillaume Apollinaire's Wikipedia page. language: - fra production-software: eScriptorium + Kraken script: - iso: Latn script-type: only-manuscript time: notBefore: '2023' notAfter: '2023' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 27734 - metric: files count: 45 - metric: lines count: 1016 - metric: regions count: 45 citation-file-link: https://github.com/alix-tz/moonshines/blob/master/CITATION.cff transcription-guidelines: The transcription strictly follows what is written on the images, including accentuation or capitalization errors. The segmentation follows the SegmOnto ontology and mostly relies on MainZone and DefaultLine. Beware that this dataset barely contains any ponctuation and that most lines begin with a capital letter. characters: mode: NFD members: - e - s - a - n - r - i - t - u - o - l - d - m - c - p - ́ - "'" - v - g - b - h - ̀ - f - L - q - E - '1' - A - C - x - y - ̂ - S - '9' - P - M - j - T - D - '-' - N - J - R - '0' - z - O - I - '2' - '8' - V - F - G - U - '5' - B - Q - ) - H - '3' - ( - '7' - '6' - w - k - '4' - ̧ - K - Z - ̈ - Y - '{' - '}' - W - . - X - ',' automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix},\ndoi = {0.5281/zenodo.607720783},\n\ month = {2},\ntitle = {moonshines},\nurl = {https://github.com/alix-tz/moonshines},\n\ year = {2023}\n}\n" _apa: "Chagué A. (2023). moonshines (version 2.0.0). DOI: 0.5281/zenodo.607720783\ \ URL: https://github.com/alix-tz/moonshines\n" - authors: - name: Alix orcid: 0000-0002-0136-4434 roles: - transcriber - aligner - quality-control surname: Chagué - name: Pascal roles: - project-manager surname: Dubourg Glatigny - name: Gilles roles: - transcriber surname: Pérez characters: members: - e - s - a - t - n - i - r - u - o - l - d - m - c - p - E - ',' - ́ - . - v - A - f - ’ - I - S - N - g - q - R - T - O - ̀ - '-' - b - L - h - U - C - j - '1' - D - M - P - '"' - x - '2' - ̂ - V - y - H - '3' - J - '9' - '4' - B - G - ( - F - '0' - ) - K - '7' - '5' - ']' - '?' - '8' - ':' - '[' - '6' - Q - ̧ - z - k - Y - / - ; - Z - X - ° - '#' - ^ - '=' - ⋎ - → - ̈ - '!' - '{' - w - W - + - ̆ - '*' - '%' - '>' - < - '~' mode: NFD description: Ground Truth for the Digital Peraire project. format: Alto-XML hands: count: '1' precision: exact institutions: - name: Azentis roles: - digitization language: - fra license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ production-software: eScriptorium + Kraken project-name: Digital Peraire schema: https://htr-united.github.io/schema/2023-06-27/schema.json script: - iso: Latn script-type: only-manuscript time: notAfter: '1990' notBefore: '1928' title: Peraire Ground Truth transcription-guidelines: Les mots barrés sont transcrits par "><". Les textes suscrits ne sont pas signalés. Ce qui est écrit est transcrits. S'il y a des incertitutes, la ligne est laissée vide. La segmentation de certains documents ne convient pas pour l'entraînement d'un modèle de segmentation. L'ontologie SegmOnto a été utilisée. Quand les mots ajoutés sont insérés par un '⋎', ce graphème est transcrit par un ⋎. url: https://github.com/alix-tz/peraire-ground-truth volume: - count: 97505 metric: characters - count: 67 metric: files - count: 2307 metric: lines - count: 151 metric: regions automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Chagué, Alix and Pérez, Gilles},\n\ doi = {10.5281/zenodo.7185907},\nmonth = {6},\ntitle = {Peraire Ground Truth},\n\ url = {https://github.com/alix-tz/peraire-ground-truth},\nyear = {2023}\n}\n" _apa: "Chagué A., Pérez G. (2023). Peraire Ground Truth (version 2.0.0). DOI: 10.5281/zenodo.7185907\ \ URL: https://github.com/alix-tz/peraire-ground-truth\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: RASAM url: https://github.com/calfa-co/rasam-dataset project-website: https://calfa.fr/blog/26 authors: - name: Vidal-Gorène surname: Chahan roles: - project-manager - name: Lucas surname: Noëmie roles: - project-manager - quality-control - name: Salah surname: Clément roles: - transcriber - quality-control - name: Decours-Perez surname: Aliénor roles: - support - name: Dupin surname: Boris roles: - support description: "The Dataset is made up of 300 images, with their related ground truth\ \ stored in a XML file (pageXML format). Images come from three manuscripts selected\ \ among the collections of the BULAC Library (Paris). It covers a representative\ \ part of the handwritten production in Arabic Maghrebi scripts and includes an\ \ annotation of the layout (TextRegions, baselines and polygons) and the transcription\ \ of the main text. This dataset is the result of a collaborative transcription.\ \ All the participants are credited on the official deposit. With the support\ \ of the French Ministry of Higher Education, Research and Innovation, the Research\ \ Consortium Middle-East and Muslim Worlds (GIS MOMM), Calfa and the BULAC library.\n" language: - ara script: - iso: Arab script-type: only-manuscript time: notBefore: '1700' notAfter: '1899' hands: count: less-than-11 precision: exact license: - name: Apache-2.0 License url: https://www.apache.org/licenses/LICENSE-2.0 format: Page-XML volume: - metric: pages count: 300 - count: 7540 metric: lines - count: 300 metric: files - count: 676 metric: regions - count: 403034 metric: characters sources: - reference: Vidal-Gorène, C., Lucas, N., Salah, C., Decours-Perez, A., & Dupin, B. (2021, September). RASAM–A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi. In International Conference on Document Analysis and Recognition (pp. 265-281). Springer, Cham link: https://link.springer.com/chapter/10.1007/978-3-030-86198-8_19 transcription-guidelines: "Full description of specifications for transcription\ \ available on Github and in the paper.'\n" production-software: Calfa Vision automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: TariMa url: https://github.com/calfa-co/tarima authors: - name: Antoine surname: Perrier orcid: 0000-0002-5035-4283 roles: - project-manager institutions: - name: BULAC roles: - project-manager description: >- The dataset has been collated within the frame of the TariMa project (Tarih al-Maghrib. Writing History in the Maghreb in the modern and contemporary era), sponsored by the French agency Collex-Persee and supervised by Antoine Perrier (CNRS). It comprises different image resolution and size (width from 982px to 8049px), different layouts (double page, multiple columns), and state of conservation. It also mixes microfilms, scans and lithographies. It presents a very wide variety representative of the Maghrebi Arabic production. project-website: https://www.collexpersee.eu/projet/tarima/ language: - ara production-software: Calfa Vision script: - iso: Arab qualify: Maghrebi script-type: mainly-manuscript time: notBefore: '1500' notAfter: '1899' hands: count: more-than-10 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Page-XML sources: - reference: '' link: https://github.com/calfa-co/tarima volume: - metric: files count: 120 - metric: lines count: 2673 - metric: characters count: 146667 transcription-guidelines: >- We follow the RASAM guidelines for the transcription of Arabic Maghrebi manuscripts. automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: OCR17plus url: https://github.com/e-ditiones/OCR17plus project-name: E-ditiones project-website: https://e-ditiones.huma-num.fr/ authors: - name: Gabay surname: Simon roles: - transcriber - project-manager - support - name: Jahan surname: Claire roles: - transcriber - aligner description: Imprimés classiques language: - frm script: - iso: Latn script-type: only-typed time: notBefore: '1600' notAfter: '1700' hands: count: 1-per-folder precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - count: 25628 metric: lines - count: 965 metric: files - count: 3923 metric: regions - count: 686335 metric: characters production-software: Transkribus automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Jahan, Claire and Gabay, Simon},\n\ doi = {none},\nmonth = {7},\ntitle = {OCR17+ - Layout analysis and text recognition\ \ for 17th c. French prints},\nurl = {https://github.com/e-ditiones/OCR17plus},\n\ year = {2021}\n}\n" _apa: "Jahan C., Gabay S. (2021). OCR17+ - Layout analysis and text recognition\ \ for 17th c. French prints (version 1.0). DOI: none URL: https://github.com/e-ditiones/OCR17plus\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: GenAuto TD Corpus url: https://github.com/jpmjpmjpm/genauto-td-htr.git project-name: GenAuto project-website: '' authors: - name: Boutet surname: Jean-François roles: - transcriber - aligner - name: Merx surname: Jean-Pierre roles: - transcriber - aligner - project-manager description: "150 transcribed images from \"Tables Décennales\" French Civil Registry.\ \ Those come from Sermaises and Romilly-sur-Seine municipalities.\n" language: - fra script: - iso: Latn script-type: only-manuscript time: notBefore: '1792' notAfter: '1902' hands: count: less-than-11 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - count: 300 metric: pages - count: 150 metric: images - count: 150 metric: files - count: 186366 metric: characters - count: 21557 metric: lines - count: 608 metric: regions production-software: eScriptorium + Kraken automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Boutet, Jean-François and Merx, Jean-Pierre},\n\ doi = {10.5281/zenodo.5507403},\nmonth = {9},\ntitle = {GenAuto TD Corpus},\n\ url = {https://github.com/jpmjpmjpm/genauto-td-htr.git},\nyear = {2021}\n}\n" _apa: "Boutet J., Merx J. (2021). GenAuto TD Corpus (version 1.0.0). DOI: 10.5281/zenodo.5507403\ \ URL: https://github.com/jpmjpmjpm/genauto-td-htr.git\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Joseph Hooker HTR url: https://github.com/jschaefer738b/JosephHookerHTR.git authors: - name: John surname: Schaefer orcid: 0009-0006-5751-9323 roles: - transcriber - project-manager - quality-control - support - name: Kiri surname: Ross-Jones roles: - support - name: Alexis surname: Litvine roles: - support institutions: - name: Royal Botanic Gardens, Kew - name: University of Cambridge description: >- XML transcriptions and JPEG images exported from Transkribus as ground truth for an eScriptorium-Kraken HTR model (CER 11-12%) trained on the correspondence of Joseph Dalton Hooker (1817-1911), primarily letters to William Turner Thiselton-Dyer (1843-1928) during the late-19th/early-20th century. Many transcriptions in this dataset were generated by a small team of anonymous volunteers as part of the Joseph Hooker Correspondence Project based at Kew Gardens. All images in this dataset are reproduced with the kind permission of the Board of Trustees of the Royal Botanic Gardens Kew (© RBG, Kew). Contact archives@kew.org for more information. HTR Model: Schaefer, John, & Litvine, Alexis. (2023). Joseph Hooker HTR Model. Zenodo. https://doi.org/10.5281/zenodo.8038689 project-name: Joseph Hooker Correspondence Project project-website: >- https://www.kew.org/science/our-science/projects/joseph-hooker-correspondence-project language: - eng production-software: Transkribus script: - iso: Latn script-type: only-manuscript time: notBefore: '1850' notAfter: '1911' hands: count: '1' precision: estimated license: - name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Page-XML volume: - metric: lines count: 7100 - metric: files count: 337 - metric: pages count: 337 transcription-guidelines: >- All horizontal lines in Hooker's hand were transcribed as originally written. Most typescript and vertical lines in the margins were not included. automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Eutyches url: https://github.com/malamatenia/Eutyches authors: - name: Vlachou Efstathiou surname: Malamatenia roles: - transcriber - aligner - project-manager institutions: [] description: >- Ground truth for minuscule caroline of the late 9th century from the grammatical work "de uerbo" of Eutychès. project-name: Eutyches grammaticus glossed language: - lat - grc production-software: eScriptorium + Kraken script: - iso: Latn qualify: Minuscule Caroline script-type: only-manuscript time: notBefore: '850' notAfter: '900' hands: count: less-than-11 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML sources: - reference: Codices Vossiani Latini, Brill , VLO41 link: >- https://primarysources.brillonline.com/browse/vossiani-latini/vlo-041-eutyches-grammaticalia-isidorus-alphabeta volume: - metric: pages count: 65 citation-file-link: https://github.com/malamatenia/Eutyches/blob/main/CITATION.cff transcription-guidelines: >- Graphematic transcription, following the guidelines of CREMMA-medieval. Spacing has been reestablished when dealing with semicontinua, s for long s, loyal to the manuscript for capital letters, abbreviations preserved, punctuation reduced to ";" and ".". The few greek passages have been also been preserved, and some of the essais de plume as well (when forming full words). Annotation of the layout made with SegmOnto controlled vocabulary. automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Vlachou-Efstathiou, Malamatenia},\n\ title = {Eutyches \"de uerbo\" glossed}\n}\n" _apa: "Vlachou-Efstathiou M. Eutyches \"de uerbo\" glossed\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Shakespeare-Scott translations url: https://github.com/millawell/ocr-data project-name: "Publishing an OCR ground truth data set for reuse in an unclear copyright\ \ setting'\n" project-website: https://github.com/millawell/ocr-data authors: - name: Lassner surname: David - name: Coburger surname: Julius - name: Neudecker surname: Clemens - name: Baillot surname: Anne description: "Ground truth data in German and English of Shakespeare and Scott prints\ \ in original and different translations. \n" language: - eng - deu script: - iso: Latn - iso: Latf script-type: only-typed time: notBefore: '1815' notAfter: '1852' hands: count: unknown precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: lines count: 5354 - metric: files count: 131 - metric: regions count: 131 - metric: characters count: 192264 sources: - reference: '' link: https://zfdg.de/sb005_006 citation-file-link: https://github.com/millawell/ocr-data/blob/master/citation.cff production-software: eScriptorium + Kraken automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Paris Bible Project (PBP) url: https://github.com/parisbible/ground_truth authors: - name: Estelle surname: Guéville orcid: 0000-0003-2603-1051 roles: - transcriber - aligner - project-manager - quality-control - name: David surname: Wrisley orcid: 0000-0002-0355-1487 roles: - transcriber - aligner - project-manager - quality-control - name: Niccolò Acram surname: Cappelletto roles: - transcriber - aligner - quality-control institutions: [] description: >- The Paris Bible Project aims to understand the production and diffusion of medieval Latin Bibles in Europe. The dataset includes ground truth from Paris Bibles produced in the 13th and 14th centuries. We also provide the most recent version of our list of Paris Bible manuscripts found in the world along with information about them. project-website: https://parisbible.github.io/ language: - lat production-software: Transkribus script: - iso: Latn script-type: only-manuscript time: notBefore: '1200' notAfter: '1399' hands: count: more-than-10 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: lines count: 1700 - metric: files count: 19 - metric: regions count: 40 - metric: characters count: 55970 characters: mode: NFKD members: - i - e - t - u - a - s - o - n - ̄ - c - m - r - l - ꝺ - . - p - b - q - ⁊ - g - f - ́ - ꝛ - h - '-' - d - ꝫ - ; - x - ꝯ - ̾ - ꝑ - ͥ - E - ̕ - ꝝ - ̃ - ꝓ - y - ̈ - N - ̇ - Q - · - D - S - I - A - ͦ - C - T - ᔆ - ꝙ - H - F - P - ͣ - '2' - V - M - ':' - R - z - L - O - U - v - ℟ - G - ͨ - ͧ - '&' - ẜ - ᷤ - ͤ - ʀ - B - X - Ꝙ - '?' - k - ᣳ - j - ͬ transcription-guidelines: 'See: https://parisbible.github.io/guidelines/' automatically-aligned: false _bibtex: "@misc{YourReferenceHere,\nauthor = {Guéville, Estelle and Wrisley, David\ \ Joseph},\ndoi = {10.5281/zenodo.7653691},\nmonth = {10},\ntitle = {Ground Truth\ \ Used in HTR for the Paris Bible Project},\nyear = {2021}\n}\n" _apa: "Guéville E., Wrisley D.J. (2021). Ground Truth Used in HTR for the Paris\ \ Bible Project (version 1.0.0). DOI: 10.5281/zenodo.7653691\n" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Bullinger HTR Dataset url: https://github.com/pstroe/bullinger-htr authors: - name: Phillip Benjamin surname: Ströbel orcid: 0000-0003-2063-5495 roles: - aligner - support - name: Tobias surname: Hodel orcid: 0000-0002-2071-6407 roles: - aligner - project-manager - name: Christian surname: Sieber orcid: 0000-0002-9364-6921 roles: - digitization - name: Patricia surname: Scheurer roles: - quality-control - support - name: David Selim surname: Schoch orcid: 0000-0002-9936-8459 roles: - aligner - name: Anna surname: Janka roles: - aligner - name: Raphael surname: Schwitter roles: - aligner - name: Beat surname: Wolf roles: - aligner - name: Jonas surname: Widmer roles: - aligner - name: Peter surname: Rechsteiner roles: - quality-control - support - name: Raphael surname: Müller roles: - quality-control - digitization - support institutions: [] description: >- This dataset contains 165,673 image and corresponding text line files (.png for images and .txt for the texts) in a random 80/10/10 training, validation and test set split. The source is the extensive correspondence of Swiss reformer Heinrich Bullinger (1504-1575) and his over 800 different correspondents. It therefore contains great variety in handwriting styles. Furthermore, it is multilingual since there are Latin and Early New High German (and sometimes mixed) letters. The data is split into Latin and Early New High German (determined with langid) and put into separate folders (de for Early New High German and la for Latin). project-website: https://www.bullinger-digital.ch/ language: - lat - deu production-software: Transkribus, own script: - iso: Latn script-type: only-manuscript time: notBefore: '1523' notAfter: '1575' hands: count: more-than-10 precision: estimated license: name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Image-Text-Pairs volume: - metric: lines count: 165673 automatically-aligned: true transcription-guidelines: Automated transcript alignment with Transkribus - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Caroline Minuscule by Rescribe url: https://github.com/rescribe/carolineminuscule-groundtruth project-name: "Rescribe'\n" project-website: https://rescribe.xyz/ authors: - name: White surname: Nick roles: - transcriber - project-manager - name: Clérice surname: Thibault roles: - aligner - name: Karaisl surname: Antonia roles: - transcriber - project-manager description: "This ground truth repository is a work in process; it currently accounts\ \ for a part of our complete Caroline Minuscule training pool of around 70 manuscripts\ \ used for our OCRopus Caroline Minuscule model (see ocropus-models repository).\n" language: - lat script: - iso: Latn script-type: only-manuscript time: notBefore: '800' notAfter: '1199' hands: count: 1-per-file precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: characters count: 17155 - metric: files count: 17 - metric: lines count: 457 - metric: regions count: 46 transcription-guidelines: "In general this meant deciding between diplomatic transcription\ \ (i.e. sticking to what it says on the page) and gently modernized features (i.e.\ \ reinterpreting medieval signs into modern equivalents) with a view to specific\ \ categories. Read on for a summary of the rules and the respective rationale\ \ behind them.\nSUMMARY\nPUNCTUATION\n\n Modern: medieval punctuation is transcribed\ \ with modern equivalents; punctus elevatus transcribed as semicolon\n\nCAPITALIZATION\n\ \n Diplomatic: Original capitalization retained\n\nABBREVIATIONS\n\n Diplomatic\ \ where possible: Retain abbreviations and render glyphs as opposed to expanded\ \ versions where possible\n \"*\" where original character isn't served: OCRopus\ \ (at the point in time of transcription) could not handle some of the medieval\ \ glyphs, even where a Unicode version was present. Abbreviations not in OCRopus\ \ are uniformly transcribed as \"*\", in the case of a combined character (such\ \ as a consonant with a macron) as the base character followed by \"*\" (e.g.\ \ \"t*\"). The list of accepted characters in OCRopus can be found in this repository,\ \ and downloaded and used as codec in the OCRopus training process.\n\nSPACING\n\ \n Diplomatic: Preserve manuscript spacing, i.e. give diplomatic transcription\n\ \nNUMBERS\n\n Diplomatic: retain original version of both Roman and Arabic\ \ numerals'" characters: mode: NFD members: - i - e - t - u - a - s - n - o - r - m - c - d - l - p - . - b - q - g - '*' - h - ; - ̃ - f - x - I - ̄ - E - N - ̨ - ':' - '&' - S - ꝑ - C - A - đ - D - U - T - ꝓ - Q - v - ',' - O - R - P - L - M - æ - H - F - '?' - '1' - y - ꝝ - ꝙ - V - '4' - B - z - '5' - X - '6' - ꝛ - / - "'" - '0' - '2' - '9' - K - '-' production-software: Unknown [Automatically filled] automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Éditer la correspondance de Constance de Salm (1767-1845) url: https://github.com/sbiay/CdS-edition/tree/main/htr/verite-terrain authors: - name: Biay surname: Sébastien roles: - transcriber institutions: [] description: >- La correspondance de Constance de Salm (femme de lettres française) comprend différents spécimens d’écriture du début du XIXe siècle. Le jeu de données atteste les mains de quatre copistes différents. project-website: https://dhiha.hypotheses.org/2945 language: - fra production-software: eScriptorium + Kraken script: - iso: Latn script-type: only-manuscript time: notBefore: '1800' notAfter: '1825' hands: count: less-than-11 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML sources: - reference: >- Salm, C. de (1767-1845). Correspondance. Société des Amis du Vieux Toulon et de sa Région, Fonds Salm. Archiv Schloss Dyck, fonds Constance de Salm. link: '' volume: - metric: lines count: 1754 transcription-guidelines: >- Usages scribaux respectés : abréviations, fautes, accentuation respectés. Allographes normalisés (s long). automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: EpiSearch HTR url: https://github.com/vedph/episearch-htr authors: - name: Lorenzo surname: Calvelli orcid: 0000-0002-0920-9156 roles: - project-manager - name: Tatiana surname: Tommasi orcid: 0009-0000-2815-0113 roles: - transcriber - name: Federico surname: Boschetti orcid: 0000-0002-7810-7735 roles: - support institutions: [] description: Ground Truth for Astori’s letters (see the README.md file for details) project-name: EpiSearch project-website: https://github.com/vedph/episearch-htr language: - ita production-software: eScriptorium + Kraken script: - iso: Latn script-type: only-manuscript time: notBefore: '1705' notAfter: '1709' hands: count: '1' precision: exact license: - name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Alto-XML volume: - metric: files count: 34 automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: La Correspondances Jacques Doucet - René Jean url: https://gitlab.inha.fr/snr/LaCorrespondanceDoucetReneJean authors: - name: Cugy surname: Pascale roles: - transcriber - project-manager - quality-control - name: Fieschi surname: Caroline roles: - project-manager - quality-control - name: Peyrard surname: Alix roles: - transcriber - quality-control - name: Prohin surname: Lucie roles: - transcriber - quality-control - name: Sarda surname: Marie-Anne roles: - support institutions: - name: Institut National de l'histoire de l'art (INHA) roles: - transcriber - project-manager - quality-control - name: Bibliothèque nationale de France roles: - digitization description: >- Projet entrepris dans le cadre du programme La Bibliothèque d’art et d’archéologie de Jacques Doucet : corpus, savoirs et réseaux de l’Institut national d’histoire de l’art à partir d’un corpus de lettres et documents conservés au Département des manuscrits de la Bibliothèque nationale de France sous la cote NAF 13124, une des principales sources sur la relation entre Doucet et René Jean qu’il engagea comme bibliothécaire le 2 juin 1908. project-name: PENSE@INHA project-website: https://skylab.inha.fr/PENSE/LettresDeJacquesDoucetAReneJean1908-1929/ language: - fra production-software: Transkribus script: - iso: Latn script-type: mainly-manuscript time: notBefore: '1908' notAfter: '1929' hands: count: less-than-11 precision: exact license: - name: Etalab OL 2.0 url: https://spdx.org/licenses/etalab-2.0.html format: Alto-XML volume: - metric: characters count: 83312 - metric: lines count: 2987 - metric: pages count: 200 - metric: files count: 200 automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Les Papiers Barye url: https://gitlab.inha.fr/snr/LesPapiersBarye authors: - name: Claass surname: Victor roles: - transcriber - project-manager - quality-control - name: Gain surname: Justine roles: - transcriber - quality-control - name: Martin-Vigier surname: Suzanne roles: - transcriber - quality-control institutions: - name: Institut National de l'histoire de l'art (INHA) roles: - transcriber - aligner - project-manager - quality-control - digitization description: >- Ensemble de documents autour du sculpteur Antoine-Louis Barye. Paris, Bibliothèque de l’Institut national d’histoire de l’art, collections Jacques Doucet, Archives 166. Institut National de l’Histoire de l’art (INHA) / Set of documents about the sculptor Antoine-Louis Barye. Paris, Library of the Institut national d'histoire de l'art, Jacques Doucet, Archives 166. National Institute of Art History (INHA) project-name: PENSE@INHA project-website: https://skylab.inha.fr/PENSE/LesPapiersBarye/ language: - fra production-software: Transkribus script: - iso: Latn script-type: mainly-manuscript time: notBefore: '1819' notAfter: '1914' hands: count: more-than-10 precision: exact license: - name: Etalab OL 2.0 url: https://spdx.org/licenses/etalab-2.0.html format: Alto-XML volume: - metric: characters count: 362629 - metric: lines count: 17880 - metric: pages count: 918 - metric: files count: 918 automatically-aligned: false - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Ground truth for Neue Zürcher Zeitung black letter period url: https://zenodo.org/record/3333627#.YhN1G1vMLUQ project-name: "impresso'\n" project-website: https://impresso-project.ch/ authors: - name: Ströbel surname: Phillip Benjamin roles: - transcriber - aligner - project-manager - quality-control - support - name: Clematide surname: Simon roles: - transcriber - quality-control - name: Watter surname: Camille roles: - transcriber - name: Meraner surname: Isabell roles: - transcriber description: "The Neue Zürcher Zeitung (NZZ) has been publishing in black letter\ \ from its very first issue in 1780 until 1947. From this time period, we randomly\ \ sampled one frontpage per year, resulting in a total of 167 pages. We chose\ \ frontpages because they typically contain highly relevant material and because\ \ we want to make sure not to sample pages containing exclusively advertisements\ \ or stock information. During certain periods, the NZZ was published several\ \ times a day, and there were supplements, too. Due to incomplete metadata, the\ \ sampling included frontpages from supplements. We then manually corrected the\ \ pages, so it can be used as a ground truth to improve the OCR of black letter\ \ in historical newspapers.i\n" language: - deu script: - iso: Latn script-type: only-typed time: notBefore: '1780' notAfter: '1946' hands: count: less-than-11 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - count: 43173 metric: lines - count: 167 metric: files - count: 6318 metric: regions - count: 1768146 metric: characters production-software: Transkribus automatically-aligned: false _bibtex: "@dataset{phillip_strobel_2019_3333627,\n author = {Phillip Ströbel\ \ and\n Simon Clematide},\n title = {{Ground truth for\ \ Neue Zürcher Zeitung black letter \n period}},\n month \ \ = jul,\n year = 2019,\n publisher = {Zenodo},\n version \ \ = {v1.0},\n doi = {10.5281/zenodo.3333627},\n url =\ \ {https://doi.org/10.5281/zenodo.3333627}\n}" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Gwalther Handwriting Ground Truth url: https://zenodo.org/record/4780947#.YhN5pVvMLUQ project-name: "Bullinger digital'\n" project-website: https://www.bullinger-digital.ch/ authors: - name: Ströbel surname: Phillip Benjamin roles: - aligner - quality-control - support - name: Stotz surname: Peter roles: - transcriber description: "This is ground truth for Rudolph Gwalther’s (1519-1586) handwriting\ \ taken from his book \"Lateinische\" Gedichte\", where he accumulated writings\ \ between 1540 and 1580. Data collection and ground truth creation: At the time\ \ we collected the data, we found 150 images with corresponding transcriptions\ \ by Peter Stotz on e-manuscripta (reference: Gwalther, Rudolf: Lateinische Gedichte.\ \ Zürich, 1540-1580. Zentralbibliothek Zürich, Ms D 152, https://doi.org/10.7891/e-manuscripta-26750\ \ / Public Domain Mark) . We removed 8 images with too many corrections or vertical\ \ texts. Next, we uploaded the images into the Transkribus platform, applied the\ \ line recognition tool and manually copied the transcribed text lines into the\ \ recognised line boxes. During this process, we made some corrections, which\ \ were mainly due to inconsistencies in punctuation and capitalised letters.\n" language: - lat script: - iso: Latn script-type: only-manuscript time: notBefore: '1540' notAfter: '1580' hands: count: '1' precision: exact license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - count: 4040 metric: lines - count: 142 metric: files - count: 155 metric: regions - count: 144301 metric: characters production-software: Transkribus automatically-aligned: false _bibtex: "@dataset{peter_stotz_2021_4780947,\n author = {Peter Stotz and\n\ \ Phillip Ströbel},\n title = {{bullinger-digital/gwalther-handwriting-ground-\ \ \n truth: Initial release}},\n month = may,\n year\ \ = 2021,\n publisher = {Zenodo},\n version = {v1.0},\n doi\ \ = {10.5281/zenodo.4780947},\n url = {https://doi.org/10.5281/zenodo.4780947}\n\ }" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: BiblIA url: https://zenodo.org/record/5167263 project-name: "Scripta PSL\n" project-website: https://escripta.hypotheses.org/ authors: - name: Stökl Ben Ezra surname: Daniel roles: - transcriber - project-manager - name: Brown-DeVost surname: Bronson - name: Jablonski surname: Pawel - name: Kiessling surname: Benjamin - name: Lolli surname: Elena - name: Lapin surname: Hayim description: "This dataset for Handwritten Text Recognition includes layout segmentation\ \ (regions, toplines and linepolygons) and unicode-transcriptions in alto 4.2\ \ XML for 202 images of Medieval Hebrew manuscripts from the Bibliothèque nationale\ \ de France (BnF, National Library of France) and the Biblioteca Apostolica Vaticana\ \ (BAV, Vatican Library) corresponding to the article \"BiblIA - a General Model\ \ for Medieval Hebrew Manuscripts and an Open Annotated Dataset\" by Daniel Stökl\ \ Ben Ezra, Bronson Brown-DeVost, Pawel Jablonski, Benjamin Kiessling, Elena Lolli,\ \ and Hayim Lapin, published in HIP@ICDAR 2021 held in Lausanne, September 2021.\n" language: - heb script: - iso: Hebr script-type: only-manuscript time: notBefore: '1000' notAfter: '1499' hands: count: more-than-10 precision: exact license: - name: CC-BY-SA 4.0 url: https://creativecommons.org/licenses/by-sa/4.0/ format: Alto-XML volume: - metric: files count: 202 - metric: pages count: 202 - metric: lines count: 12461 - metric: regions count: 509 - metric: characters count: 278641 transcription-guidelines: "See the guidelines detailed in Stoekl Ben Ezra Daniel,\ \ Brown-DeVost Bronson, Jablonski Pawel, Lapin Hayim, Kiessling Benjamin, and\ \ Lolli Elena. 2021. BiblIA - a General Model for Medieval Hebrew Manuscripts\ \ and an Open Annotated Dataset. In The 6th International Workshop on Historical\ \ Document Imaging and Processing (HIP '21). Association for Computing Machinery,\ \ New York, NY, USA, 61–66. DOI:https://doi.org/10.1145/3476887.3476896'\n" production-software: eScriptorium + Kraken automatically-aligned: false _bibtex: "@dataset{stokl_ben_ezra_2021_5167263,\n author = {Stökl Ben Ezra,\ \ Daniel and\n Brown-DeVost, Bronson and\n Jablonski,\ \ Pawel and\n Kiessling, Benjamin and\n Lolli,\ \ Elena and\n Lapin, Hayim},\n title = {BiblIA - an Open\ \ Annotated Dataset},\n month = aug,\n year = 2021,\n publisher\ \ = {Zenodo},\n version = {1.0},\n doi = {10.5281/zenodo.5167263},\n\ \ url = {https://doi.org/10.5281/zenodo.5167263}\n}" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: The POPP datasets url: https://zenodo.org/record/6581158 authors: - name: Thomas surname: Constum roles: - aligner - quality-control - support - name: Nicolas surname: Kempf - name: Pierrick surname: Tranouez - name: Thierry surname: Paquet roles: - project-manager - name: Sandra surname: Brée orcid: 0000-0002-2802-5563 roles: - transcriber - project-manager - name: François surname: Merveille roles: - transcriber institutions: [] description: >- The POPP datasets is a set of 3 datasets created within the POPP project (Project for the Oceration of the Paris Population Census) for the task of handwriting text recognition. These datasets have been published in "Recognition and information extraction in historical handwritten tables: toward understanding early 20th century Paris census" at DAS 2022. The 3 datasets are called “Generic dataset”, “Belleville”, and “Chaussée d’Antin” and contains lines made from the extracted rows of census tables from 1926. Each table in the Paris census contains 30 rows, thus each page in these datasets corresponds to 30 lines. project-name: Project for the Oceration of the Paris Population Census project-website: https://popp.hypotheses.org language: - fra production-software: Pivan script: - iso: Latn script-type: only-manuscript time: notBefore: '1926' notAfter: '1926' hands: count: more-than-10 precision: estimated license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML volume: - metric: lines count: 7050 transcription-guidelines: > The text is transcribed as in the image (no correction of mispelling, no resolution of abbreviation). Since the lines are extracted from table rows, we defined 4 special characters to describe the structure of the text: ¤ : indicates an empty cell / : indicates the separation into columns ? : indicates that the content of the cell following this symbol is written above the regular baseline ! : indicates that the content of the cell following this symbol is written below the regular baseline automatically-aligned: false _bibtex: "@dataset{constum_2022_6581158,\n author = {CONSTUM, Thomas and\n\ \ KEMPF, Nicolas and\n PAQUET, Thierry and\n\ \ TRANOUEZ, Pierrick and\n CHATELAIN, Clément\ \ and\n BREE, Sandra and\n MERVEILLE, François},\n\ \ title = {{POPP Datasets : Datasets for handwriting \n \ \ recognition from French population census}},\n month = may,\n year\ \ = 2022,\n publisher = {Zenodo},\n version = {v1.0},\n doi\ \ = {10.5281/zenodo.6581158},\n url = {https://doi.org/10.5281/zenodo.6581158}\n\ }" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Wien ÖNB Cod. 2160 f. 164-184 Ground Truth from HTR Winter School 2022 url: https://zenodo.org/record/7467027#.Y6LRj3bMK3B authors: - name: Geelhaar surname: Tim orcid: 0000-0002-7653-5859 roles: - transcriber - project-manager - name: D'Amico surname: Sara orcid: 0000-0002-8937-2040 roles: - transcriber - name: Hofmann surname: Lara orcid: 0000-0003-4698-3906 roles: - transcriber - name: Gnasso surname: Alessandro orcid: 0000-0001-5964-2989 roles: - transcriber - name: Audebrand surname: Justine roles: - transcriber - name: Stitts surname: Jeremy orcid: 0000-0001-6988-1836 roles: - transcriber - name: Sweeney surname: Mary orcid: 0000-0001-7028-2072 roles: - transcriber - name: Atwood surname: Grace orcid: 0000-0002-1546-6546 roles: - transcriber institutions: [] description: >- This is Ground Truth data created during the HTR Winter School 2022 for the Cod. 2160 ÖNB that contains one version of the so called Lex Dei. project-name: HTR Winter School 2022, Vienna language: - lat production-software: Transkribus script: - iso: Latn qualify: Carolingian Minuscule script-type: only-manuscript time: notBefore: '850' notAfter: '900' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Alto-XML sources: - reference: '' link: http://data.onb.ac.at/rec/AC13956457 volume: - metric: pages count: 40 transcription-guidelines: >- Abbreviations resolved, but no normalization and no correcting of mispelling. No transcription of initials and interlinear script. automatically-aligned: false _bibtex: "@dataset{attwood_2022_7467027,\n author = {Attwood and\n \ \ Sweeney and\n Stitts and\n Audebrand\ \ and\n D'Amico and\n Geelhaar and\n \ \ Hofmann and\n Gnasso},\n title = {{Wien ÖNB\ \ Cod. 2160 f. 164-184 Ground Truth from \n HTR Winter School\ \ 2022}},\n month = dec,\n year = 2022,\n publisher = {Zenodo},\n\ \ doi = {10.5281/zenodo.7467027},\n url = {https://doi.org/10.5281/zenodo.7467027}\n\ }" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Padeřov-Bible-handwriting-ground-truth url: https://zenodo.org/record/7467034#.Y6LQZBWZM2w authors: - name: Anna surname: Michalcová orcid: 0000-0003-4760-6950 roles: - transcriber - aligner - project-manager - quality-control - support - name: Jan surname: Odstrčilík orcid: 0000-0001-9104-9827 roles: - project-manager - support - name: Laura surname: Maniaková roles: - transcriber - name: Eliška surname: Pěnkavová orcid: 0000-0002-5494-8847 - name: Kamil surname: Bazelides orcid: 0000-0002-5199-8726 - name: Jan surname: Hajič orcid: 0000-0002-9207-567X - name: Hana surname: Kreisingerová orcid: 0000-0002-2924-598X - name: Jitka surname: Filipová orcid: 0000-0002-3570-4038 - name: Chi-hung surname: Liu - name: Martina surname: Dvořáková institutions: - name: Institute of the Czech Language - name: Masaryk Institute and Archives description: >- This is ground truth based on the Padeřov Bible (Vienna, Austrian National Library, shelfmark Cod. 1175, 1432–1435), the bible of the third redaction of the Old Czech Bible translation. The transcription rules were based on semi-diplomatic transcription rules set by PERO OCR and Směrnice pro vydávání starších českých textů set by Jiří Daňhelka (https://vokabular.ujc.cas.cz/moduly/edicnipoznamka.aspx?id=DanhelkaSmernice). Abbreviations were tagged and expanded. project-name: HTR Winter School 2022, Vienna project-website: >- https://www.oeaw.ac.at/imafo/veranstaltungen/detail/introduction-into-handwritten-text-recognition-1 language: - ces production-software: Transkribus script: - iso: Latn script-type: only-manuscript time: notBefore: '1432' notAfter: '1435' hands: count: '1' precision: exact license: - name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Page-XML sources: - reference: '' link: >- https://search.onb.ac.at/primo-explore/fulldisplay?docid=ONB_alma21302405460003338&context=L&adaptor=Local%20Search%20Engine&vid=ONB&lang=de_DE&search_scope=ONB_gesamtbestand&tab=default_tab&query=addsrcrid,exact,AC13954505 volume: - metric: pages count: 63 transcription-guidelines: >- Transliteration. Differentiates long and short "s". Abbreviations tagged and expanded. No misspelling corrections. automatically-aligned: false _bibtex: "@dataset{michalcova_2022_7467034,\n author = {Michalcová, Anna\ \ and\n Bazelides, Kamil and\n Hajič, Jan and\n\ \ Pěnkavová, Eliška and\n Maniaková, Laura and\n\ \ Kreisingerová, Hana and\n Filipová, Jitka\ \ and\n Chi-hung Lu and\n Dvořáková, Martina},\n\ \ title = {{Padeřov-Bible-handwriting-ground-truth: Initial \n \ \ release}},\n month = dec,\n year = 2022,\n publisher\ \ = {Zenodo},\n doi = {10.5281/zenodo.7467034},\n url =\ \ {https://doi.org/10.5281/zenodo.7467034}\n}" - schema: https://htr-united.github.io/schema/2023-06-27/schema.json title: Belfort url: https://zenodo.org/record/8041668 authors: - name: Solène surname: Tarride orcid: 0000-0001-6174-9865 - name: Tristan surname: Faine - name: Mélodie surname: Boillet orcid: 0000-0002-0618-7852 - name: Harold surname: Mouchère orcid: 0000-0001-6220-7216 - name: Christopher surname: Kermorvant orcid: 0000-0002-7508-4080 institutions: [] description: > This dataset includes minutes of Belfort municipal council drawn up between 1790 and 1946. Documents include deliberations, lists of councillors, convocations, and agendas. The dataset includes 24,105 text-line images that were automatically detected from pages. Up to four transcriptions are available for each line image: * two from human annotators (in `Transcriptions/callico_1/` and `Transcriptions/callico_2/`) * two from automatic models (in `Transcriptions/dan/` and `Transcriptions/pylaia/`) project-name: Handwritten Text Recognition from Crowdsourced Annotations project-website: https://arxiv.org/abs/2306.10878 language: - fra production-software: Callico script: - iso: Latn script-type: only-manuscript time: notBefore: '1790' notAfter: '1946' hands: count: more-than-10 precision: estimated license: name: CC-BY 4.0 url: https://creativecommons.org/licenses/by/4.0/ format: Image-Text-Pairs sources: - reference: >- Solène Tarride, Tristan Faine, Mélodie Boillet, Harold Mouchère, & Christopher Kermorvant. (2023). The Belfort dataset: Handwritten Text Recognition from Crowdsourced Annotations [Data set]. 7th International Workshop on Historical Document Imaging and Processing (HIP'23), San José, California, USA. Zenodo. https://doi.org/10.5281/zenodo.8041668 link: https://arxiv.org/abs/2306.10878 volume: - metric: lines count: 24105 _bibtex: "@dataset{solene_tarride_2023_8041668,\n author = {Solène Tarride\ \ and\n Tristan Faine and\n Mélodie Boillet\ \ and\n Harold Mouchère and\n Christopher Kermorvant},\n\ \ title = {{The Belfort dataset: Handwritten Text Recognition \n \ \ from Crowdsourced Annotations}},\n month = jun,\n year\ \ = 2023,\n publisher = {Zenodo},\n doi = {10.5281/zenodo.8041668},\n\ \ url = {https://doi.org/10.5281/zenodo.8041668}\n}"