Persistent Identifier
|
doi:10.18710/VSMP3U |
Publication Date
|
2024-01-17 |
Title
| Replication Data for: Alea iacta est. Insights from corpus semantics into the diachrony of the Latin passive |
Author
| Aerts, Simon (Ghent University) - ORCID: 0000-0003-1852-9255 |
Point of Contact
|
Use email button above to contact.
Aerts, Simon (Ghent University) |
Description
| Dataset includes annotated corpus data from Latin texts from the 3rd c. BCE until the 10th c. CE. Attestations of forms of both the original construction ('PP-BE.inf.', e.g. 'cantatus est') and the innovation ('PP-BE.perf.', e.g. 'cantatus fuit') for the expression of the passive (taken together with forms of deponent verbs into 'nonactive') of perfectum stem tenses were extracted from major online corpora (331.131 data points); a random sample (n = 383) of data points of PP-BE.inf. that represents all text types and time periods as evenly as possible was then subjected to a close-reading analysis in order to ascertain the attestation rate of meanings for which it competed with the innovation PP-BE.perf. (specialized in ANTERIORITY) and the present tense (specialized in present events or situations). Only the data points that were annotated in full in this second phase are included in the current dataset. For the data points examined in the first phase, only the formal categories in the list below were annotated to the extent that these annotations are not subject to interpretation. (2023-08-22) |
Subject
| Arts and Humanities |
Keyword
| Latin passive system
quantitative and qualitative linguistics
form-function pairings
tense-aspect systems
diachronic linguistics |
Related Publication
| Aerts, S. “Alea iacta est. Insights from corpus semantics into the diachrony of the Latin passive.” Submitted for review. |
Language
| English |
Producer
| Ghent University https://www.ugent.be/en |
Contributor
| Data Curator : Cluyse, Brian |
Funding Information
| Research Foundation - Flanders: Grant number: 1282722N |
Distributor
| The Tromsø Repository of Language and Linguistics (TROLLing) (TROLLing) https://trolling.uit.no/ |
Depositor
| Aerts, Simon |
Deposit Date
| 2023-08-22 |
Time Period
| Start Date: 200BCE ; End Date: 0950 |
Date of Collection
| Start Date: 2021-11-01 ; End Date: 2023-04-15 |
Data Type
| Annotated corpus data |
Series
| Tracing change and reaction in the Latin tense system: The datasets in this series contain the replication data for research papers published within the FWO-funded project "Tracing change and reaction in the Latin tense system: an empirical analysis of language-internal and language-external influences on the development of morphological innovations and form-function pairings from Early Latin to Early Romance". |
Software
| R |
Data Source
| The data contained in this dataset originate from the following sources:
- CDC: Codex diplomaticus Cavensis Vol. 1. (8th - 10th c. CE). Diplomatic charters from the context of the Lombard rule of Central Italy (Campania)(n = 505; 0,15%). This data is accessible under the CC BY-NC-ND license.
- ECDS: Epigraphik-Datenbank Clauss/Slaby. Epigraphic texts (0.99 % of all attestations). ECDS does not provide a user license / Terms of Use, except for the following disclaimer: "All texts, pictures and graphics published on this website are subject to copyright and other laws for the protection of intellectual property".
- LASLA: Laboratoire d’Analyse Statistique des Langues Anciennes - Hyperbase. (2nd c. BCE – 2nd c. CE): classical, literary texts (5,23% of all attestations). LASLA does not provide a user license / Terms of Use, except for the general copyright statement in the about section of the LASLA Opera Latina website: Copyright LASLA - CIPL 2014.
- LLT: Library of Latin texts. (3rd c. BC - 8th c. CE): all text types from all periods of natural language use (n = 13.119; 92,78% of all attestations). LLT is part of the BREPOLiS databases, for which the BREPOLiS Terms and Conditions apply. The BREPOLiS Terms and Conditions entitle users "to extract and re-utilize, for non-commercial purposes only, any insubstantial parts of the contents of the Database".
- PaLaFra: The transition from Latin to French: constitution and analysis of a Latin-French digital corpus. PaLaFra-Lat-V2 (5th - 10th c. CE): various text types, mainly from the Merovingian period (0,84% of all attestations). The subcorpus PaLaFraLat is accessible under the CC BY-NC-SA 4.0 license.
- Papyri.info: Papyri.info. 2nd - 5th c. CE. Texts on papyri (mainly personal letters) which provide direct access to everyday language (n = 22; < 0,01%). Papyri.info does not provide any user license / Terms of Use.
The extracted text fragments that are contained in the data file of this dataset only represent non-substantial portions of the sources listed above, and they do not represent coherent larger texts. Therefore, the reuse (including redistribution) of these excerpts is permitted by the exceptions rules in IPR and database protection regulations, such as Fair use (USA cf. US Copyright Act), Fair dealing (UK; cf. Exceptions to copyright), "lover, forskrifter, rettsavgjørelser og andre vedtak av offentlig myndighet" (Norway; cf. § 14 in Åndsverkloven), "uvesentlige deler av databaser" (Norway; cf. § 24 in Åndsverkloven), "sitatretten" (Norway; cf. § 29 in Åndsverkloven). As these excerpts do not represent substantial parts of the reused sources, the redistribution of these excerpts is according to Creative Commons (CC) also permitted if they are extracted from sources that are distributed under Creative Commons licenses (cf. question "Do I always have to comply with the license terms? If not, what are the exceptions?" in the Creative Commons Frequently Asked Questions). |