„(Digital) Research and Metadata“
The presentation addresses what research data are and how they can be made available in the short, medium and long term (reusable). In particular, the role of metadata and data formats will be discussed.
Almost all (digital) data can potentially become research data. Once a scientific project decides to use data to gain knowledge, these data inevitably become research data. Research also collects a variety of data in surveys and experiments, or creates collections and editions of digital/digitised artifacts (corpora, editions) that are the basis for further research. Often software – from scripts to stand-alone programs – is also included when talking about data, and in some projects data and related tools are hardly conceptually separated.
Metadata is the name for data that describes other data. However, depending on the research question, a ‚metadatum‘ can itself become a variable in the research design – and thus a research datum.
In recent years, FAIR has become a much-noted acronym: data should be Findable, Accessible (have regulated access), Interoperable (use interoperable formats), and thus overall Reuseable.
In our talk we will exemplify different ways from (digital) data to research data to archived files. We also show that information can be stored in many different ways. These examples are meant to invite participants to get more involved with data (formats). We explain how(so) metadata is central to the reusability of research data.
In addition, we show that various formats are suitable and useful for concrete use, but less suitable for sharing or long-term archiving of data in terms of interoperability and sustainability. Finally, we briefly address the question of scripts and tools and the problems of making them available to other researchers in the medium and long term.
Bernhard Fisseni studied computational linguistics, German linguistics and older German literature as well as computer science for his MA, and obtained his PhD from the University of Duisburg-Essen for research in formal pragmatics.
He has worked on corpora and text technology mainly in the context of German linguistics and processing of mathematical language. This includes computational archeology on the Bonn corpus of Early new High German from the 1970s, but also digital editions (e.g., the corpus of Kant’s works, the edition of the archive of the Counts von Platen). He is now working at Leibniz Institute for the German Language for the CLARIAH-DE project concerned with building a European infrastructure for linguistic resources.
He worked extensively with speech corpora and was involved in the compilation of the Karl-Eberhard-Corpus. Since 2017 he has been working at the Leibniz Institute for the German Language in long-term preservation group, where he also works in subprojects of CLARIN, CLARIAH-DE and in the nestor network.