OCR with R


Friday, November 4th 2022
9 –
P612 Computational Humanities Lab (Hauptcampus Leipzig University, Augustusplatz 10, 04109 Leipzig)

Language of instruction: English (but questions can be made in German)

Photo by C M on Unsplash

Learn how to extract text and tables from your scanned or digitally born PDFs in this 2½-hour workshop using RStudio, R and the {pdftools} package. Details will be shared in an email once you’ve registered.
There are only 12 spots – so register now and secure yours!

The workshop will take place in English. Please bring you own laptop. No prior technical knowledge is necessary to participate in the workshop. If you want to participate in this face-to-face workshop, please register here. Please consider the Leipzig University Covid-19 regulations.

The workshop will be held by Silvia Gutiérrez. She is a PhD candidate and DAAD Scholarship recipient at the Computational Humanities Group in Leipzig. She is a trained digital humanist (Würzburg University / King’s College London) with a background in Linguistics and Spanish philology. In her PhD, she investigates state-of-the-art text mining approaches to detect and extract patterns in thousands of digitized dissertations.