DARIAH-IE

  • Home
  • About
    • Partner Institutions
    • National Advisory Committee
    • National Coordinating Institution
    • Research Ireland
    • Contact Us
  • Resources
    • DARIAH in a nutshell
    • DARIAH-Campus
    • SSH Open Marketplace
    • Transformations Journal
    • ECR Bursary
    • Research Landscape – European Initiatives, Projects and Clouds of relevance to Ireland
  • News & Events
    • DARIAH-IE Introduces Series
    • Events & Announcements
    • DARIAH-EU News & Events
    • Past Events
      • (Re)introducing DARIAH-IE
  • Newsletter
    • Newsletter Archive
The FLOW Project: A Modular Workflow for Automatic Text Recognition and Beyond [Feb 18, online @ 15:00 GMT]

Methods

The FLOW Project: A Modular Workflow for Automatic Text Recognition and Beyond [Feb 18, online @ 15:00 GMT]

12th February 2026 by Joan Murphy

The FLOW Project: A Modular Workflow for Automatic Text Recognition and Beyond

Bodleian Bytes

18 February 15:00 to 16:00

Online event. Registration required.

Registration: https://app.onlinesurveys.jisc.ac.uk/s/oxford/registration-bodleian-bytes-the-flow-project

Historical research often involves working with highly diverse and complex source materials, ranging from handwritten manuscripts to large, heterogeneous document collections. Machine learning methods are increasingly shaping how historians work with digitised sources, particularly through Automatic Text Recognition (ATR). In this talk, Jonas Widmer and Dana Meyer will introduce The FLOW, a modular, microservice-based framework designed to support machine learning–driven data management and processing in the Digital Humanities.

The talk will outline how The FLOW separates complex ATR workflows such as pre-processing, model training, inference, and evaluation into independent, reusable components that can be combined flexibly and accessed without programming experience. Using state-of-the-art transformer-based models, the project aims to make advanced text recognition workflows more transparent, reproducible, and scalable across diverse historical datasets.

Jonas and Dana will outline a typical FLOW workflow, showing how datasets are managed on the Hugging Face platform and then processed step by step. The focus will be on how such workflows can support everyday research practices when working with large and heterogeneous historical corpora.

Speaker Biographies

Jonas Widmer is a Research Software Engineer specialising in Digital Humanities at the University of Bern. In this role, he assists in planning and developing projects focused on Natural Language Processing. His primary interest lies in Handwritten Text Recognition (HTR), where he engages with historical projects and their diverse sources.

Dana Meyer is a Master’s student in Intelligent Interactive Systems at Bielefeld University and works as a research assistant on the project The Flow in the Digital History group at Bielefeld University

Jonas Widmer

Jonas Widmer

Dana Meyer

Dana Meyer

Bodleian Bytes

Bodleian Bytes is a series of online talks hosted by the Centre for Digital Scholarship at the Bodleian Libraries. The series engages with innovative national and international research in digital scholarship. It is a virtual space for discussions surrounding different tools and methodologies whilst also providing inspiration for future digital research.

Event Details and Registration

Registration is required for this free online event. Registration closes at 17.00 on Monday 16 February 2026.

Date and time: Wednesday 18 February, 15:00-16:00 (UK time)

Location: Online via Zoom.

For further information, please email the Centre for Digital Scholarship: cds@bodleian.ox.ac.uk.

Centre for Digital Scholarship

The Centre for Digital Scholarship (CDS) at the Bodleian Libraries is a space and place for engaging, leading and shaping discussions around digital scholarship practice and research within and beyond the University of Oxford. 

Share this:

  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Facebook (Opens in new window) Facebook
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
Posted in: AI, Machine Learning, Methods, Webinar Tagged: Automatic Text Recognition, Digital Scholarship at Oxford

SynFlow: Continuous Semantics Change Analysis via Dependency Co-occurences [Jan 26, online, @17:00 GMT]

21st January 2026 by Joan Murphy

SynFlow: Continuous Semantics Change Analysis via Dependency Co-occurences

The first talk of the Data in Historical Linguistics Seminar Series 2026 will take place remotely on Monday 26 January 2026 at 5pm GMT. Bách Phan-Tất (KU Leuven, Belgium) will be presenting on SynFlow: Continuous Semantics Change Analysis via Dependency Co-occurences

Registration for this talk will close at midnight on the Friday before the event and the link for this can be accessed here: https://forms.gle/HEnpTKreXdrZqjfA8 

Participants will receive a Microsoft Teams link via email on the morning of the talk. 

The abstract for this talk can be found at this page.

The programme and registration links for all talks in the series can be found on our website: 

2026 Programme

This seminar series is run by Andrea Farina (King’s College London) and Dr Mathilde Bru and is aimed at PhD students and early career researchers. The purpose of this seminar series is to bring together researchers working on historical linguistics with a quantitative approach, and to discuss current avenues of research in this topic. We hope that these seminars will nurture international collaboration and establish academic ties among researchers working on similar topics in this field.

Share this:

  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Facebook (Opens in new window) Facebook
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
Posted in: Digital Humanities, Methods, Webinar Tagged: Computational Analysis, Computational Humanities, Linguistics, Semantics

Technical Writing in the Humanities: a facilitated writing sprint [Dec 15, online, 13:30-15:00 GMT]

12th December 2025 by Joan Murphy

Technical Writing in the Humanities: a facilitated writing sprint

The Digital Skills in Arts and Humanities Network (DISKAH) is organising a webinar on “Technical Writing in the Humanities: a facilitated writing sprint” in collaboration with the Programming Historian to support interested colleagues in developing a publication targeted to this journal, and more widely in communicating your technical workflows within Digital Humanities research to relevant audiences.

Webinar date and time: Monday 15 December, 13:30-15:00 (GMT)

Please register here for the webinar: https://www.eventbrite.co.uk/e/diskah-webinar-technical-writing-in-the-humanities-tickets-1976718137160

Further information about the webinar: https://culturedigitalskills.org/webinar-diskah-programming-historian/

Share this:

  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Facebook (Opens in new window) Facebook
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
Posted in: Digital Humanities, Events, Methods, workshops Tagged: Digital Humanities, Programming Historian, Technical Writing

Reference Extraction at the Intersection of AI Research and the Digital Humanities: Validation, Interoperability and Collaboration [Nov 4, hybrid]

3rd November 2025 by Joan Murphy

Reference Extraction at the Intersection of AI Research and the Digital Humanities: Validation, Interoperability and Collaboration

This informal meeting is meant mainly to foster collaboration and knowledge exchange between researchers and practitioners working at the intersection of data extraction, artificial intelligence, and the digital humanities. In the workshop, we continue to address the challenge of extracting heterogeneous references from texts, particularly from historical documents and humanities or legal scholarship. This second workshop focuses on three key themes emerging from the 2023 discussions:

  1. Validation: How can we evaluate and benchmark the performance of different reference extraction tools and approaches, particularly with large language models?
  2. Interoperability: How can we ensure that different tools, datasets, and workflows can work together effectively through shared data models and formats?
  3. Collaboration: How can researchers, developers, and institutions work together to advance the field of reference extraction?

The program is available online at: https://mpilhlt.github.io/reference-extraction/workshop-2025/programme/

The event will take place in-person and online. Register at https://plan.events.mpg.de/e/refextract25 

A link for online attendance will be sent to registered participants before the event. Also, even if you cannot attend, but want to be informed about updates, materials being made available, etc. you can notify us about this at the registration link.

Programme

Tuesday 04 November 2025

Onboarding

09:00-09:15 Arrival/Registration

09:15-09:45 Christian Boulanger/Andreas Wagner (mpilhlt): Welcome and Upshot from RefExtract2023, State of the Discussion

09:45-10:00 Coffee Break

Research presentations

10:00-12:30

  1. Hiba Arnaout (TU Darmstadt): In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis
  2. Yurui Zhu/Matteo Romanello (Odoma): Benchmarking Large Language Models on Reference Extraction and Parsing in the Social Sciences and Humanities
  3. Sofía Aguilar Valdez (Saarland University): How Scientific Ideas Evolve
  4. Open Discussion and Ad-Hoc Presentation of Research

12:30-13:30 Lunch

Datasets, Infrastructure and Interoperability

13:30-15:30

  1. Angelo Di Iorio/Matteo Guenci/Marta Soricetti*/Silvio Peroni/Lorenzo Paolini*/Ivan Heibi (University of Bologna): Citation Extractor and Classifier: Pipeline and Datasets (*presenting)
  2. Tamara Heck/Christoph Schindler/Verena Weimer/Philipp Mayr/Ahsan Shahid (DIPF/GESIS): Open Citation Data for Educational Research
  3. Christian Boulanger, Andreas Wagner (mpilhlt): Datasets in the Legal Theory Knowledge Graph Project
  4. Interoperability Roundtable: Open Discussion on Data Models and Data Formats

15:30-16:00 Coffee Break

Tools, Workflows and Pipelines

16:00-17:30

  1. Raphael Schlattmann/Malte Vogl (mpigea)/Aleksandra Kaye (TU Berlin/mpigea): LLM-Based Knowledge Graph Extraction Pipeline
  2. Luca Foppiano (ScienciaLAB): Training the Grobid Reference Extraction Models
  3. Christian Boulanger/Andreas Wagner (mpilhlt): Annotation Tools for Machine Learning: PDF-TEI Editor (for LLamore & Grobid), Prodigy, TEI-Publisher

17:30-18:30 Takeaways, Way Forward, Closing

Share this:

  • Share on Bluesky (Opens in new window) Bluesky
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Facebook (Opens in new window) Facebook
  • Share on Reddit (Opens in new window) Reddit
  • Email a link to a friend (Opens in new window) Email
Posted in: AI, Data, Digital Humanities, Events, Methods, Research IT, TEI, Tools, Workflows Tagged: AI, Data Science, Digital Humanities, Events, Methods, TEI, Tools, Workflows

News & Upcoming Events

  • Introduction to the Digital Humanities Climate Coalition Toolkit [Mar 18 @ 13:00 EDT, online]
  • Interdisciplinary User Requirements in Burial Cultural Heritage [Survey closes Mar 15]
  • Videogame Preservation [March 12, online seminar, 10-17 GMT]
  • Considering a Digital Humanities PhD in the UK ? Seminar [Mar 9 @ 16:30 online]
  • Demystifying Data Journals [Mar 10 @ 13:00, online]
  • Beyond The Frame: Network, Infrastructure and Vernacular in the Making of Environmental Visuals [Mar 16 @ 17:00 CET online]
  • DRI Reproductive Justice Hackathon [Mar 7 @ 12:30, in person, Dublin]
  • Horizon Europe Research Infrastructures Info Day [18 March, online]

DARIAH-IE is funded by Research Ireland

Unless stated otherwise all contents of this site are licensed under CC-BY-4.0-Licence

Copyright © 2026 DARIAH-IE.

Custom WordPress Theme by themehall.com