This section of the K-OAr Center’s site is a porting of some content from the Speech Data & Tech website, formerly at https://speechandtech.eu/ (you can find its latest Internet Archive copy here).
Welcome to Speech Data & Tech
The initiators of the website are a group of experts interested in speech data with very different backgrounds – oral history, computational linguistics, anthropology, sociolinguistics, phonetics and phonology. We all have an interest in exploring how technology can be integrated into research that involves spoken narratives.
This may vary from very basic technologies, such as the conversion of recorded speech from analogue to digital, or more elaborate ones, such as ASR applied to automatically generate transcripts of the spoken content.
We first offer basic information about the epistemological focus of research domains that deal with speech data. This is followed by an explanation of relevant technologies and related tools that researchers can use. We then share our knowledge in the sections Workshops, showcases and publications.
Last but not least, you can access the first concrete result of our efforts: The Transcription Portal, an open source service for automatic transcription in English, German, Dutch and Italian.
What drives us?
We could use text to describe what drives us, but as scholars interested in the multilayered nature of speech and of non-verbal communication, we prefer to present three short video-talks by key-members of our group in which they explain why they are members of this network. Listen to what speech technologist Henk van de Heuvel, sociolinguistic Silvia Calamai, and data curator and former member of our group Louise Corti have to say in their own mother tongue.
Their talks offered us the opportunity to apply some of the tools of which we would like to increase the uptake amongst scholars. See below the video-clips for an explanation.
We have applied Automatic Speech Recognition (ASR) to transcribe the spoken content of the video’s and automatic translation of Google translate to translate it into two other languages. If you want to manage the transcripts click on the closed caption button.
In the video of speech technologist Henk van de Heuvel, ASR software for Dutch has been applied, developed at the university of Twente and the Radboud University of Nijmegen, to high-quality audio (quiet environment, microphone of good quality) and automatic translation from Dutch to Italian and English with Google Translate. The subtitles were added with the tool SubtitleEdit.
In the video of sociolinguist Silvia Calamai, ASR software for Italian has been applied, developed by Piero Cosi at the University of Padova, to a high quality audio (quiet environment, microphone of good quality), and automatic translation from Italian to Dutch and English with Google Translate.
In the video of data specialist Louise Corti, ASR software for English has been applied developed at the University of Sheffield, to a high quality audio (quiet environment, microphone of good quality), and automatic translation from English to Dutch and Italian with Google Translate.
Who we are

The group of people working on this website since sept – 2016, consists of:
Arjan van Hessen received a master in Geophysics, a PhD in Phonetics and is working in the field of Human Language Technology since ’91. His main interest is in applying the various HLT techniques in both the academic/research and the real world. Through his work as director-user-involvement at CLARIAH and as Head of Imaganation at Telecats, he is in the middle of the world of public-private collaboration.
Dr. Stef Scagliola is a historian specialised in digital audiovisual archives, with an emphasis on oral history collections. She has worked as a post doc at the Centre for Contemporary and Digital History at the University of Luxembourg, and at the Erasmus School of History, Culture and Communication at the Erasmus University in Rotterdam. She has published on this topic, and was involved in the creation of various digital oral history collections: Interview Collectie Nederlandse Veteranen, Croatian Memories, Bosnian Memories, Warlovechild.
Christoph Draxler, Bavarian Archive for Speech Signals (BAS), Ludwig-Maximilians-Universität Munich, is the head of the corpus and tools group. Christoph studied computer science at TU Munich and Romance literature and linguistics at LMU Munich. In his PhD dissertation he developed database predicates in Prolog to access relational databases. At the BAS, he was responsible for the collection of several large-scale speech databases, e.g. SpeechDat II and SpeechDat-Car (German), Ph@ttSessionz, VOYS, and he has developed a number of speech tools, e.g. SpeechRecorder, WebTranscribe, and the online perception experiment tool percy. His research interests are crowdsourcing for speech processing and regional variations of spoken language.
Henk van den Heuvel is an expert on production and curation of language and speech databases, transcription of speech (orthographical and phonetic); data science for speech technology; and automatic speech recognition. He is director of the Centre for Language and Speech Technology (CLST) and Head of the Humanities Lab at the Faculty of Arts. He is also Research Data Manager (data steward) for the Faculty of Arts.
Silvia Calamai is full professor in Linguistics and Sociolinguistics at the University of Siena. She is member of the CLARIN Legal Issues Committee and the CLARIN-IT group. At present, she is coordinating two projects on oral archives (“Archivio Vivo”, Regione Toscana 2019-2021; LISTEN Landscape in Sounds through Eco-Museums network, Ecomuseo del Casentino, 2020-22); and the scientific committee of the Historical Archive of the Arezzo psychiatric hospital (2017-). She is on the board of Italian Association of Speech Sciences and of Sonorités Bulletin de l’AFAS Association française des détenteurs de documents audiovisuels et sonores. Her main research interests are sociophonetics, oral archives and dialectology.
About this website
The main goal of this website is to give an overview of technology that can be used in the processing of Spoken Content data in general: from an analogue tape and perhaps a handwritten summary to a digital recording including digital transcripts, speaker allocation/recognition (who is speaking when), emotion-markers, speech velocity and much more.
One may think of technologies working on the primary data (the spoken content) such as ADC (Analogue-Digital Conversion), ASR (Automatic Speech Recognition) that can be used to automatically generate transcripts of the spoken content, OCR (Optical Character Recognition) that can be used to digitize (handwritten or typed) transcripts on paper, Audio-Video converters that can be used to convert the recordings from a less suitable digital format into a more common one, or Speaker Diarization & Speaker Recognition (deciding who is talking when).

But software tools can also be used to work on those digital transcripts and related metadata. One may think of LIWC (Linguistic Inquiry & Word Count, a computerized text analysis program that is considered as a gold standard) or Emotion Detection where both the spoken words and the tone-of-voice are used to “calculate” the emotion of the speaker(s).
All these technologies can be used for different disciplines of Spoken Content. However, on this website we will sometimes focus on Oral History recordings: recordings of mostly one interviewer and one interviewee about a special event in the past or someone’s life during a now closed period.

