AI can classify texts, it can recognise names, addresses and other entities, it knows what a customer asks for in a chat box and it can even write full summaries of documents. We command our phones and smart speakers with our voice and more and more organisations see opportunities in further automating customer services or automatically transcribing conversations, meetings, and presentations. Also, opportunities are growing to automatically transcribe sign languages using a combination of computer vision and language models, and to generate sign language animations using game technology. All of this is made possible in part by training AI on large data sets.
In practice, however, it appears that performance decreases as soon as AI is applied to languages that are used in the Netherlands, especially when it comes to spoken Dutch dialects, the Dutch Sign Language (NGT), accents, slang or domain-specific language expressions. Although there are initiatives that have trained state-of-the-art AI on large amounts of Dutch data (think of BERTje from the University of Groningen), there is still a lot to be gained in the field of Dutch AI. Both in terms of knowledge and technology, and in terms of the bias and transparency of language models.
The aim of the project is to make language and speech technology available to anyone who speaks Dutch, writes Dutch, or uses Dutch sign language, in any variety, not being dependent on large foreign commercial parties. We want to join forces and make major improvements in language and speech technology in the Netherlands, particularly because collecting and transcribing relevant training material is not feasible for every individual Dutch organisation.
Deployment of Artificial Intelligence
NAIN aims to set up an infrastructure for Dutch AI for Dutch languages, in all its modalities: text, speech, and sign language. Domain-specific building blocks such as security, healthcare and culture are built on this infrastructure. The challenge is to build AI reliably and responsibly. Issues surrounding sharing data will also be thoroughly examined. The NAIN project builds on knowledge and experience already gained from the market and research, for example the STEVIN program and BERTje. It tries to bring together ongoing initiatives in order achieve optimal solutions.
What challenge does it solve?
Developing an own Dutch infrastructure for speech, text and sign language gives sovereignty. Currently, models are mainly used for English, developed by foreign multinationals such as Google. Moreover, a language model developed specifically for Dutch languages potentially provides better performance, broader applications and more control over development.
First result
The NAIN consortium has presented a report (in Dutch) about the current state of Dutch language and speech technology in the Netherlands and Flanders. The report is an important starting point for further work that can be done in the coming five years to develop well-functioning Dutch language technologies. The results will be usable everywhere in Dutch society, enabling an enormous diversity of applications with great public and economic value.
Collaboration partners
In this project, led by TNO, participate the working group Security, Peace and Justice, the Ministry of Justice and Security and the NFI. They cooperate closely with the NL Speech Coalition, working groups Culture and Media, Health and Care, the business community and knowledge institutions. Subgroups are being set up on topics including speech, text, sign language and data sharing and responsible AI in order to take the proposal further in terms of content.