Yesterday, Catalan exceeded 3,000 hours of voice clips recorded in the digital repository of Common Voice, and it is now only 200 hours away from replacing English as the language with the most hours recorded in this project promoted by the Mozilla Foundation. Catalan was already the second language with the most hours recorded for a few months and this month has also passed Rwandan as the second language with the most validated hours. Plataforma per la Llengua celebrates the ephemeride and henceforth will involve in voice collection in order to make it easier for companies to incorporate Catalan as a language of recognition and reproduction of voices in household appliances and other objects of everyday use with artificial intelligence.
Common Voice is a digital project that wants to forge a repository of voice clips from all languages around the world from the voluntary collaboration of people who make their voice input or validate other people's recording. This free database is segmented by gender, age and dialectal variant, and allows voice downloaders to whomever wants to develop and improve speech recognition software, such as home robots or voice assistants. The download of these voices can be done free of charge and under CC0 license, offering to the public domain without operating rights. Common Voice, which already has 136 languages, was created in 2017 by the Mozilla Foundation and has been promoted in the Catalan-speaking territories by Softcatalà. Later, in 2020, it was integrated as a work axis of the Aina project, an initiative of the Government of Catalonia and the Barcelona Supercomputing Center, with the collaboration of the Government of the Balearic Islands, which grew it significantly.
With the aim of further growing Catalan in Common Voice, Plataforma per la Llengua will launch a campaign to get much more voice donations for Catalan and, in a short time, try it to reach the top position of the languages with more hours recorded. In order to achieve sufficient representation of those dialects, genres and age groups hitherto under-represented in the database, the campaign will combine strong digital activity with face-to-face voice-gathering acts that will make it easier for those further away from the digital world to participate as well. This will allow, for example, increasing the percentage of voices of older people in the database.
The details of the campaign will be made public soon, but it can already be announced that it will start on Friday 14 April in the Palau Blaugrana, as part of the EuroLeague basketball match of Barça-València, and that it will continue with a second major act during the Spanish league match between Barça and Real Madrid. For the first one, the entity will have some discounted tickets for its partners.