Fast-Tracking Speech Recognition
Open Wide
Open Speech Initiative seeks to bring advanced speech processing to free software.
Over the years, free software has seen at least a dozen projects to implement speech recognition. However, even the most advanced of these projects, such as CMU Sphinx[1] and Festival [2], often lag behind commercial equivalents, largely because of a lack of resources. To close this gap, Peter Grasch, a KDE developer from Austria, launched the Open Speech Initiative [3] in October 2013 with the goal of assembling "a team of developers looking to bring first class speech processing to the world of free software."
This is an ambitious project, but Grasch argues strongly for its importance.
"Speech recognition," he says, "is a greatly underutilized input method. Over the recent years we can at last see it slowly being adopted in mobile applications, and I think this trend will surely continue. I am not saying it can or will replace the other input methods we have now, but there are many use cases where speech input can simplify things. At least, after the first Iron Man movie came out, I think many would agree," Grasch adds, referring to the extensive speech processing capabilities that the movie's protagonist has embedded throughout his home and office as well as in his combat suit.
Given the growing importance of speech recognition, Grasch describes as "troubling" the fact that "speech technology was and to a large extent still is entirely in the domain of big business." Not only are the three main commercial developers – Nuance, Microsoft, and Google – proprietary, but so are the implementations for Android and Tizen.
According to Grasch, the reason for this situation is that "speech recognition requires significant upfront investment to acquire the necessary data, and tedious speech modeling that does not generalize well across languages – requiring countless more hours if you intend to support multiple languages." The result is a combination of resources, expertise, and effort that is extremely difficult to organize and sustain in a volunteer project.
However, Grasch suggests that such obstacles are beginning to become less important because of crowd-sourcing and the gradual accumulation of existing data. Now, he suggests, "with community engagement, we can build more accurate speech recognition systems and create better integrated solutions for more devices – and the use cases are truly endless."
Since 2010, Grasch has been explaining this rationale at conferences [4], including KDE's annual Akademy and the Desktop Linux Summit, as well as writing about it in academic papers. Now, with the Open Speech Initiative, he plans to put the rationale into practice.
The Path to Open Speech Initiative
Grasch became interested in free software while still in high school. "I was always a tech enthusiast and had been following the Linux movement with a bit of interest for some time when, in the tenth grade, I managed to thoroughly wipe my Windows installation from my home computer," he recalls. "At that point, I had never even tried a Live CD, but, for some reason, I decided to just install SUSE 9.3 instead of going back to XP. Since then, I have not owned a single Windows system."
The transition was not always smooth, but in getting his system up and running, Grasch discovered that he enjoyed both hacking and the community he discovered in the Linux forums. "While I had already dabbled in writing small programs back on Windows, I never realized that merely writing code is just the start of it; I wanted to become part of the free software community and give back," he says. His opportunity to get involved came about a year later, in a class in which students worked on projects suggested by professionals. When Franz Stieger, a special needs teacher, wanted to study the best speech recognition software for children with speech impediments, Grasch and three other students volunteered.
Grasch remembers, "We quickly realized that there was no commercial offering that was flexible enough to cope with non-standard speech patterns. So we drafted the concept of Simon, an extremely flexible open speech recognition system and set to work. To us, it was always clear that the result would be free software."
As Grasch went on to university, he continued the development of Simon [5] as a KDE project. Building on CMU Sphinx, Julius [6] and HTK [7], Simon is designed for both Linux and Windows.
At the 2013 Akademy, the annual meeting of KDE developers, Grasch delivered a talk [8] in which he described the progress he was able to make on one challenge in speech processing in one week, using only free software data and technologies.
"The point," Grasch said, "was to show off how close – or, indeed, far away – we were from being able to implement current and next gen speech recognition using free software. The experiment was a big success and proved that by even just investing a handful of days, it was possible to further the state of the art in open source speech recognition."
Grasch continues, "A large part of the reason for conducting such an experiment was to show interested third parties that investing time in free speech technology is viable. Quite a few enthusiasts and even companies responded and showed interest. As a response, we set up the Open Speech Initiative both to give our newly formed team a common label and to formalize the cooperation."
Setting Priorities
The Open Speech Initiative is still in its earliest stages of development – so early that the website [4] is still under construction, and the project lists only half a dozen members.
"From a consumer perspective, there is little to see at this point," Grasch admits. "But behind the largely desolate end user software landscape for anything but command and control applications, there are some promising efforts. Grasch singles out CMU Sphinx for "mature speech recognition engines for a variety of use cases" and the KALDI toolkit [9], which he describes as "working on state of the art neural-network-based decoding.
However, the main challenge is to develop an efficient speech model, which is necessary to teach the recognition software what each language sounds like. "Creating such speech models requires careful and painfully time-consuming planning and tons and tons of data," Grasch explains. "Because of this, even projects using open source speech recognition engines mostly rely on proprietary speech models. This is why creating high quality open source speech models is on the forefront of the Open Speech Initiative's agenda."
Given this background, the immediate goals of the Open Speech Initiative are well-defined, particularly for Grasch's continued work on Simon. As described on Grasch's blog [10], much of this work is a necessary prelude to providing speech recognition – such as reviewing existing resources and working to improve acoustic and language models, as well as dictation capabilities [11]. The ideal is "to create systems that aid with this and automate the process as much as possible," according to Grasch.
Eventually, the Open Speech Initiative intends to create the infrastructure to allow contributions from non-experts – perhaps even end users. Then, Grasch himself plans to add the missing features in Simon to produce a preview version that may be ready as early as this summer.
Even when these goals are reached, Grasch sees "a large, continuous effort" ahead, with no immediate end in sight. However, with the coordination of effort and the successful attracting of casual contributors, the Open Speech Initiative may represent the best chance for free software speech processing to catch the proprietary leaders in the field – and perhaps even show them a thing or two.
Infos
- CMU Sphinx: http://cmusphinx.sourceforge.net/
- Festival: http://www.cstr.ed.ac.uk/projects/festival/
- Open Speech Initiative launch: http://grasch.net/node/24
- Open Speech Initiative at KDE: http://speech.kde.org/
- Simon: http://simon.kde.org/
- Julius: http://julius.sourceforge.jp/en_index.php
- HTK: http://htk.eng.cam.ac.uk/
- Grasch at 2013 Akademy: http://files.kde.org/akademy/2013/videos/Peter_Grasch_-_FLOSS_Speach_Recognition.webm
- KALDI toolkit: http://kaldi.sourceforge.net/about.html
- Peter Grasch's blog: http://grasch.net/
- Dictation system: http://grasch.net/node/22 @IE
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.
-
Fedora 41 Released with New Features
If you're a Fedora fan or just looking for a Linux distribution to help you migrate from Windows, Fedora 41 might be just the ticket.
-
AlmaLinux OS Kitten 10 Gives Power Users a Sneak Preview
If you're looking to kick the tires of AlmaLinux's upstream version, the developers have a purrfect solution.
-
Gnome 47.1 Released with a Few Fixes
The latest release of the Gnome desktop is all about fixing a few nagging issues and not about bringing new features into the mix.
-
System76 Unveils an Ampere-Powered Thelio Desktop
If you're looking for a new desktop system for developing autonomous driving and software-defined vehicle solutions. System76 has you covered.
-
VirtualBox 7.1.4 Includes Initial Support for Linux kernel 6.12
The latest version of VirtualBox has arrived and it not only adds initial support for kernel 6.12 but another feature that will make using the virtual machine tool much easier.
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.