Putting free digital assistants to the test
Friend and Helper
Researchers from the University of Michigan have built an intelligent personal assistant akin to Siri and Cortana from free components. Although the Sirius Project focuses on the server load created by digital assistant software, we are interested in the usability of Sirius and its successor Lucida.
What does the chief engineer of a spaceship in the 23rd century do to operate a computer from the 20th century? He picks up the mouse and says, "Hello, computer" (Star Trek IV: The Voyage Home, Paramount Pictures, 1986). During his journey through time, Montgomery "Scotty" Scott nonetheless had to hit the keys eventually.
Owners of modern smartphones, on the other hand, can go a long way with OK, Google
, Hey, Siri
, or Hey, Cortana
; the speech assistants understand many questions or instructions formulated in everyday language. You can only guess how many algorithms are behind the proprietary marvels.
Things are quite different with the open source intelligent personal assistant Sirius [1], which was developed in 2015 by the research group Clarity Lab at the University of Michigan [2]. The software, published under the BSD license, bundles together the free speech recognition systems CMU Sphinx [3] (PocketSphinx and Sphinx4), Kaldi [4], image recognition based on OpenCV [5], the question-answering system OpenEphyra [6], and UC Berkley's deep learning framework Caffe [7]. A Wikipedia dump forms the basis for OpenEphyra's data corpus. With aid from all of these components, Sirius is in a position to answer typed or spoken questions and to recognize objects in images (Figure 1).
The developers at Clarity Lab formulated the aim of the software in an abstract [8] for the Sirius tutorial that took place during the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-20). They proceed from the assumption that the demand for intelligent personal assistants (IPAs) will increase in the future and ask what server architectures will have to look like to handle the workload of these programs. Because of a lack of open source IPAs to calculate the load, they developed Sirius so they could represent the resource requirements realistically.
How does Sirius fare in practical use? Is the program a suitable helper on the Linux desktop? Those running the test considered these questions, and carefully examined Sirius and its successor, Lucida [9]. They installed the software on Ubuntu 14.04 and Ubuntu 16.04, used the Sirius speech recognition, tested its question-answering system, and scrutinized its image recognition abilities. Lucida is not yet as far along. So far, only a simple question and answer game has operated in its demo version, which the testing team briefly exercised.
Ready-to-Assemble Kit
The Clarity Lab website offers a download that includes the Sirius application, Sirius Suite, and the web front-end server; the Sirius Suite alone with a Caffe snapshot; and the Wikipedia dump for the question-answering system [10].
After unpacking the Sirius archive, you switch to the sirius-1.0.1/sirius-application
directory. A few scripts here import the software expected by Sirius, load components from the Internet, and compile and install them. The scripts are written for Ubuntu 14.04; if you use this somewhat older LTS version (that is nevertheless supported until 2019), you should enter the following four commands:
sudo ./get-dependencies.sh sudo ./get-opencv.sh ./get-kaldi.sh ./compile-sirius-servers.sh
If you use the current Ubuntu 16.04, adjust the get-dependencies.sh
script in the text editor beforehand and comment out the entry for adding the external FFmpeg repository (ppa:kirillshkrogalev/ffmpeg-next
). The external package source is no longer necessary because FFmpeg is in the official Xenial repositories.
Next, execute the first three commands, but before you call up ./compile-sirius-servers.sh
, place a symbolic link from /usr/bin/libtoolize
to /usr/bin/libtool
, because the Kaldi makefile searches for this binary.
A fast Internet connection is an advantage, because the scripts download a whole host of software. With the OpenCV download, around 3GB of data are copied onto the disk; Kaldi takes up 2GB. The Sirius archive itself is 470MB in size, and the Wikipedia dump encompasses some 11GB. When completely installed, Sirius and its components occupy around 25GB of disk space.
The scripts that bring the speech recognition, image recognition, and question-answering system into the arena are in the sirius-application/run-scripts
directory with start
at the beginning of their file names. All three components are implemented as server services. The scripts you use to direct your requests to the servers are also found here with test
in their file names.
Good Listener
In their first attempt, the test team fed a few of the WAV files stored in the sirius-application/inputs/questions
directory to Sirius automatic speech recognition (ASR) and started the ASR server in a terminal in succession with one of the three available back ends (Kaldi, PocketSphinx, and Sphinx4):
./start-asr-server.sh kaldi ./start-asr-server.sh PocketSphinx ./start-asr-server.sh sphinx4
We then called up the sirius-asr-test.sh
script in a second terminal together with a question (Provided) and saw the result from Sirius (Figure 2). Sometimes it worked well, sometimes only after waiting a while, and sometimes not at all; the communication with Sphinx4 using Ubuntu 16.04 completely misfired. For the comparison, the test team recorded the sentences themselves (Recorded) with a microphone and sent them to all three back ends. With the aid of five example sentences, Table 1 shows what Kaldi, PocketSphinx, and Sphinx4 understood.
Table 1
Sirius ASR Back Ends
Recording | Source | Kaldi | PocketSphinx | Sphinx4 |
---|---|---|---|---|
Who invented the telegraph? |
||||
|
Provided |
who invented the telegraph |
who invented the telegraph |
who invented the telegraph |
|
Recorded |
we went at the telegraph |
we're going to the telegraph |
with only scowled |
Where is the Louvre Museum located? |
||||
|
Provided |
where is the liberal museum love the change yeah |
where is the liver uneasy and located |
where's the louvre museum located |
|
Recorded |
where was the little free museums okay tent |
where is the u. over a museum located |
london back while passengers are |
Where did John Lennon die? |
||||
|
Provided |
where do you john lennon dot |
where did john lennon got |
where did john lennon died |
|
Recorded |
when it it's john lennon die |
where did john lennon die |
only after all how often run |
What is the population of France? |
||||
|
Provided |
what is the population of france |
what is the population of forms |
what is the population of france |
|
Recorded |
uh what is the population of france |
what is the population of trunks |
in a half and unload newark crown |
What is the speed of light? |
||||
|
Provided |
which is the speed of light |
what is the speed of light |
what is the speed of light |
|
Recorded |
well just the speed of flights |
what does the speed of light |
the injury to half moon last |
The quality of text recognition is very patchy: With the WAV files provided, only the Sphinx4 back end worked almost flawlessly. On the other hand, with the testers' own recordings, the correctly recognized sentences remain a strange exception. The developers may have trained their speech recognition libraries primarily with the files they enclosed, which are spoken with an American accent throughout. With the test team's own recordings (in British English with a German accent), Sphinx4 particularly was unable to cope; the other engines at least recognized individual words.
Quality of the audio should not explain the lack of understanding, because a decent microphone was used. The testers recorded their sentences at random with a headset and a different frequency response, and the recordings still delivered inferior results. The Google and Apple speech recognition engines recognized almost all the questions on the test team's smartphones.
Answer Me
If the digital assistant understands a question, it would be great if it could answer it as well. The Sirius developers employ the question-answering system OpenEphyra [6] for this step.
A Wikipedia dump without semantic distinctions serves as the data corpus. The developers created this with Indri [11], a search engine specialized for large text corpora. You can download the Wikipedia knowledge database from the Sirius download page and extract it into the sirius-application/question-answer
directory.
Now start the QA server with the start-qa-server.sh
script from the sirius-application/run-scripts
directory. On the Ubuntu 16.04 test machine, this did not work without further ado; a call to ant
– which uses the XML build files for OpenEphyra and documentation files – in the sirius-application/question-answer
directory was necessary before the server started working. If you receive an insufficient threads configured warning, you can fix it with a simple hack and comment out this line in the sirius-application/question-answer/src/info/ephyra/OpenEphyraServer.java
file:
con1.setThreadPool(new QueuedThreadPool(NTHREADS));
After taking care of this problem, you must call up the compile-sirius-servers.sh
script once more and restart the QA server.
Now you can ask questions in a second terminal; for example:
./sirius-qa-test.sh "what is the speed of light"
After a confirmation that the question has come through, a message appears stating that the question has gone to the server. After a short wait, the answer pops up in the terminal (Figure 3).
Because spoken and typed questions are both possible, it would be great if you could combine these. That is no problem with Sirius; you simply start the ASR service along with the QA server and use the following script for communication:
./sirius-asr-qa-test.sh ../inputs/real/who.is.the.current.president.of.the.united.states.wav
Depending on the ASR back end, the analysis then continues. After this part has successfully transcribed the question, however, the QA service still requires some time to find the answer, so patience is needed.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.