Putting free digital assistants to the test

Friend and Helper

Article from Issue 192/2016

Author(s): Peter Kreußel

Researchers from the University of Michigan have built an intelligent personal assistant akin to Siri and Cortana from free components. Although the Sirius Project focuses on the server load created by digital assistant software, we are interested in the usability of Sirius and its successor Lucida.

What does the chief engineer of a spaceship in the 23rd century do to operate a computer from the 20th century? He picks up the mouse and says, "Hello, computer" (Star Trek IV: The Voyage Home, Paramount Pictures, 1986). During his journey through time, Montgomery "Scotty" Scott nonetheless had to hit the keys eventually.

Owners of modern smartphones, on the other hand, can go a long way with OK, Google, Hey, Siri, or Hey, Cortana; the speech assistants understand many questions or instructions formulated in everyday language. You can only guess how many algorithms are behind the proprietary marvels.

Things are quite different with the open source intelligent personal assistant Sirius [1], which was developed in 2015 by the research group Clarity Lab at the University of Michigan [2]. The software, published under the BSD license, bundles together the free speech recognition systems CMU Sphinx [3] (PocketSphinx and Sphinx4), Kaldi [4], image recognition based on OpenCV [5], the question-answering system OpenEphyra [6], and UC Berkley's deep learning framework Caffe [7]. A Wikipedia dump forms the basis for OpenEphyra's data corpus. With aid from all of these components, Sirius is in a position to answer typed or spoken questions and to recognize objects in images (Figure 1).

Figure 1: Components of the Sirius intelligent personal assistant (based on an image at the Clarity Lab website [1]).

The developers at Clarity Lab formulated the aim of the software in an abstract [8] for the Sirius tutorial that took place during the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-20). They proceed from the assumption that the demand for intelligent personal assistants (IPAs) will increase in the future and ask what server architectures will have to look like to handle the workload of these programs. Because of a lack of open source IPAs to calculate the load, they developed Sirius so they could represent the resource requirements realistically.

How does Sirius fare in practical use? Is the program a suitable helper on the Linux desktop? Those running the test considered these questions, and carefully examined Sirius and its successor, Lucida [9]. They installed the software on Ubuntu 14.04 and Ubuntu 16.04, used the Sirius speech recognition, tested its question-answering system, and scrutinized its image recognition abilities. Lucida is not yet as far along. So far, only a simple question and answer game has operated in its demo version, which the testing team briefly exercised.

Ready-to-Assemble Kit

The Clarity Lab website offers a download that includes the Sirius application, Sirius Suite, and the web front-end server; the Sirius Suite alone with a Caffe snapshot; and the Wikipedia dump for the question-answering system [10].

After unpacking the Sirius archive, you switch to the sirius-1.0.1/sirius-application directory. A few scripts here import the software expected by Sirius, load components from the Internet, and compile and install them. The scripts are written for Ubuntu 14.04; if you use this somewhat older LTS version (that is nevertheless supported until 2019), you should enter the following four commands:

sudo ./get-dependencies.sh
sudo ./get-opencv.sh
./get-kaldi.sh
./compile-sirius-servers.sh

If you use the current Ubuntu 16.04, adjust the get-dependencies.sh script in the text editor beforehand and comment out the entry for adding the external FFmpeg repository (ppa:kirillshkrogalev/ffmpeg-next). The external package source is no longer necessary because FFmpeg is in the official Xenial repositories.

Next, execute the first three commands, but before you call up ./compile-sirius-servers.sh, place a symbolic link from /usr/bin/libtoolize to /usr/bin/libtool, because the Kaldi makefile searches for this binary.

A fast Internet connection is an advantage, because the scripts download a whole host of software. With the OpenCV download, around 3GB of data are copied onto the disk; Kaldi takes up 2GB. The Sirius archive itself is 470MB in size, and the Wikipedia dump encompasses some 11GB. When completely installed, Sirius and its components occupy around 25GB of disk space.

The scripts that bring the speech recognition, image recognition, and question-answering system into the arena are in the sirius-application/run-scripts directory with start at the beginning of their file names. All three components are implemented as server services. The scripts you use to direct your requests to the servers are also found here with test in their file names.

Good Listener

In their first attempt, the test team fed a few of the WAV files stored in the sirius-application/inputs/questions directory to Sirius automatic speech recognition (ASR) and started the ASR server in a terminal in succession with one of the three available back ends (Kaldi, PocketSphinx, and Sphinx4):

./start-asr-server.sh kaldi
./start-asr-server.sh PocketSphinx
./start-asr-server.sh sphinx4

We then called up the sirius-asr-test.sh script in a second terminal together with a question (Provided) and saw the result from Sirius (Figure 2). Sometimes it worked well, sometimes only after waiting a while, and sometimes not at all; the communication with Sphinx4 using Ubuntu 16.04 completely misfired. For the comparison, the test team recorded the sentences themselves (Recorded) with a microphone and sent them to all three back ends. With the aid of five example sentences, Table 1 shows what Kaldi, PocketSphinx, and Sphinx4 understood.

Table 1

Sirius ASR Back Ends

Recording	Source	Kaldi	PocketSphinx	Sphinx4
Who invented the telegraph?
	Provided	who invented the telegraph	who invented the telegraph	who invented the telegraph
	Recorded	we went at the telegraph	we're going to the telegraph	with only scowled
Where is the Louvre Museum located?
	Provided	where is the liberal museum love the change yeah	where is the liver uneasy and located	where's the louvre museum located
	Recorded	where was the little free museums okay tent	where is the u. over a museum located	london back while passengers are
Where did John Lennon die?
	Provided	where do you john lennon dot	where did john lennon got	where did john lennon died
	Recorded	when it it's john lennon die	where did john lennon die	only after all how often run
What is the population of France?
	Provided	what is the population of france	what is the population of forms	what is the population of france
	Recorded	uh what is the population of france	what is the population of trunks	in a half and unload newark crown
What is the speed of light?
	Provided	which is the speed of light	what is the speed of light	what is the speed of light
	Recorded	well just the speed of flights	what does the speed of light	the injury to half moon last

Figure 2: After WAV files are sent to Sirius ASR, you see what was understood by the back end.

The quality of text recognition is very patchy: With the WAV files provided, only the Sphinx4 back end worked almost flawlessly. On the other hand, with the testers' own recordings, the correctly recognized sentences remain a strange exception. The developers may have trained their speech recognition libraries primarily with the files they enclosed, which are spoken with an American accent throughout. With the test team's own recordings (in British English with a German accent), Sphinx4 particularly was unable to cope; the other engines at least recognized individual words.

Quality of the audio should not explain the lack of understanding, because a decent microphone was used. The testers recorded their sentences at random with a headset and a different frequency response, and the recordings still delivered inferior results. The Google and Apple speech recognition engines recognized almost all the questions on the test team's smartphones.

Answer Me

If the digital assistant understands a question, it would be great if it could answer it as well. The Sirius developers employ the question-answering system OpenEphyra [6] for this step.

A Wikipedia dump without semantic distinctions serves as the data corpus. The developers created this with Indri [11], a search engine specialized for large text corpora. You can download the Wikipedia knowledge database from the Sirius download page and extract it into the sirius-application/question-answer directory.

Now start the QA server with the start-qa-server.sh script from the sirius-application/run-scripts directory. On the Ubuntu 16.04 test machine, this did not work without further ado; a call to ant – which uses the XML build files for OpenEphyra and documentation files – in the sirius-application/question-answer directory was necessary before the server started working. If you receive an insufficient threads configured warning, you can fix it with a simple hack and comment out this line in the sirius-application/question-answer/src/info/ephyra/OpenEphyraServer.java file:

con1.setThreadPool(new QueuedThreadPool(NTHREADS));

After taking care of this problem, you must call up the compile-sirius-servers.sh script once more and restart the QA server.

Now you can ask questions in a second terminal; for example:

./sirius-qa-test.sh "what is the speed of light"

After a confirmation that the question has come through, a message appears stating that the question has gone to the server. After a short wait, the answer pops up in the terminal (Figure 3).

Figure 3: Once the OpenEphyra server is running in one terminal window, you can enter your question in another and receive your answer there, as well.

Because spoken and typed questions are both possible, it would be great if you could combine these. That is no problem with Sirius; you simply start the ASR service along with the QA server and use the following script for communication:

./sirius-asr-qa-test.sh ../inputs/real/who.is.the.current.president.of.the.united.states.wav

Depending on the ASR back end, the analysis then continues. After this part has successfully transcribed the question, however, the QA service still requires some time to find the answer, so patience is needed.

1 2 3 Next »

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Systemd Fixes Bug While Facing New Challenger in GNU Shepherd

Linux , Software , Systemd

The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
AlmaLinux 10.0 Beta Released

AlmaLinux , Enterprise Linux , open source

The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
Gnome 47.2 Now Available

Gnome , Linux , open source

Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
Latest Cinnamon Desktop Releases with a Bold New Look

Cinnamon , Linux , open source

Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
Armbian 24.11 Released with Expanded Hardware Support

Armbian , DEBIAN , Ubuntu

If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
SUSE Renames Several Products for Better Name Recognition

Enterprise Linux , SUSE , Virtualization

SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
ESET Discovers New Linux Malware

Linux , malware , Security

WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
New Linux Kernel Patch Allows Forcing a CPU Mitigation

Kernel , Linux , Security

Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
Red Hat Enterprise Linux 9.5 Released

Enterprise Linux , RHEL

Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
Linux Sees Massive Performance Increase from a Single Line of Code

Kernel , Linux

With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.

Putting free digital assistants to the test

Friend and Helper

Ready-to-Assemble Kit

Good Listener

Answer Me

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Systemd Fixes Bug While Facing New Challenger in GNU Shepherd

AlmaLinux 10.0 Beta Released

Gnome 47.2 Now Available

Latest Cinnamon Desktop Releases with a Bold New Look

Armbian 24.11 Released with Expanded Hardware Support

SUSE Renames Several Products for Better Name Recognition

ESET Discovers New Linux Malware

New Linux Kernel Patch Allows Forcing a CPU Mitigation

Red Hat Enterprise Linux 9.5 Released

Linux Sees Massive Performance Increase from a Single Line of Code

Putting free digital assistants to the test

Friend and Helper

Ready-to-Assemble Kit

Good Listener

Answer Me

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters