Wiki.archlinux.org

Speech Recognition

2013-09-04

Merging Text to speech

← Older revision

Revision as of 21:18, 4 September 2013

Line 1:

Line 1:

[[Category:Accessibility]]

[[Category:Accessibility]]

[[Category:Audio/Video]]

[[Category:Audio/Video]]

−

Speech recognition is any means by which you can interface with your computer via spoken word.

This page is designed to identify applications that can facilitate speech recognition and to serve as a guide in installing and using this software in Arch.

+

Speech recognition is any means by which you can interface with your computer via spoken word. This page is designed to identify applications that can facilitate speech recognition and to serve as a guide in installing and using this software in Arch.

−

'''A note to newcomers:'''
Speech
recognition is something that traditionally has not been well supported in Linux.

If you become interested and choose to dig below the immediate surface, you can expect difficulty in finding documentation or help from the community.

+

'''A note to newcomers:'''
speech
recognition is something that traditionally has not been well supported in Linux. If you become interested and choose to dig below the immediate surface, you can expect difficulty in finding documentation or help from the community.

+

+

== Types of speech recognition ==

−

==Types of Speech Recognition==

Speech recognition can mean several things:

Speech recognition can mean several things:

* Text-To-Speech:

* Text-To-Speech:

−

*:As it sounds, Text-To-Speech (or TTS)

will manipulate a string of text into an audio clip.

There are several programs available that perform TTS, some of which are command-line based (ideal for scripting) and others which provide a handy GUI.

+

*: As it sounds, Text-To-Speech (or TTS) will manipulate a string of text into an audio clip.
It is useful for blind people to be able to use computers but can also be used to simply improve computer experience.
There are several programs available that perform TTS, some of which are command-line based (ideal for scripting) and others which provide a handy GUI.

−

*Simple Voice Control/Commands:

+

* Simple Voice Control/Commands:

−

*:This is the most basic form of Speech-To-Text application. These are designed to recognize a small number of specific, typically one-word commands and then perform an action. This is often used as an alternative to an application launcher, allowing the user for instance to say the word “firefox” and have his OS open a new browser window.

+

*: This is the most basic form of Speech-To-Text application. These are designed to recognize a small number of specific, typically one-word commands and then perform an action. This is often used as an alternative to an application launcher, allowing the user for instance to say the word “firefox” and have his OS open a new browser window.

−

*Full dictation/recognition:

+

* Full dictation/recognition:

−

*:Full dictation/recognition software allows the user to read full sentences or paragraphs and translates that data into text on the fly. This could be used, for instance, to dictate an entire letter into the window of an email client. In some cases, these types of applications need to be trained to your voice and can improve in accuracy the more they are used.

+

*: Full dictation/recognition software allows the user to read full sentences or paragraphs and translates that data into text on the fly. This could be used, for instance, to dictate an entire letter into the window of an email client. In some cases, these types of applications need to be trained to your voice and can improve in accuracy the more they are used.

−

==Development
Status
==

+

== Development
status
==

−

Several years ago there was a push to implement speech recognition in Linux. Since then, many of those projects have stagnated.

+

−

==Text-To-Speech==

+

Several years ago there was a push
to
implement
speech
recognition in Linux
.
Since then, many of those projects have stagnated
.

−

The two major players in text-
to
-
speech
applications are Festival and eSpeak
.
Comparison available [http://braille.uwo.ca/pipermail/speakup/2008-July/046755
.
html here]

+

−

==
=Festival===

+

==
List
of text to speech
applications ==

−

[[Festival]] offers a general framework for building speech synthesis systems as well as including examples
of
various modules. As a whole it offers full
text to speech
through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced.

+

−

* Free

+

The two major players in text-to-speech applications are Festival and eSpeak. Comparison available [http:
/
/braille
.
uwo.ca/pipermail/speakup/2008-July/046755.html here]

−

* Can install several different voices
/
accents
.

+

−

* Available in Extra

+

−

[http://www.cstr.ed.ac.uk/projects/festival/
Site Link
]

+

* {{App|
[
[Wikipedia:eSpeak|eSpeak]]|Compact open source software speech synthesizer for English and other languages which currently supports more than 50 languages.|http://espeak.sourceforge.net/|{{Pkg|espeak}}}}

+

* {{App|[[Festival]]|General framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech.|
http://www.cstr.ed.ac.uk/projects/festival/
|{{Pkg|festival}}}}

+

* {{App|[[mbrola|MBROLA
]
]|Non-free phonemes-to-audio program which supports more than 70 languages.|http://tcts.fpms.ac.be/synthesis/mbrola.html|{{AUR|mbrola}}}}

+

* {{App|Speech Dispatcher|Common interface to speech synthesis. It has backends for eSpeak, Festival, and a few other speech synthesizers.|http://www.freebsoft.org/speechd|{{Pkg|speech-dispatcher}}}}

−

===
eSpeak
=
==

+

==
List of voiced commands applications
==

−

[http://espeak.sourceforge.net/ eSpeak] is "a compact open source software speech synthesizer for English and other languages, for Linux and Windows".

+

−

*Open source

+

=== Gnome-Voice-Control ===

−

*Lightweight

+

−

*Available in the community repository

+

−

*Excellent language support

+

−

=====Installing eSpeak=====

+

Gnome
-
Voice-Control
is a
dialogue system to control the GNOME Desktop. It is developed on Google Summer of Code 2007
.
Available in AUR

−

To install eSpeak:

+

−

:{{bc| pacman
-
S espeak}}

+

−

=====Testing eSpeak=====

+

−

:{{bc| echo "Hello. This
is a
test
.
" <nowiki>|</nowiki> espeak}}

+

−

=====
eSpeak Usage/Configuration
=
====

+

===
VEDICS
===

−

The Documents page on the eSpeak website [http://espeak.sourceforge.net/docindex.html here] provides an excellent guide for using different voices, adjusting pronunciation, etc. There are many different accents included in this install that are worth trying out.

+

−

−

==Voiced Commands==

−

===Gnome-Voice-Control===

−

Gnome-Voice-Control is a dialogue system to control the GNOME Desktop. It is developed on Google Summer of Code 2007.

−

−

Available in AUR

−

−

===VEDICS===

VEDICS (Voice Enabled Desktop Interaction and Control System) is an assistive software which lets the user to interact with the OS using voice commands.

VEDICS (Voice Enabled Desktop Interaction and Control System) is an assistive software which lets the user to interact with the OS using voice commands.

Line 81:

Line 63:

#Desktop plugins to control your Linux desktop using only your voice. You can switch virtual screens, cycle through desktops, invoke the run dialog, quick lock the screen.

#Desktop plugins to control your Linux desktop using only your voice. You can switch virtual screens, cycle through desktops, invoke the run dialog, quick lock the screen.

#Custom commands are fully supported, and you can add commands on the fly.

#Custom commands are fully supported, and you can add commands on the fly.

−

#Pseudo Commands' allow you to enter commands that the speaker should say. For example, if you say "Good morning", the computer voice could say "And good morning to you".

+

#Pseudo Commands' allow you to enter commands that the speaker should say. For example, if you say "Good morning", the computer voice could say "And good morning to you".

−

+

==
List of speech recognition applications
==

−

==
Speech Recognition
==

+

===Free Speech Recognition Engines===

===Free Speech Recognition Engines===

====CMU Sphinx====

====CMU Sphinx====

Line 99:

Line 80:

Dragon Naturally Speaking software by Nuance is a well-functioning and popular implementation of speech dictation. It is developed for Windows, but has been run sucsessfully in a a linux enviornment using wine. It can be used independently for dictation into other wine programs such as notepad or it can be paired with Platypus to interface with any native linux program. Platypus also provides a feature to control of your OS using voice commands, similar to the programs described in the [[Speech_Recognition#Voiced_Commands | Voiced Commands]] section.

Dragon Naturally Speaking software by Nuance is a well-functioning and popular implementation of speech dictation. It is developed for Windows, but has been run sucsessfully in a a linux enviornment using wine. It can be used independently for dictation into other wine programs such as notepad or it can be paired with Platypus to interface with any native linux program. Platypus also provides a feature to control of your OS using voice commands, similar to the programs described in the [[Speech_Recognition#Voiced_Commands | Voiced Commands]] section.

−

Nuance's software is non-free, so you will have to purchase a copy. Note that Dragon provides you with the ability to install it on a set number of machines. Installing/Reinstalling in wine may use up some of these licenses.

+

Nuance's software is non-free, so you will have to purchase a copy. Note that Dragon provides you with the ability to install it on a set number of machines. Installing/Reinstalling in wine may use up some of these licenses.

[http://thenerdshow.com/platypus.html Platypus Project]

[http://thenerdshow.com/platypus.html Platypus Project]

Line 107:

Line 88:

====DynaSpeak from SRI International====

====DynaSpeak from SRI International====

====LumenVox Speech Engine====

====LumenVox Speech Engine====

+

+

== See also ==

+

+

[http://kubuntu.free.fr/blog/index.php/2006/09/24/121-synthese-vocale-en-francais-sous-linux Synthèse vocale en français sous Linux - KubuntuBlog (french)]