This thread is for discussion of Speakeasy.
Announcement
Collapse
No announcement yet.
Speakeasy Discussion
Collapse
X
-
Howdy, Rodney!
Sorry for letting you have a one-man conversation for so long!I got distracted by other stuff but am interested in what you've done.
You're right about that the speech recognition seem to be good. In my tests today, it definitely seems to give better translation results than the original code from Jim back in 2014.
There are several things I don't understand about your code.
Before you start your app don't you have to start the Win10 Voice Recognition, manually? How do you do that? I've been using the Winkey+H key shortcut to start the listening bar.. How do you start the bar?
In your Help, you say that speaking various words will be the same as pushing the corresponding button. I don't see in your code where you capture those words.
When speech recognition is started, there is a bar that appears across the top of the Desktop. After a brief period of inactivity, that bar turns off - stops "listening". I have to click the bar to get listening to start up again.
Why does it do that? I don't want it to turn off until I tell it to shut down. Have you seen a way to keep it from turning off?
I'd like the bar to not display at all, and the listening to stay active until I close the bar.
I suppose I could try to capture a handle to the bar and reposition it out of sight or just make it not visible But the more important issue is that it turns off. I haven't found documentation about why it does that.
-
Howdy Rodney!
Here's a baby app that I'm using to test. The "Speech" button toggles the speech listening bar on and off. Both buttons give focus to the textbox to give the speech a location to be placed.
As I mentioned, I don't know yet how to keep the listening bar active. I'm searching for that now.
Code:#Compile Exe #Dim All %Unicode = 1 #Include "Win32API.inc" Enum Equates Singular IDC_Button = 500 IDC_Clear IDC_TextBox End Enum Global hDlg As Dword Function PBMain() As Long Dialog Default Font "Arial Black", 14, 0 Dialog New Pixels, 0, "gbSpeech",300,300,250,210, %WS_OverlappedWindow To hDlg Control Add Button, hDlg, %IDC_Button,"Speech", 10,10,100,25 Control Add Button, hDlg, %IDC_Clear,"Clear", 120,10,65,25 Control Add TextBox, hDlg, %IDC_TextBox, "This is a test", 10, 40, 230, 160, %ES_MultiLine Or %ES_WantReturn Dialog Show Modal hDlg Call DlgProc End Function CallBack Function DlgProc() As Long Select Case Cb.Msg Case %WM_Command Select Case Cb.Ctl Case %IDC_Button Control Set Focus hDlg, %IDC_TextBox ToggleSpeech Case %IDC_Clear Control Set Text hDlg, %IDC_TextBox, "" Control Set Focus hDlg, %IDC_TextBox Case %IdCancel Dialog End hDlg End Select End Select End Function Sub ToggleSpeech keybd_event(%VK_LWIN, &H45, 0, 0) keybd_event(%VK_H, &H45, 0, 0) keybd_event(%VK_H, &H45, %KEYEVENTF_KEYUP, 0) keybd_event(%VK_LWIN, &H45, %KEYEVENTF_KEYUP, 0) End Sub
Comment
-
Right click on the speach bar, the first three options are
On - Listen to everything
Sleep - Listen for "Start Listening"
Off- Do not listen to anything
There is a corresponding "Stop listening" for the second of the three.
Once the speech to text engine is on, the user has control of it with the start / stop listening feature.
I find that if it is in sleep mode, it uses a fair amount of system resources so I turn it off if I won't be using it for an extended period.
Sorry, I didn't see that you had two posts here.
If the engine is operating smoothly, which can take a while, depending on the amount of mead consumed, in your baby app, if you say "Clear" the speech engine will "CLICK" that button, no code necessary, likewise the "Speech" button. You can even use menus in this manner.Rod
In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.
Comment
-
Howdy, Rodney!
The bar across the top, the one that appears when WinKey+H is pressed, has no context menu. It has the picture of a little microphone and an "X" button. If I click the microphone icon the bar, "listening" is toggled.
We must not be talking about the same thing.
... added... my CPU stays at about 2% regardless of whether the speech bar is on or note.
Comment
-
Howdy, Rodney!
More information ...
So in Settings, under Speech, is a setting "Turn On Speech Recognition". Mine is off. It stays off when I use WinKey+H to start speech recognition in the baby app above.
If I set "Turn on Speech Recognition" to ON, then I get another window which does have the context menu you mention. It also says "Listening", just like the bar I mention. Here's a picture of both:
Nothing I've read mentioned the possibility of 2 different windows. I'll have to go read more.
... added ... with either Window visible, speaking will add text to the textbox into either of our apps.
Comment
-
From what I am reading, Windows offers two separate features: "Speech Recognition" and "Voice Dictation".
Speech Recognition appears to be the broader capability to parse incoming speech for various commands as well as entering text into an edit control "Voice Dictation" is limited to placing text in an edit control, but does have a few formatting command-recognition capabilities.
In my limited testing, "Voice Dictation" surprisingly appears to have a better accuracy than "Speech Recognition".
With "Speech Recognition", I do open excel have access to more features. For example, "Open Excel" opens the Excel app for me.
I continue to be pleasantly surprised at how accurate the "Voice Dictation" appears to be.
Comment
-
Speech recognition is what I'm using, didn't know there was some other concept(voice dictation).
I get what you get when I press WinKey+H, apparently a different animal, in a different but overlapping habitat.Rod
In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.
Comment
-
Originally posted by Gary Beene View PostHowdy, Rodney!
I've found several comments on the web where other users find, as I have, that Voice Dictation gives better accuracy than Speech Recognition.
Comment
-
Howdy, Stuart!
Yes, that's a good point. Just as my smart phone uses server-based voice recognition, so does the voice dictation feature.
Since the speech recognition works offline, I'm surprised that voice dictation is is online-only. You would think they would be able to have a (degraded) version of voice dictation.offline as well.
Comment
-
Perhaps they see the speech recognition as an offline version of voice dictation, just giving it a different name to avoid confusion. This explains why my speech recognition app takes more of my computer's resources than does the voice dictation.Rod
In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.
Comment
-
Rodney,
Sorry for going off-topic a bit here ... the "baby app" in #3 demonstrates another problem - that the voice dictation puts text into whichever textbox has focus. And when no textbox has focus, dictation output is lost.
I have this vision of a "Voice Monitor (VM)" application that runs in the background, continuously monitoring all that is said and then broadcasting a message to PowerBASIC apps when a command of interest is detected. Both the VM and the PowerBASIC app would have to agree on what commands can be sent/recognized.
I checked in with the Dragon folks and they do provide the ability to issue commands to some major applications, such as Word and Excel. But their product does not allow sending commands to other applications in general. Bummer that.
Comment
-
A stab in the dark here. Regarding the textbox not having focus, perhaps a thread with a higher priority that has a textbox that maintains focus or that gets focus on sound input?
I think your 'vision' may be doable but it may lack quick response, depending on the number of different responses possible. I can't do any testing of any voice inspired code at the present time, no microphone currently attached(too many projects happening).
I think you could buy Dragon's API for many thousands of dollars and it would do the trick.Rod
In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.
Comment
-
Howdy, Rodney!
Dragon told me yesterday me they no longer offer the API for use with 3rd party apps. Have you seen differently?
The thing about the dictation that slows it down is that the audio is sent to MS Servers, converted, and sent back as text. A local dictation library might be faster, although not necessarily as accurate.
I'd expect an INSTR search to offer no slowdown in parsing the incoming text, even if testing against hundreds of command strings.
Comment
-
I'd expect an INSTR search to offer no slowdown in parsing the incoming text, even if testing against hundreds of command strings.
Binary Search of an array February 14 2000, July 15 2003
(Post 2 that thread is the PB/Windows update) (from PB-DOS, post 1)
Michael Mattias
Tal Systems (retired)
Port Washington WI USA
[email protected]
http://www.talsystems.com
Comment
-
I know nothing new about Dragon, haven't used them since they wanted $25,000 for their API.
Gary, I think the accuracy is going to depend on the user and their equipment and how well they use their equipment. That will apply to both server side and local arenas. I know I had one microphone that had me talking to myself and I never knew that I knew such language.
Your concept needs to be consistent in getting the input first, then the parsing, perhaps creating an queue of uttered commands.Rod
In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.
Comment
-
Howdy, Rodney!
I spent some more time today looking at the various speech-to-text apps and API out there. I've come to the conclusion my time will be best spent on working out a best-I-can-do solution to a Voice Monitor using the free Microsoft capabilities. Free and available to all users is hard to beat.
Comment
Comment