Announcement

Collapse
No announcement yet.

Speakeasy Discussion

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rodney Hicks
    replied
    I think you're right. At the time I wrote that 'speakeasy' thing I was heading in a similar direction til I went off on some tangent. .

    Leave a comment:


  • Gary Beene
    replied
    Howdy, Rodney!

    I spent some more time today looking at the various speech-to-text apps and API out there. I've come to the conclusion my time will be best spent on working out a best-I-can-do solution to a Voice Monitor using the free Microsoft capabilities. Free and available to all users is hard to beat.

    Leave a comment:


  • Rodney Hicks
    replied
    I know nothing new about Dragon, haven't used them since they wanted $25,000 for their API.

    Gary, I think the accuracy is going to depend on the user and their equipment and how well they use their equipment. That will apply to both server side and local arenas. I know I had one microphone that had me talking to myself and I never knew that I knew such language.

    Your concept needs to be consistent in getting the input first, then the parsing, perhaps creating an queue of uttered commands.

    Leave a comment:


  • Michael Mattias
    replied
    I'd expect an INSTR search to offer no slowdown in parsing the incoming text, even if testing against hundreds of command strings.​
    Depending on the number of "command strings" you need to search you might find a binary search fast and useful.

    Binary Search of an array February 14 2000, July 15 2003

    (Post 2 that thread is the PB/Windows update) (from PB-DOS, post 1)

    Leave a comment:


  • Gary Beene
    replied
    Howdy, Rodney!

    Dragon told me yesterday me they no longer offer the API for use with 3rd party apps. Have you seen differently?

    The thing about the dictation that slows it down is that the audio is sent to MS Servers, converted, and sent back as text. A local dictation library might be faster, although not necessarily as accurate.

    I'd expect an INSTR search to offer no slowdown in parsing the incoming text, even if testing against hundreds of command strings.

    Leave a comment:


  • Rodney Hicks
    replied
    A stab in the dark here. Regarding the textbox not having focus, perhaps a thread with a higher priority that has a textbox that maintains focus or that gets focus on sound input?

    I think your 'vision' may be doable but it may lack quick response, depending on the number of different responses possible. I can't do any testing of any voice inspired code at the present time, no microphone currently attached(too many projects happening).

    I think you could buy Dragon's API for many thousands of dollars and it would do the trick.

    Leave a comment:


  • Gary Beene
    replied
    Rodney,
    Sorry for going off-topic a bit here ... the "baby app" in #3 demonstrates another problem - that the voice dictation puts text into whichever textbox has focus. And when no textbox has focus, dictation output is lost.

    I have this vision of a "Voice Monitor (VM)" application that runs in the background, continuously monitoring all that is said and then broadcasting a message to PowerBASIC apps when a command of interest is detected. Both the VM and the PowerBASIC app would have to agree on what commands can be sent/recognized.

    I checked in with the Dragon folks and they do provide the ability to issue commands to some major applications, such as Word and Excel. But their product does not allow sending commands to other applications in general. Bummer that.

    Leave a comment:


  • Rodney Hicks
    replied
    Perhaps they see the speech recognition as an offline version of voice dictation, just giving it a different name to avoid confusion. This explains why my speech recognition app takes more of my computer's resources than does the voice dictation.

    Leave a comment:


  • Gary Beene
    replied
    Howdy, Stuart!

    Yes, that's a good point. Just as my smart phone uses server-based voice recognition, so does the voice dictation feature.

    Since the speech recognition works offline, I'm surprised that voice dictation is is online-only. You would think they would be able to have a (degraded) version of voice dictation.offline as well.

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Gary Beene View Post
    Howdy, Rodney!
    I've found several comments on the web where other users find, as I have, that Voice Dictation gives better accuracy than Speech Recognition.
    I'd guess that that is because Dictation uses a lot more resources. Apparently, Dictation uses on-line Speech Recognition - everything is sent to MS servers for the speech to text conversion. Ordinary Speech Recognition just uses what is available on your computer..


    Click image for larger version

Name:	Dictation.jpg
Views:	267
Size:	27.8 KB
ID:	813363

    Leave a comment:


  • Gary Beene
    replied
    Howdy, Rodney!
    I've found several comments on the web where other users find, as I have, that Voice Dictation gives better accuracy than Speech Recognition.

    Leave a comment:


  • Rodney Hicks
    replied
    Speech recognition is what I'm using, didn't know there was some other concept(voice dictation).
    I get what you get when I press WinKey+H, apparently a different animal, in a different but overlapping habitat.

    Leave a comment:


  • Gary Beene
    replied
    From what I am reading, Windows offers two separate features: "Speech Recognition" and "Voice Dictation".

    Speech Recognition appears to be the broader capability to parse incoming speech for various commands as well as entering text into an edit control "Voice Dictation" is limited to placing text in an edit control, but does have a few formatting command-recognition capabilities.

    In my limited testing, "Voice Dictation" surprisingly appears to have a better accuracy than "Speech Recognition".

    With "Speech Recognition", I do open excel have access to more features. For example, "Open Excel" opens the Excel app for me.

    I continue to be pleasantly surprised at how accurate the "Voice Dictation" appears to be.

    Leave a comment:


  • Gary Beene
    replied
    Howdy, Rodney!

    Win10 Pro. 10.0.19043 Build 19043.

    I can get the same two windows on both machines I've tried. Both updated within the last hour.

    What happens when you press WinKey+H?

    Leave a comment:


  • Rodney Hicks
    replied
    Must have different versions, mine is Version 21h1 (OS Build 19043.1415)
    Yours seems to be an older version, methinks, mine is on Windows 10.
    Click image for larger version

Name:	speech.gif
Views:	298
Size:	10.6 KB
ID:	813341

    Leave a comment:


  • Gary Beene
    replied
    Howdy, Rodney!

    More information ...

    So in Settings, under Speech, is a setting "Turn On Speech Recognition". Mine is off. It stays off when I use WinKey+H to start speech recognition in the baby app above.

    If I set "Turn on Speech Recognition" to ON, then I get another window which does have the context menu you mention. It also says "Listening", just like the bar I mention. Here's a picture of both:


    Click image for larger version  Name:	pb_2283.jpg Views:	0 Size:	19.5 KB ID:	813337

    Nothing I've read mentioned the possibility of 2 different windows. I'll have to go read more.

    ... added ... with either Window visible, speaking will add text to the textbox into either of our apps.

    Leave a comment:


  • Gary Beene
    replied
    Howdy, Rodney!

    The bar across the top, the one that appears when WinKey+H is pressed, has no context menu. It has the picture of a little microphone and an "X" button. If I click the microphone icon the bar, "listening" is toggled.

    We must not be talking about the same thing.

    ... added... my CPU stays at about 2% regardless of whether the speech bar is on or note.

    Leave a comment:


  • Rodney Hicks
    replied
    Right click on the speach bar, the first three options are
    On - Listen to everything
    Sleep - Listen for "Start Listening"
    Off- Do not listen to anything
    There is a corresponding "Stop listening" for the second of the three.
    Once the speech to text engine is on, the user has control of it with the start / stop listening feature.
    I find that if it is in sleep mode, it uses a fair amount of system resources so I turn it off if I won't be using it for an extended period.

    Sorry, I didn't see that you had two posts here.
    If the engine is operating smoothly, which can take a while, depending on the amount of mead consumed, in your baby app, if you say "Clear" the speech engine will "CLICK" that button, no code necessary, likewise the "Speech" button. You can even use menus in this manner.

    Leave a comment:


  • Gary Beene
    replied
    Howdy Rodney!

    Here's a baby app that I'm using to test. The "Speech" button toggles the speech listening bar on and off. Both buttons give focus to the textbox to give the speech a location to be placed.

    As I mentioned, I don't know yet how to keep the listening bar active. I'm searching for that now.

    Click image for larger version  Name:	pb_2282.jpg Views:	1 Size:	10.5 KB ID:	813330

    Code:
    #Compile Exe
    #Dim All
    %Unicode = 1
    #Include "Win32API.inc"
    
    Enum Equates Singular
       IDC_Button = 500
       IDC_Clear
       IDC_TextBox
    End Enum
    
    Global hDlg As Dword
    
    Function PBMain() As Long
       Dialog Default Font "Arial Black", 14, 0
       Dialog New Pixels, 0, "gbSpeech",300,300,250,210, %WS_OverlappedWindow To hDlg
       Control Add Button, hDlg, %IDC_Button,"Speech", 10,10,100,25
       Control Add Button, hDlg, %IDC_Clear,"Clear", 120,10,65,25
       Control Add TextBox, hDlg, %IDC_TextBox, "This is a test", 10, 40, 230, 160, %ES_MultiLine Or %ES_WantReturn
       Dialog Show Modal hDlg Call DlgProc
    End Function
    
    CallBack Function DlgProc() As Long
       Select Case Cb.Msg
          Case %WM_Command
             Select Case Cb.Ctl
                Case %IDC_Button
                   Control Set Focus hDlg, %IDC_TextBox
                   ToggleSpeech
                Case %IDC_Clear
                   Control Set Text hDlg, %IDC_TextBox, ""
                   Control Set Focus hDlg, %IDC_TextBox
                Case %IdCancel
                   Dialog End hDlg
             End Select
       End Select
    End Function
    
    Sub ToggleSpeech
       keybd_event(%VK_LWIN, &H45, 0, 0)
       keybd_event(%VK_H, &H45, 0, 0)
       keybd_event(%VK_H, &H45, %KEYEVENTF_KEYUP, 0)
       keybd_event(%VK_LWIN, &H45, %KEYEVENTF_KEYUP, 0)
    End Sub

    Leave a comment:


  • Gary Beene
    replied
    Howdy, Rodney!

    Sorry for letting you have a one-man conversation for so long! I got distracted by other stuff but am interested in what you've done.

    You're right about that the speech recognition seem to be good. In my tests today, it definitely seems to give better translation results than the original code from Jim back in 2014.

    There are several things I don't understand about your code.

    Before you start your app don't you have to start the Win10 Voice Recognition, manually? How do you do that? I've been using the Winkey+H key shortcut to start the listening bar.. How do you start the bar?

    In your Help, you say that speaking various words will be the same as pushing the corresponding button. I don't see in your code where you capture those words.

    When speech recognition is started, there is a bar that appears across the top of the Desktop. After a brief period of inactivity, that bar turns off - stops "listening". I have to click the bar to get listening to start up again.

    Why does it do that? I don't want it to turn off until I tell it to shut down. Have you seen a way to keep it from turning off?

    I'd like the bar to not display at all, and the listening to stay active until I close the bar.

    I suppose I could try to capture a handle to the bar and reposition it out of sight or just make it not visible But the more important issue is that it turns off. I haven't found documentation about why it does that.





    Leave a comment:

Working...
X
😀
🥰
🤢
😎
😡
👍
👎