Announcement

Collapse
No announcement yet.

Speakeasy Discussion

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Speakeasy Discussion

    This thread is for discussion of Speakeasy.
    Rod
    In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

    #2
    Howdy, Rodney!

    Sorry for letting you have a one-man conversation for so long! I got distracted by other stuff but am interested in what you've done.

    You're right about that the speech recognition seem to be good. In my tests today, it definitely seems to give better translation results than the original code from Jim back in 2014.

    There are several things I don't understand about your code.

    Before you start your app don't you have to start the Win10 Voice Recognition, manually? How do you do that? I've been using the Winkey+H key shortcut to start the listening bar.. How do you start the bar?

    In your Help, you say that speaking various words will be the same as pushing the corresponding button. I don't see in your code where you capture those words.

    When speech recognition is started, there is a bar that appears across the top of the Desktop. After a brief period of inactivity, that bar turns off - stops "listening". I have to click the bar to get listening to start up again.

    Why does it do that? I don't want it to turn off until I tell it to shut down. Have you seen a way to keep it from turning off?

    I'd like the bar to not display at all, and the listening to stay active until I close the bar.

    I suppose I could try to capture a handle to the bar and reposition it out of sight or just make it not visible But the more important issue is that it turns off. I haven't found documentation about why it does that.





    Comment


      #3
      Howdy Rodney!

      Here's a baby app that I'm using to test. The "Speech" button toggles the speech listening bar on and off. Both buttons give focus to the textbox to give the speech a location to be placed.

      As I mentioned, I don't know yet how to keep the listening bar active. I'm searching for that now.

      Click image for larger version  Name:	pb_2282.jpg Views:	1 Size:	10.5 KB ID:	813330

      Code:
      #Compile Exe
      #Dim All
      %Unicode = 1
      #Include "Win32API.inc"
      
      Enum Equates Singular
         IDC_Button = 500
         IDC_Clear
         IDC_TextBox
      End Enum
      
      Global hDlg As Dword
      
      Function PBMain() As Long
         Dialog Default Font "Arial Black", 14, 0
         Dialog New Pixels, 0, "gbSpeech",300,300,250,210, %WS_OverlappedWindow To hDlg
         Control Add Button, hDlg, %IDC_Button,"Speech", 10,10,100,25
         Control Add Button, hDlg, %IDC_Clear,"Clear", 120,10,65,25
         Control Add TextBox, hDlg, %IDC_TextBox, "This is a test", 10, 40, 230, 160, %ES_MultiLine Or %ES_WantReturn
         Dialog Show Modal hDlg Call DlgProc
      End Function
      
      CallBack Function DlgProc() As Long
         Select Case Cb.Msg
            Case %WM_Command
               Select Case Cb.Ctl
                  Case %IDC_Button
                     Control Set Focus hDlg, %IDC_TextBox
                     ToggleSpeech
                  Case %IDC_Clear
                     Control Set Text hDlg, %IDC_TextBox, ""
                     Control Set Focus hDlg, %IDC_TextBox
                  Case %IdCancel
                     Dialog End hDlg
               End Select
         End Select
      End Function
      
      Sub ToggleSpeech
         keybd_event(%VK_LWIN, &H45, 0, 0)
         keybd_event(%VK_H, &H45, 0, 0)
         keybd_event(%VK_H, &H45, %KEYEVENTF_KEYUP, 0)
         keybd_event(%VK_LWIN, &H45, %KEYEVENTF_KEYUP, 0)
      End Sub

      Comment


        #4
        Right click on the speach bar, the first three options are
        On - Listen to everything
        Sleep - Listen for "Start Listening"
        Off- Do not listen to anything
        There is a corresponding "Stop listening" for the second of the three.
        Once the speech to text engine is on, the user has control of it with the start / stop listening feature.
        I find that if it is in sleep mode, it uses a fair amount of system resources so I turn it off if I won't be using it for an extended period.

        Sorry, I didn't see that you had two posts here.
        If the engine is operating smoothly, which can take a while, depending on the amount of mead consumed, in your baby app, if you say "Clear" the speech engine will "CLICK" that button, no code necessary, likewise the "Speech" button. You can even use menus in this manner.
        Rod
        In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

        Comment


          #5
          Howdy, Rodney!

          The bar across the top, the one that appears when WinKey+H is pressed, has no context menu. It has the picture of a little microphone and an "X" button. If I click the microphone icon the bar, "listening" is toggled.

          We must not be talking about the same thing.

          ... added... my CPU stays at about 2% regardless of whether the speech bar is on or note.

          Comment


            #6
            Howdy, Rodney!

            More information ...

            So in Settings, under Speech, is a setting "Turn On Speech Recognition". Mine is off. It stays off when I use WinKey+H to start speech recognition in the baby app above.

            If I set "Turn on Speech Recognition" to ON, then I get another window which does have the context menu you mention. It also says "Listening", just like the bar I mention. Here's a picture of both:


            Click image for larger version  Name:	pb_2283.jpg Views:	0 Size:	19.5 KB ID:	813337

            Nothing I've read mentioned the possibility of 2 different windows. I'll have to go read more.

            ... added ... with either Window visible, speaking will add text to the textbox into either of our apps.

            Comment


              #7
              Must have different versions, mine is Version 21h1 (OS Build 19043.1415)
              Yours seems to be an older version, methinks, mine is on Windows 10.
              Click image for larger version

Name:	speech.gif
Views:	297
Size:	10.6 KB
ID:	813341
              Rod
              In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

              Comment


                #8
                Howdy, Rodney!

                Win10 Pro. 10.0.19043 Build 19043.

                I can get the same two windows on both machines I've tried. Both updated within the last hour.

                What happens when you press WinKey+H?

                Comment


                  #9
                  From what I am reading, Windows offers two separate features: "Speech Recognition" and "Voice Dictation".

                  Speech Recognition appears to be the broader capability to parse incoming speech for various commands as well as entering text into an edit control "Voice Dictation" is limited to placing text in an edit control, but does have a few formatting command-recognition capabilities.

                  In my limited testing, "Voice Dictation" surprisingly appears to have a better accuracy than "Speech Recognition".

                  With "Speech Recognition", I do open excel have access to more features. For example, "Open Excel" opens the Excel app for me.

                  I continue to be pleasantly surprised at how accurate the "Voice Dictation" appears to be.

                  Comment


                    #10
                    Speech recognition is what I'm using, didn't know there was some other concept(voice dictation).
                    I get what you get when I press WinKey+H, apparently a different animal, in a different but overlapping habitat.
                    Rod
                    In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

                    Comment


                      #11
                      Howdy, Rodney!
                      I've found several comments on the web where other users find, as I have, that Voice Dictation gives better accuracy than Speech Recognition.

                      Comment


                        #12
                        Originally posted by Gary Beene View Post
                        Howdy, Rodney!
                        I've found several comments on the web where other users find, as I have, that Voice Dictation gives better accuracy than Speech Recognition.
                        I'd guess that that is because Dictation uses a lot more resources. Apparently, Dictation uses on-line Speech Recognition - everything is sent to MS servers for the speech to text conversion. Ordinary Speech Recognition just uses what is available on your computer..


                        Click image for larger version

Name:	Dictation.jpg
Views:	267
Size:	27.8 KB
ID:	813363
                        =========================
                        https://camcopng.com
                        =========================

                        Comment


                          #13
                          Howdy, Stuart!

                          Yes, that's a good point. Just as my smart phone uses server-based voice recognition, so does the voice dictation feature.

                          Since the speech recognition works offline, I'm surprised that voice dictation is is online-only. You would think they would be able to have a (degraded) version of voice dictation.offline as well.

                          Comment


                            #14
                            Perhaps they see the speech recognition as an offline version of voice dictation, just giving it a different name to avoid confusion. This explains why my speech recognition app takes more of my computer's resources than does the voice dictation.
                            Rod
                            In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

                            Comment


                              #15
                              Rodney,
                              Sorry for going off-topic a bit here ... the "baby app" in #3 demonstrates another problem - that the voice dictation puts text into whichever textbox has focus. And when no textbox has focus, dictation output is lost.

                              I have this vision of a "Voice Monitor (VM)" application that runs in the background, continuously monitoring all that is said and then broadcasting a message to PowerBASIC apps when a command of interest is detected. Both the VM and the PowerBASIC app would have to agree on what commands can be sent/recognized.

                              I checked in with the Dragon folks and they do provide the ability to issue commands to some major applications, such as Word and Excel. But their product does not allow sending commands to other applications in general. Bummer that.

                              Comment


                                #16
                                A stab in the dark here. Regarding the textbox not having focus, perhaps a thread with a higher priority that has a textbox that maintains focus or that gets focus on sound input?

                                I think your 'vision' may be doable but it may lack quick response, depending on the number of different responses possible. I can't do any testing of any voice inspired code at the present time, no microphone currently attached(too many projects happening).

                                I think you could buy Dragon's API for many thousands of dollars and it would do the trick.
                                Rod
                                In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

                                Comment


                                  #17
                                  Howdy, Rodney!

                                  Dragon told me yesterday me they no longer offer the API for use with 3rd party apps. Have you seen differently?

                                  The thing about the dictation that slows it down is that the audio is sent to MS Servers, converted, and sent back as text. A local dictation library might be faster, although not necessarily as accurate.

                                  I'd expect an INSTR search to offer no slowdown in parsing the incoming text, even if testing against hundreds of command strings.

                                  Comment


                                    #18
                                    I'd expect an INSTR search to offer no slowdown in parsing the incoming text, even if testing against hundreds of command strings.​
                                    Depending on the number of "command strings" you need to search you might find a binary search fast and useful.

                                    Binary Search of an array February 14 2000, July 15 2003

                                    (Post 2 that thread is the PB/Windows update) (from PB-DOS, post 1)

                                    Michael Mattias
                                    Tal Systems (retired)
                                    Port Washington WI USA
                                    [email protected]
                                    http://www.talsystems.com

                                    Comment


                                      #19
                                      I know nothing new about Dragon, haven't used them since they wanted $25,000 for their API.

                                      Gary, I think the accuracy is going to depend on the user and their equipment and how well they use their equipment. That will apply to both server side and local arenas. I know I had one microphone that had me talking to myself and I never knew that I knew such language.

                                      Your concept needs to be consistent in getting the input first, then the parsing, perhaps creating an queue of uttered commands.
                                      Rod
                                      In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

                                      Comment


                                        #20
                                        Howdy, Rodney!

                                        I spent some more time today looking at the various speech-to-text apps and API out there. I've come to the conclusion my time will be best spent on working out a best-I-can-do solution to a Voice Monitor using the free Microsoft capabilities. Free and available to all users is hard to beat.

                                        Comment

                                        Working...
                                        X
                                        😀
                                        🥰
                                        🤢
                                        😎
                                        😡
                                        👍
                                        👎