Announcement

Collapse
No announcement yet.

Speech to Text Recognition Issues

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Speech to Text Recognition Issues

    I've been playing with the Microsoft SAPI of late (sample code below), but am not as happy with it as I'd like.

    For one, the accuracy is nothing to write home about. I looked around but did not find any adjustments/tips discussed for getting better results. Has anyone else found any suggestions that might help with the accuracy?

    I'd consider using an alternative to SAPI. I don't need a system-wide speech engine - just something that I can use in a specific application that I'm writing.

    Another thing is that the code example does not close cleanly/quickly - the example sometimes goes into the "not responding" mode. I looked at the SAPI pages but did not find any kind of shutdown commands. Is anyone aware of a cleaner way to stop SAPI?


    Code:
    'Compilable Example:
    #Compiler PBWin 10
    #Compile Exe  "gbvoicemail.exe"
    #Dim All
    %Unicode = 1
    #Include "Win32API.inc"
    #Include "sapi.inc"
    
    %IDC_Body = 500
    Global hDlg As Dword
    
    Global SpVoice        As ISpVoice
    Global oRecoContext   As ISpeechRecoContext
    Global oRecognizer    As ISpeechRecognizer
    Global oMyGrammar     As ISpeechRecoGrammar
    Global oCategory      As ISpeechObjectTokenCategory
    Global oToken         As ISpeechObjectToken
    Global InProcEvents   As ISpeechRecoContextEventsImplemented
    
    Function PBMain() As Long
       Dialog Font "Tahoma",12,0
       Dialog New Pixels, 0, "Speech-To-Text",300,300,600,200, %WS_OverlappedWindow To hDlg
       Control Add Label, hDlg, %IDC_Body,"Voice Test", 0,0,600,200
       Dialog Show Modal hDlg Call DlgProc
    End Function
    
    CallBack Function DlgProc() As Long
       Select Case Cb.Msg
          Case %WM_InitDialog
             InitializeSpeechRecognition
       End Select
    End Function
    
    
    Sub InitializeSpeechRecognition
       oRecoContext = NewCom "SAPI.SpInProcRecoContext"              'Create an instance of the ISpeechRecoContext Interface
       InProcEvents = Class "CISpeechRecoContextEventsImplemented"   'Link the events of oRecoContext to InProcEvents to process a recognition event.
       Events From oRecoContext Call InProcEvents
       oRecognizer = oRecoContext.Recognizer            'Create the InProc Speech Recognizer.
       oMyGrammar = oRecoContext.CreateGrammar(1)       'Create the InProc Speech Grammar.
       oMyGrammar.State = %SGSDisabled                  'Disable Grammar while loading it.
       oMyGrammar.DictationLoad("", %SLOstatic)         'Load the default Dictation Grammar.
       oMyGrammar.State = %SGSEnabled                   'Enable Grammar after loading it.
       oMyGrammar.DictationSetState(%SGDSInactive)      'Turn Dictation off.
       oCategory = NewCom "SAPI.SpObjectTokenCategory"  'Create the Audio Token Category.
       oCategory.SetId("HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput")   'Set the Audio Token category ID.
       oToken = NewCom "SAPI.SpObjectToken"             'Create the Audio Token.
       oToken.SetId(oCategory.Default)                  'Set the Token Category ID.
       oRecognizer.PutRef_AudioInput = oToken           'Give the Recognizer the Token.
       oMyGrammar.DictationSetState(%SGDSActive)        'Start the recognition by turning Dictation back on.
       'oRecoContext.Recognizer.EmulateRecognition("Recognition has started")   'Emmulate recognition to test the interface.
       spVoice = NewCom $PROGID_SpVoice1                'Create an instance of the ISpeechVoice Interface.
    End Sub
    
    
    Class CISpeechRecoContextEventsImplemented Guid$("{5B344ADB-C0C7-4B5F-8046-7D2DB91A1D75}") As Event
       ' ########################################################################################
       ' Class CISpeechRecoContextEvents
       ' Interface name = _ISpeechRecoContextEvents
       ' IID = {7B8FCB42-0E9D-4F00-A048-7B04D6179D3D}
       ' Attributes = 4096 [&H1000] [Dispatchable]
       ' ########################################################################################
       Interface ISpeechRecoContextEventsImplemented Guid$("{7B8FCB42-0E9D-4F00-A048-7B04D6179D3D}") As Event
         Inherit IDispatch
          Method Recognition <7> ( _
            ByVal StreamNumber As Long _                       ' __in long StreamNumber
          , ByVal StreamPosition As Variant _                  ' __in VARIANT StreamPosition
          , ByVal RecognitionType As Long _                    ' __in SpeechRecognitionType RecognitionType
          , ByVal Result As ISpeechRecoResult _                ' __in ISpeechRecoResult *Result
          )                                                    ' void
             Local pDisp As IDispatch, bstrText As WString
             If IsNothing(Result) Then Exit Method
             pDisp = Result
             Object Call pDisp.PhraseInfo.GetText To bstrText
             If ObjResult Then
                ? "GetText error: " & ObjResult$
             Else
                If Len(bstrText) Then
                   Control Set Text hDlg, %IDC_Body, bstrText 'display all text
                   oRecoContext.Pause()                       'Pause Recognition
                   oRecoContext.Resume()                      'Resume Recognition
                End If
             End If
          End Method
       End Interface
    End Class

  • #2
    In the column of "Bad Service", I tried to reach the folks at Nuance to talk about their Dragon Speaking SDK product.

    Three dropped/mis-directed phone calls later I got stuck in a voice mail.

    One of the support folks was willing to transfer me, but only if I wrote down the number in case we got disconnected - as though she knew a disconnect was likely.

    Another said she thought the person I needed to talk to might be leaving shortly, but would I please answer some questions first. I declined, which is perhaps why I didn't get connected correctly?

    So I went to their online site to request a call, only to be faced with a 12-field form to fill out - all of which were 'required' entries. I gave my basic contact information and stuck in a bunch of gibberish in the fields I didn't want to answer.

    Not a good start.

    Comment


    • #3
      Three dropped/mis-directed phone calls later I got stuck in a voice mail.
      Ah, but based upon whom you called, there's probably a text record of what you said!

      Hmmm., I was just thinking about what I might say if I got stuck in "voice mail jail." Some of what I might say is spelled "<expletive deleted>"

      You'll have to make sure whatever product you choose can handle that!

      Comment


      • #4
        FWIW, I got a call yesterday from DNS trying to get me to buy their most recent version, (13 I think she said), at half price. Telling me that it no longer required the teaching to make it work to your voice inflections, amongst other improvements.
        Years ago I inquired about their SDK but I found their 5 figure price out of proportion for my purposes, wallet, and need. They may have made changes to that since then.

        Aside:
        The spell checker for this forum tells me I spelt amongst rong!
        Rod
        "To every unsung hero in the universe
        To those who roam the skies and those who roam the earth
        To all good men of reason may they never thirst " - from "Heaven Help the Devil" by G. Lightfoot

        Comment


        • #5
          Hi Rodney,
          Yes, I got a call back from Nuance today. The $5K fee, plus per "speaker" costs, to use their Dragon SDK is likewise way more than I can afford. Bummer.

          They will have one of their experts call me to talk about my needs in more detail. I'll quiz them about what options small businesses have.

          Comment


          • #6
            It was $25,000 when I talked to them a way back when. I suspect they have different levels of depth of their SDK for lower prices now.
            Rod
            "To every unsung hero in the universe
            To those who roam the skies and those who roam the earth
            To all good men of reason may they never thirst " - from "Heaven Help the Devil" by G. Lightfoot

            Comment


            • #7
              Resurrecting this thread ...

              Now, 2 years later, I thought I'd take another look, so I've called Nuance and ask for their folks to call me back. I'll post new information as I get it.

              Hoping that the latest update to Win10 might have an improved SAPI, I tried the code from #1 above. I can't say that the results are any better than before. Has anyone else used it lately and gotten results worth using?

              Comment


              • #8
                I also did some looking around for an alternate speech-to-textt API to use in a PowerBASIC app. It seems to me that the number of companies/options has dwindled. And it seems that there is a focus on cloud-based, pay-by-the-minute, speech-to-text conversion.

                If all I'm interested in is allowing a user to speak and have that text put in a textbox/richedit control, it may be that requiring each user to purchase a copy of Dragon Home and run that in the background while the PowerBASIC app is being used. Hmm... recognizing commands, separate from inserting speech, is also useful.

                I guess I'll contact tech support at Dragon Speaking and discuss my needs with them to see if Dragon Home can be used as I want.

                Comment


                • #9
                  Yes, I'm still interested in speech-to-text and have decided to focus on Dragon for now.

                  I contacted Nuance and talked them into giving me an evaluation copy of their SDK. I've not installed it yet, but will soon. I'll post updates as I know more about it. From what they described, it is no walk in the park to integrate it into a PowerBASIC app, but hopefully as I know more about it, that mountain won't see so tall!

                  I also went ahead and bought a copy earlier today of Dragon Home v15 for $150.

                  I want to evaluate both approaches - using SDK in my app and having a user's copy of Dragon working in the background.

                  The SDK should provide more capability and control, but the Dragon app may be much easier to implement.



                  Comment


                  • #10
                    I installed Dragon Home v15, with no installation issues. It took about 30 minutes, including the "training" where I read some text out loud.

                    I recall reading somewhere that the training can be skipped and Dragon will still perform well out of the box. If true, that's quite useful because it demands less from a user of one of our apps. For computer-literate users, it's not such a big deal. But for computer-challenged users, the Dragon installation and training can be daunting. I'll try to confirm that the no-training option is possible.

                    I have a big Yeti microphone, but for this evaluation I'm using the microphone that is part of my Logitech 920 webcam. So far, it seems to work just fine with Dragon although II do seem to need to speak a bit louder than my normal voice to get the best results.

                    To try out Dragon, I created a simple Dialog+TextBox and set focus to the TextBox. So far, Dragon has worked quite well, putting my spoken words into the TextBox. I've found only one or two words that I had to correct, and those were homonyms.

                    Likewise, when I had Dragon read text content to me, it accurately read the words.

                    So on a first pass, Dragon certainly promises to be useful - as long as a customer can afford the $$.

                    Comment


                    • #11
                      Ok, after all the recent Win10 updates, I could not resist but to try the Win10 speech recognition again this evening. In my tests, it is still horribly inaccurate. How anyone seen a success that is eluding me?

                      Comment


                      • #12
                        After purchasing several version, I have wondered for quite some time about the quality of the Version of Dragon, be it Home or any other version. Subsequent versions never seem to address the most glaring issues with the program. I have come to the conclusion that the company employs a built in obstruction to some forms of usage. The fact that they have pricey APIs for sale backs up the idea.

                        It actually makes sense, not much different than options on automobiles, from a business perspective.
                        Rod
                        "To every unsung hero in the universe
                        To those who roam the skies and those who roam the earth
                        To all good men of reason may they never thirst " - from "Heaven Help the Devil" by G. Lightfoot

                        Comment

                        Working...
                        X