Announcement

Collapse
No announcement yet.

Speech to Text Recognition Issues

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Speech to Text Recognition Issues

    I've been playing with the Microsoft SAPI of late (sample code below), but am not as happy with it as I'd like.

    For one, the accuracy is nothing to write home about. I looked around but did not find any adjustments/tips discussed for getting better results. Has anyone else found any suggestions that might help with the accuracy?

    I'd consider using an alternative to SAPI. I don't need a system-wide speech engine - just something that I can use in a specific application that I'm writing.

    Another thing is that the code example does not close cleanly/quickly - the example sometimes goes into the "not responding" mode. I looked at the SAPI pages but did not find any kind of shutdown commands. Is anyone aware of a cleaner way to stop SAPI?


    Code:
    'Compilable Example:
    #Compiler PBWin 10
    #Compile Exe  "gbvoicemail.exe"
    #Dim All
    %Unicode = 1
    #Include "Win32API.inc"
    #Include "sapi.inc"
    
    %IDC_Body = 500
    Global hDlg As Dword
    
    Global SpVoice        As ISpVoice
    Global oRecoContext   As ISpeechRecoContext
    Global oRecognizer    As ISpeechRecognizer
    Global oMyGrammar     As ISpeechRecoGrammar
    Global oCategory      As ISpeechObjectTokenCategory
    Global oToken         As ISpeechObjectToken
    Global InProcEvents   As ISpeechRecoContextEventsImplemented
    
    Function PBMain() As Long
       Dialog Font "Tahoma",12,0
       Dialog New Pixels, 0, "Speech-To-Text",300,300,600,200, %WS_OverlappedWindow To hDlg
       Control Add Label, hDlg, %IDC_Body,"Voice Test", 0,0,600,200
       Dialog Show Modal hDlg Call DlgProc
    End Function
    
    CallBack Function DlgProc() As Long
       Select Case Cb.Msg
          Case %WM_InitDialog
             InitializeSpeechRecognition
       End Select
    End Function
    
    
    Sub InitializeSpeechRecognition
       oRecoContext = NewCom "SAPI.SpInProcRecoContext"              'Create an instance of the ISpeechRecoContext Interface
       InProcEvents = Class "CISpeechRecoContextEventsImplemented"   'Link the events of oRecoContext to InProcEvents to process a recognition event.
       Events From oRecoContext Call InProcEvents
       oRecognizer = oRecoContext.Recognizer            'Create the InProc Speech Recognizer.
       oMyGrammar = oRecoContext.CreateGrammar(1)       'Create the InProc Speech Grammar.
       oMyGrammar.State = %SGSDisabled                  'Disable Grammar while loading it.
       oMyGrammar.DictationLoad("", %SLOstatic)         'Load the default Dictation Grammar.
       oMyGrammar.State = %SGSEnabled                   'Enable Grammar after loading it.
       oMyGrammar.DictationSetState(%SGDSInactive)      'Turn Dictation off.
       oCategory = NewCom "SAPI.SpObjectTokenCategory"  'Create the Audio Token Category.
       oCategory.SetId("HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput")   'Set the Audio Token category ID.
       oToken = NewCom "SAPI.SpObjectToken"             'Create the Audio Token.
       oToken.SetId(oCategory.Default)                  'Set the Token Category ID.
       oRecognizer.PutRef_AudioInput = oToken           'Give the Recognizer the Token.
       oMyGrammar.DictationSetState(%SGDSActive)        'Start the recognition by turning Dictation back on.
       'oRecoContext.Recognizer.EmulateRecognition("Recognition has started")   'Emmulate recognition to test the interface.
       spVoice = NewCom $PROGID_SpVoice1                'Create an instance of the ISpeechVoice Interface.
    End Sub
    
    
    Class CISpeechRecoContextEventsImplemented Guid$("{5B344ADB-C0C7-4B5F-8046-7D2DB91A1D75}") As Event
       ' ########################################################################################
       ' Class CISpeechRecoContextEvents
       ' Interface name = _ISpeechRecoContextEvents
       ' IID = {7B8FCB42-0E9D-4F00-A048-7B04D6179D3D}
       ' Attributes = 4096 [&H1000] [Dispatchable]
       ' ########################################################################################
       Interface ISpeechRecoContextEventsImplemented Guid$("{7B8FCB42-0E9D-4F00-A048-7B04D6179D3D}") As Event
         Inherit IDispatch
          Method Recognition <7> ( _
            ByVal StreamNumber As Long _                       ' __in long StreamNumber
          , ByVal StreamPosition As Variant _                  ' __in VARIANT StreamPosition
          , ByVal RecognitionType As Long _                    ' __in SpeechRecognitionType RecognitionType
          , ByVal Result As ISpeechRecoResult _                ' __in ISpeechRecoResult *Result
          )                                                    ' void
             Local pDisp As IDispatch, bstrText As WString
             If IsNothing(Result) Then Exit Method
             pDisp = Result
             Object Call pDisp.PhraseInfo.GetText To bstrText
             If ObjResult Then
                ? "GetText error: " & ObjResult$
             Else
                If Len(bstrText) Then
                   Control Set Text hDlg, %IDC_Body, bstrText 'display all text
                   oRecoContext.Pause()                       'Pause Recognition
                   oRecoContext.Resume()                      'Resume Recognition
                End If
             End If
          End Method
       End Interface
    End Class

  • #2
    In the column of "Bad Service", I tried to reach the folks at Nuance to talk about their Dragon Speaking SDK product.

    Three dropped/mis-directed phone calls later I got stuck in a voice mail.

    One of the support folks was willing to transfer me, but only if I wrote down the number in case we got disconnected - as though she knew a disconnect was likely.

    Another said she thought the person I needed to talk to might be leaving shortly, but would I please answer some questions first. I declined, which is perhaps why I didn't get connected correctly?

    So I went to their online site to request a call, only to be faced with a 12-field form to fill out - all of which were 'required' entries. I gave my basic contact information and stuck in a bunch of gibberish in the fields I didn't want to answer.

    Not a good start.

    Comment


    • #3
      Three dropped/mis-directed phone calls later I got stuck in a voice mail.
      Ah, but based upon whom you called, there's probably a text record of what you said!

      Hmmm., I was just thinking about what I might say if I got stuck in "voice mail jail." Some of what I might say is spelled "<expletive deleted>"

      You'll have to make sure whatever product you choose can handle that!

      Comment


      • #4
        FWIW, I got a call yesterday from DNS trying to get me to buy their most recent version, (13 I think she said), at half price. Telling me that it no longer required the teaching to make it work to your voice inflections, amongst other improvements.
        Years ago I inquired about their SDK but I found their 5 figure price out of proportion for my purposes, wallet, and need. They may have made changes to that since then.

        Aside:
        The spell checker for this forum tells me I spelt amongst rong!
        Rod
        "To every unsung hero in the universe
        To those who roam the skies and those who roam the earth
        To all good men of reason may they never thirst " - from "Heaven Help the Devil" by G. Lightfoot

        Comment


        • #5
          Hi Rodney,
          Yes, I got a call back from Nuance today. The $5K fee, plus per "speaker" costs, to use their Dragon SDK is likewise way more than I can afford. Bummer.

          They will have one of their experts call me to talk about my needs in more detail. I'll quiz them about what options small businesses have.

          Comment


          • #6
            It was $25,000 when I talked to them a way back when. I suspect they have different levels of depth of their SDK for lower prices now.
            Rod
            "To every unsung hero in the universe
            To those who roam the skies and those who roam the earth
            To all good men of reason may they never thirst " - from "Heaven Help the Devil" by G. Lightfoot

            Comment


            • #7
              Resurrecting this thread ...

              Now, 2 years later, I thought I'd take another look, so I've called Nuance and ask for their folks to call me back. I'll post new information as I get it.

              Hoping that the latest update to Win10 might have an improved SAPI, I tried the code from #1 above. I can't say that the results are any better than before. Has anyone else used it lately and gotten results worth using?

              Comment


              • #8
                I also did some looking around for an alternate speech-to-textt API to use in a PowerBASIC app. It seems to me that the number of companies/options has dwindled. And it seems that there is a focus on cloud-based, pay-by-the-minute, speech-to-text conversion.

                If all I'm interested in is allowing a user to speak and have that text put in a textbox/richedit control, it may be that requiring each user to purchase a copy of Dragon Home and run that in the background while the PowerBASIC app is being used. Hmm... recognizing commands, separate from inserting speech, is also useful.

                I guess I'll contact tech support at Dragon Speaking and discuss my needs with them to see if Dragon Home can be used as I want.

                Comment

                Working...
                X