Announcement

Collapse
No announcement yet.

Voice to text

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Voice to text

    I'm posting this in the Hardware/Software forum because I'm hoping to find an off-the-shelf solution.

    I have a collection of 16 "letter to home" audio cassettes that were recorded by my eldest brother when he was stationed in Vietnam; his mom saved them. I've had 3 digitized (via legacybox.com), and I have tuned them up with Adobe Audition. Level balancing, some eq, pop/click removal, etc. My long term goal is to do all 16 and submit them to the U.S. Veterans Legacy Project, which already has a videotape of our dad being interviewed about his experiences as a pilot in WWII.

    My question: How good has voice recognition software become? I assume that most of it still requires "training" with a pre-written script, but then again, I can have a voice conversation with an untrained automated system over the phone. The tapes sound better than most phone connections... Is there anything out there that would give me a basic transcript, even if it has to be hand-corrected?
    "Not my circus, not my monkeys."

  • #2
    Eric,
    Dragon Speaking (or their parent company) boasts something like 99% accuracy. I had reason recently to want a voice-to-text solution for a product I sell, but I thought the cost per seat was unreasonable (for a solution to incorporate into my apps). They told me that they were focusing almost entirely on mobile products and did not expect to offer anything new for PC-based apps.

    But they do offer a PC version and if it gets anywhere close to their claims, it sounds like it will deliver a very usable solution, certainly for a manuscript that you're willing to edit. I think it runs about $100.

    I did find a few other solutions, but since I was looking for something to use in my apps, what I looked at does not fit your intended use.

    Perhaps someone here has Dragon Naturally Speaking and could try a tape for you, to help you make the decision?

    Comment


    • #3
      I would also record your cassettes digitally using the FLAC -- Free Lossless Audio Codec -- format
      (which is not very compressed but is lossless) to avoid excessive wear on the precious tapes or the danger that a tape might get chewed up whilst playing.
      For more normal listening (if that is wanted) Audacity or the like can easily get an mp3 or ogg from the FLAC files.

      I hope I am not stating the obvious, but these sound as if they are irreplaceable.

      Cheers, Mike.
      There are only two speeds for computers: fast enough, and too bloody slow.
      And there are 10 types of programmer -- those that know binary, and those that don't.

      Comment


      • #4
        Given how minimal training Siri, Alexa et al require these days in order to function properly, I'd say speak recognition, along with reading & answering related questions capabilities has come a long way forward.

        And as a developer, you might be even capable of throwing something together yourself, e.g. an Alexa Skill.

        I also think that you'll more likely find a working & affordable app for a mobile device than a PC program.

        Comment


        • #5
          Dragon Professional is suppose to work with some dication recoding devices. That info may get you headed in the right direction.
          p purvis

          Comment


          • #6
            Eric, have you already found a solution?
            I've tried a couple versions of Dragon NaturallySpeaking (although not the latest) for speech to text, both with microphone, and record to digital audio recorder then process .wav audio file. I did not find it that useful. Too many mis-translations that it fills in with the closest word it recognizes -- rather than mark a poorly recognized sound as unknown and let me fill it in during the editing process. So I got some unreadable sentences consisting of strings of all-correctly-spelled English words but which I could not always figure out the meaning, even though I was the original speaker.

            I think what you want is an audio transcription service. You upload a digital audio file and their human contract workers listen and type a transcript. Costs vary, depending on how many different people are speaking in the recording (you have just one speaker, your brother), and how technical is the material (legal, medical, etc cost more). Prices start around $1/minute. I don't know but given the major cost is the hourly labor of the workers, my guess is there some tradeoff of cost vs quality of the transcript (that is, how carefully is it proof-read and edited). WeScribeIt is one service ($1.50/minute) but there are many others. You are not in a rush, so choose a service that charges less in exchange for slower turnaround time.

            Google "Audio transcription service" and/or look at the blog post below, which lists several services, or this Quora answer, which some transcription employees list info about their services.
            https://blogging.org/blog/11-of-the-...rvices-online/
            https://www.quora.com/What-is-the-be...iption-service

            Comment


            • #7
              Also, for the cassette to digital file conversion. I've converted a couple of cassettes by connecting a cassette player to the LineIn audio input of a PC, and running audio recording software while playing the cassette. You said you have Adobe Audition so you have audio software. (For others interested in other audio recording and processing software, a couple other options are: Audacity is free; Total Recorder (High Criteria) is $18 or $36, plus more for some add-ons; and Sound Forge Audio Studio (formerly made by Sony Vegas, now owned by Magix), is $60

              When I did the cassette to PC input recording, with some PCs there was too much noise from the PC itself. Rather than try to fix this, I just used a different PC -- usually at least one PC I owned would record audio without adding in additional noise. Also sometimes running a portable cassette player on batteries, rather than on AC power, would reduce the noise generated by the cassette player itself.

              Comment


              • #8
                For another test just for what it worth, see if google applications with chrome can do something for you.
                http://smallbusiness.chron.com/set-u...ext-50812.html
                If you have too. You might want to try out a yeti blue microphone. I have bought many for speech recognition that I need to get back to work on.
                The yeti is about 110 dollars and even i saw some at Target that are the newer ones for just over that price.
                Good Luck with it all and I hope you write back.
                p purvis

                Comment


                • #9
                  Thanks guys, I will be returning to this project shortly and will let you know the results.
                  "Not my circus, not my monkeys."

                  Comment


                  • #10
                    Eric,
                    Have you tried rolling your own with SAPI?

                    Comment


                    • #11
                      https://cloud.google.com/speech/
                      i did not know about a paid for google voice service.
                      Right now it cost about 1.44 dollar amount per 60 minutes with 60 minutes free.
                      Jusr a FYI
                      p purvis

                      Comment


                      • #12
                        Just more FYI:
                        Paul posted about Google Cloud Speech transcription service ( I also did not know Google offered this service). I saw something that Amazon introduced a similar service, beta in December 2017, called Amazon Transcribe. https://aws.amazon.com/transcribe/

                        Pricing is similar: 60 minutes free per month for the first 12 months of use. and $0.0004 per second thereafter, which works out to $0.006 per 15 seconds of audio. Same or similar pricing as Google. Google's service can process files stored on Google Drive. Amazon's service processes files stored on Amazon AWS, such S3.

                        Amazon audio requirements: "The input file must be: In FLAC, MP3, MP4, or WAV file format Less than 2 hours in length You must specify the language and format of the input file. For best results: Use a lossless format, such as FLAC or WAV, with PCM 16-bit encoding. Use a sample rate of 8000 Hz for low-fidelity audio and 16000 Hz for high-fidelity audio. "

                        so, 1. Eric: An alternative to sending audio files to a human-typist transcription service (likely higher quality but higher cost), is to sign up for Google or Amazon transcription service, and see how the quality is, and hand-edit the text transcriptions yourself. Also, since both Google and Amazon right now are both offering the first 60 minutes per month free; these may be competition introductory offers to get customers to try the services, to try to win market share, and once the market is established, the free amount for a new user will be reduced.

                        2. Gary: You wanted to include a voice-to-text in one of your apps and wrote that you had found a few other solutions. Were the solutions you found similar to the Google and Amazon services? Since voice to text transcription requires ongoing payments, I assume you would either need to charge your customers for a service you provided in the app, which fed the audio to the transcription service. OR .. a customer purchasing your software sets up the software to use their own Google / Amazon / OtherVendor transcription service, and Transcription vendor bills them directly.

                        Comment


                        • #13
                          My uncle told me a few months ago, around December 2017, that he rarely types on his Samsung phone but uses voice to text almost exclusively.
                          p purvis

                          Comment


                          • #14
                            If you want to roll your own from SAPI and you don't build a special grammar then all you have to do is filter the returned speech string.

                            Example command to be detected:
                            "watch this"

                            SAPI can interpret this several ways.

                            "watching this"
                            "watched this"
                            "watches this"
                            "watching is"
                            etc.
                            Yes, it is a pain but doable.

                            Comment


                            • #15
                              Eric, I came across a video in youtube that you might be interested in watching.
                              I have never forgot about your needs.
                              This video got me to thinking about a few things somethings like this could be used.

                              I found the creator's webpage which is here and the youtube video is on the webpage too
                              https://www.sobolsoft.com/convertmp3text/

                              and just for fyi
                              https://download.cnet.com/MP3-Speech...-77756486.html

                              also i came across a webpage that allowed 50 hours of free useage but i lost it.

                              p purvis

                              Comment


                              • #16
                                Thanks Paul, I will check those out.
                                "Not my circus, not my monkeys."

                                Comment

                                Working...
                                X