No announcement yet.

File Formats - Extract Text and Images

  • Filter
  • Time
  • Show
Clear All
new posts

  • File Formats - Extract Text and Images

    I've been scanning the web for a DLL that I could use with my PowerBASIC apps - one that would extract text and images from various file formats, such as DOC, DOCX, PDF, ePUB, etc.

    Mostly I've found standalone apps, along with a lot of online conversion web sites.

    Does anyone have a favorite tool they could share? A commercial product is just fine, free is better.

  • #2
    Hi Gary,

    To extract text from PDF, I use for years Xpdf command line tool pdftotext.

    Xpdf is open source.

    More info here:
    Jean-Pierre LEROY


    • #3
      I see xPDFreader is derived from another company which has commercial licenses with DLL support with more features for Windows.
      Last edited by Mike Doty; 16 Dec 2018, 07:08 AM. instead of google


      • #4
        Gary, narrow the file types down.
        Alchemy might be ok for graphic files.
        For documents, you likely to have to go to OpenOffice, Abi, WordPerfect, Msword, Lotus, etc tools and covert to some standard form like Richtext or PDF.
        I pesonally stay away from any new proprietary formats by any company.
        Some many companies have now had to be able to create PDF files. That can be one of the standards you set. You might narrow down your support to 10 file formats in addition to ones your able to comfortably support and just have to work on getting other documents to support your listed ones.
        i personally don't like the idea of converting sensitive documents by an outside web server. I just don't trust them.
        p purvis


        • #5
          Thanks for that lead. Their page also says it can extract embedded images, too. That's a real plus!

          I looked at their pricing and it includes a one run-time license for each computer where the final product will be installed. Bummer, That means I cannot use it in any of my freeware products.

          MS Office documents (docx, .pptx, xlsx), HTML, ePub and XML come to mind. I don't have a locked down list, rather I'm interested generally in file formats that most folks might come in contact with in both personal and business environments. Not every format known to man, but a Top 10 list would seem to be the guideline.


          • #6
            Gary, I just had the time to search Alchemy.
            It was made by Handmade Software and they are not longer in business.
            But it while the search did show up a few other nice tools.
            XnConvert and some other related stuff.
            p purvis