No announcement yet.

gbCapture - Discussion

  • Filter
  • Time
  • Show
Clear All
new posts

  • gbCapture - Discussion

    This thread is for discussion of gbCapture, which I just posted in the Source Code forum. gbCapture can capture multiple pages of text from applications which do not offer a built-in way to otherwise capture the entire document text.

    The main gbCapture window should be placed in a corner of the desktop, with the application containing the text to be captured placed so that it overlaps the center of the desktop and has nothing covering the text to be captured. gbCapture will capture an image of the application, extract the text and then send a Next Page keystroke command to the center application. This can be repeated as many times are the user requests, or until the end of the book is reached. The extracted text from each image is combined into a single file.

    Click image for larger version  Name:	gbcapture75.jpg Views:	1 Size:	17.4 KB ID:	820141

    Good results depend on having a clear margin around the text to be captured. Some apps which allow smooth scrolling, such as NotePad, may display a partial line of text, which gbCapture cannot capture accurately. I was successful with using Word, which can display each page separately with very good margins.

    Word does something else helpful. When scrolled to the last page, it shows the remaining text followed by blank lines. Some other document viewers try keep the last page filled by showing text from previous pages. That will cause gbCapture to duplicate some of the document content.

    Comments and suggestions are welcome!

  • #2

    1. I include a "PageInterval" to allow adjusting the speed of the capture/extract/nextpage loop. Default is 1s. You PC speed may affect which interval gives the fastest, but accurate, results.

    2. Tesseract has various adjustments. The only one I included is to toggle between detecting single-column and double-column text.

    3. My code for finding the handle to the window containing the text doesn't seem to work for all document viewers. So I wrote 2 methods for capturing the text. One that works with the desktop DC and another that works with the application text window DC. You may have to switch methods depending on the document viewer you use. I'll look at that some more.

    4. The SaveAs code calls up the Windows version of the save as dialog. I want to make one that provides the same larger/bold that I use elsewhere in gbCapture.

    5. I don't really like the popup window I used to enter the number of pages to capture. I'll probably try to place a textbox over the toolbar to make it easier to access and to see. The toolbar is already wider than I want so I'll have to think about it

    6. I want to add dropdown menus to some of the toolbar buttons and add some context menus - both using a larger/bold font. The link Pierre mentioned seems to have code for that. I'll look at it as time permits.


    • #3
      Not that gbCapture got any response here to speak of, , but I do think that there would be a big interest in it outside the forums. It offers an alternative to the free Calibre ebook converter software and does not require breaking DRM on the eBook, as does Calibre. Its downside is that it can take much longer to convert an ebook than does Calibre. According to Google there are over 7M sites which reference Calibre.

      My question is which site I should consider for attempting to publish an article on gbCapture?

      CodeProject was my first thought but then stackoverflow came to mind as well. There might be others as well.


      • #4
        I posted it on CodeProject and it has about 5K views.

        But no posted comments and no emails asking about it.

        but I do think that there would be a big interest in it outside the forums
        So either I was wrong or CodeProject readers are not the right audience. Bummer that!
        I haven't given up yet ...