No announcement yet.

Web Browser Reference Materials

  • Filter
  • Time
  • Show
Clear All
new posts

  • Web Browser Reference Materials

    Not having programmed with a web browser before, your code always makes me feel like there's something I should have read first, so that your code isn't the first time I see a topic.

    Can you recommend any reading materials, especially online, that would be useful in developing a background in programming with IE, such that we would get more benefit from the code you post?
    Last edited by Gary Beene; 22 Aug 2012, 01:04 PM.

  • #2
    Internet Explorer uses Microsoft ActiveX Controls and Active Document interfaces to connect components.

    IExplore.exe is at the top level; it is a small application that is instantiated when Internet Explorer is loaded. This executable application uses Internet Explorer components to perform the navigation, history maintenance, favorites maintenance, HTML parsing and rendering, and so on, while it supplies the toolbar and frame for the stand-alone browser. IExplorer.exe directly hosts the IEFrame.dll (Shdocvw.dll in older versions) component.
    IEFrame.dll in turn hosts the Mshtml.dll component, as well as any other Active Document component (such as a Microsoft Office application) that can be loaded in place in the browser when the user navigates to a specific document type. IEFrame.dll supplies the functionality associated with navigation, in-place linking, favorites and history management, and PICS support. This dynamic-link library (DLL) also exposes interfaces to its host to allow it to be hosted separately as an ActiveX control. The IEFrame.dll component is more frequently referred to as the WebBrowser Control. In-place linking refers to the ability to click a link in the HTML of the loaded document and to load a new HTML document in the same instance of the WebBrowser Control. If only Mshtml.dll is being hosted, a click on the link results in a new instance of the browser.
    Mshtml.dll is the component that performs the HTML parsing and rendering in Internet Explorer 4.0 and later, and it also exposes the HTML document through the Dynamic HTML Object Model. This component hosts the scripting engines, Microsoft virtual machine, ActiveX Controls, plug-ins, and other objects that might be referenced in the loaded HTML document. Mshtml.dll implements the Active Document server interfaces, which allows it to be hosted using standard Component Object Model (COM) interfaces.
    The WebBrowser Control provides a rich set of functionality that a host typically requires, such as that for in-place linking. Therefore, it is much more applicable for most applications to host this control instead of MSHTML for browsing or viewing HTML documents. Hosting MSHTML is recommended only for specialized applications, such as parsing HTML.
    It should also be noted that although hosting MSHTML is slightly more lightweight than hosting the WebBrowser Control, the savings rarely justify the extra work involved in implementing functionality that is already available in the WebBrowser Control. It is very likely that the WebBrowser Control will already be loaded in memory, and navigating to a frameset page will also result in the WebBrowser Control being loaded as part of the standard working set.

    WebBrowser Control - Reference for C/C++ Developers

    The attached file contains a reference guide to all constants, functions, interfaces, etc., adapted to the PowerBASIC syntax.
    Attached Files


    • #3
      With the WebBrowser control and related APIs, you can do all that Internet Explorer can do, but embeded in your PB application. It is a very powerful component. The limit is your imagination and your knowledge.

      What I have done is to write an OLE container, needed to host the WebBrowser Control, and several examples. But my examples only scratch the surface, as I have little kinowledge of MSHTML and javascript. With the new HTML5 technology and javascript you can do wonders.


      • #4
        Hey Jose,
        Thanks so much for your response. I'll follow the link you gave.

        This comment of yours caught my eye ...
        Mshtml.dll is the component that performs the HTML parsing and rendering ...
        Can I use the DLL to get the innertext of an HTML document without loading it into a browser - perhaps saving time and avoiding the timing issue of having to use the DocumentComplete event in DWebBrowserEvents2Impl?

        I've mentioned before that I'm working on a indexing scheme for thousands of HTML files, which is why I'm interested in a batch approach to performing speedy innertext extraction.


        • #5
          Yes, you can.

          ' Create a new document
          DIM pDoc AS IHTMLDocument2
          pDoc = NEWCOM CLSID $CLSID_HTMLDocument
          ' Load the web page into a string
          ' using your prefered method
          DIM bstrDoc AS WSTRING
          ' Write it to the document
          IHTMLDocument2_WriteString(pDoc, bstrDoc)
          ' Close the document to "apply" the code
          ' Use other methods of the IHTMLDocument2 interface


          • #6
            What the heck are you doing up? It's only 4am your time. Lucky me, but not so much for you. Hope you're not experiencing insomnia!


            • #7
              In fact is 5am.


              • #8
                Thanks for the comment. I made the following code and it seems to work very well - much simpler than using a web browser and having to work with the attendant code! Very cool!

                Here's an image of the quick test I ran, using your suggestion. I had 5 HTML test files, each containing just a single line of text. In the code below I read each file and displayed the text from all files in a MsgBox.

                'Compilable Example:
                #Compile Exe
                #Dim All
                %Unicode = 1
                #Include "MSHTML.INC"  '<--- Jose Roca include
                %IDC_GetText    = 1002
                Function PBMain
                   Local hDlg As Dword
                   Dialog New Pixels, 0, "Batch HTML Extract", , , 300, 200, %WS_OverlappedWindow To hDlg
                   Control Add Button, hDlg, %IDC_GetText, "Get Text", 10,10,75,20
                   Dialog Show Modal hDlg, Call DlgProc
                End Function
                CallBack Function DlgProc() As Long
                   Select Case CbMsg
                      Case %WM_Command
                         Select Case Cb.Ctl
                            Case %IDC_GetText
                               Local fName$, HTMLText$, temp$, pDoc As IHTMLDocument2
                               pDoc = NewCom ClsId $CLSID_HTMLDocument
                               fName$ = Dir$("*.htm")
                               While Len(fName$)
                                  fName$ = Exe.Path$ + fName$
                                  Open fName$ For Binary As #1 : Get$ #1, Lof(1), temp$ : Close #1
                                  IHTMLDocument2_WriteString(pDoc, temp$)
                                  HTMLText$ = HTMLText$ + $CrLf + pDoc.body.innerText
                                  fName$ = Dir$(Next)
                               ? HTMLText$
                         End Select
                   End Select
                End Function


                • #9
                  Sorry for reviving an old thread, but is it possible for me to get the MSHTML.INC from somewhere? I have tried going to Jose Roca's website, but in order to download files from his forum you have to be a registered member and registration is down. Any help is much appreciated.


                  • #10
                    The headers are available in this forum.


                    • #11
                      Thanks a lot!