Announcement

Collapse
No announcement yet.

Retrieve a web page

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Hey Dan,

    This seems to work for me ...

    Code:
    DeleteURLCacheEntry(URLPath)

    Comment


    • #22
      @Gary, any chance you could create one of your super minimal working examples, for the COM challenged peeps?

      Comment


      • #23
        There's no COM involved at the base level. All the parameters you don't know can just be set to 0/NULL so

        Code:
        DeleteURLCacheEntry(URLPath)
        URLDownloadToFile(0, URLPath, FileName, 0, 0)
        Might need some ByVals in there. Flitting between languages I can never remember which params need them and which don't with which versions of the headers

        There are other URL* functions (URLOpenBlockingStream, etc) if you don't actually need a file (they do still cache the data), but those involve IStreams which is technically COM

        Comment


        • #24
          James,
          As you requested ...
          Code:
          'Compilable Example:
          #Compile Exe
          %Unicode = 1
          #Include "Win32API.inc"
          #Include "WinINET.inc"
          Function PBMain() As Long
             Local URLPath, LocalPath As WStringZ * %Max_Path
             URLPath = "http://www.garybeene.com/index.htm"
             LocalPath = Exe.Path$ + "gb.htm"
             DeleteURLCacheEntry(URLPath)
             URLDownloadToFile(Nothing, URLPath, LocalPath, 0, Nothing)
          End Function
          I tend to use %Unicode=1 in all my apps. If you don't want that, use StringZ instead of WStringZ

          Comment


          • #25
            Thanks Gary, thats really small.I would have "liked" you but I get an erorr whwn I do that.

            Comment


            • #26
              Thanks folks! I have learned a lot from the question I asked.

              Steve H. Thanks for the suggestion about the "slobs way" I too, like Kerry F, would like to see a little more explanation.

              Gary B. I am looking for just text. As you say, the amount of HTML code is daunting! Not generally looking for images, but I do plan to search out your code to capture an image. Thanks.

              Putting a URL into NOTEPAD was a new idea to me. Works well, but again, there is a bunch of HTML code I don't need. I was hoping for a way to capture just text - say pages from a book - and then assemble them offline for my own use in the way I would like to see them. My old - non programming way has been CTRL-A, CTRL-C, and then CTRL-V into a text processor. The cleanup isn't as bad as downloading all the HTML code. Will think about whether there is a way to "easily" strip HTML code.

              I guess the PBCC command, TCP will work OK on websites that are HTTP, but not so well on HTTPS sites.

              I plan to keep thinking about this as a programming solution would be nice. In the meantime I am fooling around with the public data from the State of Michigan Voter Registration database. A typical database contains about 7.3 million entries with 38 fields per individual with some of those fields serving as keys to other databases. Why do this? Just for the exercise of working with a large database, random access files, and maybe indexing sometime in the future. What is interesting is that the data is available to everyone, but the data isn't "friendly" for the average person who works only with spreadsheets, etc. So, you can have the data (FOI) , but in most cases you can't do anything with it. The State of Ohio has their database online and updates it weekly.

              Tim

              Comment


              • #27
                Second time in about a week somebody saying PB TCP doesn't do HTTPS. Well it does not do HTTP either!

                TCP is OSI Level 4. HTTP and HTTPS are Level 7.When "HTTP" is used in a TCP OPEN line it is simply setting the port to "80". If port "443" is in the TCP OPEN line the service type will be HTTPS.

                The "problem" is that HTTP requires very little else by the programmer. HTTPS requires certificate exchange code and encryption/decryption code. What PB's TCP statements do is the same either way, there is just a lot more for you to do above that to have a secure session.

                The earier way would be API procedures (I don't know which), or 3rd party procedures (I also don't know which) in a DLL to do HTTPS. There is a possibility the API or 3rd party library would contain replacements for the PB TCP statements. So in that sense,yes, PB TCB will see less use. Of course there are still FTP, SMTP, etc, etc, and custom protocols you may come up with.

                PB is not a certificate authority, so they (past or current owners) would not be providing certificates to you and your customers anyway!
                Dale

                Comment


                • #28
                  but the data isn't "friendly" for the average person who works only with spreadsheets, etc. So, you can have the data (FOI) , but in most cases you can't do anything with it.
                  ????

                  I'll bet it could easily be made "friendly" for any "user" by an experienced "programmer." For that matter, it may already be in a format which IS friendly, as long as you use the correct software to access it.

                  Format not shown.

                  MCM

                  Michael Mattias
                  Tal Systems Inc.
                  Racine WI USA
                  mmattias@talsystems.com
                  http://www.talsystems.com

                  Comment


                  • #29
                    Originally posted by Michael Mattias View Post

                    ????

                    I'll bet it could easily be made "friendly" for any "user" by an experienced "programmer." For that matter, it may already be in a format which IS friendly, as long as you use the correct software to access it.

                    Format not shown.

                    MCM
                    Yep, I'm sure it is. It took me less than a minute to find this page: http://michiganvoters.info/download.html with a web search.

                    It has links to download the complete list and a small PDF which gives the data structure of the file:
                    Qualified Voter File
                    Freedom of Information Format
                    Field Description Start
                    Position
                    Length Type Content/Format
                    1 last name 1 35 a alpha - hyphen allowed
                    2 first name 36 20 a alpha only (no spaces)
                    3 middle name 56 20 a alpha only (no spaces)
                    4 name suffix 76 3 a JR, SR, I, II, III, IV or V
                    5 birthyear 79 4 a YYYY
                    6 gender 83 1 a M or F
                    7 date of registration 84 8 a MMDDYYYY
                    8 house number character 92 1 a alpha prefix to house num
                    9 residence street number 93 7 a actual street number
                    10 house suffix 100 4 n typically contains ½
                    11 pre-direction 104 2 a N, S, NE, etc.
                    12 street name 106 30 a
                    13 street type 136 6 a RD, AVE, ST, etc.
                    14 suffix direction 142 2 a N, S, NE, etc.
                    15 residence extension 144 13 a LOT #, APT #, etc.
                    16 city 157 35 a
                    17 state 192 2 a
                    18 zip 194 5 a
                    19 mail address 1 199 50 a if different than residence
                    20 mail address 2 249 50 a
                    21 mail address 3 299 50 a
                    22 mail address 4 349 50 a
                    23 mail address 5 399 50 a
                    Field Description Start
                    Position
                    Length Type Content/Format
                    24 voter id 449 13 n unique sequence number
                    25 county code 462 2 a 1-83, see countycd.lst
                    26 jurisdiction 464 5 a see jurisdcd.lst
                    27 ward precinct 469 6 a
                    28 school code 475 5 a see schoolcd.lst
                    29 state house 480 5 a
                    30 state senate 485 5 a
                    31 US Congress 490 5 a
                    32 county commissioner 495 5 a
                    33 village code 500 5 a see villagcd.lst
                    34 village precinct 505 6 a
                    35 school precinct 511 6 a
                    36 permanent absentee ind 517 1 a Y = yes N = no
                    37 status type 518 2 a A=active V=verify
                    C=cancelled R=rejected
                    CH=challenged
                    38 UOCAVA Status 520 1 a M=Military
                    C=Civilian Overseas
                    N=Non UOCAVA
                    O=Other/Legacy Overseas



                    --
                    [URL="http://www.camcopng.com"]CAMCo - Applications Development & ICT Consultancy[/URL][URL="http://www.hostingpng.com"]
                    PNG Domain Hosting[/URL]

                    Comment


                    • #30
                      Yes, the information is there for someone who understands it all. I was thinking about the average, oh let's say, news reporter who would be lost without some help. You guys just forget how special you are!!

                      Tim

                      Comment


                      • #31
                        Hey Tim,

                        You should also check out Jose's work on using an embedded browser. the "INNERTEXT" might be of interest to you. Take a look at this example for puling the text out of a web page. Requires Jose's includes ...

                        Code:
                        'Compilable Example:
                        #Compile Exe
                        #Dim All
                        %Unicode = 1
                        %UseWebBrowser = 1            '
                        #Include Once "CWindow.inc"   'Jose Roca includes
                        %IDC_WebBrowser  = 1001
                        %IDC_GetText     = 1002
                        
                        Global hDlg, hBrowser As Dword
                        Function PBMain
                           Local bstrURL As WString, pWindow As IWindow
                           Dialog New Pixels, 0, "WebBrowser Get Selected Text", , , 600, 400, %WS_OverlappedWindow To hDlg
                           Control Add Button, hDlg, %IDC_GetText, "Get Selected Text", 10,10,140,20
                           pWindow = Class "CWindow"
                           bstrURL = "http://www.powerbasic.com"
                           pWindow.AddWebBrowserControl(hDlg, %IDC_WEBBROWSER, bstrURL, Nothing, 0, 40, 600,350)
                           Dialog Show Modal hDlg, Call DlgProc
                        End Function
                        
                        CallBack Function DlgProc() As Long
                           Local pIWebBrowser2 As IWebBrowser2
                           Local pIHTMLDocument2 As IHTMLDocument2
                           Local pElement As IHTMLElement
                           Local hBrowser As Dword
                           Local temp$$
                        
                           Select Case CbMsg
                              Case %WM_Command
                                 Select Case Cb.Ctl
                                    Case %IDC_GetText
                                       hBrowser = GetDlgItem(hDlg,%IDC_WebBrowser)
                                       pIWebBrowser2 = OC_GetDispatch(hBrowser)
                                       pIHTMLDocument2 = pIWebBrowser2.Document
                                       pElement = pIHTMLDocument2.body
                                       temp$$ = pElement.innerText
                                       ? temp$$
                                 End Select
                        
                              Case %WM_Size
                                 If Cb.WParam <> %Size_Minimized Then
                                    Local w,h As Long
                                    Dialog Get Client Cb.Hndl To w,h
                                    Control Set Size Cb.Hndl, %IDC_WebBrowser, w, h-40
                                 End If
                           End Select
                        End Function

                        Comment


                        • #32
                          Am I the only person having their posts deleted? Are only post with over 1000 lines of code permitted?

                          Comment


                          • #33
                            Tim,
                            Try COM using Jose Roca methods
                            As you have learned https can't be done using webget.
                            https://forum.powerbasic.com/forum/j...p-com-examples

                            Comment


                            • #34
                              Originally posted by James McNab View Post
                              @Gary, any chance you could create one of your super minimal working examples, for the COM challenged peeps?
                              Here's a way to retrieve web data (that includes the content of a website) using the Microsoft MSXML library: https://forum.powerbasic.com/forum/u...501#post582501

                              As I stated over there, don't let the "XML" in the library's name fool you and make you think it only deals with XML files. The (from a programmer's perspective) nice thing is: it deals easily with HTTPS traffic (you just pass the https-URL and that's it), using a proxy server or user authentication. It can be used either synchronous or asynchronous.

                              Comment

                              Working...
                              X