Announcement

Collapse
No announcement yet.

Pulling A Webpage of Information (via. https)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pulling A Webpage of Information (via. https)

    I have been a limited user of the PBCC (v6.03.0102) product for many years and have used it for a few simple projects with great success.

    I am looking to grab a web page (https) several times per hour and automatically write the results to an output log. Prior to writing out the results, I would parse out certain fields that I need.

    The device that posts power (voltage and current) generated/consumed to a webpage and would like to pull that information from the page.

    Are there any good examples or information available on the basics of grabbing a page?

    I did see mention of using and calling "webget" or "wget", but did not understand that process.

    Thoughts or ideas would be appreciated.

    Thanks. John

  • #2
    his should work with PCC6
    Using Jose Rocas includes.
    https://forum.powerbasic.com/forum/u...445#post811445

    Comment


    • #3
      Posted two different methods in the last couple of days:

      https://forum.powerbasic.com/forum/u...446#post811446

      https://forum.powerbasic.com/forum/u...455#post811455

      Comment


      • #4
        John, I'm a little late to the game here. The above comments might help you out, but for me, I avoid re-inventing the wheel. Some people essentially want to write a primitive web client just to download a webpage. For me, I use cURL. It's flawless (flawless "enough").

        cURL is a command-line program meant to interact with web servers, free, open source I think, widely tested. You can easily Google to find it. Some developers in Sweden maintain the code. I'm using cURL version 7.62.0.

        Code:
        LOCAL OutputFile AS STRING
        LOCAL temp AS STRING
        OutputFile = "WebOutputFile.txt"
        IF ISFILE(OutputFile) THEN KILL OutputFile
        temp = DEC$(TIMER * 20)
        SHELL BUILD$("curl.exe -s http://black.blue/LongDates.txt?r=", temp, " -o ", OutputFile), 0
        Some explanation: temp is a random-ish string. Why? Although this webserver will ignore the variable 'r' in the URL, the webserver is forced to re-load (refresh) the page when 'r' changes. That way, you're sure to not get a cached copy.

        To make it fault-tolerant, check to see the OutputFile exists. Perhaps your internet went down, or the webserver is down.

        EDIT: my example was using HTTP. To use HTTPS, you need a certificate, but cURL has an option (I think it's -k) that basically says, "forget the certificate; I trust this webserver and accept the risk of using HTTPS without the proper verification".
        Last edited by Christopher Becker; 13 Oct 2021, 07:14 PM. Reason: added the -k option in cURL
        Christopher P. Becker
        signal engineer in the defense industry
        Abu Dhabi, United Arab Emirates

        Comment

        Working...
        X