Announcement

Collapse
No announcement yet.

Server Exchange

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Mike Doty View Post
    The whole thing won't work or not using WSTRING?
    The whole thing will have issues at some stage. Almost immediately if it is UTF16 encoded.

    If you knew that the byte string was actually UTF8 encoded text, you would need to convert it to WSTRING using
    myWideString = UTF8ToCHR$(sBuffer).
    If you knew that it was UTF16 encoded text, you would have to use:
    IF nBytes THEN myWideString = PEEK$$(VARPTR(vByteArray(0)), nBytes/2)
    But since there is no indication what ResponseBody actually contains, how would you know which to do?

    You are either going to get spurious charactes such as CHRS(193) i.e. &HC1 or "Á" in UTF8 oro truncated test with the first Null in UTF16.

    Bottom line:
    Parsing ResponseBody for an HTML page is an ugly kludge for a poorly structured page .
    Try plugging http://www.garybeene.com/power/code/gbsnippets0052.htm into https://validator.w3.org/

    Initially it won't validate at all because it thinks it is UTF8 in accordance with the Server Response.
    "Sorry, I am unable to validate this document because on line 1 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.​"

    Change the "encoding" manually to WIndows-1252 or ISO 8859-1
    and note the 362 Warnings and Errors identified on the page

    (If you tell it to validate as HTML2.0, you only get 136 errors )

    Comment


    • I will only use for text files,sql requests and downloading .exe files from my site so hopefully won't have any issues.

      Comment


      • Use the GetResponseHeader method to get the character encoding

        sContent = pWHttp.GetResponseHeader("Content-Type")
        IF INSTR (UCASE$(sContent), "CHARSET=UTF-8") > 0 THEN buffer = UTF8TOCHR$(buffer)

        Comment


        • Originally posted by Rod Macia View Post
          Use the GetResponseHeader method to get the character encoding

          sContent = pWHttp.GetResponseHeader("Content-Type")
          IF INSTR (UCASE$(sContent), "CHARSET=UTF-8") > 0 THEN buffer = UTF8TOCHR$(buffer)
          Nope, that't the whole problem and it's why the ResponseText is throwing an error.
          The Response header says the page is UTF8 encoded, but it ISN'T.
          The server just assumes that as the default encoding for web pages in the absence of a meta Content-Type on the page saying something different.
          The page is probably WIndows-1252 or ISO 8859-1 encoded (based in the &HA0 encoding of the nbsp.)

          Comment


          • Originally posted by Stuart McLachlan View Post

            Nope, that't the whole problem and it's why the ResponseText is throwing an error.
            The Response header says the page is UTF8 encoded, but it ISN'T.
            The server just assumes that as the default encoding for web pages in the absence of a meta Content-Type on the page saying something different.
            The page is probably WIndows-1252 or ISO 8859-1 encoded (based in the &HA0 encoding of the nbsp.)
            I understood your analysis of Gary's page.
            I was referring to the ResponseBody example not ResponseText.
            When it comes to Malformed html pages then we have to do what the browser is doing, take your best guess.

            Comment


            • That being said, UTF-8 accounts for about 97.8% of all the websites whose character encoding we know.
              Source: https://w3techs.com/technologies/details/en-utf8

              Comment


              • Originally posted by Rod Macia View Post

                I was referring to the ResponseBody example not ResponseText.
                And I was pointing out that the original issue with ResponseText was because it did what you suggested as a solution to not knonwing the correct encoding. - use the encoding from the ResponseHeader/
                When it comes to Malformed html pages then we have to do what the browser is doing, take your best guess.
                Which fails when you guess a WIndows-1252 encoded page is UTF8 encoded.

                Comment


                • When downloading an exe file what is internally used (content-length?)
                  I know nothing about this, but wondered why encoding doesn't matter when using urldownloadtofile?

                  Comment


                  • Originally posted by Rod Macia View Post
                    That being said, UTF-8 accounts for about 97.8% of all the websites whose character encoding we know.
                    Source: https://w3techs.com/technologies/details/en-utf8
                    Yep, and Gary's pages don't fall into either the 97.8% or the remaining 2.2% of websites whose character encoding we know.
                    They fall into the unreported number of pages whose character encoding we don't know.


                    Comment


                    • Originally posted by Mike Doty View Post
                      When downloading an exe file what is internally used (content-length?)
                      Yes, content-length is in the headers, as is the content-type.

                      Here's the ResponseHeaders when downloading an exe file
                      Click image for larger version  Name:	exedownload.jpg Views:	0 Size:	38.0 KB ID:	819091
                      or a zip:
                      Click image for larger version  Name:	zipdownload.jpg Views:	0 Size:	35.7 KB ID:	819092

                      I know nothing about this, but wondered why encoding doesn't matter when using urldownloadtofile?
                      Because URLDownloadToFile doesn't try to do anything with the raw bytes. It just saves the byte stream to a file.

                      Comment


                      • Howdy, Guys!

                        Thanks for all the advice! I won't ignore it ... but it's time to crash for the night.

                        Comment


                        • Because URLDownloadToFile doesn't try to do anything with the raw bytes. It just saves the byte stream to a file.
                          So if I save GetHTTPsFromWeb to disk and compare there could be a difference from UrlDownloadToFile?


                          Click image for larger version  Name:	image.png Views:	0 Size:	3.7 KB ID:	819094




                          Here is an example using Gary's file.
                          Code:
                          #COMPILE EXE   'Jose Roca Includes  winhttpget.bas
                          #INCLUDE "win32api.inc"
                          #INCLUDE "wininet.INC"
                          #INCLUDE ONCE "httprequest.inc"
                          'MACRO mFormat(value,length) = RSET$(FORMAT$(value),length USING "0")
                          
                          FUNCTION PBMAIN () AS LONG
                           LOCAL sURL,sBuffer,sError,sUserName,sPassword AS STRING
                           sURL = "http://www.garybeene.com/power/code/gbsnippets0054.htm"
                          
                           sUserName = ""
                           sPassword = ""
                           sBuffer = GetHTTPsfromWEB(sUrl,sError,sUserName,sPassword)
                          
                           LOCAL s AS STRING
                           s = USING$("#, bytes GetHTTPsfromWeb&",LEN(sBuffer),$CR)
                           DeleteURLCacheEntry(sURL + $NUL)
                           URLDownloadToFile NOTHING, sUrl+$NUL, "temp.tmp"+$NUL,0,NOTHING
                           OPEN "temp.tmp" FOR INPUT AS #1
                           s+= USING$("#, bytes Urldownloadtofile",LOF(1)) 'corrected spelling
                           CLOSE #1
                           ? s
                          
                          END FUNCTION
                          
                          FUNCTION GetHTTPsfromWEB _
                                    (sFullURL      AS STRING, _
                                     sError        AS STRING, _
                                     sUserName     AS STRING,_
                                     sPassword     AS STRING) AS STRING
                           RESET sError
                          ' GetHTTPsfromWEB
                          ' Opens an HTTP or HTTPS connection to an HTTP resource
                          ' Usage     GetHTTPsfromWEB (sFullURL)
                          ' Usage     GetHTTPsfromWEB ("https:/www.mydomain.com/whatever.html)
                           LOCAL pWHttp     AS IWinHttpRequest
                           LOCAL sbuffer    AS STRING
                           LOCAL vBody      AS VARIANT
                           LOCAL nBytes     AS LONG
                           LOCAL iSucceeded AS INTEGER
                           DIM   vByteArray(0) AS BYTE
                          
                           ' Creates an instance of the HTTP service
                           pWHttp = NEWCOM "WinHttp.WinHttpRequest.5.1"
                          
                           IF ISNOTHING(pWHttp) THEN EXIT FUNCTION
                           TRY
                            ' Opens an HTTP or HTTPS connection to an HTTP resource
                            pWHttp.Open "GET", sFullURL
                            IF LEN(sUserName)AND LEN(sPassword) THEN
                             '? sUserName + $CR + sPassword
                             pWHttp.SetCredentials sUserName,sPassword,%HTTPREQUEST_SETCREDENTIALS_FOR_SERVER
                            END IF
                            ' Sends an HTTP request to the HTTP server
                            pWHttp.SetRequestHeader "Content-Type","application/x-www-form-urlencoded"
                            pWHttp.Send
                            ' Wait for response with a timeout of 5 seconds
                            iSucceeded = pWHttp.WaitForResponse(5) 'always returns -1
                          
                            vBody = pWHttp.ResponseBody 'Stuart
                            sError = pWHttp.StatusText
                            IF sError <> "OK" THEN EXIT TRY
                            ' Convert the variant to an array of bytes
                            vByteArray() = vBody
                            ' Convert the array of bytes to a string
                            nBytes = UBOUND(vByteArray) - LBOUND(vByteArray) + 1
                            IF nBytes THEN sbuffer = PEEK$(VARPTR(vByteArray(0)), nBytes)
                            CATCH
                             sError = OBJRESULT$ 'parameters invalid ...
                            END TRY
                          FUNCTION = sbuffer
                          END FUNCTION

                          Comment


                          • Originally posted by Mike Doty View Post
                            Isn't that all that is needed if just downloading?
                            UrlDownloadToFile doesn't ask for file type or encoding.
                            I was just curious if the same method could be used if just what the bytes?
                            Yes, but this thread is not about just downloading a file from a server somewhere.

                            Post # 1: "want to send a string to a server app, have the server app look up the string in a text database, then return the string and any associated information"
                            And it subsequently expanded to cover other things that can happen when you send such data to a script such as a PHP file.

                            IOW, It is about sending data to a server somewhere which the server acts on and then
                            a. Getting a response (generated text) or
                            b. Getting a file or
                            c. Not receiving anything back after the server has done something locally (other than preferably getting a confirmation that the data was processed)



                            Comment


                            • I got it.
                              I needed new routine to work with data requiring a user name and password.
                              Getting a string instead of a file was also a big plus to send/receive requests from cgi program(s.)
                              Hopefully Gary may someday need the same. It did uncover the encoding of some of his pages that might cause a problem..

                              Comment


                              • To not cause invalid characters use wide strings for all processing after initial input -
                                Code:
                                sContent = pWHttp.GetResponseHeader("Content-Type")
                                IF INSTR (UCASE$(sContent), "CHARSET=UTF-8") > 0 THEN wbuffer = UTF8TOCHR$(buffer)
                                (note wbuffer destination variable)((didn't bold in code used to work?))
                                The IF will be more complicated (maybe SELECT CASE) to allow for other CHARSETs.

                                Cheers,
                                Dale

                                Comment


                                • Originally posted by Mike Doty View Post
                                  I got it.
                                  I needed new routine to work with data requiring a user name and password.
                                  I've frequently used POST to send authentication parameters to individual PHP scripts rather than setting authentication on the server.
                                  It makes your security much more finely grained.

                                  Here's an example of using a single auth code:
                                  Code:
                                  <?php
                                  $Auth = $_POST['Auth'];
                                  if ($Auth != 'xxxxxxxxxxxxxxxxxxxxxxxxxx'){
                                  die("Not authorised");
                                  }
                                  $SQL = $_POST['SQL'];​
                                  ...
                                  or authenticating individual users.
                                  Code:
                                  ...
                                  if (isset($_POST['ApplID']) and isset($_POST['pword']))
                                  ...
                                      $qsafe1 = $mysqli->real_escape_string($_POST['ApplID']);
                                      $qsafe2 = $mysqli->real_escape_string($_POST['pword']);
                                      if ((strlen($qsafe1) > 0) and (strlen($qsafe2) > 0) )
                                      {
                                          $sql = "select  PW,LName,...
                                  ...
                                  '

                                  and you can set $_SESSION variables after the iniitial login PHP script to proceed to other pages.
                                  Code:
                                  if (!isset($_SESSION['LoginOK'])){
                                  Header('Location:admin2.php');
                                  exit;
                                  }

                                  Comment


                                  • Originally posted by Dale Yarker View Post
                                    To not cause invalid characters use wide strings for all processing after initial input -
                                    Code:
                                    sContent = pWHttp.GetResponseHeader("Content-Type")
                                    IF INSTR (UCASE$(sContent), "CHARSET=UTF-8") > 0 THEN wbuffer = UTF8TOCHR$(buffer)
                                    (note wbuffer destination variable)((didn't bold in code used to work?))
                                    The IF will be more complicated (maybe SELECT CASE) to allow for other CHARSETs.

                                    Cheers,
                                    As has been pointed out a couple of times now, pWHttp.GetResponseHeader("Content-Type... charset=UTF8") won't be correct if the HTML page isn't UTF8 (i.e. it is some from of ANSI text) and it doesn't have a <meta charset="xxxxxxx">​ in the page Head actually saying what it is (Windows-1251 or whatever). In that case, the UTF-8 is just a default/guess by the server. The first single byte character above CHR$(127) on the page will break the attempted decoding.


                                    Comment


                                    • As has been pointed out a couple of times now
                                      But hasn't "stuck" yet, so again won't hurt.
                                      "Content-Type... charset=UTF8") won't be correct if the HTML page isn't UTF8
                                      Note non wide string as parameter of UTF8TOCHR$. And all HTML statements/tags are ASCII; identical to UTF-8 (and other CHARSETs) for characters below 128.

                                      Cheers,
                                      Dale

                                      Comment


                                      • Yes we know that Gary's page is malformed, an server gives wrong character set info. and there is nothing we can do about it other than guess, Save as is, or spend a lot of time analyzing.

                                        But Dale's point is valid that any UTF8TOCHR$(buffer) conversion from web content should go into a WSTRING to preserve as many as the Unicode characters, UTF-8 encoded. For well formed, properly identifying HTML pages.

                                        Unless all you need is to save as is, which should not require any conversion.

                                        Comment


                                        • Originally posted by Dale Yarker View Post
                                          But hasn't "stuck" yet, so again won't hurt.
                                          Note non wide string as parameter of UTF8TOCHR$. And all HTML statements/tags are ASCII; identical to UTF-8 (and other CHARSETs) for characters below 128.
                                          But Gary's multipe CHR$(160)s aren't ASCII - hence the issue with trying to decode them correctly in an assumed UTF8 string. They would need to be either the UTF8 two byte "&HC1A0" or the HTML entity "&nbsp;"



                                          Comment

                                          Working...
                                          X