No announcement yet.

Multiple UDT's in BINARY files with PB 3.5?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple UDT's in BINARY files with PB 3.5?

    For the first time I need to use record sizes for a PB 3.5 Binary
    file which are larger than the 16374 byte UDT maximum size. For
    actual use for what is in the file, I only need a close size just
    under 4096 bytes of information per record. But to populate what
    is needed in this 4096 bytes will require close to 1000 different
    variables in about 25K more of junk in each record. The 4096 bytes
    in the critical area, plus two more UDT's each less than the 16384
    byte max size for a UDT mean three UDT's are needed for this task.

    To minimize the load on most of the programs which use the file, I
    contemplate only even addressing the UDT for the 4096 byte block.
    I always try to choose block boundaries like this which are set up
    so that network operations are optimized as much as possible along
    the way the Novelle, IBM's OS/2 caching and so on are buffer set
    in focus. Faster throughput and less thrashing that way. Only
    even defining that 4096 byte UDT for use saves close to 950
    variables out of the code not needed there.

    Now .. If I open the file for BINARY use, I willl know exactly where
    the boundaries are for the entire record, as well as for the three
    UDT's. Can't I just do a SEEK to place the pointer for each file
    read for that needed 4096 bytes and just go get it?

    Come write time, from the master utility which uses the bigger UDT's
    as a dedicated program to manipulate the whole file, can't I just do
    a similar SEEK to know position and write that 4096 bytes? The same
    as to read and write for the larger UDT's?

    Alternatively, can I SEEK to the starting point of a given record,
    then simply PUT the 4096 byte UDT and then, in sequence, each of
    the other two larger UDT's? Will PB 3.5 know to keep on just moving
    up in the file, even though there was no pointer hard defined for
    them in the gross write for the whole record?

    What will the general effects of this be if I move toward PBCC at
    a later date .. or far more likely, to PB for LINUX when it arrives?

    The technique looks appealing as I think I can pretty easily match
    the UDT in STRUCT fashion for C++ if all else fails. Obviously with
    appropriate thoughts about the internal members being stored in some
    form of compatible mode to the common file that would be needed.

    Inquiring mind wants to know .. Thanks!

    Mike Luther
    [email protected]
    Mike Luther
    [email protected]

  • #2
    The concepts sounds quite fine.. I don't see any problems there at all. Ditto for moving to PB/CC... the same techniques can be used, except that PB/CC supports UDT's up to 16Mb, so I guess you'll be able to reduce your structure count and simplify your I/O code too.

    I can't comment on what it will be like with PB for Linux, naturally.

    PowerBASIC Support
    mailto:[email protected][email protected]</A>
    mailto:[email protected]


    • #3
      Approach seems fine to me; I'd probably write it this way to make it easy to maintain and/or migrate...

      ' Abstraction of type
      'TYPE TypeTooBig
      '  A AS CRITICAL_STUFF   ' the 4096 byte UDT which is critical
      '  B AS OTHER_1          ' the other two UDTs which make up 
      '  C AS OTHER_2          ' the 'true' record
      ' END TYPE
       hFile = FREEFILE
       OPEN "Thefile" FOR BINARY AS hFile
       Recno = GetRecordNo()  ' << which record number of "typeTooBig" do you want?
       CALL FillUdts(hFile, RecNo, A, B, C)  ' get the record
       CALL ModifyData (A,B,C)   ' << whatever you do with it
       CALL SaveUdts(hFile, RecNo, A, B, C)
          RecLen = SIZEOF(A) + SIZEOF(B) + SIZEOF(C)
          SEEK hFile, (RecNo -1)* RecLen
          GET hFile,,A
          GET hFile,,B
          Get hFile,,C
       END SUB
       SUB SaveUdts (hFIle AS INTEGER, RecNo AS Integer, A AS CTITICAL_STUFF, B AS OTHER_1, C AS OTHER_2) 
          RecLen = SIZEOF(A) + SIZEOF(B) + SIZEOF(C)
          SEEK hFile, (RecNo -1)* RecLen
          PUT hFile,,A
          PUT hFile,,B
          PUT hFile,,C
       END SUB

      You could also use pointers to read the whole "too big" amount from disk in one gulp and move the actual data piece by piece

      UNION PointerUnion
       pB  AS OTHER_1 PTR
       pC  AS OTHER_2 PTR
      END TYPE
          DIM Buff AS STRING * (Size of TypeTooBig which you will have to calculate yourself)
          DIM pU AS PointerUnion
          RecLen = SIZEOF(A) + SIZEOF(B) + SIZEOF(C)
          SEEK hFile, (RecNo -1)* RecLen
          GET hFile,,BUff
          pU.pA = VARPTR(Buff)  ' point at buffer, which is where "A" is..
          A = @PU.pA            ' fill A
          INCR            ' advance pointer SIZEOF(A) bytes to start of B
          B = @pu.pb            ' fill B
          INCR PU.pb            ' advance pointer SIZEOF(B) bytes to start of C
          C = @pu.PC            ' Fill C
      END SUB
      No reason you could not adapt this for RANDOM access if that seems more natural to you..

      Michael Mattias
      Tal Systems Inc. (retired)
      Racine WI USA
      [email protected]


      • #4
        You might want to use VARPTR32() instead of just VARPTR() in the latter code above...

        PowerBASIC Support
        mailto:[email protected][email protected]</A>
        mailto:[email protected]


        • #5
          VARPTR32.. yeah, that's the ticket...

          I guess I don't do enough MS-DOS any more...

          Michael Mattias
          Tal Systems Inc. (retired)
          Racine WI USA
          [email protected]


          • #6
            Oh! Thank you both Lance and Mike for the advice!

            I coded Mike's suggestions up to see what it all looked like at
            compile time and noticed a couple of things!

            They have to do with the second technique which illustrates the use
            of pointers. I'm wondering if the speed gain in the use of pointers
            in this scenario would be on the same order as that which uses them
            for MID$ replacement that was suggested in the by-gone ARC4 encrytption
            work that was posted as a thread involving Scott Turchin's work.

            The VARPTR32 deal was interesting. I'm having trouble with the suggested
            improvement with Scott's posting on that in that the pointer method
            there doesn't work here, but that's another thread and issue.

            The PB compiler does not allow, it seems, the use of UNION with END TYPE
            as I tried this. Instead, mine will compile with,

            UNION PointerUnion
             pA  AS Pw ' CRITICAL_STUFF PTR
             pB  AS Pwa ' OTHER_1 PTR
             pC  AS Pwb ' OTHER_2 PTR
            END UNION
            that with the obvious substitution of the real information.

            On beyond that I hit another curious issue! My needed data gambit
            is to big for this pair of britches! If I try the needed,

            DIM Buff AS STRING * 35520 ' (Size of TypeTooBig)
            I get the error:

            "Error 428: Positive integer constant expected"

            Experimentation produces the maximum size of a number which can be
            used as a string length to be the familiar 32767 we all know an love
            as the top positive integer for a two byte short signed integer.

            It would appear that the largest block of string data that could be
            used with this suggestion would thus be the value of 32767, in that
            this is the longest string that PB 3.5 can handle?

            That noted, would you think that push come to shovel, trying to go
            to the issue of pointer shuffle of the struct data would be worth
            while here?

            I say that for another curious reason.

            All this is bound up in a move to begin working with massive amounts
            of fingerprint and facial image or IR thermogram match data work!
            It is related to the manipulation of approximately 25 different
            anticipated certification keys and trusted certificate data for them.
            Actually, the pure key matrix from all the sites is in a yet still
            far smaller than the 4096 Byte 'preamble' in that file. Considering
            a partial quote to my request for key length string sizes for
            this work in the other thread, I'll post just the comment here.

            * Face Image - Fingerprint
            OSS: Windows and others - Full TCP/IP and API developers interface.

            How does Facial Recognition work

            The technology uses a highly sophisticated algorithm called LFA
            (Local Feature Analysis) to identify and derive a representation in
            terms of the spatial relationships between irreducible local features,
            or nodal points on the face. The algorithm allows for automatic
            detection of certain landmarks on the human face (such as eyes, nose,
            brow etc.) and defines identity based on the spatial relationship
            between each of these landmarks.

            It analyses predominantly in the area from the a persons temples to
            the top of the lip. This is commonly known as  The Golden Triangle.
            Once it has performed the mathematical calculations, the facial
            recognition engine converts your face into a template. This template
            can be as small as 88 Bytes, and can be written to a smart card or
            transponder. Due to the small template size, faces can be searched
            and compared at the speed of up to 1 Million faces per second!
            It seems obvious to me that whatever technique they are using to munch
            all the key searches would almost have to be a pointer to memory technique.
            In coding work to inteface with things like this, maybe the extra work
            to go to the pointer deal would be worth it.

            That said, how do we get around the 32767 limit?

            Mike Luther
            [email protected]
            Mike Luther
            [email protected]


            • #7
              Ok, so I did "END TYPE" instead of END UNION...

              That said, I'm not sure how to get around that 32767 limitation. (I forgot integer-size limitations are endemic to MS-DOS programs).

              You may need to use "three reads" approach if you can't get more than 32767 bytes in one shot from a GET.

              This might cut it to two (2)reads..

              REDIM Buffer(1) AS STRING * (half of UDT size)  ' 2 -element array
              GET hFile,,Buffer(0)  ' get first half of "big udt"
              GET hFile,,Buffer(1)  ' get second half and store it adjacent to first half

     = VARPTR(Buffer(0))
               A = @u.Pa
               INCR u.Pa
              No promises, especially when I do this stuff 'on the fly'

              (I sure do not remember that GET was limited to 32K, but I'll take your word for it.)


              Michael Mattias
              Tal Systems Inc. (retired)
              Racine WI USA
              [email protected]


              • #8
                Thanks again Mike!

                I hope you didn't think I was being critical, I wasn't intending
                that at all.

                I really appreciate your and Lance's time and effort here, as well
                as anything else that comes out of this.

                I'm not sure that GET is limited,

                (I sure do not remember that GET was limited to 32K, but I'll take your word for it.)
                as such? Only, it would seem so by default! Get, as a RANDOM file issue
                is going after a record by number, defined one way or another I'd suppose!
                I've never tried to define a record whose length was ever even close to
                that 32767 bytes before now. I have no idea what would happen if you
                even tried to OPEN for RANDOM with a length beyond 32767 at all! In that
                way, and by using multiple strings in the FIELD statement, you might be
                able to exceed that size for all I know. But, so seems, you can't do it
                in the use of a pure single string defintion.

                Looking at the bigger problem of pattern matching coyly here. Even 32767
                bytes of 'grab me' at a time is only some 370+ patterns of 88 bytes to look
                at from a brute force viewpoint. Thus, one wonders if the finished product
                of the Snoopie's (sic!) art, is really a binary list that delivers a
                million 88 byte pattern match filtering in a minute?

                    \   /
                     @ @

                Mike Luther
                [email protected]
                Mike Luther
                [email protected]