Announcement

Collapse
No announcement yet.

Multiple UDT's in BINARY files with PB 3.5?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mike Luther
    replied
    Thanks again Mike!

    I hope you didn't think I was being critical, I wasn't intending
    that at all.

    I really appreciate your and Lance's time and effort here, as well
    as anything else that comes out of this.

    I'm not sure that GET is limited,

    (I sure do not remember that GET was limited to 32K, but I'll take your word for it.)
    as such? Only, it would seem so by default! Get, as a RANDOM file issue
    is going after a record by number, defined one way or another I'd suppose!
    I've never tried to define a record whose length was ever even close to
    that 32767 bytes before now. I have no idea what would happen if you
    even tried to OPEN for RANDOM with a length beyond 32767 at all! In that
    way, and by using multiple strings in the FIELD statement, you might be
    able to exceed that size for all I know. But, so seems, you can't do it
    in the use of a pure single string defintion.

    Looking at the bigger problem of pattern matching coyly here. Even 32767
    bytes of 'grab me' at a time is only some 370+ patterns of 88 bytes to look
    at from a brute force viewpoint. Thus, one wonders if the finished product
    of the Snoopie's (sic!) art, is really a binary list that delivers a
    million 88 byte pattern match filtering in a minute?

    Code:
        \   /
         (!)
         @ @


    ------------------
    Mike Luther
    [email protected]

    Leave a comment:


  • Michael Mattias
    replied
    Ok, so I did "END TYPE" instead of END UNION...

    That said, I'm not sure how to get around that 32767 limitation. (I forgot integer-size limitations are endemic to MS-DOS programs).

    You may need to use "three reads" approach if you can't get more than 32767 bytes in one shot from a GET.

    This might cut it to two (2)reads..

    Code:
    REDIM Buffer(1) AS STRING * (half of UDT size)  ' 2 -element array
    ...
    GET hFile,,Buffer(0)  ' get first half of "big udt"
    GET hFile,,Buffer(1)  ' get second half and store it adjacent to first half
    ...
    ..Then..

    Code:
    u.pa = VARPTR(Buffer(0))
     A = @u.Pa
     INCR u.Pa
     etc...
    No promises, especially when I do this stuff 'on the fly'

    (I sure do not remember that GET was limited to 32K, but I'll take your word for it.)

    MCM

    Leave a comment:


  • Mike Luther
    replied
    Oh! Thank you both Lance and Mike for the advice!

    I coded Mike's suggestions up to see what it all looked like at
    compile time and noticed a couple of things!

    They have to do with the second technique which illustrates the use
    of pointers. I'm wondering if the speed gain in the use of pointers
    in this scenario would be on the same order as that which uses them
    for MID$ replacement that was suggested in the by-gone ARC4 encrytption
    work that was posted as a thread involving Scott Turchin's work.

    The VARPTR32 deal was interesting. I'm having trouble with the suggested
    improvement with Scott's posting on that in that the pointer method
    there doesn't work here, but that's another thread and issue.

    The PB compiler does not allow, it seems, the use of UNION with END TYPE
    as I tried this. Instead, mine will compile with,

    Code:
    UNION PointerUnion
     pA  AS Pw ' CRITICAL_STUFF PTR
     pB  AS Pwa ' OTHER_1 PTR
     pC  AS Pwb ' OTHER_2 PTR
    END UNION
    that with the obvious substitution of the real information.

    On beyond that I hit another curious issue! My needed data gambit
    is to big for this pair of britches! If I try the needed,

    Code:
    DIM Buff AS STRING * 35520 ' (Size of TypeTooBig)
    I get the error:

    "Error 428: Positive integer constant expected"

    Experimentation produces the maximum size of a number which can be
    used as a string length to be the familiar 32767 we all know an love
    as the top positive integer for a two byte short signed integer.

    It would appear that the largest block of string data that could be
    used with this suggestion would thus be the value of 32767, in that
    this is the longest string that PB 3.5 can handle?

    That noted, would you think that push come to shovel, trying to go
    to the issue of pointer shuffle of the struct data would be worth
    while here?

    I say that for another curious reason.

    All this is bound up in a move to begin working with massive amounts
    of fingerprint and facial image or IR thermogram match data work!
    It is related to the manipulation of approximately 25 different
    anticipated certification keys and trusted certificate data for them.
    Actually, the pure key matrix from all the sites is in a yet still
    far smaller than the 4096 Byte 'preamble' in that file. Considering
    a partial quote to my request for key length string sizes for
    this work in the other thread, I'll post just the comment here.

    BioCom
    - http://www.biocom.tv/
    * Face Image - Fingerprint
    OSS: Windows and others - Full TCP/IP and API developers interface.

    How does Facial Recognition work

    The technology uses a highly sophisticated algorithm called LFA
    (Local Feature Analysis) to identify and derive a representation in
    terms of the spatial relationships between irreducible local features,
    or nodal points on the face. The algorithm allows for automatic
    detection of certain landmarks on the human face (such as eyes, nose,
    brow etc.) and defines identity based on the spatial relationship
    between each of these landmarks.

    It analyses predominantly in the area from the a persons temples to
    the top of the lip. This is commonly known as  The Golden Triangle.
    Once it has performed the mathematical calculations, the facial
    recognition engine converts your face into a template. This template
    can be as small as 88 Bytes, and can be written to a smart card or
    transponder. Due to the small template size, faces can be searched
    and compared at the speed of up to 1 Million faces per second!
    It seems obvious to me that whatever technique they are using to munch
    all the key searches would almost have to be a pointer to memory technique.
    In coding work to inteface with things like this, maybe the extra work
    to go to the pointer deal would be worth it.

    That said, how do we get around the 32767 limit?


    ------------------
    Mike Luther
    [email protected]

    Leave a comment:


  • Michael Mattias
    replied
    VARPTR32.. yeah, that's the ticket...

    I guess I don't do enough MS-DOS any more...

    MCM

    Leave a comment:


  • Lance Edmonds
    replied
    You might want to use VARPTR32() instead of just VARPTR() in the latter code above...

    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Michael Mattias
    replied
    Approach seems fine to me; I'd probably write it this way to make it easy to maintain and/or migrate...

    Code:
    ' Abstraction of type
    'TYPE TypeTooBig
    '  A AS CRITICAL_STUFF   ' the 4096 byte UDT which is critical
    '  B AS OTHER_1          ' the other two UDTs which make up 
    '  C AS OTHER_2          ' the 'true' record
    ' END TYPE
    
     DIM A AS CRITICAL_STUFF, B AS OTHER_1, C AS OTHER_2
     DIM RecNo AS INTEGER hFile AS INTEGER
    
     hFile = FREEFILE
     OPEN "Thefile" FOR BINARY AS hFile
     Recno = GetRecordNo()  ' << which record number of "typeTooBig" do you want?
    
     CALL FillUdts(hFile, RecNo, A, B, C)  ' get the record
     CALL ModifyData (A,B,C)   ' << whatever you do with it
     CALL SaveUdts(hFile, RecNo, A, B, C)
    
     SUB FillUDts (hFIle AS INTEGER, RecNo AS Integer, A AS CTITICAL_STUFF, B AS OTHER_1, C AS OTHER_2) 
        RecLen = SIZEOF(A) + SIZEOF(B) + SIZEOF(C)
        SEEK hFile, (RecNo -1)* RecLen
        GET hFile,,A
        GET hFile,,B
        Get hFile,,C
     END SUB
    
     SUB SaveUdts (hFIle AS INTEGER, RecNo AS Integer, A AS CTITICAL_STUFF, B AS OTHER_1, C AS OTHER_2) 
        RecLen = SIZEOF(A) + SIZEOF(B) + SIZEOF(C)
        SEEK hFile, (RecNo -1)* RecLen
        PUT hFile,,A
        PUT hFile,,B
        PUT hFile,,C
     END SUB

    You could also use pointers to read the whole "too big" amount from disk in one gulp and move the actual data piece by piece

    Code:
    UNION PointerUnion
     pA  AS CRITICAL_STUFF PTR
     pB  AS OTHER_1 PTR
     pC  AS OTHER_2 PTR
    END TYPE
    
    SUB FillUDTS (hFile AS INTEGER, RecNo AS Integer, A AS CRITICAL_STUFF, B AS OTHER_1, C AS OTHER_2) 
    
        DIM Buff AS STRING * (Size of TypeTooBig which you will have to calculate yourself)
        DIM pU AS PointerUnion
    
        RecLen = SIZEOF(A) + SIZEOF(B) + SIZEOF(C)
        SEEK hFile, (RecNo -1)* RecLen
        GET hFile,,BUff
        pU.pA = VARPTR(Buff)  ' point at buffer, which is where "A" is..
        A = @PU.pA            ' fill A
        INCR pu.pa            ' advance pointer SIZEOF(A) bytes to start of B
        B = @pu.pb            ' fill B
        INCR PU.pb            ' advance pointer SIZEOF(B) bytes to start of C
        C = @pu.PC            ' Fill C
    END SUB
    No reason you could not adapt this for RANDOM access if that seems more natural to you..


    MCM

    Leave a comment:


  • Lance Edmonds
    replied
    The concepts sounds quite fine.. I don't see any problems there at all. Ditto for moving to PB/CC... the same techniques can be used, except that PB/CC supports UDT's up to 16Mb, so I guess you'll be able to reduce your structure count and simplify your I/O code too.

    I can't comment on what it will be like with PB for Linux, naturally.

    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Mike Luther
    started a topic Multiple UDT's in BINARY files with PB 3.5?

    Multiple UDT's in BINARY files with PB 3.5?

    For the first time I need to use record sizes for a PB 3.5 Binary
    file which are larger than the 16374 byte UDT maximum size. For
    actual use for what is in the file, I only need a close size just
    under 4096 bytes of information per record. But to populate what
    is needed in this 4096 bytes will require close to 1000 different
    variables in about 25K more of junk in each record. The 4096 bytes
    in the critical area, plus two more UDT's each less than the 16384
    byte max size for a UDT mean three UDT's are needed for this task.

    To minimize the load on most of the programs which use the file, I
    contemplate only even addressing the UDT for the 4096 byte block.
    I always try to choose block boundaries like this which are set up
    so that network operations are optimized as much as possible along
    the way the Novelle, IBM's OS/2 caching and so on are buffer set
    in focus. Faster throughput and less thrashing that way. Only
    even defining that 4096 byte UDT for use saves close to 950
    variables out of the code not needed there.

    Now .. If I open the file for BINARY use, I willl know exactly where
    the boundaries are for the entire record, as well as for the three
    UDT's. Can't I just do a SEEK to place the pointer for each file
    read for that needed 4096 bytes and just go get it?

    Come write time, from the master utility which uses the bigger UDT's
    as a dedicated program to manipulate the whole file, can't I just do
    a similar SEEK to know position and write that 4096 bytes? The same
    as to read and write for the larger UDT's?

    Alternatively, can I SEEK to the starting point of a given record,
    then simply PUT the 4096 byte UDT and then, in sequence, each of
    the other two larger UDT's? Will PB 3.5 know to keep on just moving
    up in the file, even though there was no pointer hard defined for
    them in the gross write for the whole record?

    What will the general effects of this be if I move toward PBCC at
    a later date .. or far more likely, to PB for LINUX when it arrives?

    The technique looks appealing as I think I can pretty easily match
    the UDT in STRUCT fashion for C++ if all else fails. Obviously with
    appropriate thoughts about the internal members being stored in some
    form of compatible mode to the common file that would be needed.

    Inquiring mind wants to know .. Thanks!


    ------------------
    Mike Luther
    [email protected]
Working...
X