No announcement yet.

file access modes

  • Filter
  • Time
  • Show
Clear All
new posts

  • file access modes


    I've been out of practice for a year or two and am getting back into PB for DOS.

    I want to set up and store junk to a file that I can retrieve junk from at any file position I choose, ie, #1, 645. I could use a working example of some code so that I could figure out what I'm doing.

    I'm having trouble using a random access file. This is my syntax, which is causing problems:

    junk$ = "whatever"
    bufferwidth$ = "10"
    field #1, width as bufferwidth$
    lset bufferwidth$ = junk$

    open "random.dat" for random as #1
    put$ #1, junk$
    close #1

    Thanks, Ben

  • #2
    Ben --

    No offense, but that code is a mess. I tried to fix it for you but it's not entirely clear what you're trying to do. You need to change a bunch of things, including:

    1) You can't do FIELD #1 until file #1 is open. Move the OPEN statement to the top, followed by FIELD, then do the rest of the stuff.

    2) After a FIELD, you must use LSET or RSET instead of doing an assignment with "=" or the FIELD will become "diconnected" from the file.

    3) WIDTH is a reserved word so you can't use it as a variable name.

    4) You are assigning a string value ("whatever") to a variable called "bufferwidth$" (?) but then you are using PUT$ to put a variable called Junk$ into the file.

    5) If you use FIELD you should use PUT not PUT$.

    If you can describe exactly what you're trying to accomplish we can be of more help. What is the end result that you want?

    Is it something like this...?

    open "random.dat" for random as #1
    field #1, 10 as buffer$
    lset buffer$ = "whatever"
    put #1,1
    close #1
    I'm not sure what you had in mind with the bufferwidth$ = "10" line, so I assume that's supposed to be the width of the records in the file. (?)


    -- Eric

    P.S. Don't let my kidding put you off from asking questions here! You're in the right place if you want to learn or re-learn PB.

    Perfect Sync: Perfect Sync Development Tools
    Email: mailto:[email protected][email protected]</A>

    [This message has been edited by Eric Pearson (edited June 02, 2000).]
    "Not my circus, not my monkeys."


    • #3
      Thanks. I've been away from PowerBASIC for about 5 years and have forgotten a bunch of things! I wasn't really being specific about the code because I knew that the response I'd get would be helpful enough.




      • #4
        With PB, you need to specify the LEN of a file opened RANDOM unless you accept the default of 128. This won't make any "real" difference as long as you always open the file with the same LEN parameter, but it will waste disk space.

        e.,g, OPEN "foo.dat" FOR RANDOM AS #1 LEN=10

        Michael Mattias
        Tal Systems (retired)
        Port Washington WI USA
        [email protected]


        • #5
          FYI, FIELD can be used with variables to produce a user defined record. For example:

          DEFINT A - Z

          I = 10
          DIM FieldSize(1 TO I) , SomeData$(1 TO I), FieldData$(1 TO I)

          .... put some field sizes in FieldSize() and add them up to get
          the total size of the data record + 2

          .... put some data into SomeData$()

          FileHandle = FREEFILE
          L = UBOUND(FieldSize)

          OPEN SomeFile$ FOR RANDOM AS FileHandle LEN = DataLen

          FOR I = 1 TO L

          FIELD FileHandle, FieldSize(I) AS FieldData$

          NEXT I

          Then you can store your data and easily retrieve it without resorting to defining a UDT before hand.

          Walt Decker


          • #6
            This is the file I am creating. I now have two questions.

            1. Can I store the records in the file according to the different lengths of the variables I am using, (i.e., custno$, custname$) or do I have to define the record size in the OPEN statement? Specifying 30 characters for all records (when there are some as small as 1 character) is very wasteful.

            2. Now what is the syntax to open the very same file to retrieve all of the records into variables?

            Thanks, Ben

            OPEN custno$ FOR RANDOM AS #1 LEN = 30
            FIELD #1, 30 as filebuffer$
            LSET filebuffer$ = custno$
            PUT #1, 1
            LSET filebuffer$ = custname$
            PUT #1, 2
            LSET filebuffer$ = custemail$
            PUT #1, 3
            LSET filebuffer$ = custphone$
            PUT #1, 4
            LSET filebuffer$ = custshipto$
            PUT #1, 5
            LSET filebuffer$ = custbillto$
            PUT #1, 6
            LSET filebuffer$ = cctype$
            PUT #1, 7
            LSET filebuffer$ = ccexpdate$
            PUT #1, 8
            LSET filebuffer$ = ccnumber$
            PUT #1, 9
            CLOSE #1



            • #7
              Ben, this is the way I would approach it. (I didn't run this to debug it!)
              First create a User defined TYPE that describes your data. Then fill an instance of the type with your data and put the whole customer array into your file one item at a time.

              TYPE CustomerTYPE
                IDNo    AS LONG
                cName   AS STRING * 20
                Email   AS STRING * 30
                phone   AS STRING * 13 'ie. (804)123-4567
                shipTo  AS STRING * 50
                billTo  AS STRING * 50
                cType   AS STRING * 5
                ExpDate AS STRING * 10 'ie. 12/05/2001
                cNum    AS LONG
              END TYPE
              noCustomers = 100
              DIM customer(1:noCustomers) AS customerTYPE
              ' Get customer information
              fhandle = FREEFILE
              OPEN "customers" FOR RANDOM AS fhandle, LEN = LEN(customerTYPE)
              FOR i = 1 TO noCustomers
                 PUT # fhandle, customer(i)
              NEXT i
              CLOSE fhandle

              [email protected]
              :) IRC :)


              • #8
                I have "discovered" another way of storing and retrieving records (fields) of random length. This technique is as follows:

                Get the length of a field and store it as a char$(). Record that on disk then the data of the record (field). For example:

                open "sometext.dat" for binary as #1
                a$ = "LastName"
                a = len(a$)
                c$ = chr$(a)
                put$ #1,c$ + a$
                close #1

                To retrieve:
                open "sometext.dat" for binary as #1
                get$ #1,1,c$
                c = asc(c$)
                get$ #1,c,a$
                close #1

                This technique has pluses and minuses. On the plus side, you will save a large amount of hard drive real estate since you don't have to "pad" a field to a fixed length.

                On the minus side, to retrieve data, you have to start with the 1st record and proceed down to the record you want to retrieve. This may slow things down a bit. Also, using a one byte field length, you are limited to 255 bytes per record. You can, of course, go to 64K record length if you use a two-byte field length tag.

                As noted previously, speed of data access will be greatly increased if you can copy your data to RAM disk and access it from there.

                There are no atheists in a fox hole or the morning of a math test.
                If my flag offends you, I'll help you pack.


                • #9
                  On the minus side, to retrieve data, you have to start with the 1st record and proceed down to the record you want to retrieve. This may slow things down a bit.
                  By using a separate "index" file to store the byte locations of each record in the main data file, you can greatly speed up the chore of locating a particular record. This is something that PowerTREE can help with, but there are other methods that can be used to implement a simple form of indexing.

                  However, there is a separate problem that can make the "dynamic record size" approach much more complicated... How should it be handled if ...
                  (1) if a record size changes (especially if it grows), or
                  (2) a record needs to be "inserted", or
                  (3) a record needs to be deleted.
                  The amount of code necessary to handle these type of tasks can make simple data file manipulation code *much* more complicated. For example, you could maintain a list of empty "slots", along with a form of linked-list to minimize the need to store the records in any particular sorted order... it starts getting complicated very quickly, and often this additional code slows the final application down significantly.

                  However, using a fixed length record solves many of these problems and makes the chore of data manipulation very simple, but the fixed-length record approach adds to the overall storage size of the entire data file.

                  Therefore, such a choice often comes down to a choice of performance VS data size. The design choice is *yours* and you should choose carefully - changing a database design at a later date is usually much harder than implementing it the right way to start with.

                  PowerBASIC Support
                  mailto:[email protected][email protected]</A>
                  mailto:[email protected]


                  • #10
                    Ben --

                    Addressing the second code sample that you posted (OPEN custno$ FOR RANDOM...)

                    It looks to me like you are confusing records and fields. The file that that code would create would look like this (with < denoting the end of each 30-byte record)

                    123456                       <
                    A Customer, Inc.             <
                    [email protected]        <
                    1 (800) 555-1212             <
                    (and so on)
                    ...and so on. Each invidual piece of data would be in its own record. What I suspect you really want is something like this...

                    12345   A Customer, Inc.             [email protected]   1 (800) 555-1212 <
                    12346   Acme Anvils, Inc.            [email protected]       1 (800) BEEPBEEP <
                    ...where each record is a customer, and each field within a record is a piece of data about that customer. To do that...

                    OPEN custno$ FOR RANDOM AS #1 LEN = 256
                    FIELD #1, 8 as custno$, 32 as custname$, 24 as custemail$, 32 as custhipto$ (and so on)
                    LSET custno$ = "123456"
                    LSET custname$ = "A Customer, Inc."
                    LSET custemail$ = "[email protected]"
                    LSET custphone$ = "1 (800) 555-1212"
                    (and so on)
                    'put all of that data into record 1...
                    PUT #1, 1
                    I made up all of the numbers. As long as the FIELDs add up to the LEN, it should work.

                    Then, on the other side, when you GET a record, all of the variables will be automatically "filled" with the correct data. GET a different record number, and all of the variables will be filled with the data for that customer.

                    A more "modern" approach would be to use a User Defined Type structure instead of using FIELD, as Ian suggested. Each GET would fill the UDT stucture, and you would get the individual pieces of data by using variable names like Customer.IDNo, Customer.cName, and so on (to use Ian's example).

                    -- Eric

                    Perfect Sync: Perfect Sync Development Tools
                    Email: mailto:[email protected][email protected]</A>

                    "Not my circus, not my monkeys."


                    • #11
                      Lance ..

                      Looking at the new AMC-X12 medical record format for all new med
                      record communication in the USA ref governmental interface leaves
                      me wondering about something here. The creature, in the last run
                      at the public comment from last year in the thirty (30) days we
                      were given to comment on the final NPRM, had some 938 dictionary
                      definitions. It had a complete variable length string format to
                      the tranmission "standard" that is expected in, for example the
                      Type 836 request for payment and Type 837 reply from Diety!

                      Worse, it has loops inside of loops in it. That means that for
                      a complete variable roll of 0 (no instances) to XMAX (max instances)
                      you can have iteritive instances of more complete variable length
                      chunks of the records. Some groups of cases will have full sets
                      of these; some have none!

                      The reason for the variable-length operation is for cost effective
                      transmission, so said. However, just because the fact you never
                      too an air ambulance ride for your case, does not mean that the
                      possibility for having been carried in a chopper isn't always
                      present in each case! For payment from 'd guvmint, suh, every
                      possible thing that can be done and charged for is in every record,
                      all the time, obviously ... 99.99 percent Ivory Soap pure wasted
                      space, Charlie!


                      Never mind that the Food and Drug Administration, per their Chair
                      of the Software Standards Committee back in 1994, told me that they
                      insist they will eventually pass on all forms of compression used
                      in medical records. That's a separate tale all by itself!

                      It "appears" that the total length of any given record for a case
                      will likely be just under 16K in size .. virtually all of it empty.
                      It appears that, even in one's wildest dreams, it will never get
                      larger than 32K, surely not larger than 64K for any given record.

                      Thus, at present size, I've got it crosswalked to a UDT for all the
                      fields which is under 16K in size, fixed-element lengths and all for
                      the current status of ZIPLOG here. Obviously, as has been suggested
                      here already in the thread, one way of keeping track of such data
                      is simply to create the needed UDT, stuff it as needed, then put
                      it to the disk, empty holes and all. That's currently the way
                      I'm doing it, as I suspect many would. However, that's not what
                      a good programmer would do, I suspect, *IF* smart enough.

                      The thread here started out on exactly the right "I wonder how to"
                      do this in the most practical use of disk space. The suggestion
                      has already been made to use a head and tail pointer marker to
                      indicate the length of a random SEEK read to get a chunk. From
                      there we proceed into the file, always reading the pointer to
                      the place where the next chunk is needed.

                      Now .. let's expand our horizon a bit. What if we have both a
                      data compression tool *AND* a mapping table for the UDT which was
                      used to create a *VARIABLE LENGTH COMPRESSED CHUNK* of data? In
                      use, we create the data record, based on the applicable dictionary,
                      using the UDT.

                      Then, using a compression function, built into the program code,
                      we do a *LOSSLESS* compression on this UDT. We then store the
                      compressed *VARIABLE LENGTH STRING* on the disk, using a LONG
                      INTEGER for a pointer. Can we, at that point, use a simple
                      NULL CHARACTER .. perish the thought here, in "C/C++" style in
                      the 'disk record', for a curious safety reason? If we wish to
                      create and index file for this file, as was suggested, we can
                      build a separate file as an index if we wish. However for some
                      purposes, corruption and loss being what it is, we might be
                      able to read the ruins and at least establish what can possibly
                      be salvaged from a file, if we can re-index the index, or,
                      for example, to build a new index from files we might want to
                      splice together even at a later point!

                      A dragon may have invaded our dungeon!

                      In essence, what we are doing, is creating a standard sequential
                      file, the hard way. That's what it will look like on the disk.
                      The use of the index makes this become an Indexed Sequential Access
                      Method (ISAM) file, as far as I know. That tool was available
                      to the Microsoft PD7 crew ages ago, although I never used it.

                      The only twist to the technigue here I ponder, is the use of a
                      compression algorithim to keep from wasting disk space, to get
                      a UDT stored in the most compressed fashion possible on the disk,
                      and ...

                      ----------> automatically create the transmission block
                      ----------> in the shortest possible size to move the
                      ----------> data over the IP circuit as well!

                      Moving large groups of these requests and replys around, becomes
                      nothing more than simply sending a flat file, already compressed,
                      then decoding it at the other end. We won't get into what happens
                      to modem traffic on encoding and decoding. The FDA has told me
                      that, in the end, every modem and network, since it used compression
                      inherent in the system, will be licensed as well. I was advised
                      to stay out of the modem and network business .. period! Was
                      told to stick to making sure a human being authorized to make up
                      a record was at each site that was going to have a permanent
                      record, and no push would be allowed without that human letting
                      it happen .. Oh well ..

                      Last thought. Phew! All this done, how do we choose, on the fly,
                      in the real world, a way to dynamically alter the format of the
                      needed UDT at runtime in PowerBasic?

                      As far as I can see, UDT's being fixed at compile time, the only
                      way to dynamically alter a long-term application, is to use an
                      on-the-fly switch to select one or more cannonical convention
                      or crosswalk tools, every time the rules change from Diety!

                      Maybe there is a way to get dynamically changeable UDT's in some
                      future incantation of PowerBasic! That would be NEAT!

                      When you used the word "much" to illustrate how things get far
                      more complicated in the world of disk files, and you also said
                      that people should spend a *LOT* of time studying how they want
                      to store things, you were spot on, to use your lingo.

                      The number of file format changes in a well designed management
                      template really does not change more than a few times as a general
                      rule, even in big projects, if they are well thought out. Thus,
                      as a last step in the choice of creating a major storage design
                      for data, I suspect we also need to maintain a key, in our record,
                      as to *WHAT KIND OF UDT* was used to work with the data and what
                      kind of compression format was used to smunch the data and needs
                      to be used to unsmunch it!

                      Yes, there are utilities that do this, however, the code internal
                      to any such utility used has to be available down to the source
                      code level to eventually pass the FDA for use in medical records.
                      That because any compression and de-compression algorithim, per
                      my information from them, must be licensed to prove that it is
                      lossless. Has anyone here worked out a publicly available utility,
                      the source for which is there so as to submit this with one's
                      PowerBasic source, if needed?

                      Inquiring mind wants to know!

                      Mike Luther
                      [email protected]
                      Mike Luther
                      [email protected]


                      • #12
                        You present a very sticky point. Where disk space is huge compared to the
                        size of the data being saved, a UDT, even with gaping holes in it, is the
                        easiest route to take. Now when you are talking about files that may occupy
                        gigabytes, or larger, wasted space becomes critical. I believe that that is
                        why Lotus (.WK?) files used a "data packet" format. They were binary, with
                        each record containing a code for data type and length, then the data record.
                        This allows for a "sparse array" to be efficiently packaged. The downside is
                        that the entire file had to be read into memory at one "swell foop". So the data
                        size was limited to memory size.
                        One of the ways to store large records in a sequential file is to use a
                        special inter-record delimiter (such as |, or chr$(0), or chr$(9) ). Then the
                        Then only the actual data is stored with the special character separating fields.
                        When the data is read from disk, the record is processed, one field at a time,
                        into an instance of a UDT.
                        Of course, using such a system, you could do a search for a particular field
                        within a record using INSTR to find the "N"th occurance of the inter-record
                        delimiter. Of course, the delimiter idea would prevent the use of anything but string
                        data in the record. Using Lance's suggestion of a separate index(or indices), would
                        allow fast access to a particular record.

                        Just some thoughts,

                        [email protected]
                        :) IRC :)


                        • #13
                          Inquiring mind wants to know!
                          Let me address a couple of your points.

                          First off, a "dynamic UDT" is inherently oxymoronic. UDTs are programmer conveniences for fixed-size data structures, enabling the programmer to gain the peformance benefits of literal offset values and data type conversions without requring the programmers to 'do the math' to calculate data element sizes and offsets, nor to invoke the run-time string engine to handle MID$ and CVL and the like.

                          Second,it is possible to use tables (arrays) to store datatypes and lengths, and then use pointers to accomplish the same thing. Given that you can identify when a new structure applies to the data stream, you simply adjust the "data description array" and go on your merry way. (However, the UNION is a much easier way to do this, if the range of possible data descriptions is finite).

                          Third, as far as approved, lossless compression for use with Uncle HCFA (for health care tyros, HCFA is the US Government Agency which defines all the data standards for health care transactions): How about what's already approved: ANSI X.12 EDI!

                          Currently, I handle the ANSI EDI programming for the USA's largest Medicare part A TPA, have written (with PB/DLL) an ANSI decoder for the 835 remittance advice document, and am a consultant to a service bureau which is gearing up for the full implementation of the HIPAA (lots of ANSI X12). For a price, I can provide assistance to your firm in this arena as well.

                          Right now, too, XML to replace ANSI as a data format is a much-discussed topic; but XML does not compress (quite the contrary!) and the lack of industry-wide DTDs tells me XML as a day-to-day format won't happen until I am ready to go on Medicare myself, in the year 2026.


                          Michael Mattias
                          Racine WI USA
                          [email protected]
                          Michael Mattias
                          Tal Systems (retired)
                          Port Washington WI USA
                          [email protected]


                          • #14
                            Mike M ..

                            Currently, I handle the ANSI EDI programming for the USA's largest Medicare part A TPA, have written (with PB/DLL)
                            an ANSI decoder for the 835 remittance advice document, and am a consultant to a service bureau which is gearing
                            up for the full implementation of the HIPAA (lots of ANSI X12). For a price, I can provide assistance to your firm in this
                            arena as well.
                            I've, I think .. got the 835/837 pair, perhaps crosswalked to my code, with an HCFA1500 segment running now and
                            the UB92 extrapolation of that waiting .. groan. We run a real-time facility management template that also does
                            total on-line near real-time full double-entry accounting, inventory control, scheduling and the case module in
                            simultaneous work. You can pull a full standard income statement and balance sheet on the facility at any minute
                            of the day - or a recap sheet and it will be correct. That's been done through a collection of some 105 major
                            executables and reasonably well thought out common library modules in PB 3.5 for DOS. It now comprised over
                            650,000 lines of PB 3.5 source with about 100,000 yet to go to complete the agreed-upon stage level for this
                            goal for the product. I coded the 835/837 efforts, as well as the HCFA1500 interface in UDT format.

                            My problem is that for the final move the information storage requirements are so huge and the user field is so
                            large that it appears that only DB/2 will be able to handle what we need. I have access to a particularly good
                            Oracle specialist. She took one really close look at what is on the table and gasped. After a few moments of
                            stunned silence, she told me not to even try Oracle, only DB2 would handle where I want to go. The projected
                            first year-end storage and site load is about 10,000 sites and perhaps three terabytes a day worth of I/O with
                            a ramp-up of considerably more than that in a few years hence.

                            More important .. for some darned good reasons .. the only final choice for the operating system platform,
                            appears to be either UNIX (Although LINUX can be used for much of that) or .. OS/2. The M/S Win-xx platform
                            isn't, frankly, available. Thus .. for very real reasons, PB/DLL and that arena isn't even a consideration
                            as to what can be used.

                            It may be that the rumored LINUX version of PB will solve the problem of what to do next, but likely, only the
                            movement of the code to a compiler that can handle both UNIX *AND* OS/2 will be required. As I close these
                            last roughly 100,000 lines we're working on now there is some real soul searching going on, I assure you, Mike.
                            Actually, that phase of the work is really fairly simple. It's the professional management template part of
                            the code we're refining now that is taking the time. Somebody recently observed to me that what I've embarked
                            upon is called an Enterprise Research Project (ERP), however I never thought of it as such until that comment
                            was made.

                            Whomever said you can't do things in BASIC was slightly mis-informed..

                            There is some real soul searching going on among a few more folks than Mikey here about what to do next. It
                            is time to take the creature out from the development cage and dress it up for action. And, again .. for some
                            *VERY* good reasons, full DB/2, IBM oriented operations .. and, sigh, it seems OS/2 is an absolute must do.

                            What to do? I happen to be in love with PowerBASIC, even though it might not look like that on the surface.
                            Without what Bob Zale has offered us and given to us all .. I and a *TON* of others would be just lost waifs
                            in an endless sea of misery..

                            What to do, oh what to do? Whither goest we?

                            Right now, too, XML to replace ANSI as a data format is a much-discussed topic; but XML does not compress (quite
                            the contrary!) and the lack of industry-wide DTDs tells me XML as a day-to-day format won't happen until I am ready
                            to go on Medicare myself, in the year 2026.
                            I, unfortunately, am only about 5 years away from Medicare, but I have even less chance of seeing it than you
                            do .. I reckon ..

                            Mike Luther
                            [email protected]
                            Mike Luther
                            [email protected]