Announcement

Collapse
No announcement yet.

file access modes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mike Luther
    replied
    Mike M ..

    Currently, I handle the ANSI EDI programming for the USA's largest Medicare part A TPA, have written (with PB/DLL)
    an ANSI decoder for the 835 remittance advice document, and am a consultant to a service bureau which is gearing
    up for the full implementation of the HIPAA (lots of ANSI X12). For a price, I can provide assistance to your firm in this
    arena as well.
    I've, I think .. got the 835/837 pair, perhaps crosswalked to my code, with an HCFA1500 segment running now and
    the UB92 extrapolation of that waiting .. groan. We run a real-time facility management template that also does
    total on-line near real-time full double-entry accounting, inventory control, scheduling and the case module in
    simultaneous work. You can pull a full standard income statement and balance sheet on the facility at any minute
    of the day - or a recap sheet and it will be correct. That's been done through a collection of some 105 major
    executables and reasonably well thought out common library modules in PB 3.5 for DOS. It now comprised over
    650,000 lines of PB 3.5 source with about 100,000 yet to go to complete the agreed-upon stage level for this
    goal for the product. I coded the 835/837 efforts, as well as the HCFA1500 interface in UDT format.

    My problem is that for the final move the information storage requirements are so huge and the user field is so
    large that it appears that only DB/2 will be able to handle what we need. I have access to a particularly good
    Oracle specialist. She took one really close look at what is on the table and gasped. After a few moments of
    stunned silence, she told me not to even try Oracle, only DB2 would handle where I want to go. The projected
    first year-end storage and site load is about 10,000 sites and perhaps three terabytes a day worth of I/O with
    a ramp-up of considerably more than that in a few years hence.

    More important .. for some darned good reasons .. the only final choice for the operating system platform,
    appears to be either UNIX (Although LINUX can be used for much of that) or .. OS/2. The M/S Win-xx platform
    isn't, frankly, available. Thus .. for very real reasons, PB/DLL and that arena isn't even a consideration
    as to what can be used.

    It may be that the rumored LINUX version of PB will solve the problem of what to do next, but likely, only the
    movement of the code to a compiler that can handle both UNIX *AND* OS/2 will be required. As I close these
    last roughly 100,000 lines we're working on now there is some real soul searching going on, I assure you, Mike.
    Actually, that phase of the work is really fairly simple. It's the professional management template part of
    the code we're refining now that is taking the time. Somebody recently observed to me that what I've embarked
    upon is called an Enterprise Research Project (ERP), however I never thought of it as such until that comment
    was made.

    Whomever said you can't do things in BASIC was slightly mis-informed..

    There is some real soul searching going on among a few more folks than Mikey here about what to do next. It
    is time to take the creature out from the development cage and dress it up for action. And, again .. for some
    *VERY* good reasons, full DB/2, IBM oriented operations .. and, sigh, it seems OS/2 is an absolute must do.

    What to do? I happen to be in love with PowerBASIC, even though it might not look like that on the surface.
    Without what Bob Zale has offered us and given to us all .. I and a *TON* of others would be just lost waifs
    in an endless sea of misery..

    What to do, oh what to do? Whither goest we?

    Right now, too, XML to replace ANSI as a data format is a much-discussed topic; but XML does not compress (quite
    the contrary!) and the lack of industry-wide DTDs tells me XML as a day-to-day format won't happen until I am ready
    to go on Medicare myself, in the year 2026.
    I, unfortunately, am only about 5 years away from Medicare, but I have even less chance of seeing it than you
    do .. I reckon ..



    ------------------
    Mike Luther
    [email protected]

    Leave a comment:


  • Michael Mattias
    replied
    Inquiring mind wants to know!
    Let me address a couple of your points.

    First off, a "dynamic UDT" is inherently oxymoronic. UDTs are programmer conveniences for fixed-size data structures, enabling the programmer to gain the peformance benefits of literal offset values and data type conversions without requring the programmers to 'do the math' to calculate data element sizes and offsets, nor to invoke the run-time string engine to handle MID$ and CVL and the like.

    Second,it is possible to use tables (arrays) to store datatypes and lengths, and then use pointers to accomplish the same thing. Given that you can identify when a new structure applies to the data stream, you simply adjust the "data description array" and go on your merry way. (However, the UNION is a much easier way to do this, if the range of possible data descriptions is finite).

    Third, as far as approved, lossless compression for use with Uncle HCFA (for health care tyros, HCFA is the US Government Agency which defines all the data standards for health care transactions): How about what's already approved: ANSI X.12 EDI!

    Currently, I handle the ANSI EDI programming for the USA's largest Medicare part A TPA, have written (with PB/DLL) an ANSI decoder for the 835 remittance advice document, and am a consultant to a service bureau which is gearing up for the full implementation of the HIPAA (lots of ANSI X12). For a price, I can provide assistance to your firm in this arena as well.

    Right now, too, XML to replace ANSI as a data format is a much-discussed topic; but XML does not compress (quite the contrary!) and the lack of industry-wide DTDs tells me XML as a day-to-day format won't happen until I am ready to go on Medicare myself, in the year 2026.

    MCM



    ------------------
    Michael Mattias
    Racine WI USA
    [email protected]

    Leave a comment:


  • Ian Cairns
    replied
    Mike,
    You present a very sticky point. Where disk space is huge compared to the
    size of the data being saved, a UDT, even with gaping holes in it, is the
    easiest route to take. Now when you are talking about files that may occupy
    gigabytes, or larger, wasted space becomes critical. I believe that that is
    why Lotus (.WK?) files used a "data packet" format. They were binary, with
    each record containing a code for data type and length, then the data record.
    This allows for a "sparse array" to be efficiently packaged. The downside is
    that the entire file had to be read into memory at one "swell foop". So the data
    size was limited to memory size.
    One of the ways to store large records in a sequential file is to use a
    special inter-record delimiter (such as |, or chr$(0), or chr$(9) ). Then the
    Then only the actual data is stored with the special character separating fields.
    When the data is read from disk, the record is processed, one field at a time,
    into an instance of a UDT.
    Of course, using such a system, you could do a search for a particular field
    within a record using INSTR to find the "N"th occurance of the inter-record
    delimiter. Of course, the delimiter idea would prevent the use of anything but string
    data in the record. Using Lance's suggestion of a separate index(or indices), would
    allow fast access to a particular record.

    Just some thoughts,


    ------------------
    [email protected]

    Leave a comment:


  • Mike Luther
    replied
    Lance ..

    Looking at the new AMC-X12 medical record format for all new med
    record communication in the USA ref governmental interface leaves
    me wondering about something here. The creature, in the last run
    at the public comment from last year in the thirty (30) days we
    were given to comment on the final NPRM, had some 938 dictionary
    definitions. It had a complete variable length string format to
    the tranmission "standard" that is expected in, for example the
    Type 836 request for payment and Type 837 reply from Diety!

    Worse, it has loops inside of loops in it. That means that for
    a complete variable roll of 0 (no instances) to XMAX (max instances)
    you can have iteritive instances of more complete variable length
    chunks of the records. Some groups of cases will have full sets
    of these; some have none!

    The reason for the variable-length operation is for cost effective
    transmission, so said. However, just because the fact you never
    too an air ambulance ride for your case, does not mean that the
    possibility for having been carried in a chopper isn't always
    present in each case! For payment from 'd guvmint, suh, every
    possible thing that can be done and charged for is in every record,
    all the time, obviously ... 99.99 percent Ivory Soap pure wasted
    space, Charlie!

    Sigh..

    Never mind that the Food and Drug Administration, per their Chair
    of the Software Standards Committee back in 1994, told me that they
    insist they will eventually pass on all forms of compression used
    in medical records. That's a separate tale all by itself!

    It "appears" that the total length of any given record for a case
    will likely be just under 16K in size .. virtually all of it empty.
    It appears that, even in one's wildest dreams, it will never get
    larger than 32K, surely not larger than 64K for any given record.

    Thus, at present size, I've got it crosswalked to a UDT for all the
    fields which is under 16K in size, fixed-element lengths and all for
    the current status of ZIPLOG here. Obviously, as has been suggested
    here already in the thread, one way of keeping track of such data
    is simply to create the needed UDT, stuff it as needed, then put
    it to the disk, empty holes and all. That's currently the way
    I'm doing it, as I suspect many would. However, that's not what
    a good programmer would do, I suspect, *IF* smart enough.

    The thread here started out on exactly the right "I wonder how to"
    do this in the most practical use of disk space. The suggestion
    has already been made to use a head and tail pointer marker to
    indicate the length of a random SEEK read to get a chunk. From
    there we proceed into the file, always reading the pointer to
    the place where the next chunk is needed.

    Now .. let's expand our horizon a bit. What if we have both a
    data compression tool *AND* a mapping table for the UDT which was
    used to create a *VARIABLE LENGTH COMPRESSED CHUNK* of data? In
    use, we create the data record, based on the applicable dictionary,
    using the UDT.

    Then, using a compression function, built into the program code,
    we do a *LOSSLESS* compression on this UDT. We then store the
    compressed *VARIABLE LENGTH STRING* on the disk, using a LONG
    INTEGER for a pointer. Can we, at that point, use a simple
    NULL CHARACTER .. perish the thought here, in "C/C++" style in
    the 'disk record', for a curious safety reason? If we wish to
    create and index file for this file, as was suggested, we can
    build a separate file as an index if we wish. However for some
    purposes, corruption and loss being what it is, we might be
    able to read the ruins and at least establish what can possibly
    be salvaged from a file, if we can re-index the index, or,
    for example, to build a new index from files we might want to
    splice together even at a later point!

    A dragon may have invaded our dungeon!

    In essence, what we are doing, is creating a standard sequential
    file, the hard way. That's what it will look like on the disk.
    The use of the index makes this become an Indexed Sequential Access
    Method (ISAM) file, as far as I know. That tool was available
    to the Microsoft PD7 crew ages ago, although I never used it.

    The only twist to the technigue here I ponder, is the use of a
    compression algorithim to keep from wasting disk space, to get
    a UDT stored in the most compressed fashion possible on the disk,
    and ...

    ----------> automatically create the transmission block
    ----------> in the shortest possible size to move the
    ----------> data over the IP circuit as well!

    Moving large groups of these requests and replys around, becomes
    nothing more than simply sending a flat file, already compressed,
    then decoding it at the other end. We won't get into what happens
    to modem traffic on encoding and decoding. The FDA has told me
    that, in the end, every modem and network, since it used compression
    inherent in the system, will be licensed as well. I was advised
    to stay out of the modem and network business .. period! Was
    told to stick to making sure a human being authorized to make up
    a record was at each site that was going to have a permanent
    record, and no push would be allowed without that human letting
    it happen .. Oh well ..

    Last thought. Phew! All this done, how do we choose, on the fly,
    in the real world, a way to dynamically alter the format of the
    needed UDT at runtime in PowerBasic?

    As far as I can see, UDT's being fixed at compile time, the only
    way to dynamically alter a long-term application, is to use an
    on-the-fly switch to select one or more cannonical convention
    or crosswalk tools, every time the rules change from Diety!

    Maybe there is a way to get dynamically changeable UDT's in some
    future incantation of PowerBasic! That would be NEAT!



    When you used the word "much" to illustrate how things get far
    more complicated in the world of disk files, and you also said
    that people should spend a *LOT* of time studying how they want
    to store things, you were spot on, to use your lingo.

    The number of file format changes in a well designed management
    template really does not change more than a few times as a general
    rule, even in big projects, if they are well thought out. Thus,
    as a last step in the choice of creating a major storage design
    for data, I suspect we also need to maintain a key, in our record,
    as to *WHAT KIND OF UDT* was used to work with the data and what
    kind of compression format was used to smunch the data and needs
    to be used to unsmunch it!

    Yes, there are utilities that do this, however, the code internal
    to any such utility used has to be available down to the source
    code level to eventually pass the FDA for use in medical records.
    That because any compression and de-compression algorithim, per
    my information from them, must be licensed to prove that it is
    lossless. Has anyone here worked out a publicly available utility,
    the source for which is there so as to submit this with one's
    PowerBasic source, if needed?

    Inquiring mind wants to know!

    ------------------
    Mike Luther
    [email protected]

    Leave a comment:


  • Eric Pearson
    replied
    Ben --

    Addressing the second code sample that you posted (OPEN custno$ FOR RANDOM...)

    It looks to me like you are confusing records and fields. The file that that code would create would look like this (with < denoting the end of each 30-byte record)

    Code:
    123456                       <
    A Customer, Inc.             <
    [email protected]        <
    1 (800) 555-1212             <
    (and so on)
    ...and so on. Each invidual piece of data would be in its own record. What I suspect you really want is something like this...

    Code:
    12345   A Customer, Inc.             [email protected]   1 (800) 555-1212 <
    12346   Acme Anvils, Inc.            [email protected]       1 (800) BEEPBEEP <
    ...where each record is a customer, and each field within a record is a piece of data about that customer. To do that...

    Code:
    OPEN custno$ FOR RANDOM AS #1 LEN = 256
    FIELD #1, 8 as custno$, 32 as custname$, 24 as custemail$, 32 as custhipto$ (and so on)
    LSET custno$ = "123456"
    LSET custname$ = "A Customer, Inc."
    LSET custemail$ = "[email protected]"
    LSET custphone$ = "1 (800) 555-1212"
    (and so on)
    'put all of that data into record 1...
    PUT #1, 1
    I made up all of the numbers. As long as the FIELDs add up to the LEN, it should work.

    Then, on the other side, when you GET a record, all of the variables will be automatically "filled" with the correct data. GET a different record number, and all of the variables will be filled with the data for that customer.

    A more "modern" approach would be to use a User Defined Type structure instead of using FIELD, as Ian suggested. Each GET would fill the UDT stucture, and you would get the individual pieces of data by using variable names like Customer.IDNo, Customer.cName, and so on (to use Ian's example).

    -- Eric


    ------------------
    Perfect Sync: Perfect Sync Development Tools
    Email: mailto:[email protected][email protected]</A>

    Leave a comment:


  • Lance Edmonds
    replied
    On the minus side, to retrieve data, you have to start with the 1st record and proceed down to the record you want to retrieve. This may slow things down a bit.
    By using a separate "index" file to store the byte locations of each record in the main data file, you can greatly speed up the chore of locating a particular record. This is something that PowerTREE can help with, but there are other methods that can be used to implement a simple form of indexing.

    However, there is a separate problem that can make the "dynamic record size" approach much more complicated... How should it be handled if ...
    (1) if a record size changes (especially if it grows), or
    (2) a record needs to be "inserted", or
    (3) a record needs to be deleted.
    The amount of code necessary to handle these type of tasks can make simple data file manipulation code *much* more complicated. For example, you could maintain a list of empty "slots", along with a form of linked-list to minimize the need to store the records in any particular sorted order... it starts getting complicated very quickly, and often this additional code slows the final application down significantly.

    However, using a fixed length record solves many of these problems and makes the chore of data manipulation very simple, but the fixed-length record approach adds to the overall storage size of the entire data file.

    Therefore, such a choice often comes down to a choice of performance VS data size. The design choice is *yours* and you should choose carefully - changing a database design at a later date is usually much harder than implementing it the right way to start with.



    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Mel Bishop
    replied
    I have "discovered" another way of storing and retrieving records (fields) of random length. This technique is as follows:

    Get the length of a field and store it as a char$(). Record that on disk then the data of the record (field). For example:

    open "sometext.dat" for binary as #1
    a$ = "LastName"
    a = len(a$)
    c$ = chr$(a)
    put$ #1,c$ + a$
    close #1

    To retrieve:
    open "sometext.dat" for binary as #1
    get$ #1,1,c$
    c = asc(c$)
    get$ #1,c,a$
    close #1


    This technique has pluses and minuses. On the plus side, you will save a large amount of hard drive real estate since you don't have to "pad" a field to a fixed length.

    On the minus side, to retrieve data, you have to start with the 1st record and proceed down to the record you want to retrieve. This may slow things down a bit. Also, using a one byte field length, you are limited to 255 bytes per record. You can, of course, go to 64K record length if you use a two-byte field length tag.

    As noted previously, speed of data access will be greatly increased if you can copy your data to RAM disk and access it from there.


    ------------------

    Leave a comment:


  • Ian Cairns
    replied
    Ben, this is the way I would approach it. (I didn't run this to debug it!)
    First create a User defined TYPE that describes your data. Then fill an instance of the type with your data and put the whole customer array into your file one item at a time.

    Code:
    TYPE CustomerTYPE
      IDNo    AS LONG
      cName   AS STRING * 20
      Email   AS STRING * 30
      phone   AS STRING * 13 'ie. (804)123-4567
      shipTo  AS STRING * 50
      billTo  AS STRING * 50
      cType   AS STRING * 5
      ExpDate AS STRING * 10 'ie. 12/05/2001
      cNum    AS LONG
    END TYPE
    
    noCustomers = 100
    DIM customer(1:noCustomers) AS customerTYPE
    ' Get customer information
    
    fhandle = FREEFILE
    OPEN "customers" FOR RANDOM AS fhandle, LEN = LEN(customerTYPE)
    FOR i = 1 TO noCustomers
       PUT # fhandle, customer(i)
    NEXT i
    CLOSE fhandle
    regards,

    ------------------
    [email protected]

    Leave a comment:


  • Guest's Avatar
    Guest replied
    This is the file I am creating. I now have two questions.

    1. Can I store the records in the file according to the different lengths of the variables I am using, (i.e., custno$, custname$) or do I have to define the record size in the OPEN statement? Specifying 30 characters for all records (when there are some as small as 1 character) is very wasteful.

    2. Now what is the syntax to open the very same file to retrieve all of the records into variables?

    Thanks, Ben

    OPEN custno$ FOR RANDOM AS #1 LEN = 30
    FIELD #1, 30 as filebuffer$
    LSET filebuffer$ = custno$
    PUT #1, 1
    LSET filebuffer$ = custname$
    PUT #1, 2
    LSET filebuffer$ = custemail$
    PUT #1, 3
    LSET filebuffer$ = custphone$
    PUT #1, 4
    LSET filebuffer$ = custshipto$
    PUT #1, 5
    LSET filebuffer$ = custbillto$
    PUT #1, 6
    LSET filebuffer$ = cctype$
    PUT #1, 7
    LSET filebuffer$ = ccexpdate$
    PUT #1, 8
    LSET filebuffer$ = ccnumber$
    PUT #1, 9
    CLOSE #1

    ------------------

    Leave a comment:


  • Walt Decker
    replied
    FYI, FIELD can be used with variables to produce a user defined record. For example:

    DEFINT A - Z

    I = 10
    DIM FieldSize(1 TO I) , SomeData$(1 TO I), FieldData$(1 TO I)

    .... put some field sizes in FieldSize() and add them up to get
    the total size of the data record + 2

    .... put some data into SomeData$()

    FileHandle = FREEFILE
    L = UBOUND(FieldSize)

    OPEN SomeFile$ FOR RANDOM AS FileHandle LEN = DataLen

    FOR I = 1 TO L

    FIELD FileHandle, FieldSize(I) AS FieldData$

    NEXT I

    Then you can store your data and easily retrieve it without resorting to defining a UDT before hand.



    ------------------

    Leave a comment:


  • Michael Mattias
    replied
    With PB, you need to specify the LEN of a file opened RANDOM unless you accept the default of 128. This won't make any "real" difference as long as you always open the file with the same LEN parameter, but it will waste disk space.

    e.,g, OPEN "foo.dat" FOR RANDOM AS #1 LEN=10

    MCM

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Thanks. I've been away from PowerBASIC for about 5 years and have forgotten a bunch of things! I wasn't really being specific about the code because I knew that the response I'd get would be helpful enough.

    Thanks.

    ------------------

    Leave a comment:


  • Eric Pearson
    replied
    Ben --

    No offense, but that code is a mess. I tried to fix it for you but it's not entirely clear what you're trying to do. You need to change a bunch of things, including:

    1) You can't do FIELD #1 until file #1 is open. Move the OPEN statement to the top, followed by FIELD, then do the rest of the stuff.

    2) After a FIELD, you must use LSET or RSET instead of doing an assignment with "=" or the FIELD will become "diconnected" from the file.

    3) WIDTH is a reserved word so you can't use it as a variable name.

    4) You are assigning a string value ("whatever") to a variable called "bufferwidth$" (?) but then you are using PUT$ to put a variable called Junk$ into the file.

    5) If you use FIELD you should use PUT not PUT$.

    If you can describe exactly what you're trying to accomplish we can be of more help. What is the end result that you want?

    Is it something like this...?

    Code:
    open "random.dat" for random as #1
    field #1, 10 as buffer$
    lset buffer$ = "whatever"
    put #1,1
    close #1
    I'm not sure what you had in mind with the bufferwidth$ = "10" line, so I assume that's supposed to be the width of the records in the file. (?)

    HTH.

    -- Eric

    P.S. Don't let my kidding put you off from asking questions here! You're in the right place if you want to learn or re-learn PB.


    ------------------
    Perfect Sync: Perfect Sync Development Tools
    Email: mailto:[email protected][email protected]</A>



    [This message has been edited by Eric Pearson (edited June 02, 2000).]

    Leave a comment:


  • Guest's Avatar
    Guest started a topic file access modes

    file access modes

    Hi:

    I've been out of practice for a year or two and am getting back into PB for DOS.

    I want to set up and store junk to a file that I can retrieve junk from at any file position I choose, ie, #1, 645. I could use a working example of some code so that I could figure out what I'm doing. :-)

    I'm having trouble using a random access file. This is my syntax, which is causing problems:

    junk$ = "whatever"
    bufferwidth$ = "10"
    field #1, width as bufferwidth$
    lset bufferwidth$ = junk$

    open "random.dat" for random as #1
    put$ #1, junk$
    close #1

    Thanks, Ben
Working...
X