No announcement yet.

managing many large files on HD as strings

  • Filter
  • Time
  • Show
Clear All
new posts

  • managing many large files on HD as strings

    I have to process a file that has data for about 600 different
    products. I have no idea which ones any user may need so I have
    to process all of them! urgh

    The problem is that there is one file for each day. So each of
    the daily files has to be read from top to bottom, and the data
    in it sent to one of the 600 destination files.

    I began by doing a straight foreward "read the source file line by
    line and open the appropriate destination file and write line by
    line". This was fine when the destination file did not have too
    much existing data in it. but once a few weeks of data is in each
    of the destination files, I have to wade thru it all to insert or
    append the new data.

    Next I tried reading the entire destination file into memory and
    flying thru it with pointers. Then trasnfereing the file up to the
    insert point to a TempStr, adding the new data, then adding the'
    rest of the file. Then I write the whole file to HD again.

    This is faster but I am still moving each one of the 600 files
    around too much. They can each be about 1.6MB!
    Thats about 960MB I have to read and write for each new day of data.
    and thats not including some other string processing that has to
    be done on each one. So its sloooooow

    If I want to drag and drop a weeks data at a time, thats about
    6GB that has to be moved and processed.

    So now Im realizing I need to stop moving the data. I could detect
    if the data needs to be inserted vs appended to the existing data.
    If I re-write the app (for the 4th time) I could make the majority
    an append operation. Then I could just open the file and APPEND the
    new days data.

    I am worried this will produce 600 fragments for each days data
    as the files are writen one at a time for one day and then
    one at a time for the next day and so on.

    Am I going in the right direction with this? I cant think of a
    better solution, but you guys allways seem to have great ideas,
    so I thought i would throw it our there and see what you think

    Kind Regards

    [This message has been edited by Mike Trader (edited August 24, 2001).]

  • #2
    A couple of thoughts...

    1. What about reading the input files, sorting them and consolidating like items? Without a description of the fields in the files, I can't really tell if this is a good solution.

    1a. Can you similarly consolidate the destination files?

    2. Just looking at your volumes, I think you have outgrown sequential processing and it is now time to add some indexes.
    Not knowing how these data are accessed by other programs, I can't tell how much additional work it's going to take in other programs, but if those other programs access the data with nicely-defined functions (i.e., "CALL getRecord (KeyOrRecordNo, BufferToFill)" I don't think it would be too bad. (If these data are accessed 'in-line' that's another story).

    Possibilities here include indexing the destination file to facilitate the adds; or eliminating the "daily" files by adding a date field and indexing on that.

    Bottom line, I think you need to schedule a redesign of your file structures. The use of 'daily' files getting appended to 'master' files was once a pretty popular design, but with today's RDBMS's and indexing products, IMO that method has outlived its usefulness.

    (Blatant self-promotion alert). You may wish to get an outside consultant to review your problem and design in detail. That 'fresh set of eyes' can be invaluable.

    Michael Mattias
    Racine WI USA
    [email protected]

    [This message has been edited by Michael Mattias (edited August 24, 2001).]
    Michael Mattias
    Tal Systems (retired)
    Port Washington WI USA
    [email protected]


    • #3
      Whether or not fragmentation will occur will depend on the size of the
      data you append to each file. Fragmentation is measured in disk sectors,
      while file appends are measured in records. If each append is relatively
      small (less than a sector) then you will not get 600 fragments per day.
      In all likelyhood, however, you will get some fragmentation, but so what?
      Processing 600 files is going to bounce the disk head all over the drive
      anyway, plus, you are reading the files sequentially, fragmentation won't
      significantly impact what you are doing. It will impact your defrag time,
      which you will probably have to schedule regularly.

      I would go ahead and append your files.


      John Kovacich

      John Kovacich
      Ivory Tower Software


      • #4
        Indexing. great idea. I need to read up on that.

        OK. that makes sense. The data added to each file ranges from
        about 4 lines to 405 lines per day.
        Each line is about 70chars long. I guess a line = a record?

        Kind Regards