Announcement

Collapse

Maintenance

The forum could be offline for 30-60 minutes in the very near future for maintenance (said 3pm Pacific). I was behind on getting this notice. I do apologize.
See more
See less

Curious results from binary File I/O speed tests.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Curious results from binary File I/O speed tests.

    I wrote a routine to read and convert SYLK (multimate) data files,
    but I was unsatisfied with the conversion speed, so I have been
    experimenting with different buffer sizes.

    I open the file as binary, and specify a buffer length with the GET$
    command. On mainframes, it is usually beneficial to make the buffer
    size a ratio of the track or cylinder size. I expected to find that
    for this case, buffer size should be a ratio of the block size, but
    then I started to wonder if block size was meaningfull on a FAT32
    system?

    My results show that the buffer prefers being a meg. I was a bit surprised to find
    a maximum size for the buffer, and even more surprised that the largest
    buffer I could read was not a binary number, but an inbetween.

    This data is purely anecdotal, I made no attempt to stop background
    processes, or clear disk caches. Also, as I said, the program is
    decoding the SYLK format, not purely reading data off the disk.

    The file size is 1,616KB, the test system is a Pentium 266 laptop.

    Buffer size Time in seconds
    32767 12.8
    131072 12.7
    262144 12.2
    524288 12.3
    786432 12.5
    1048576 8.6
    1310720 10.4
    1572864 12.4
    2097152 failed to run correctly (0.3 sec return)

    I have run these several times to make sure that the sub 9 second result
    is correct, and it is. Can anyone explain why this one buffer size would
    make such a dramatic difference? Does it relate to a block size? Can
    I expect it to change from machine to machine?

    John Kovacich


    ------------------
    Thanks,

    John Kovacich
    Ivory Tower Software

  • #2
    Originally posted by John Kovacich:
    1048576 8.6

    I have run these several times to make sure that the sub 9 second result
    is correct, and it is. Can anyone explain why this one buffer size would
    make such a dramatic difference? Does it relate to a block size? Can
    I expect it to change from machine to machine?
    Well, 1048576 is evenly divisible for 1024, so 1 meg seems to be a magic number of some kind.

    Do you have some sample code you can post?

    --Dave


    ------------------
    Home of the BASIC Gurus
    www.basicguru.com
    Home of the BASIC Gurus
    www.basicguru.com

    Comment


    • #3
      Sure Dave, though I never intended this as a code problem. I have removed
      code which is not directly involved in the file reading process, so this
      would qualify as a snippet.

      What is the code to properly format code on this forum?

      For each test I ran, I simply changed the size value, compiled and executed.

      #COMPILE EXE "ReadSylk.exe"

      %True = 1
      %False = 0
      %hellfreezesover=0
      %progress=0
      '%size=32767 '12.8
      '%size=131072 '12.7
      '%size=262144 '12.2
      '%size=524288 '12.29 seconds
      '%size=786432 '12.5 seconds
      '%size=1048576 '8.6 seconds
      '%size=1310720 '10.4 seconds
      %size=1572864 '12.4 + seconds
      '%size=2097152 Failed to operate correctly
      '%size=4194304 Failed to operate correctly

      GLOBAL charptr AS BYTE PTR
      GLOBAL rec$
      GLOBAL f AS LONG
      GLOBAL cr AS STRING
      GLOBAL lf AS STRING
      GLOBAL crlf AS STRING
      GLOBAL nullchar AS STRING
      GLOBAL delim$
      GLOBAL MaxPtr AS LONG

      DECLARE FUNCTION LoadSylkTable ALIAS "LoadSylkTable" (fromfile AS STRING) AS LONG


      #IF %PB_EXE
      FUNCTION PBMAIN
      DIM fromfile AS STRING
      DIM x AS LONG

      cr=CHR$(13)
      lf=CHR$(10)

      crlf=CHR$(13,10)
      nullchar=CHR$(0)

      delim$=";" & crlf

      fromfile="test.slk"

      x=LoadSylkTable(fromfile)

      END FUNCTION
      #ENDIF

      FUNCTION LoadSylkTable ALIAS "LoadSylkTable" (fromfile AS STRING) EXPORT AS LONG

      DIM FmtCmd AS STRING
      DIM operand AS STRING
      DIM X AS LONG
      DIM f AS LONG
      DIM mode AS STRING
      DIM quoteflag AS LONG
      DIM start AS DOUBLE
      DIM finish AS DOUBLE

      start=TIMER

      f = FREEFILE
      OPEN fromfile FOR BINARY AS #f

      GET$ #f,%size, rec$

      charptr=STRPTR(rec$)
      MaxPtr=charptr+LEN(rec$)

      WHILE NOT EOF(f)
      GOSUB GetSylkParms
      CALL ProcessSylkFormat
      WEND

      finish=TIMER

      MSGBOX "Elapsed time = " & STR$(finish-start)

      EXIT FUNCTION

      GetSylkParms:
      REM assemble sylk parameters
      ' . . .
      REM assemble operand
      WHILE CharPtr <= MaxPtr AND INSTR(delim$,CHR$(@CharPtr))=0
      INCR CharPtr
      IF CharPtr > MaxPtr THEN
      GOSUB LoadSylkBuffer
      END IF
      WEND
      RETURN

      LoadSylkBuffer:
      REM this is where we load the next buffer
      GET$ #f,%size, rec$
      CharPtr = STRPTR(rec$)
      MaxPtr=CharPtr + LEN(rec$)
      RETURN

      END FUNCTION

      John



      ------------------
      Thanks,

      John Kovacich
      Ivory Tower Software

      Comment


      • #4
        I wrote an 16bit sylk reader, terebly slowwwww!
        It used Seek() to remember the line of where a row started.
        (For VB3)

        In 32bit's i would certainly do it all in memory.
        My suggestion, read it all in a string and process it using some instr()/Parse$ technique.

        Avoid the DOS if you can!

        The file can easily be put in an array this way..


        ------------------
        [email protected]
        hellobasic

        Comment


        • #5
          Edwin,
          Reading it into memmory, is essentially what I am doing, except, as I have
          shown, it was faster to read it in chunks, than all at once. Converting the
          functions to gosubs also helped gain some speed, though I couldn't
          elaborate on why that is.

          The file, which as I stated, was 1.6 meg, consists of 77,760 lines which
          are read into a 2286 x 33 array. Thats almost 10,000 lines per second,
          which is much better than the initial 6,000 lines per second.

          My question, is why would 1048576 be the magic number. Most of my sizes
          were binary. 1048576 is 1024^2. When reading in chunks of data using a
          binary access mode, what physical characteristic determines the speed?
          Are we bound to some kind of paging number in memory management? Is it
          related to the disk track size? Is it a magic number to Fat32? Is it
          related to the physical buffer size of my hard drive?

          There must be some cause, all I have identified here is the effect.

          Lastly, a question to the PB staff:
          If 1048576 is more than 50% faster than 32767, can the LEN parameter
          of input files be increased to allow this large a buffer?

          open "datafile" for input as #f len=1048576

          John



          ------------------
          Thanks,

          John Kovacich
          Ivory Tower Software

          Comment


          • #6
            My understanding from past discussions with R&D is that the limit is imposed to ensure compatibility with all possible flavors of Win32, as there was a problem encountered during the product development cycle.

            I'll check back with them on this and ask if it is still a requirement to impose this limit. ie, I'll ask for this to be added to the wish list.

            That said, using such large buffer sizes is a bad choice if the application will run on a network as it can cause saturation. Experienced network application programmers tend to use 8Kb buffers (at the sacrifice of a small amount of performance) as they are the most "network friendly" and provide the best overall result.

            Also, I fully expect that you'll find the "magic buffer length" value will vary dramatically from system to system, and will also be affected by multi-tasking issues, plus numerous other factors. These are some of the prime reasons for choosing an 8K buffer size.



            ------------------
            Lance
            PowerBASIC Support
            mailto:[email protected][email protected]</A>
            Lance
            mailto:[email protected]

            Comment

            Working...
            X