Announcement

Collapse
No announcement yet.

Reading a whole text file... market data

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Michael
    Some quick quotes from Microsofts own descriptions of MMF's
    One advantage to using MMF I/O is that the system performs all data transfers for it in 4K pages of data.
    As I think you pointed out thats a small input buffer.
    While no gain in performance is observed when using MMFs for simply reading a file into RAM,
    Like sequential files that are only read in entirety once.
    Since Windows NT is a page-based virtual-memory system, memory-mapped files represent little more than an extension of an existing, internal memory management component.......when a process starts, pages of memory are used to store static and dynamic data for that application. Once committed, these pages are backed by the system pagefile,
    Why would you bother backing up memory pages to the pagefile when they are only being read once.
    There is no question MMF's are an important part of the O/S but not as you keep quoting how to use them.
    John

    Comment


    • #22
      Originally posted by John Petty View Post
      Doug
      I don't understand why you are useing LINE INPUT and parsing. Old school Basic has always seperated input fields with commas or CRLF combinations so why are you bringing these 8 variables per line into a string and then parsing them. Far more efficient to let basic do the parsing automatically as it reads the file. Here is a small code example assuming there are only 3 variables per line. Of course this could also be done with a single UDT array. As for the disk yes increase the buffer size.
      John,
      It's just to keep it clean visually.
      Some of the records have a time field and others have additional volume fields - so that's not so much of a speed thing.
      However, I am using the Parse$ function.
      Thanks!
      Doug

      Comment


      • #23
        Other than to clean house and not be wasteful, is there any performance reason to use redim preserve?

        Comment


        • #24
          No it cleans up a bit of memory space thats all. The main reason I would do it is that when the array is used later you can use its Ubound so not have to worry if counter (x in this case) gets changed.

          Comment


          • #25
            Other than to clean house and not be wasteful, is there any performance reason to use redim preserve?
            ???

            REDIM PRESERVE is not used to clean house or eliminate resource profligacy, it's used to resize an array without losing the current contents.
            Michael Mattias
            Tal Systems (retired)
            Port Washington WI USA
            [email protected]
            http://www.talsystems.com

            Comment


            • #26
              Michael
              Is this yet another profligate post by you.
              My understanding is that Redim Preserve will return the wastefully extravagant portion of the original Redim to the usable 2 GB memory space of a program.
              You really must learn all the meanings of these fancy words you have started using.
              John
              PS I am still trying to understand what my intellectual creativity has to do with having children

              Comment


              • #27
                My understanding is that Redim Preserve will return the wastefully extravagant portion of the original Redim to the usable 2 GB memory space of a program
                Then you suffer a bad case of misunderstanding.

                When using PB arrays, there is no overallocation; the compiler only allocates space for as many elements as requested by the programmer in the DIM or REDIM statement.

                And regardless, all array data allocations always come out of your 2GB user limit. The space required for any increase or decrease in the number of elements requested subtracts from or adds to what remains available to your process.

                If you, the programmer, allocate more elements than you need, that is your own promiscuous behavior, not the compiler's or the operating system's.

                MCM
                Last edited by Michael Mattias; 21 Jun 2009, 11:52 AM.
                Michael Mattias
                Tal Systems (retired)
                Port Washington WI USA
                [email protected]
                http://www.talsystems.com

                Comment


                • #28
                  Originally posted by Michael Mattias View Post
                  The space required for any increase or decrease in the number of elements requested subtracts from or adds to what remains available to your process.
                  MCM
                  Note bolding is mine.
                  So you are agreeing with me!! What is your point? Who said it had anything to do with the O/S or compiler? How simple can I make it for you? If I REDIM PRESERVE to the actual array size needed then the unused portion of memory is returned to the program for other uses. Thats what your statement says, thats what Doug said and what I have said, so what is your argument?

                  Comment


                  • #29
                    There are always two things to consider for programming.

                    1)
                    Do so little memory claims as possible.
                    This doesn't mean size but the number of allocations.
                    Redim preserve means: make a copy of the existing array in a new memory allocation.

                    2)
                    Diskreads, better read a large block then little parts each time.
                    hellobasic

                    Comment


                    • #30
                      Originally posted by John Gleason View Post
                      I'm sad to report I made a mistake in my timing code, but happy to report that it works fine using individual record reads!
                      I think you still have a problem with your new test.

                      Code:
                      #COMPILE EXE
                      #DIM ALL
                      
                      FUNCTION PBMAIN () AS LONG
                         LOCAL allFile AS STRING, iRecs, ii AS LONG
                         LOCAL high, t AS DOUBLE
                      
                         OPEN "C:\BinFil08.dat" FOR INPUT AS #2 LEN = &h00800 'I DID see speed increases with larger rec sizes.
                         t = TIMER
                         DO
                            LINE INPUT #2, allFile
                         LOOP UNTIL EOF(#2)
                         ? "done individual rec read: " & STR$(TIMER - t)
                         CLOSE
                      
                         OPEN "C:\BinFil08.dat" FOR BINARY AS #1
                         t = TIMER
                         GET$ #1, LOF(#1), allFile
                         iRecs = PARSECOUNT(allFile, $CRLF)
                         DIM arrOfRecs(iRecs) AS STRING
                         PARSE allFile, arrOfRecs(), $CRLF
                      
                          ? "done whole file read: " & STR$(TIMER - t)
                         WAITKEY$ 
                      END FUNCTION
                      In your first test, all you are doing is reading in each record and not keeping any of those records but the last one in memory. In the second, your are reading in the entire contents of the file but then going on to calculate the number of records, dimensioning an array to hold the parsed data and then parsing the data.

                      Not a very fair test if you ask me. Here is another test program that will give the results for reading the entire contents of the file into an array using either LINE INPUT or PARSE which is a fairer test.

                      Code:
                      #COMPILE EXE
                      #DIM ALL
                      
                      FUNCTION PBMAIN () AS LONG
                      
                          LOCAL allFile   AS STRING
                          LOCAL iRecs     AS LONG
                          LOCAL ii        AS LONG
                          LOCAL hFile     AS DWORD
                          LOCAL oneRec    AS STRING
                          LOCAL t         AS DOUBLE
                          LOCAL Pathname  AS STRING
                      
                          DISPLAY OPENFILE %HWND_DESKTOP, , , "", "", "All files (*.*)|*.*", "", "", %OFN_PATHMUSTEXIST TO Pathname
                          IF Pathname = "" THEN
                              EXIT FUNCTION
                          END IF
                      
                          hFile = FREEFILE
                          OPEN Pathname FOR INPUT AS #hFile LEN = &h00800
                              t = TIMER
                              FILESCAN #hFile, RECORDS TO iRecs
                              DIM arrOfRecs(iRecs) AS STRING
                              FOR ii = 1 TO iRecs
                                  LINE INPUT #hFile, arrOfRecs(ii)
                              NEXT ii
                              ? "done individual rec read to array: " & STR$(TIMER - t)
                          CLOSE #hFile
                      
                          hFile = FREEFILE
                          OPEN Pathname FOR BINARY AS #hFile
                              t = TIMER
                              GET$ #hFile, LOF(#hFile), allFile
                              iRecs = PARSECOUNT(allFile, $CRLF)
                              DIM arrOfRecs(iRecs) AS STRING
                              PARSE allFile, arrOfRecs(), $CRLF
                              ? "    done whole file read to array: " & STR$(TIMER - t)
                          CLOSE #hFile
                      
                      END FUNCTION
                      Here are the results I got when using a 10.4 MB text file of lines of up to 80 characters followed by CRLF.

                      Code:
                      done individual rec read to array:  .281000000001921
                          done whole file read to array:  .156999999999243
                      Jeff Blakeney

                      Comment


                      • #31
                        John & Jeff
                        Both most interesting and probaby usefull info but neither addresses what Doug is trying to do. He actually wants the individual fields. The methods you have both presented require reading the data twice to break up into strings either by filescan or parsing in memory. To get what he wants he then needs to parse each individual string again to get the seperate fields (a 3rd read and second parse), why not do it all in the first pass as basic has always done (thus my code example).
                        Doug has pointed out that there may be different numbers of fields in different reports, the one I looked at only had 7 where his example has 8. Simple do a parsecount on the heading line first and adjust the number of Input fields.
                        Actually for this type of data then there should be no difference past the heading line if they are CSV or TXT.
                        John

                        Comment


                        • #32
                          Originally posted by John Petty View Post
                          John & Jeff
                          Both most interesting and probaby usefull info but neither addresses what Doug is trying to do. He actually wants the individual fields. The methods you have both presented require reading the data twice to break up into strings either by filescan or parsing in memory. To get what he wants he then needs to parse each individual string again to get the seperate fields (a 3rd read and second parse), why not do it all in the first pass as basic has always done (thus my code example).
                          Doug has pointed out that there may be different numbers of fields in different reports, the one I looked at only had 7 where his example has 8. Simple do a parsecount on the heading line first and adjust the number of Input fields.
                          Actually for this type of data then there should be no difference past the heading line if they are CSV or TXT.
                          John
                          What I was addressing was the fact that John did a test that had LINE INPUT loading the data from disk faster than using GET$ to get the whole thing at once. His test was biased because he did more work with the GET$ in his test program so I gave him another test program to show that LINE INPUT is slower than GET$.

                          Once the data is in memory, it will take the same amount of time to parse out the individual fields with either method. Doug was looking for a way to make sure it ran as fast as possible and he can save a bit of time by loading the entire file into memory using GET$ and then parsing the data into his arrays.

                          However, in this case, the data he's getting is quite small. I went to the site that Doug posted and grabbed the daily data from 1950 to present and it is only 788 KB. My test showed that just loading the data for a 10.4 MB file took only .28 seconds in the worst case so a 788 KB file would take even less.
                          Jeff Blakeney

                          Comment


                          • #33
                            06/11/2009,1600,949.65,949.98,943.75,944.89,0,-999999
                            06/12/2009,1030,943.44,943.44,935.66,939.03,0,-999999
                            06/12/2009,1130,938.76,943.24,938.46,940.33,0,-999999
                            06/12/2009,1230,940.3,941.06,938.98,939.99,0,-999999
                            06/12/2009,1330,940.0,941.42,938.88,940.41,0,-999999
                            06/12/2009,1430,940.42,942.15,940.01,940.53,0,-999999
                            One needs to examine the data more carefully. Notice the extra comma's inserted and will it always be in that format.
                            Last edited by Nick Luick; 23 Jun 2009, 03:44 PM.

                            Comment

                            Working...
                            X