Announcement

Collapse
No announcement yet.

Problem with read in large freefile

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    ???
    CSV is a string file format.
    DOUBLE is a floating point binary type format.
    I don't see any VAL() functions for converting number strings in a file (like ", 15.345,") to binary, so I am confused.

    Cheers,
    Dale

    Comment


    • #42
      The CSV has been read into memory in an earlier SUB() and converted to DOUBLE.

      Comment


      • #43
        Okay. Except for page 1, only saw Stuart use VAL().

        Cheers,
        Dale

        Comment


        • #44
          'today' is similarly not explained. Is it some kind of GLOBAL?
          Michael Mattias
          Tal Systems (retired)
          Port Washington WI USA
          [email protected]
          http://www.talsystems.com

          Comment


          • #45
            Originally posted by Michael Mattias View Post
            'today' is similarly not explained. Is it some kind of GLOBAL?
            It's clearly explained if you care to actually read the code. It's a parameter to the function.

            FUNCTION percentAway(today AS DOUBLE, yest AS DOUBLE, dr AS LONG) AS DOUBLE

            And is a value from the data array passed as:

            percentAway(csvdata(mp, checkstatscol), REFCOLMIN(p), 2)

            Comment


            • #46
              It's a parameter to the function.
              Oh boy, and they say the memory is the second thing to go, but I'm more thinking it's the eyesight.

              Duh.

              It's clearly explained if you care to actually read the code.
              And just how did I find the variable 'today' was missing? BY READING, however incorrectly.


              Michael Mattias
              Tal Systems (retired)
              Port Washington WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #47
                I have not really followed this thread in detail but confronted with a task of this type and trying to stick to normal basic capacity, I would be inclined to chop up the original file into managable chunks, (10 meg - 100 meg) and replace the ascii 10s in each chunk with CRLF pairs then write each replaced chunk back to a new file. This way you could then use LINE INPUT #1 to process the file line by line. There are certainly more exotic ways to do this but if you want to stick to normal basic, it can do this job OK.

                I don't know what is being done with each line data but I would imagine it would be slower than the read time for each line so using LINE INPUT #1 is probably fast enough.
                hutch at movsd dot com
                The MASM Forum

                www.masm32.com

                Comment


                • #48
                  I have written a quick tool that should be useful in converting ascii 10 Unix style delimiters to standard ascii 13+10 delimiting. There are two files here, for testing the first writes a 20 million line test file that is ascii 10 delimited. The second file converts the ascii 10 delimited file to to standard 13+10 delimiting.

                  The file that writes the 20 million lines is slow but at least it has a crude progress indicator. The second file that does the conversion is fast enough to be useful. With the output file the file can be opened and run through using LINE INPUT #1 (or whatever file number).

                  Code:
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                      #compile exe "mkfile.exe"
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FUNCTION PBmain as LONG
                  
                      LOCAL icnt as DWORD
                      LOCAL tcnt as DWORD
                  
                      icnt = 0
                      tcnt = 0
                  
                      Open "bigfile.txt" for Output as #1
                  
                      StdOut "Warning, this is a slow operation writing 20 million lines"+$CRLF
                  
                      Do
                        tcnt = tcnt + 1
                        If tcnt > 250000 Then
                          Stdout ".";                             ' progress indicator
                          tcnt = 0
                        End If
                  
                        Print #1, "Record number"+str$(icnt)+" Your Record Data"+chr$(10);
                        icnt = icnt + 1
                      Loop While icnt < 20000000
                  
                      Close #1
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  The convertor.
                  Code:
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                      #compile exe "cnvt.exe"
                  
                      MACRO BufSize = 1024*1024*10
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FUNCTION PBmain as LONG
                  
                      LOCAL cloc as QUAD
                      LOCAL flen as QUAD
                      LOCAL lcnt as QUAD
                  
                      Open "bigfile.txt" for Binary as #1
                      Open "outfile.txt" for Append as #2
                  
                      flen = lof(1)                                       ' get the file length
                      lcnt = flen \ BufSize                               ' integer divide
                      lcnt = lcnt + 1                                     ' plus 1
                  
                      cloc = 0                                            ' current location is 0
                  
                      Do
                        Seek #1, cloc                                     ' seek from current location
                        Get$ #1,BufSize,FileChunk$                        ' get user defined chunk
                        Replace chr$(10) with chr$(13,10) in FileChunk$   ' replace ascii 10 with ascii 13 10
                        Print #2, FileChunk$;                             ' append the chunk to the last one
                        cloc = cloc + BufSize                             ' increment current location by buffer size
                        lcnt = lcnt - 1
                      Loop while lcnt > 0
                  
                      Close #2
                      Close #1
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  Attached Files
                  hutch at movsd dot com
                  The MASM Forum

                  www.masm32.com

                  Comment


                  • #49
                    This will cut-down the file creation time using string builder to fill a buffer based on %RecordsPerWrite to reduce the number of writes.
                    If %RecordsPerWrite is too many millions the program will crash without notice. Over a million will not increase performance.
                    String builder automatically writes the correct number of bytes on the last write if any bytes are left in the buffer.
                    Added TIMER to show how long it takes.
                    increasing sb.capacity had little impact on the performace times so did not include it. Maybe it should be increased from 1024 bytes.
                    Code:
                    %Records         = 20000000 '20-million records
                    %RecordsPerWrite = 400000   '15 seconds on i7-2600K
                    
                    FUNCTION PBMAIN AS LONG
                     LOCAL x  AS LONG
                     LOCAL StartTime AS SINGLE
                     LOCAL sb AS ISTRINGBUILDERA
                     sb = CLASS "STRINGBUILDERA"
                     OPEN "bigfile.txt" FOR OUTPUT AS #1
                     StartTime = TIMER
                     FOR x = 1 TO %Records
                      sb.add CHR$("Record number",STR$(x)," Your Record Data",CHR$(10))
                      IF x MOD %RecordsPerWrite = 0 THEN PRINT #1,sb.string;:sb.clear:SLEEP 10
                     NEXT
                     IF sb.len THEN PRINT #1,sb.string;:sb.clear 'last write(if needed)
                     ? USING$("Bytes #,  Time #.##",LOF(1),TIMER-StartTime),%MB_SYSTEMMODAL
                     CLOSE #1
                    END FUNCTION

                    Comment


                    • #50
                      Originally posted by Stuart McLachlan View Post
                      (If your files are CRLF and you don't have to convert them in the application, it will work with any size of data files - the only memory constraints are in the LFtoCRLF function)
                      '
                      Code:
                      FUNCTION LFtoCRLF(s AS STRING) AS LONG
                      LOCAL ff AS LONG
                      LOCAL strBuf AS STRING
                      ff = FREEFILE
                      OPEN s FOR BINARY AS #1
                      GET$ #1,LOF(#1),strBuf
                      REPLACE $LF WITH $CRLF IN strBuf
                      CLOSE #1
                      OPEN "crlf-" & s FOR BINARY AS #1
                      PUT$ #1, strBUf
                      CLOSE #1
                      END FUNCTION
                      '
                      To overcome any memory problems with the LF to CRLF, this does the job for any sized file
                      (function wrapped in a compilable "test harness: )
                      '
                      Code:
                      #COMPILE EXE
                      #DIM ALL
                      FUNCTION PBMAIN AS LONG
                          LOCAL t AS DOUBLE
                          LOCAL strResult AS STRING
                          t = TIMER
                          strResult = LFToCRLF( "MyData1.csv")
                          t = TIMER - t
                          ? strResult & " created in " & FORMAT$(t,"0.0") & " seconds"
                      END FUNCTION
                      
                      FUNCTION LFtoCRLF(s AS STRING) AS STRING
                          LOCAL qFileLen AS QUAD
                          LOCAL lngChunkSize,lngLoops,lngLoopCOunt,ff1,ff2  AS LONG
                          LOCAL  strOut,strChunk AS STRING
                      
                          lngChunkSize = 10485760  '10 MiBytes
                          strOut = PATHNAME$(PATH,s) & PATHNAME$(NAME,s) & "-CRLF" & PATHNAME$(EXTN,s)
                          IF ISFILE(strOut) THEN KILL strOut
                          ff1 = FREEFILE: ff2 = FREEFILE
                          OPEN s FOR BINARY AS #ff1
                          OPEN strOut FOR BINARY AS #ff2
                          qFileLen = LOF(#ff1)
                          lngLoopCount = (qFileLen \ lngChunkSize ) + 1 'Number of full sized chunks to fetch plus final partial chunk if not exact match
                      
                          FOR lngLoops = 1 TO lngLoopcount
                            GET$ #ff1,lngChunkSize,strChunk 'Get from first byte or first byte past last GET$
                            REPLACE $LF WITH $CRLF IN strChunk
                            PUT$ #ff2, strChunk
                          NEXT
                          CLOSE #ff2:CLOSE #ff1
                          FUNCTION = strOut
                      END FUNCTION
                      '
                      Last edited by Stuart McLachlan; 15 Sep 2021, 06:24 AM.

                      Comment

                      Working...
                      X