Announcement

Collapse
No announcement yet.

Put large string in to an array. Can't remember...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Put large string in to an array. Can't remember...

    Is there a command (that I have forgotten) to split a big delimited string and put in in and array with one command??

    I know about Parse$ and a loop. Thought I remember seeing a way to do in in one step.

    AH!!! FOUND IT!!
    PARSE start$, target$()

    Problem is this code hangs at GET$ if the file is huge.

    Code:
    #COMPILE EXE
    #DIM ALL
    
    FUNCTION PBMAIN () AS LONG
    
    LOCAL IN_STRING AS STRING
    LOCAL COUNTER AS LONG
    
    OPEN "output.txt" FOR BINARY AS #1
    
    COUNTER = LOF(1)
    
    GET$ #1, LOF(1), IN_STRING
    
    CLOSE #1
    
    END FUNCTION

  • #2
    How "huge" ? Are you using #OPTION LARGEMEM32?

    What's the structure of "output.txt"? How are you creating it?

    Comment


    • #3

      Problem is this code hangs at GET$ if the file is huge.
      If the file is that big and you want to put it into an array, that is even more memory. What is the purpose?
      1) OptTech sort could create an index for any or all of the columns.or just sort the whole file on any columns with 1-line of code.
      2) SQLite, Tsunami, Cheetah and never need delimited columns.
      3) Disk index with Ptree, DvBtree, memory mapped, roll your own.

      Comment


      • #4
        Check out FILESCAN.
        Rod
        In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

        Comment


        • #5
          Is the file size sometimes larger than (2^31) - 1 ?

          A single string can be 2G, so Count& in GET$ is a long. Files may be larger, so LOF() can return a QUAD.

          Just can't put a larger file in a single string.

          Cheers,
          Dale

          Comment


          • #6
            Originally posted by Dale Yarker View Post
            A single string can be 2G,
            Further discussion on practical limits to dynamic string size here:
            https://forum.powerbasic.com/forum/u...ile#post790717

            Note the last post where I ran some tests and fell short of the 2GiB string size by 262,182
            I just ran the same test on a different laptop and this time it falls short by 1,799,318 bytes

            so LOF() can return a QUAD.
            Actualy "does", not just "can"

            Comment


            • #7
              Read any size csv file into a SQLite table.
              Many optimizations could be done and are welcome. Goal was to allow any file size.
              Reads first line of csv file to determine number of columns and creates table with column names C1, C2, etc.

              In the real world the create statement should be manually created with correct names and column types!.
              Since columns are not automatically created as INTEGER or TEXT. Say column 3, 'C3' is a numeric column.
              Example: SELECT c1, c2, cast(c3 as integer) as Balance from table1 where Balance > 0 order by Balance
              Code:
              #DIM ALL       'csv2sqlite.bas  1/4/22
              #INCLUDE "sqlitening.inc"
              FUNCTION PBMAIN AS LONG
                LOCAL s, sCSVFile,sDataBaseName,sTableName AS STRING
                LOCAL counter,DropTable,QuoteNumbers AS LONG
              
                sDataBaseName = "junk.db3"  'sqlite database to write to
                sTableName    = "table1"    'table to write to
                DropTable     = 1           '0=append, 1= start fresh
                QuoteNumbers  = 1           'enclose all columns with $SQ (only while testing, not needed)
              
                'create csv file
                sCSVFile      = "CSV.csv"   'input csv file
                OPEN sCSVFile FOR OUTPUT AS #1
              
                FOR counter = 1 TO 15
                 s = FORMAT$(counter)
                 WRITE #1, s,"now","brown","3","Mike","was",9,"Here"
                NEXT
                CLOSE #1
              
                Csv2Sqlite sCsvFile,sDatabaseName,sTableName,DropTable,QuoteNumbers
              
                ? slSelStr("select count(*) from " + sTableName),,"Records"
              
               END FUNCTION
              '
              FUNCTION Csv2SQLite ( _
                sCSVFILE AS STRING, _
                sDataBaseName AS STRING , _
                sTableName    AS STRING,  _
                DropTable     AS LONG,    _
                QuoteNumbers  AS LONG) AS LONG
              
                'Create table with sColumnNames$ = "C1,C2, ..."
              
                LOCAL x, hFile,cols AS LONG
                LOCAL sInputLine,sColumnNames,sOutputLine,s,sInsert AS STRING
                hFile = FREEFILE
                OPEN sCSVFile FOR INPUT AS #hFile:
                IF ERR THEN ? ERROR$(ERRCLEAR):EXIT FUNCTION
                LINE INPUT #hFile, sInputLine 'read first line
                cols = PARSECOUNT(sInputLine) 'number of columns
                CLOSE #hFile
              
                'first line could be column names, using C1,C2, ..."
                FOR x = 1 TO cols
                  sColumnNames = sColumnNames + "C" + FORMAT$(x)+ ","
                NEXT
                sColumnNames = LEFT$(sColumnNames,-1) 'your column names
              
                slOpen sDataBaseName,"C"
                IF DropTable THEN slexe "Drop table if exists " + sTableName
                slexe "Create Table if not exists " + sTableName + "(" + sColumnNames + ")"
                slexe "BEGIN EXCLUSIVE"
                hFile = FREEFILE
                OPEN sCSVFile FOR INPUT AS #hFile
                DO UNTIL EOF(#hFile) 'process line
                  LINE INPUT #hFile, sInputLine
                  RESET sOutputLine
                  FOR x = 1 TO cols
                    s = PARSE$(sInputLine,x)
                      IF QuoteNumbers THEN        'enclose all columns in $SQ
                      REPLACE $SQ WITH $DQ IN s
                      s = $SQ & s & $SQ
                    ELSEIF IsNumeric(s) = 0 THEN 'only enclose strings with $SQ
                      REPLACE $SQ WITH $DQ IN s
                      s = $SQ & s & $SQ
                    END IF
                      sOutputLine = sOutPutLine +  s + ","
                  NEXT
                  sOutPutLine = LEFT$(sOutputLine,-1)
                  sInsert = "Insert into " + sTableName + " values (" + sOutputline + ")"
                  REM ? sOutputLine:end 'look at first output record and end
                  slExe sInsert
                  IF slGetChangeCount <> 1 THEN
                    ? sInsert,,"Error":slExe "Abort":CLOSE #hFile:EXIT FUNCTION
                  END IF
                LOOP
                slEXE "END"
              
              END FUNCTION
              '
              FUNCTION isNumeric(Answer AS STRING) AS LONG
                Answer = TRIM$(Answer) 'may be required if right-justified text
                IF (LEN(Answer) = 0)                  OR _
                   (VERIFY (Answer , ".-0123456789")) OR _
                   (RIGHT$(Answer,1) = ".")           OR _
                   (INSTR(-1,Answer,"-") > 1)         OR _
                   (TALLY(Answer,".") > 1) THEN
                   EXIT FUNCTION  'exit, return 0 not-numeric
                END IF
                FUNCTION = -1 'numeric
              END FUNCTION
              Mike Doty
              Member
              Last edited by Mike Doty; 4 Jan 2022, 09:53 AM.

              Comment


              • #8
                Originally posted by Mike Doty View Post
                Read any size csv file into a SQLite table. csv2Sqlite.bas 1/4/22
                Many optimizations could be done to this, but it does the job.
                Reads first line of csv file to determine number of columns and creates table with column names C1, C2, etc.
                Code:
                ...
                OPEN sCSVFile FOR INPUT AS #hFile
                .....
                LINE INPUT #hFile, sInputLine 'read first line
                cols = PARSECOUNT(sInputLine) 'number of columns
                ..
                That presupposes that Dave's file is a plain ANSII file composed of a series of CRLF delimited lines/records containing comma separated values/fields.
                All we've been told is that he has a "huge" file which presumably contains his "delimited string" (format unspecified)

                Comment


                • #9
                  WRT the supposed 2GiB string limit, according to PBs documentation:
                  If a string is shorter than 65535 bytes, it starts with a 2-byte length WORD followed by the string data. Otherwise, it will start with a 2-byte value of 65535, followed by a DWORD indicating the string length, then finally the string data itself
                  SInce a DWORD has a maximum value of 2^32 - 1 (4 GiB, 4,294,967,295 bytes), if enough of the approximately 3GiB of memory with #OPTION LARGEMEM32 were available, you should actually be able to have a string larger than 2GiB.

                  Comment


                  • #10
                    In 64 bit you would just plonk the data in memory but with the slightly under 2 gig absolute in Win32, you need another approach if the data is over 2 gig. In days of old you would open a file and write sequentially to it and read it back as you needed it using the seek capacity for location. There is another option, a memory mapped file with a sliding window but its fussy stuff to write and easy to make a mess of it. Put the data on a gen 4 NVMe drive and you can probably get reasonable performance.
                    hutch at movsd dot com
                    The MASM Forum

                    www.masm32.com

                    Comment


                    • #11
                      Notes on post #7 of the SQLitening CSV file to table. No change to the code.
                      In the real world the create statement should be manually created with correct names and column types!.
                      Since columns are not automatically created as INTEGER or TEXT. Say column 3, 'C3' is a balance column.

                      This statement was generated
                      Create Table Table1(c1,c2,c3) 'automatically generated

                      Create Table Table1(c1 text,c2 text, c3 integer) 'better
                      Create Table Table1(FirstName Text, LastName Text, Balance Integer) 'best

                      Examples if create statement was automatically generated to get correct results using CAST on a numeric column while testing.
                      SELECT c1, c2, cast(c3 as integer) from table1 where c3 > 0 order by c3;
                      SELECT c1 as FirstName,c2 as LastName, Cast(c3 as integer) as Balance from table1 where Balance > 0 order by Balance;
                      SELECT c1 as FirstName,c2 as LastName, Cast(c3 as integer) as Balance from table1 where Balance > 0 order by Balance Desc;

                      Comment


                      • #12
                        Thanks everybody.

                        Looks like #OPTION LARGEMEM32 is the solution. I have never used that before!
                        Next I will try it with a bigger file!

                        I actually have it in SQL server to mess with. I was playing with how fast arrays can be compared to a database.

                        This is what the data looks like:
                        C:\DumpStack.log.tmp
                        C:\hiberfil.sys
                        C:\pagefile.sys
                        C:\swapfile.sys
                        C:\Asus\E16228_ROG_ZENITH_II_EXTREME_ALPHA_UM_WEB.pdf
                        C:\Asus\New Text Document.txt
                        C:\Asus\web_location.txt
                        C:\David Personal\Eye_Doctor\12_10_2020.pdf
                        C:\Everything\Everything.db
                        C:\Everything\Everything.exe

                        I am making database rows of every file on every disk in my network.


                        Comment


                        • #13
                          Problem is this code hangs at GET$ if the file is huge.
                          Why do you need the entire file data in a string at one time? Any chance you can work on it in pieces? How about working on it 'as it lays' on disk?

                          (I won't ask you where your error-handler is.. or at least your check of COUNTER to be "to big" to load).
                          Michael Mattias
                          Tal Systems (retired)
                          Port Washington WI USA
                          [email protected]
                          http://www.talsystems.com

                          Comment


                          • #14
                            Read any size csv file into a SQLite table.
                            Using either OLE/ADO (MS-Text provider) or the Microsoft Text ODBC driver, you can work on it using SQL statements without the need to load it to a database.

                            Just sayin,,,,
                            Michael Mattias
                            Tal Systems (retired)
                            Port Washington WI USA
                            [email protected]
                            http://www.talsystems.com

                            Comment


                            • #15
                              Michael: Mostly I am just fooling around seeing what I can do with large arrays and large chunks of data without using a database.

                              "Using either OLE/ADO (MS-Text provider) or the Microsoft Text ODBC driver, you can work on it using SQL statements without the need to load it to a database."
                              That is something I didn't know! Thanks!


                              This code runs on 316472 rows in a split second!

                              Code:
                              #COMPILE EXE
                              #DIM ALL
                              
                              FUNCTION PBMAIN () AS LONG
                              
                              ON ERROR GOTO Error_Control
                              
                              LOCAL IN_STRING AS STRING
                              LOCAL COUNTER AS LONG
                              
                              OPEN "output.txt" FOR BINARY AS #1
                              
                              COUNTER = LOF(1)
                              
                              GET$ #1, LOF(1), IN_STRING
                              
                              CLOSE #1
                              
                              COUNTER = PARSECOUNT(IN_STRING, $CRLF)
                              
                              MSGBOX FORMAT$(COUNTER)
                              EXIT FUNCTION
                              
                              Error_Control:
                              MSGBOX "Error" + FORMAT$(ERR)
                              
                              END FUNCTION

                              Comment


                              • #16
                                Current example:


                                Code:
                                #COMPILE EXE
                                #DIM ALL
                                
                                FUNCTION PBMAIN () AS LONG
                                
                                ON ERROR GOTO Error_Control
                                
                                LOCAL IN_STRING AS STRING
                                LOCAL COUNTER AS LONG
                                LOCAL SCAN_RESULT AS LONG
                                
                                OPEN "output.txt" FOR BINARY AS #1
                                
                                COUNTER = LOF(1)
                                
                                GET$ #1, LOF(1), IN_STRING
                                
                                IN_STRING = RTRIM$(IN_STRING, $CRLF)
                                
                                CLOSE #1
                                
                                COUNTER = PARSECOUNT(IN_STRING, $CRLF)
                                
                                DIM TEST_ARRAY(1 TO COUNTER)AS STRING
                                
                                PARSE IN_STRING, TEST_ARRAY(), $CRLF
                                
                                MSGBOX TEST_ARRAY(COUNTER)
                                
                                ARRAY SCAN TEST_ARRAY(), COLLATE UCASE, = "C:\ZTree\ZTWIN.BAT", TO SCAN_RESULT
                                MSGBOX FORMAT$(SCAN_RESULT)
                                
                                
                                MSGBOX FORMAT$(COUNTER)
                                EXIT FUNCTION
                                
                                Error_Control:
                                MSGBOX "Error" + FORMAT$(ERR)
                                
                                END FUNCTION

                                Comment


                                • #17
                                  David,

                                  You should be aware that when you parse the string into an array, you need enough available memory to store the data twice - once for the imported string and once for the array. I suspect that you will run into problems with any more than about 800MB of data in the input file.

                                  Comment


                                  • #18
                                    Should check for file not found and 0-length files to avoid errors
                                    FILESCAN in help file does not check for errrors.
                                    Added 2 more error checks.
                                    Code:
                                    #DIM ALL
                                    FUNCTION PBMAIN AS LONG
                                     LOCAL ecode AS LONG
                                     LOCAL sFile AS STRING
                                     LOCAL sArray() AS STRING
                                     ecode = FileScan2(sFile,sArray())
                                    END FUNCTION
                                    
                                    FUNCTION FileScan2(sFileName AS STRING, TheData() AS STRING) AS LONG
                                     LOCAL hFile AS LONG
                                     LOCAL lcount AS LONG
                                     IF ISFILE(sFileName) = 0 THEN         FUNCTION = -53:EXIT FUNCTION
                                     OPEN sFileName FOR INPUT AS #hFile
                                     IF ERR THEN                           FUNCTION = ERR:EXIT FUNCTION      'added this later  1          
                                     IF LOF(hFile) = 0 THEN CLOSE #hFile  :FUNCTION = -99:EXIT FUNCTION
                                     FILESCAN #hFile, RECORDS TO lcount
                                     IF lCount = 0 THEN CLOSE hFile       :FUNCTION =-100:EXIT FUNCTION
                                     DIM TheData(1 TO lcount) AS STRING
                                     LINE INPUT #hFile, TheData() TO lcount
                                     IF ERR THEN                           FUNCTION = ERR   'added this later  2          
                                     CLOSE hFile
                                    END FUNCTION
                                    Mike Doty
                                    Member
                                    Last edited by Mike Doty; 4 Jan 2022, 11:29 PM.

                                    Comment


                                    • #19
                                      Good points guys!
                                      Thanks!

                                      Comment


                                      • #20
                                        Originally posted by Stuart McLachlan View Post
                                        David,

                                        You should be aware that when you parse the string into an array, you need enough available memory to store the data twice - once for the imported string and once for the array. I suspect that you will run into problems with any more than about 800MB of data in the input file.
                                        One function for normal files and another for large files (based upon the LOF of the file.)
                                        Put FILESCAN into another function called Countlines which might release extra temporary memory on end function?

                                        David,
                                        Curious if you can create much larger arrays with this?
                                        Working on optimizing the single-line read would be suggested if this helps. There is somewhere posted code.

                                        Code:
                                        #DIM ALL
                                        FUNCTION PBMAIN AS LONG
                                         LOCAL sFileName AS STRING
                                         LOCAL lcount,linenum,hfile,maxbytes AS LONG
                                        
                                         sFileName = "david.txt"
                                         lcount = CountLines(sFileName)
                                         IF lcount < 1 THEN ? "Error" + STR$(lcount):EXIT FUNCTION
                                        
                                         hFile = FREEFILE
                                         OPEN sFileName FOR INPUT AS #hFile
                                         REDIM TheData(1 TO lcount) AS STRING
                                        
                                         MaxBytes = 1   'set to something like 800,000,000
                                        
                                         'LARGE file read
                                         IF LOF(#hfile) > MaxBytes THEN        'large file function
                                          FOR linenum = 1 TO lcount            'not optimized
                                           LINE INPUT #hFile, TheData(linenum) 'fill element
                                          NEXT
                                          ? "Lines"+STR$(lcount),,"Read a line at a time"
                                        
                                         'NORMAL file read
                                         ELSE  
                                          LINE INPUT #hFile, TheData() TO lcount
                                          ? "Lines"+STR$(lcount),,"Read all at once"
                                         END IF
                                        
                                         CLOSE #hFile
                                        END FUNCTION
                                        '_________________________________________________________________
                                        FUNCTION CountLines(sFile AS STRING) AS LONG
                                         LOCAL hFile,lcount  AS LONG
                                         hFile = FREEFILE
                                         OPEN sFile FOR INPUT AS #hFile
                                         IF ERR THEN FUNCTION = -ERR:EXIT FUNCTION
                                         IF LOF(hFile) = 0 THEN CLOSE #hFile:FUNCTION = -100
                                         FILESCAN #hFile,RECORDS TO lcount
                                         CLOSE #hFile
                                         FUNCTION = lCount
                                        END FUNCTION

                                        Comment

                                        Working...
                                        X