Announcement

Collapse
No announcement yet.

ARRAY DELETE on selected rows

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mike Doty
    replied
    Code:
    'John Gleason code put into a SUB for reuse and timing tests
    '766 ticks on this machine
     
    #COMPILE EXE  'ParseCSVDump.Bas
    #DIM ALL
    %TimeIt = 1
    DECLARE FUNCTION GetTickCount LIB "KERNEL32.DLL" ALIAS "GetTickCount" () AS DWORD
    FUNCTION PBMAIN () AS LONG
        LOCAL ii AS LONG
        LOCAL lArraySize AS LONG
        LOCAL sMatchString AS STRING
     
        lArraySize = 500000
        sMatchString = "68"
        DIM arrCsv(lARRAYSIZE) AS STRING
        FOR ii = 0 TO lARRAYSIZE     'creates data array where ~1/2 the records have 12th field = 68
           arrCsv(ii) = USING$("#",ii) & ",10001,10002,10000,10001,10002,10000,10001,10002,10000,10001," & _
           USING$("#", RND(68, 69)) & ",10003,10004,10005"
        NEXT
     
        ParseDump arrCSV(),12,sMATCHSTRING
     
        ? "Elements after parse" + STR$(UBOUND(arrCSV))'249842 elements in 765 ticks
        SLEEP 3000  'for PBCC users
        'then do calculations
        'finally do the sort and save.
     END FUNCTION
     
    SUB ParseDump(arrCSV() AS STRING, FieldNumber AS LONG, sMatchString AS STRING)
       'Code by John Gleason placed into a SUB with optional timing test
     
       LOCAL ii              AS LONG
       LOCAL WriteElement    AS LONG
     
       #IF %TimeIt
         'LOCAL CycleCount      AS QUAD  'TIX
         LOCAL StartTickCount  AS DWORD
         LOCAL EndTickCount    AS DWORD
         StartTickCount = GetTickCount
       #ENDIF
     
        'now quickly dump any records that have 12th field <> sMatchString and compress the empty space...
        FOR ii = 0 TO UBOUND (arrCSV)
           IF PARSE$(arrCsv(ii), ",", FieldNumber) = sMATCHSTRING THEN
              arrCsv(writeElement) = arrCsv(ii)
              INCR writeElement
           END IF
        NEXT
        REDIM PRESERVE arrCsv(writeElement - 1)
     
        #IF %TimeIt
          EndTickCount = GetTickCount
          'TIX END CycleCount&&  'TIX method
          '249842 elements in 766 ticks
           ? "Elements" + STR$(UBOUND(arrCSV)) +  "  ticks" + STR$(EndTickCount-StartTickCount)
        #ENDIF
     
    END SUB

    Leave a comment:


  • John Gleason
    replied
    Here's a more complete example:
    Code:
    #COMPILE EXE
    #DIM ALL
    %ARRAYSIZE = 500000
    $MATCHSTRING = "68"
    
    FUNCTION PBMAIN () AS LONG
        LOCAL ii, writeElement AS LONG
        DIM arrCsv(%ARRAYSIZE) AS STRING
    
        FOR ii = 0 TO %ARRAYSIZE     'creates data array where ~1/2 the records have 12th field = 68
           arrCsv(ii) = USING$("#",ii) & ",10001,10002,10000,10001,10002,10000,10001,10002,10000,10001," & _
           USING$("#", RND(68, 69)) & ",10003,10004,10005"
        NEXT
        ? "Created test data, ready to go..."
    
        'now quickly dump any records that have 12th field <> 68 and compress the empty space...
        FOR ii = 0 TO %ARRAYSIZE
           IF PARSE$(arrCsv(ii), ",", 12) = $MATCHSTRING THEN
              arrCsv(writeElement) = arrCsv(ii)
              INCR writeElement
           END IF
        NEXT
    
        REDIM PRESERVE arrCsv(writeElement - 1)
        ? "Done." & STR$(UBOUND(arrCsv())) & " final array elements"
        
        'then do calculations
        'finally do the sort and save.
    
    END FUNCTION

    Leave a comment:


  • Erich Schulman
    replied
    This project is unrelated to the one from last year, even though they do a few similar things.

    I read in the CSV file into the array and discard the lines I don't need. I use the shortened array as though it were the original. Then, with a parse-replace routine, I perform calculations and other transformations on each line as needed. The sort comes last, before the new CSV is written.

    Leave a comment:


  • Michael Mattias
    replied
    > so I can reuse your answer at .....

    That answer was based on your question as stated: READING a CSV-file in an order dependent on the values of multiple columns.

    You don't talk about DELETING at all.

    See my 'updated' post at #7 here... that's the kind of stuff which happens when you try to use previously-developed code for another purpose.

    Here of course, to delete, I would just use the sort key: by adding the 'target' columns to the key, you could get 'em all together and not have to traverse the array twice.

    That, or maybe just not include in the sort key any rows which would be deleted based on selected columns' values. Why bother sorting 'em if they will be gone anyway?

    MCM

    Leave a comment:


  • John Gleason
    replied
    >you made the job more complicated than it needed to be...

    Complicated can be in the eyes of the beholder, but depending on whether you're removing a few or a lot of records, the code I posted may perform a couple orders of magnitude faster.

    Leave a comment:


  • Erich Schulman
    replied
    Originally posted by Michael Mattias View Post
    Personally I doubt I would have even considered working with an array of CSV-strings if I needed to delete rows, except perhaps to use a TWO-dimensional array subscripted on row and column.
    I am using the single dimension array so I can reuse your answer at http://www.powerbasic.com/support/pb...ad.php?t=37839

    Sorting is the last thing done before writing to disk. The filtering of rows (the 68 thing) is done early on.

    Leave a comment:


  • Michael Mattias
    replied
    >you asked to delete the element if the column <> 68....

    <Stuff> happens... like not recognizing you made the job more complicated than it needed to be...

    Code:
    ub    = UBOUND (myArray,1) 
    lb     = LBOUND(myarray,1)
    iSub = lb
    DO  WHILE iSub <=Ub 
       IF PARSE$(MyArray (iSub), column_no) <> 68 THEN 
             ARRAY DELETE MyArray (iSub) 
             DECR ub 
      ELSE 
             INCR iSUB
      END IF
    LOOP 
    IF Lb = 0 THEN 
         REDIM PRESERVE myArray (ub)
    ELSE
         REDIM PRESERVE myArray (lb TO UB) 
    END IF
    MCM

    Leave a comment:


  • John Gleason
    replied
    Here is a way:
    Code:
    #COMPILE EXE
    #DIM ALL
    
    FUNCTION PBMAIN () AS LONG
        LOCAL ii, writeElement AS LONG
        DIM arrCsv(10000) AS STRING
        
        FOR ii = 0 TO 10000
           arrCsv(ii) = "10000,10001,10002," & USING$("#", ii AND 127) & ",10003,10004,10005"
        NEXT
    
        'now dump any records that have 4th field [COLOR="Red"]<>[/COLOR] 68 and compress the empty space...
    
        FOR ii = 0 TO 10000
           IF INSTR(18, arrCsv(ii), "68") [COLOR="Red"]<>[/COLOR] 0 THEN
              arrCsv(writeElement) = arrCsv(ii)
              INCR writeElement
           END IF
        NEXT
    
        
        REDIM PRESERVE arrCsv(writeElement - 1)
        ? "done"
    
    END FUNCTION
    added: I just noticed, you asked to delete the element if the column <> 68, but I deleted if it = 68. So I changed it above in red.
    Last edited by John Gleason; 27 Feb 2009, 01:08 PM.

    Leave a comment:


  • Michael Mattias
    replied
    >But that just seems like a terrible way to accomplish the task

    Well, maybe you can think of another way.

    Personally I doubt I would have even considered working with an array of CSV-strings if I needed to delete rows, except perhaps to use a TWO-dimensional array subscripted on row and column.

    know that I could DIM a temporary array of equal size, copy the rows I want to keep, REDIM the original array without PRESERVE, copy back from the temp array, and ERASE the temp array.
    You don't need a temp array at all.
    Code:
    ub    = UBOUND (myArray,1) 
    iSub = LBOUND (Myarray,1) 
    nDel = 0 
    DO 
       IF PARSE$(MyArray (iSub), column_no) <> 68 THEN 
             ARRAY DELETE MyArray (iSub) 
             INCR nDel 
      ELSE 
             INCR iSUB
      END IF
    LOOP UNTIL iSub > (ub - ndel)    ' Might need to add or subtract one here 
    REDIM PRESERVE myArray (ub - nDel)
    MCM
    Last edited by Michael Mattias; 27 Feb 2009, 12:20 PM. Reason: fixed error on last code line the REDIM PRESERVE

    Leave a comment:


  • Erich Schulman
    replied
    Originally posted by Michael Mattias View Post
    Or, did you mean each array element is a CSV-string representing a row of a table?
    Yes, that is what I am working with. I don't have to do this using SQL. I thought posting the SQL line would explain my intentions more succinctly than in English.

    I know that I could DIM a temporary array of equal size, copy the rows I want to keep, REDIM the original array without PRESERVE, copy back from the temp array, and ERASE the temp array. But that just seems like a terrible way to accomplish the task.

    Leave a comment:


  • Michael Mattias
    replied
    Or, did you mean each array element is a CSV-string representing a row of a table?

    In that case....

    1. Write Array to disk, (JOIN$(Myarray(), $CRLF).
    2a. Set up an ODBC datasource; use the Microsoft text driver and write all the SQL you want using your favorite techniques (ODBC API or ADO).
    2b. If you can't be bothered to set up an ODBC datasource, simply look here....
    http://www.codemaker.co.uk/it/tips/a...CDriverForText
    .. where you will find the "DSN-less" connection string for use with ADO.... and use all the SQL statements you want

    3. When done manipulating the data with SQL, reload remaining rows into your PB Array.

    MCM

    PS:

    4. Post demo of same in source code forum
    Last edited by Michael Mattias; 27 Feb 2009, 11:25 AM.

    Leave a comment:


  • Michael Mattias
    replied
    Code:
    nElement =  UBOUND (MyArray,1) + 1    ' assume zero-based array
    DO 
        ARRAY SCAN MyArray(0) FOR nElement, <> 68,  to iHit 
         IF ISTRUE iHit 
             DECR   iHit        ' correct to allow for 0-based array 
             ARRAY DELETE MyArray (iHit) 
             DECR nElement 
         ELSE
           EXIT DO   ' no more hits, all done
        END IF 
    LOOP
    REDIM PRESERVE myArray (nElement -1)
    Wrap and gift-wrap as desired.

    MCM

    Leave a comment:


  • Erich Schulman
    started a topic ARRAY DELETE on selected rows

    ARRAY DELETE on selected rows

    Consider an array of CSV rows obtained through FILESCAN and LINE INPUT#. SQL is the simplest way to show my intentions:
    Code:
    DELETE FROM MyArray WHERE CSVcolumn12 <> 68
    /*
    Being able to run SQL against an array.... Might be a good library idea?
    */

    The deleted rows will never be replaced by anything, so I want to be able to REDIM PRESERVE to the smaller size. This is not only to save memory but also because the UBOUND is used elsewhere.
Working...
X