Announcement

Collapse

Forum Guidelines

This forum is for finished source code that is working properly. If you have questions about this or any other source code, please post it in one of the Discussion Forums, not here.
See more
See less

find unique items in large list

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • find unique items in large list

    I needed to find unique values in a large list containing many
    instances of the same values.
    Here is what I came up with.
    This program extracts the unique values with only one pass
    through the large list.
    (I used it to extract unique customer record numbers from a large
    random access file.)
    John Tate

    Code:
    FUNCTION PBMAIN()
    DATA "A","B","C","A","D","B","C","C","A","B","D"
    LOCAL list() AS STRING, Items() AS STRING
    LOCAL I AS LONG, J AS LONG ,K AS LONG
    REDIM list(1  :DATACOUNT):REDIM items(1  :DATACOUNT)
    FOR I = 1 TO DATACOUNT
        List(I)= READ$(i)
        NEXT                 'now have list in array
    items(1)= List(1)       'put first item in items array
                            'to have first item for comparison
    
    K = 1                   'there is only one item to compare to
    
    FOR I = 1 TO DATACOUNT  'compare large list to unique list
        FOR J = 1 TO K
            IF List(i) = items(J) THEN EXIT FOR  'if a match, then List variable already in items list
            NEXT
            IF J > K THEN INCR K:Items(K) = List(i)  'if true, then no match was found-
                                                     'add new unique item to items list
    NEXT                                             'items list grows only when no match is found
    REDIM PRESERVE items(1:K)
    FOR I = 1 TO K
        PRINT Items(i)
        NEXT
    WAITKEY$
    
        END FUNCTION  'end of main

    ------------------




    [This message has been edited by John Tate (edited December 02, 2006).]

  • #2
    Here's a way to do it using ARRAY SCAN...
    (This is a simple test for uniqueness, does not build a list of the unique items)

    Code:
    ' test_IsitemUnique.bas
    ' 12/2/06
    ' Author: Michael Mattias Racine WI
    ' Public Domain
    ' compiler: PB.Windows 8.03 but should work with anything PB/DLL 5+
    
    #COMPILE EXE
    #DIM     ALL
    %FALSE = 0
    %TRUE  = NOT (%FALSE)
    
    
    ' === FUNCTION TO DETERMINE IF ITEM IS UNIQUE IN A LIST =================
    ' NOTE: DOES **NOT** test if item is in the list to begin with.
    ' returns: TRUE, item is unique. FALSE, either not unique or not in list.
    ' -----------------------------------------------------------------------
    FUNCTION IsItemUnique (Value AS STRING, List() AS STRING) AS LONG
    
      LOCAL lb AS LONG, iCount AS LONG, istart AS LONG, iHit AS LONG
    
      lb      = LBOUND(List)  ' only hit aray engine once for this
    
      iStart  = lb
      iCount  =  0
    
      DO
        ARRAY SCAN List(iStart), = Value, TO iHit
        IF iHit THEN
          INCR iCount
          IF iCount > 1 THEN
            FUNCTION = %FALSE          'value is not unique in list() no sense looking any more
            EXIT FUNCTION
          ELSE
             ' set up to look some more.
             iHit   = iHit + lb - 1&  ' Adjust one-based hit to subscript..
             iStart = iHit + 1        ' ... and start next scan at next element
          END IF
        ELSE                      ' no more occurrences of value in list()
          FUNCTION = (icount=1)
          EXIT FUNCTION
        END IF
      LOOP
    
    END FUNCTION
    
    FUNCTION PBMAIN() AS LONG
    
      LOCAL TheList() AS STRING, TheValue AS STRING, nItem AS LONG
      LOCAL i AS LONG
    
      'load List array
       nItem = DATACOUNT
       
       ' ------------------------
       ' for zero-based array:
       ' ------------------------
       REDIM   TheList (nItem-1)
       FOR    i = 0 TO nItem-1
         TheList (i) = READ$(I+1)
       NEXT
       ' -----------------------
       ' for 1 based arrays (Yes, I tested)
       ' -----------------------
       'REDIM   TheList (1 TO nItem)
       'FOR    i = 1 TO nItem
       '  TheList (i) = READ$(I)
       'NEXT
    
       ' ---------------------
       ' do some testing
       ' ---------------------
       DO
         TheValue = INPUTBOX$("Enter value for which to search (null quits) ", "Test for uniqueness demo")
         IF theValue = "" THEN
           EXIT DO
         ELSE
           I = IsItemUnique (TheValue, TheList())
           MSGBOX  USING$( "'&' is " & IIF$(I, "", "not") & " unique in TheList()", TheValue)
         END IF
    
       LOOP
    
    DATA "red", "blue", "green", "purple", "orange", "yellow", "pink", "black", "white"
    ' add some duplicates
    DATA "red", "blue", "green"
    ' add some REALLY duplicated items
    DATA "red", "red", "red"
    
    END FUNCTION

    [This message has been edited by Michael Mattias (edited December 02, 2006).]
    Michael Mattias
    Tal Systems (retired)
    Port Washington WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      Code:
      SUB Filter4Unique(psaData() AS STRING)
          LOCAL llCount1, llCount2, llLBound, llUBound AS LONG
          
          llLBound = LBOUND(psaData())
          llUBound = UBOUND(psaData())
          ARRAY SORT psaData()
          
          llCount2 = llLBound
          FOR llCount1 = llLBound + 1 TO llUBound
              IF psaData(llCount1) <> psaData(llCount1 - 1) THEN
                  INCR llCount2
                  IF llCount2 <> llCount1 THEN
                      psaData(llCount2) = psaData(llCount1)
                  END IF
              END IF
          NEXT llCount1
          
          REDIM PRESERVE psaData(llLBound TO llCount2)
      END SUB
          
      FUNCTION PBMAIN
          LOCAL llCount1 AS LONG
          LOCAL lsaData() AS STRING
          REDIM lsaData(1 TO 100)
          
          RANDOMIZE TIMER
          FOR llCount1 = 1 TO 100
              lsaData(llCount1) = CHR$(RND(65, 90))
          NEXT llCount1
          MSGBOX JOIN$(lsaData(), $TAB), , "Initial Sample Data"
          
          Filter4Unique lsaData()
          MSGBOX JOIN$(lsaData(), $TAB), , "Filtered Sample Data"
          
      END FUNCTION
      Regards,
      Colin Schmidt
      skyhawk76(at)yahoo.com

      Comment


      • #4
        Another way (only tested with zero-based arrays)
        Code:
        ' test_ExtractUnique.bas
        ' 12/2/06
        ' Author: Michael Mattias Racine WI
        ' Public Domain
        ' compiler: PB.Windows 8.03 but should work with anything PB/DLL 5+
        '
        #COMPILE EXE
        #DIM     ALL
        ' === FUNCTION TO EXTRACT UNIQUE ITEMS FROM A LIST=================
        ' RawList() Input
        ' UniqueOnly()  contains array (LBOUND:UBOUND) filled with unique items
        
        ' ***************************************************************
        ' ** NOTE THAT THE INPUT LIST IS SORTED DURING THIS OPERATION ***
        ' ***************************************************************
        
        FUNCTION ExtractUniqueItems (RawList() AS STRING, UniqueOnly() AS STRING) AS LONG
        
            LOCAL    i AS LONG, nUnique AS LONG, lastValue AS STRING
        
            LOCAL    lb AS LONG, ub AS LONG
        
            lb    = LBOUND (RawList)
            ub    = UBOUND (RawList)
        
            REDIM UniqueOnly (ub)  ' allow for worst-case scenario, every entry is unique
        
            ARRAY SORT RawList()
        
            UniqueOnly (lb) = RawList(lb)
            nUnique         = 1
            LastValue       = Rawlist(lb)
        
            FOR i = lb +1  TO Ub
                IF  RawList (I) <> LastValue THEN
                    INCR nUnique
                    Uniqueonly  (lb+nunique -1) = RawList (I)
                    LastValue   = RawList(I)
                END IF
           NEXT
        
           ' resize the unique array to hold only the unique items
           REDIM PRESERVE UniqueOnly (lb+nUnique -1)
        
           ' return: Number of unique items found
        
           FUNCTION =  nUnique
        END FUNCTION
        
        
        FUNCTION PBMAIN() AS LONG
        
          LOCAL TheList() AS STRING, TheValue AS STRING, nUNique AS LONG
          LOCAL i AS LONG, nItem AS LONG
          LOCAL W AS STRING
          LOCAL Unique() AS STRING
        
          'load List array
           nItem = DATACOUNT
        
           ' ------------------------
           ' for zero-based array:
           ' ------------------------
           REDIM   TheList (nItem-1)
           FOR    i = 0 TO nItem-1
             TheList (i) = READ$(I+1)
           NEXT
           
           ' get and show number of unique items in list
           ' create destination array. Will be resized by function
           REDIM Unique(0)
        
           nUnique  =   ExtractUniqueItems (TheList(), Unique())
           IF nUnique THEN
               W = USING$ ("# Unique Items extracted", nUnique) & $CRLF
        
               FOR I = LBOUND(Unique) TO UBOUND(Unique)
                   W = W & Unique (I)
                   IF I <> UBOUND(Unique) THEN
                       W = W & $CRLF
                   END IF
               NEXT
           ELSE
               W = w &  "None"
           END IF
           MSGBOX  W,,"Results"
        
        DATA "red", "blue", "green", "purple", "orange", "yellow", "pink", "black", "white"
        ' add some duplicates
        DATA "red", "blue", "green"
        ' add some REALLY duplicated items
        DATA "red", "red", "red"
        
        END FUNCTION
        MCM

        Michael Mattias
        Tal Systems (retired)
        Port Washington WI USA
        [email protected]
        http://www.talsystems.com

        Comment


        • #5
          Not fluent enough in PB to knock out an example in the time available, but an efficient method would be to build a tree structure using linked list technology (unless it's been patented), each node having the data value and an associated count.

          I mean a binary tree.
          ------------------


          [This message has been edited by Chris Holbrook (edited December 03, 2006).]

          Comment

          Working...
          X