Announcement

Collapse
No announcement yet.

First: ARRAY SORT text strings by length...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Michael Mattias
    replied
    >TAGARRAY was virtually instantaneous.

    I have found that as a rule, what comes out of the PB box is usually the fastest and most reliable way to do anything.

    Leave a comment:


  • John Montenigro
    replied
    Thanks for clarifying that - but I must admit, that was a very precise description and was not at all ambiguous!!

    Leave a comment:


  • Gösta H. Lovgren-2
    replied
    Originally posted by John Montenigro View Post
    Anyway, re-visiting your code, I agree that it's straightforward. (No need for that gesture, thank you!)
    I apologize John. It wasn't meant for you. I just saw a chance to return a little dig towards HeWhoMustNotBeNamed. (The opportunity arises so seldom, dontcha know.)

    ====================================
    "Efforts and courage are not enough
    without
    purpose and direction."
    John F. Kennedy
    ====================================
    Last edited by Gösta H. Lovgren-2; 31 Jan 2009, 08:53 PM. Reason: Proope Name

    Leave a comment:


  • John Montenigro
    replied
    Originally posted by Gösta H. Lovgren-2 View Post
    The original data isn't altered at all.
    Yep, you're right. My apologies. I had watches on both arrays while I was stepping through the code, and I thought it was the original data being modified.

    As for "limited experience", those were someone else's words, and if they apply to anyone, I know it's not you!

    Anyway, re-visiting your code, I agree that it's straightforward. (No need for that gesture, thank you!)


    Here are the results of testing Mike Doty's "loop w/TAGARRAY", Vidar Hanto's "COLLATE with spaces", your "prefix with length", and my "custom sort", all using the same ~150,000 word list. Numbers are TIX (CPU cycles).

    Code:
    	MD loops wTAGARRAY    VH collate wWeight    GL prefix Length    jhm custom sort
    prep1                   -                     -                   -                  -
    prep2           16,353,599                    -          376,464,044                 - 
    sort           193,547,453           669,189,779         356,911,332     43,800,129,044 
    display                 -                     -                   -                  -
    				
    TOTAL:         209,901,052           669,189,779         733,375,376     43,800,129,044
    The "custom sort" method that I used took about 23 seconds, whereas (despite the prep loop of obtaining and storing lengths) TAGARRAY was virtually instantaneous.

    I love the flexibility and power of the "custom sort" capability of ARRAY SORT, and I'm probably going to experiment with it more in the future. I hope that my use of it helps others to remember that ARRAY SORT has a powerful "custom sort" capability, and that if I can use it, anyone can!


    I appreciate everyone's comments, insights, and contributions, and I appreciate being able to study the variety of ideas that can be applied. I've learned a lot!

    Thanks!!!

    Leave a comment:


  • Gösta H. Lovgren-2
    replied
    Originally posted by John Montenigro View Post
    This is an interesting approach, but I'll be frank with you: I don't think I could ever actually use it. I can not even think of modifying my input data, not even if it made the sorting process 10 times faster. I feel strongly about maintaining the integrity of the data, and only performing modifications that support the user's needs, not mine as programmer. That may sound altruistic, but it helps me sleep at night...
    I must be missing something. I don't see any data corruption. The original data isn't altered at all. It's the same method (except I used a Typed Array) I sort a 110,000+ word file according to length. Simple coding, easy to understand, very fast.
    THIS I can use!!
    Oh well, I'm good for something anyway, given my limited experience.{Make fist, place thumb on inside middle section of forefinger and make jabbing motions rotating same while wincing.}

    =================================
    "Silence is argument
    carried out by other means."
    Ernesto"Che"Guevara (1928-1967)
    =================================
    Last edited by Gösta H. Lovgren-2; 30 Jan 2009, 09:03 PM. Reason: A nice gesture.

    Leave a comment:


  • John Montenigro
    replied
    Originally posted by Gösta H. Lovgren-2 View Post
    Code:
    'now put string legth in front for sorting
    '
    This is an interesting approach, but I'll be frank with you: I don't think I could ever actually use it. I can not even think of modifying my input data, not even if it made the sorting process 10 times faster. I feel strongly about maintaining the integrity of the data, and only performing modifications that support the user's needs, not mine as programmer. That may sound altruistic, but it helps me sleep at night...



    Code:
    End Function '[B][I]Applikation kerschplunckened[/I][/B]
    '
    THIS I can use!!

    Leave a comment:


  • Gösta H. Lovgren-2
    replied
    Here' another (quick and easy. probably too simple for advanced minds though) method to sort by word length.

    '
    Code:
    'PBWIN 9.00 - WinApi 05/2008 - XP Pro SP3
    #Compile Exe                                
    #Dim All 
    #Include "WIN32API.INC"
    #Include "COMDLG32.INC"
    
    Function PBMain         
      ErrClear   
      Local s, aWordList(), awl_by_Length() As String
      Local ctr As Long
      
      ReDim aWordList(1 To 7), awl_by_Length(1 To 7)
      aWordList(1) = "Long"
      aWordList(2) = "Shorter"
      aWordList(3) = "Terribly"
      aWordList(4) = "Longest"
      aWordList(5) = "Short"
      aWordList(6) = "Go"
      aWordList(7) = "Snort"
      s$ = "##"
      'now put string legth in front for sorting
      For ctr = LBound(aWordList()) To UBound(aWordList())
         awl_by_Length(ctr) = Using$(s$, Len(aWordList(ctr))) & aWordList(ctr)
      Next ctr   
    '  
      Array Sort awl_by_Length()
      
      Reset s$ 'Show results
      For ctr = LBound(aWordList()) To UBound(aWordList())
                  'skip over length 
         s$ = s$ & Mid$(awl_by_Length(ctr), 3) & $CrLf 
      Next ctr   
      
      ? s$,,"testing"
    End Function 'Applikation kerschplunckened
    '
    ======================================
    I don't want any yes-men around me.
    I want everybody to tell me the truth
    even if it costs them their jobs.
    Samuel Goldwyn
    ======================================

    Leave a comment:


  • John Montenigro
    replied
    Originally posted by John Gleason View Post
    Indeed the COLLATE idea is a thing of beauty, however it does run significantly slower than TAGARRAY on larger arrays...
    OK, I'm going to have to put another yellow sticky on my monitor that reminds me of just how fast the TAGARRAY process is.

    And then, I'm going to have to remember to act upon that information!

    Thanks for the reminder!
    -jhm

    Leave a comment:


  • John Montenigro
    replied
    Originally posted by Vidar Hanto View Post
    Code:
      i = 0
      ' The same COLLATE string trick may be used for ARRAY SCAN.
      ' 'i' will hold the first entry holding a 5 char word
      ARRAY SCAN aWordList(), COLLATE sCharWeight, =SPACE$(5), TO i
    
      IF i > 0 THEN  ' i < 1 --> No 5-char word found
        j = 0
        ' 'j'-1 will hold the last entry holding a 5 char word
        ' You must start at i - 1 since i may be the only 5-char word,
        ' and there is BASE 0 unless you set it otherwise
    [B][I]    ARRAY SCAN aWordList(), FROM i - 1 TO UBOUND(aWordList), COLLATE sCharWeight, >SPACE$(5), TO j[/I][/B]  END IF
    Vidar,
    Thanks, that's more in line with what I had originally hoped to do, but couldn't see how to sort on length - setting the COLLATE string to spaces for the comparison is excellent.

    One problem in your ARRAY SCAN syntax (bolded above), however. The FROM/TO clause scans within the array string, whereas we want to change the range of elements that are scanned:

    Code:
          ARRAY SCAN aWordList(i - 1), COLLATE sCharWeight, >SPACE$(5), TO j
    Your code only ran correctly because there were only two 5-letter strings that differed in the right place in their spelling. Change the word to something completely different, or add more words, and it won't find the first word larger than 5 chars...

    Took me awhile to figure it out, but with the change above, works as you intended.

    Thanks!!
    -jhm
    Last edited by John Montenigro; 31 Jan 2009, 12:34 PM. Reason: deleted a note that I had added; things work fine

    Leave a comment:


  • Michael Mattias
    replied
    > .. thing of beauty, however ...

    I swear, some people can find a cloud surrounding ANY silver lining...

    Leave a comment:


  • John Gleason
    replied
    Indeed the COLLATE idea is a thing of beauty, however it does run significantly slower than TAGARRAY on larger arrays:
    Code:
    #COMPILE EXE
    #DIM ALL
    
    FUNCTION PBMAIN () AS LONG
        DIM str(150000) AS STRING, strLen(150000) AS LONG
        LOCAL ii AS LONG, t1, t2 AS QUAD, sCharWeight AS STRING
    
        FOR ii = 0 TO 150000
           str(ii) = STRING$(RND(0, 800), RND(32, 126))
           strLen(ii) = LEN(str(ii))          
        NEXT
        ? "ok, 60MB unsorted of strings are loaded, let's sort ascending using COLLATE..."
    
        sCharWeight = SPACE$(256)
    
      TIX t1
        ARRAY SORT str(), COLLATE sCharWeight', DESCEND
      TIX END t1
      ? "That took" & STR$(t1) & " ticks."
      ? "Now sort the strLen array and TAGARRAY the strings..."
    
        RESET str(), strLen()
    
        FOR ii = 0 TO 150000
           str(ii) = STRING$(RND(0, 800), RND(32, 126))
           strLen(ii) = LEN(str(ii))          'peek(long, strptr(str(ii)) - 4)
        NEXT
        ? "ok, 60MB of NEW unsorted strings are loaded, let's sort ascending again..."
    
      TIX t2
        ARRAY SORT strLen(), TAGARRAY str()
      TIX END t2
      ? "That took " & STR$(t2) & " ticks. So TAGARRAY was" & STR$((t1 / t2), 4) & " times faster than COLLATE."
    '  WAITKEY$
    
    END FUNCTION

    Leave a comment:


  • John Montenigro
    replied
    Originally posted by Michael Mattias View Post
    Code:
    ' Let sCharWeight make any character look like a space to ARRAY SORT.
      ' In this way, only the length is considered
    Now THAT is clever!

    MCM
    I completely agree!
    -jhm

    Leave a comment:


  • Michael Mattias
    replied
    Code:
    ' Let sCharWeight make any character look like a space to ARRAY SORT.
      ' In this way, only the length is considered
    Now THAT is clever!

    MCM

    Leave a comment:


  • Vidar Hanto
    replied
    How about this? Everything is done using ARRAY SORT and ARRAY SCAN:
    Code:
    ' Code shown for CC5 but should work for any CC version
    ' Sorting words according to length
    #COMPILE EXE
    #DIM ALL
    
    FUNCTION PBMAIN () AS LONG
    
      DIM aWordList() AS STRING
      LOCAL sCharWeight AS STRING
      LOCAL i, j AS LONG
    
      REDIM aWordList(1 TO 7)
      aWordList(1) = "Long"
      aWordList(2) = "Shorter"
      aWordList(3) = "Terribly"
      aWordList(4) = "Longest"
      aWordList(5) = "Short"
      aWordList(6) = "Go"
      aWordList(7) = "Snort"
    
      ' Let sCharWeight make any character look like a space to ARRAY SORT.
      ' In this way, only the length is considered
      sCharWeight = SPACE$(256)
    
      ARRAY SORT aWordList(), COLLATE sCharWeight, DESCEND
      FOR i = 1 TO UBOUND(aWordList)
        PRINT aWordList(i)
      NEXT i
    
      PRINT
    
      ARRAY SORT aWordList(), COLLATE sCharWeight, ASCEND
      FOR i = 1 TO UBOUND(aWordList)
        PRINT aWordList(i)
      NEXT i
    
      i = 0
      ' The same COLLATE string trick may be used for ARRAY SCAN.
      ' 'i' will hold the first entry holding a 5 char word
      ARRAY SCAN aWordList(), COLLATE sCharWeight, =SPACE$(5), TO i
    
      IF i > 0 THEN  ' i < 1 --> No 5-char word found
        j = 0
        ' 'j'-1 will hold the last entry holding a 5 char word
        ' You must start at i - 1 since i may be the only 5-char word,
        ' and there is BASE 0 unless you set it otherwise
        ARRAY SCAN aWordList(), FROM i - 1 TO UBOUND(aWordList), COLLATE sCharWeight, >SPACE$(5), TO j
      END IF
    
      PRINT
      IF j > 0 THEN
        PRINT "First 5 char entry is"; i
        PRINT "Last 5 char entry is"; j - 1
       ELSE
        PRINT "The only 5 char entry is"; i
      END IF
      WAITKEY$
    
    END FUNCTION
    If speed is important you should avoid repeated extressions as SPACE$() and UBOUND(). Set it once and have it accessed as 'constants' from then on as I do with sCharWeight instead of calculating/looking it up every time you need it.

    ViH

    Leave a comment:


  • John Montenigro
    replied
    Ah, got it. I thought he was editing, and I didn't realize that he was adding on.
    Thanks,
    -jhm

    Leave a comment:


  • Mike Doty
    replied
    You also requested getting the last word of the length so he got the next highest allowing you to subtract 1.

    Leave a comment:


  • John Montenigro
    replied
    I don't think I mentioned it outright, but it was embedded in comments in the original code: the work of this program is to isolate and work on only the 5-letter words in a list that contains words with lengths from 2 to 28...

    What is gained with the change from "=" to ">" ????

    Leave a comment:


  • Michael Mattias
    replied
    Code:
    ARRAY SCAN WordLength(), = x, TO i
        IF i THEN ? "First word with length of";x; "is in element"; i; words(i)
    ===>

    Code:
    ARRAY SCAN WordLength(), > x, TO i
        IF i THEN ? "First word with length greater than ";x; "is in element"; i-1; words(i-1)

    Leave a comment:


  • John Montenigro
    replied
    Originally posted by Mike Doty View Post
    Code:
      FOR i = 1 TO NumWords: WordLength(i) = LEN(Words(i)): NEXT
      ARRAY SORT WordLength(), TAGARRAY Words()
    ...
    
      FOR x = WordLength(1) TO WordLength(NumWords)
        ARRAY SCAN WordLength(), = x, TO i
        IF i THEN ? "First word with length of";x; "is in element"; i; words(i)
      NEXT
    Wow!!! To my surprise, running the two For/Next loops with the tagarray is a MUCH faster approach than my original approach of "ARRAY SORT, USING". I had anticipated that the loops would be slower. (It always pays to test one's hypothesis!)

    When used against the full 150,000 word list, my original approach (which did not yet include the "SCAN for size") took over 35 seconds, but Mike Doty's approach (which does include the "SCAN for size") takes under 2 seconds!

    Very cool! Thanks Mike and Mike!
    -John


    ALSO: With reference to another recent thread I've had help with (http://www.powerbasic.com/support/pb...ad.php?t=39612), I've added these macros...

    Code:
    MACRO TB =   TIX Cycles : StartTime = TIMER
    
    MACRO TE(prm1)
       TIX END Cycles : EndTime = TIMER
       ? : ? "Elapsed number of CPU cycles used " prm1 ": " ; Cycles ; "("; STR$(EndTime-StartTime) ;" rough seconds)"  
       ? "Press a key to continue..."
       WAITKEY$
    END MACRO
    and use them this way (only one invocation shown; others are coded as needed)

    Code:
       TB
       NumWords = PARSECOUNT(g_All_Words, $SPC)
       REDIM Words (1 TO NumWords) AS STRING
       REDIM WordLength (1 TO NumWords) AS LONG  'the Tag Array
       PARSE g_All_Words, Words(), $SPC
       TE("for Parsecount, Dim, and Parse")
    Now that may not be special to you, but I'm very proud of my success in not only using a multi-line macro, but also parameters!!!
    Last edited by John Montenigro; 27 Jan 2009, 09:36 AM. Reason: added link to other thread

    Leave a comment:


  • Mike Doty
    replied
    Fill array with lengths, sort by length with TAGARRAY words

    Code:
    #COMPILE EXE
    #DIM ALL
    FUNCTION PBMAIN&()
      LOCAL NumWords&,i&,s$, x&
      s = TRIM$(" BUT MOUSE HAD FOOD LOVE AND WAS BANANA FOOT WHILE FRUFRU MONEY STAPLE BAGEL CEREAL HAND ")
      NumWords = PARSECOUNT(s, $SPC)
      REDIM Words (1 TO NumWords) AS STRING
      REDIM WordLength (1 TO NumWords) AS LONG
      PARSE s, Words(), $SPC
      FOR i = 1 TO NumWords: WordLength(i) = LEN(Words(i)): NEXT
      ARRAY SORT WordLength(), TAGARRAY Words()
      ? "Element","Word","Length"
      FOR i = 1 TO NumWords: ? FORMAT$(i),Words(i), FORMAT$(WordLength(i)):NEXT
    '[quote]'OK, my NEXT step is to figure out how to use ARRAY SCAN to find the first and last of the 5-letter words.'[/quote]
      FOR x = WordLength(1) TO WordLength(NumWords)
        ARRAY SCAN WordLength(), = x, TO i
        IF i THEN ? "First word with length of";x; "is in element"; i; words(i)
      NEXT
     
      WAITKEY$
        '[quote]  Well, I can dream, can't I? [/quote]
      'LOL
    END FUNCTION
    Last edited by Mike Doty; 26 Jan 2009, 09:47 PM.

    Leave a comment:

Working...
X