Announcement

Collapse
No announcement yet.

ARRAY SORT bug

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ARRAY SORT bug

    There is a bug with ARRAY SORT when trying to do a case-insensitive sort. For certain pairs of characters, the array will be sorted one way, but sorting manually, using a ">" comparison, would sort it the other way. This could create a serious problem if you use a search routine that assumes the data is sorted in a certain way (probably binary search) and that makes use of "<". Here's an example --



    FUNCTION PBMAIN () AS LONG


    DIM arr() AS STRING
    DIM i AS INTEGER
    DIM msg AS STRING




    'create a 2-element array --
    REDIM arr(1 TO 2)

    arr(1) = CHR$(225)
    arr(2) = CHR$(200)

    GOSUB show_array



    'sort it --
    ARRAY SORT arr(), COLLATE UCASE

    GOSUB show_array



    'the sorted array should have element#1 < element#2, but when you test for it . . .
    IF UCASE$(arr(1)) > UCASE$(arr(2)) THEN
    MSGBOX "problem"
    ELSE
    MSGBOX "no problem"
    END IF


    EXIT FUNCTION

    '----------------------------------------------------------------------------------

    show_array:

    msg = ""

    FOR i = 1 TO 2
    msg = msg & STR$(i) & " " & arr(i) & CHR$(10)
    NEXT

    MSGBOX msg

    RETURN

    END FUNCTION



    ==================================================================
    The problem shows up only when using COLLATE UCASE. If you do a case-sensitive sort, no problem.

    And all the problems occur after ASCII 128, so maybe it's not much of a concern (assuming you're working in English). Still it seems like the fix would be fairly easy for someone who knows the insides of ARRAY SORT. It would be nice if the PB people could get everything coordinated -- One less thing for everyone else to have to consider.

    For a listing of the ~600 combinations where ARRAY SORT works one way and "<" works the other way, run the following code . . .




    FUNCTION PBMAIN () AS LONG


    DIM s() AS STRING
    DIM i AS INTEGER
    DIM j AS INTEGER
    DIM s1 AS STRING
    DIM s2 AS STRING
    DIM errors AS LONG
    DIM msg AS STRING
    DIM EOL AS STRING

    EOL = CHR$(13) & CHR$(10)



    'create an array of the 256 ascii chars --
    REDIM s(0 TO 255)

    FOR i = 0 TO 255
    s(i) = CHR$(i)
    NEXT



    'sort it --
    ARRAY SORT s(), COLLATE UCASE


    'compare each char. of the sorted array with all the other chars, looking for instances where the lower char in the array is found to be ">" the upper char --
    FOR i = 0 TO 255
    s1 = s(i)
    msg = msg & "-------------------------------------" & EOL

    FOR j = i + 1 TO 255
    s2 = s(j)
    IF UCASE$(s1) > UCASE$(s2) THEN
    errors = errors + 1
    msg = msg & STR$(i) & " " & STR$(j) & " " & s1 & " " & s2 & EOL
    END IF
    NEXT
    NEXT



    'now write msg to .txt file for easy viewing

    END FUNCTION

  • #2
    Bill, you have not shown a bug here. The UCASE$ function simply adds 32 to the acii value so that adding 32 to the 225 value wraps around to a value of 1.

    Just replace your statement:
    IF UCASE$(arr(1)) > UCASE$(arr(2)) THEN

    with:
    MSGBOX STR$(ASC(CHR$(ASC(arr(1))+32)))+" " + STR$(ASC(CHR$(ASC(arr(2))+32)))
    IF CHR$(ASC(arr(1))+32) > CHR$(ASC(arr(2))+32) THEN

    and you'll see what I mean.

    PowerBasic is a very well developed compiler and I am extremely hesitant to refer to any anomoly I experience as a 'bug'.

    Comment


    • #3
      Hi Bill,
      This is normal behaviour for <<ARRAY SORT SomeArray$(), COLLATE UCASE>>
      becose it will capitalize only a to z characters,
      not those accentued ones like é or à.

      When sorting international characters, those above CHR$(127),
      you could use <<ARRAY SORT SomeArray$(), COLLATE $String>>.

      See the PowerBASIC help file about it.
      Also this demo might help.

      Comment


      • #4
        Originally posted by Bill Kadenhead View Post
        There is a bug with ARRAY SORT ... It would be nice if the PB people could get everything coordinated...
        Actually, I'm fairly certain the PB people have everything coordinated already! {smile} It would probably be a real good idea to read the PowerBASIC Doc on ARRAY SORT, don't you think? It says...

        "COLLATE cstring is used to specify an entirely new sorting order. This can be used for a variety of purposes, the most obvious of which is the case of international character sets. The collate string cstring must contain exactly 256 characters, one for each of the ASCII codes 0-255, in the order that they would be sorted (from lowest to highest, if an ascending sort were performed on them)."

        Best regards,

        Bob Zale
        PowerBASIC Inc.

        Comment


        • #5
          Charles,
          The UCASE$ function simply adds 32 to the acii value
          so that adding 32 to the 225 value wraps around to a value of 1.
          Not quite, ASC(UCASE$(CHR$(225))) = 193 with PB 8.02 (ANSI UCASE$ default)
          and retain the same value of 225 with prior version that
          did not take care of internationnal character set.

          Ucase$ substract 32 from non-accentued lowercase letter character under 127
          but there is more to it, you may have a look at
          this thread.

          Comment


          • #6
            Pierre, you're right... my logic was totally flawed. Looking again in light of Bob Zale's post, it seems to me that Bill's problem lay with his 'manual sort'. As the PowerBasic documentation for UCASE$ clearly states, the function is only valid for ascii characters from CHR$(0) to CHR$(127).

            The following code begs the question as to what the compiler does with the higher ascii codes.

            FUNCTION PBMAIN
            LOCAL c1, c2 AS STRING
            LOCAL i, j AS LONG

            c1 = CHR$(225)
            c2 = CHR$(220)
            i = (c1 > c2)
            j = (UCASE$(c1) > UCASE$(c2))
            MSGBOX STR$(i)+" "+STR$(j) ' -1 0 is displayed

            END FUNCTION

            If the value 225 is changed to 223 then -1 -1 is displayed

            or if the value 220 is changed to 224 then -1 -1 is displayed

            Its kind of what led me to my flawed solution, first.

            Comment

            Working...
            X