Announcement

Collapse
No announcement yet.

Euro-Symbol (€) changes to Chr$(135) after LCase$

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Euro-Symbol (€) changes to Chr$(135) after LCase$

    I just found out that the LCase$ function converts chr$(128) into chr$(135).
    Can anyone confirm this? Is this a bug in the compiler or am I missing something?

    Thanx in advance,

    Michael



    ------------------

  • #2
    Yes, PB's LCase, MCase and UCase routines don't handle the extended
    character set properly. I have some fast (actually faster)ASM
    replacement functions that works okay available in a sample
    at my PB page, at http://www.tolken99.com/pb/pbfile_e.htm


    ------------------

    Comment


    • #3
      Yes, PB's LCase, MCase and UCase routines don't handle the extended
      character set properly
      That is purely a matter of opinion...

      ------------------
      Lance
      PowerBASIC Support
      mailto:[email protected][email protected]</A>
      Lance
      mailto:[email protected]

      Comment


      • #4
        Well, Lance, I think that's not really funny. I mean, the dollar or pound sign isn't changed at all.
        Will this be corrected in the near future or do we all have to write our own UCase/MCase/LCase functions
        to get it working?

        Michael

        ------------------


        [This message has been edited by MT Harrer (edited October 22, 2000).]

        Comment


        • #5
          Without speaking on behalf of R&D, I would say the problem is much more complex then you may think... for example, the results of any conversion are totally dependent on the font that is being used to represent a given character code, and PowerBASIC has no way of knowing how or where such a string is going to be displayed, and with what font, etc... the low order characters are not much of a problem, but the upper ones certainly are a big problem.

          However, you do not need to write your own... Borje has made his versions available to you - just follow the link above!

          Alternatively, you could try using the Charxxxx() API functions (CharUpper(), CharLower(), etc)



          ------------------
          Lance
          PowerBASIC Support
          mailto:[email protected][email protected]</A>
          Lance
          mailto:[email protected]

          Comment


          • #6
            Well, Lance it should not be a problem for you to get it fixed.
            I've tried it with Visual Basic 6.0, PB 3.20 for DOS, Visual C++, even with Java.
            All of the above mentioned compilers (yes, I know, VB isn't a real compiler handle the Euro correctly.

            Moreover, it can't be that complicated to tell your code to not change a chr$(128) into something else,
            it simply must not be changed.

            Greetings,

            Michael

            ------------------


            [This message has been edited by MT Harrer (edited October 23, 2000).]

            Comment


            • #7
              In many fonts Ascii 128 = Ç (Big C with comma underneath) and Ascii 135 = ç (small c with comma)
              In that case the conversion is correct.

              ------------------
              Peter.
              mailto[email protected][email protected]</A>

              [This message has been edited by Peter Lameijn (edited October 23, 2000).]
              Regards,
              Peter

              "Simplicity is a prerequisite for reliability"

              Comment


              • #8
                Peter, I get your point. But Micro$oft at some point defined ascii 128 as the euro symbol.

                "Ç" and "ç", are they used in french language? Then where is their euro?

                I simply have to find a good solution to this problem.

                ------------------

                Comment


                • #9
                  Hmmmm, I just made an experiment. The result was - at least for me - interesting:
                  If you enter Alt 1 2 8 and let VB print the ASC of the just created character - displayed as "Ç" - you can read ascii 199 !?!
                  But if you enter the Euro-symbol by pressing the corresponding key and let VB display the ascii-code, you get 128 as a result.

                  Confusing...

                  BTW: Now I think MS stands for multiple sclerosis...

                  ------------------

                  Comment


                  • #10
                    Just a little note regarding the CharUpper and CharLower API's.
                    While they convert all characters in a correct way, they are
                    very slow. For single actions now and then, they are alright,
                    but not for repeated actions in a loop..

                    BTW, for the numeric keyboard, I think you must type Alt + 0 1 2 8
                    to get the proper character..


                    ------------------

                    Comment


                    • #11
                      First of all, the Euro character isn't located at ASCII 128 in all languages...

                      I've seen a solution (don't remember which basic it was) where you can supply a 256 byte string to replace the default XLAT table.
                      This can be done runtime with an extra function, at program start (or whenever your language changes...), so the rest of your code can be left unchanged. LCASE$(), UCASE$() etc. will use the new table.

                      I don't think this should be too hard to add into the next version???


                      Peter.


                      ------------------
                      [email protected]

                      Comment


                      • #12
                        > "Ç" and "ç", are they used in french language? Then where is their euro?

                        Those characters have been a standard part of the ASCII character set from the very beginning, long before the Euro symbol was created. I still use an ASCII table that was part of the TurboBASIC manual (c. 1986) and those two characters are shown.

                        In addition to problems caused by different fonts being used, different "code pages" can also affect your program. The same character number in the same font can appear different if a different code page is used.

                        With only 256 characters to work with, frankly Microsoft didn't have much choice. Eventually they created Unicode, which provides 64k different characters. But even then, the same character number in different fonts can look different.

                        -- Eric


                        ------------------
                        Perfect Sync: Perfect Sync Development Tools
                        Email: mailto:[email protected][email protected]</A>

                        "Not my circus, not my monkeys."

                        Comment


                        • #13
                          Unicode is a MBCS-style character set, unfortunately. That is, some
                          codes aren't characters, but flags that mean the next code(s) are
                          actually part of a different character table. It's an appallingly
                          inefficient and awkward design, with all the same flaws of the
                          ASCII/ANSI-based character sets it was designed to replace.

                          Eh, I don't think Unicode was designed by Microsoft... I
                          know they don't implement it according to the standards
                          recommendations, at least as far as the leading endian flag is
                          concerned.

                          ------------------
                          Tom Hanlin
                          PowerBASIC Staff

                          Comment


                          • #14
                            how can they be slow. The ascii set was designed to that any upper and lower case conversions can be done with a single bit change (bit 3 I think)

                            I imagine though that other character mapping perculuarities stem for which code page you are using. My system reads DBCS Japanese so the emails above had a phonetic "Nu" for 128 and a dot for 135 because they read the next byte if any byte is over 128.

                            PLEASE CAN WE HAVE UNICODE SUPPORT IN THE NEXT VERSION!!!!
                            MANY OF THESE PROBLEMS GO AWAY THEN (I hate DBCS, whoever thought of it should be strung up)

                            ------------------

                            Paul Dwyer
                            Network Engineer
                            Aussie in Tokyo
                            (Paul282 at VB-World)

                            Comment


                            • #15
                              The subject of Unicode has been raised before a few times (it's Deja-Vu all over again! )

                              Adding Unicode support to PowerBASIC would likely add a *significant* overhead to the final EXE/DLL - memory consumption for Unicode strings would be at least double, and (as I see it) R&D would need to add a new/separate data type just for Unicode strings, therefore, many of the existing string functions would need to be effectively duplicated to handle such a datatype. Unless this was done, it would break almost all existing code if the current string types were changed from ASCII/ANSI to Unicode. Unless the Unicode section of the RTL was to be made optional (this is called RTL granularity), the added overhead would punish those that did not want/use Unicode in their applications in terms of the EXE/DLL size.

                              While I cannot pre-empt what R&D may or may not be planning for the future (Unicode is definitely on the Wish List), Unicode can be handled *right now* with the current version of the compiler, simply by using the various Unicode API's provided by Windows... you simply need to use normal string buffers that are large enough to cope with the multi-byte character set representation.

                              My $0.02.


                              ------------------
                              Lance
                              PowerBASIC Support
                              mailto:[email protected][email protected]</A>
                              Lance
                              mailto:[email protected]

                              Comment


                              • #16
                                I would imagine that Unicode support would be added to PB the way it was added to ANSI C standard in the form of wChar.

                                The book recommeded by PB for windows programming (Windows Programming by Charles Petzold) has a whole section dedicated to how unicode is implemented in C and why it is so critical to windows programming that it is implemented. I can post the chapter if you like
                                There is no reason that if implemeted properly it would cause any code to need updating.

                                Still, I guess as you say, they are aware of the issues and whinging here is not likely to help the cause -although it never hurts to try

                                It'd probably be worth me getting off my *** and putting an INC together myself for a UDT and some string functions.



                                ------------------

                                Paul Dwyer
                                Network Engineer
                                Aussie in Tokyo
                                (Paul282 at VB-World)

                                Comment

                                Working...
                                X