You are not logged in. You can browse in the PowerBASIC Community, but you must click Login (top right) before you can post. If this is your first visit, check out the FAQ or Sign Up.
I just found out that the LCase$ function converts chr$(128) into chr$(135).
Can anyone confirm this? Is this a bug in the compiler or am I missing something?
Yes, PB's LCase, MCase and UCase routines don't handle the extended
character set properly. I have some fast (actually faster)ASM
replacement functions that works okay available in a sample
at my PB page, at http://www.tolken99.com/pb/pbfile_e.htm
Well, Lance, I think that's not really funny. I mean, the dollar or pound sign isn't changed at all.
Will this be corrected in the near future or do we all have to write our own UCase/MCase/LCase functions
to get it working?
Michael
------------------
[This message has been edited by MT Harrer (edited October 22, 2000).]
Without speaking on behalf of R&D, I would say the problem is much more complex then you may think... for example, the results of any conversion are totally dependent on the font that is being used to represent a given character code, and PowerBASIC has no way of knowing how or where such a string is going to be displayed, and with what font, etc... the low order characters are not much of a problem, but the upper ones certainly are a big problem.
However, you do not need to write your own... Borje has made his versions available to you - just follow the link above!
Alternatively, you could try using the Charxxxx() API functions (CharUpper(), CharLower(), etc)
Well, Lance it should not be a problem for you to get it fixed.
I've tried it with Visual Basic 6.0, PB 3.20 for DOS, Visual C++, even with Java.
All of the above mentioned compilers (yes, I know, VB isn't a real compiler handle the Euro correctly.
Moreover, it can't be that complicated to tell your code to not change a chr$(128) into something else,
it simply must not be changed.
Greetings,
Michael
------------------
[This message has been edited by MT Harrer (edited October 23, 2000).]
Hmmmm, I just made an experiment. The result was - at least for me - interesting:
If you enter Alt 1 2 8 and let VB print the ASC of the just created character - displayed as "Ç" - you can read ascii 199 !?!
But if you enter the Euro-symbol by pressing the corresponding key and let VB display the ascii-code, you get 128 as a result.
Confusing...
BTW: Now I think MS stands for multiple sclerosis...
Just a little note regarding the CharUpper and CharLower API's.
While they convert all characters in a correct way, they are
very slow. For single actions now and then, they are alright,
but not for repeated actions in a loop..
BTW, for the numeric keyboard, I think you must type Alt + 0 1 2 8
to get the proper character..
First of all, the Euro character isn't located at ASCII 128 in all languages...
I've seen a solution (don't remember which basic it was) where you can supply a 256 byte string to replace the default XLAT table.
This can be done runtime with an extra function, at program start (or whenever your language changes...), so the rest of your code can be left unchanged. LCASE$(), UCASE$() etc. will use the new table.
I don't think this should be too hard to add into the next version???
> "Ç" and "ç", are they used in french language? Then where is their euro?
Those characters have been a standard part of the ASCII character set from the very beginning, long before the Euro symbol was created. I still use an ASCII table that was part of the TurboBASIC manual (c. 1986) and those two characters are shown.
In addition to problems caused by different fonts being used, different "code pages" can also affect your program. The same character number in the same font can appear different if a different code page is used.
With only 256 characters to work with, frankly Microsoft didn't have much choice. Eventually they created Unicode, which provides 64k different characters. But even then, the same character number in different fonts can look different.
Unicode is a MBCS-style character set, unfortunately. That is, some
codes aren't characters, but flags that mean the next code(s) are
actually part of a different character table. It's an appallingly
inefficient and awkward design, with all the same flaws of the
ASCII/ANSI-based character sets it was designed to replace.
Eh, I don't think Unicode was designed by Microsoft... I
know they don't implement it according to the standards
recommendations, at least as far as the leading endian flag is
concerned.
how can they be slow. The ascii set was designed to that any upper and lower case conversions can be done with a single bit change (bit 3 I think)
I imagine though that other character mapping perculuarities stem for which code page you are using. My system reads DBCS Japanese so the emails above had a phonetic "Nu" for 128 and a dot for 135 because they read the next byte if any byte is over 128.
PLEASE CAN WE HAVE UNICODE SUPPORT IN THE NEXT VERSION!!!!
MANY OF THESE PROBLEMS GO AWAY THEN (I hate DBCS, whoever thought of it should be strung up)
------------------
Paul Dwyer
Network Engineer
Aussie in Tokyo
(Paul282 at VB-World)
The subject of Unicode has been raised before a few times (it's Deja-Vu all over again! )
Adding Unicode support to PowerBASIC would likely add a *significant* overhead to the final EXE/DLL - memory consumption for Unicode strings would be at least double, and (as I see it) R&D would need to add a new/separate data type just for Unicode strings, therefore, many of the existing string functions would need to be effectively duplicated to handle such a datatype. Unless this was done, it would break almost all existing code if the current string types were changed from ASCII/ANSI to Unicode. Unless the Unicode section of the RTL was to be made optional (this is called RTL granularity), the added overhead would punish those that did not want/use Unicode in their applications in terms of the EXE/DLL size.
While I cannot pre-empt what R&D may or may not be planning for the future (Unicode is definitely on the Wish List), Unicode can be handled *right now* with the current version of the compiler, simply by using the various Unicode API's provided by Windows... you simply need to use normal string buffers that are large enough to cope with the multi-byte character set representation.
I would imagine that Unicode support would be added to PB the way it was added to ANSI C standard in the form of wChar.
The book recommeded by PB for windows programming (Windows Programming by Charles Petzold) has a whole section dedicated to how unicode is implemented in C and why it is so critical to windows programming that it is implemented. I can post the chapter if you like
There is no reason that if implemeted properly it would cause any code to need updating.
Still, I guess as you say, they are aware of the issues and whinging here is not likely to help the cause -although it never hurts to try
It'd probably be worth me getting off my *** and putting an INC together myself for a UDT and some string functions.
------------------
Paul Dwyer
Network Engineer
Aussie in Tokyo
(Paul282 at VB-World)
We process personal data about users of our site, through the use of cookies and other technologies, to deliver our services, and to analyze site activity. For additional details, refer to our Privacy Policy.
By clicking "I AGREE" below, you agree to our Privacy Policy and our personal data processing and cookie practices as described therein. You also acknowledge that this forum may be hosted outside your country and you consent to the collection, storage, and processing of your data in the country where this forum is hosted.
Comment