Is there an API to test the storagetype of the (text)file?
Announcement
Collapse
No announcement yet.
Determining file saved as unicode?
Collapse
X
-
Something like this i guess..
Code:[color=#0000FF]Local[/color] T [color=#0000FF]As[/color] [color=#0000FF]String[/color] T = [color=#0000FF]VD_LoadFromFile[/color]( "C:\mydoc.txt" ) [color=#0000FF]MsgBox[/color] [color=#0000FF]Format$[/color]( FileIsUnicode( T ) ) & [color=#0000FF]$CrLf[/color] & [color=#0000FF]Left$[/color]( T, 10 ) [color=#0000FF]Function[/color] FileIsUnicode( [color=#0000FF]ByVal[/color] sLeftData [color=#0000FF]As[/color] [color=#0000FF]String[/color] * 6 ) [color=#0000FF]As[/color] [color=#0000FF]Long[/color] [color=#0000FF]Local[/color] pByte [color=#0000FF]As[/color] [color=#0000FF]Byte[/color] [color=#0000FF]Ptr[/color] sLeftData = [color=#0000FF]Left$[/color]( sLeftData, 3 ) & String$( 3, 0 ) pByte = [color=#0000FF]VarPtr[/color]( sLeftData ) [color=#0000FF]If[/color] ( [color=#7F007F]@pByte[0][/color] = &HFF [color=#0000FF]And[/color] [color=#7F007F]@pByte[1][/color] = &HFE ) _ [color=#0000FF]Or[/color] ( [color=#7F007F]@pByte[0][/color] = &HFE [color=#0000FF]And[/color] [color=#7F007F]@pByte[1][/color] = &HFF ) [color=#0000FF]Then[/color] [color=#0000FF]Function[/color] = 1 [color=#0000FF]End[/color] [color=#0000FF]If[/color] [color=#0000FF]End[/color] [color=#0000FF]Function[/color]
Comment
-
Get yourself a hex editor eheh- very useful for these sorta situations. I use Hex Workshop
To test those two bytes its best to use a WORD PTR rather than BYTE PTR ... I'd go with something like ...
Code:FUNCTION IsFileUnicode(BYVAL hFile AS DWORD) AS DWORD LOCAL sBuf AS STRING * 2, wPtr AS WORD PTR SEEK #hFile, 1: GET$ #hFile, 2, sBuf wPtr = VARPTR(sBuf) IF @wPtr = &h0000FFFE OR @wPtr = &h0000FEFF THEN FUNCTION = 1 END FUNCTION FUNCTION PBMAIN() AS LONG LOCAL hFile AS DWORD hFile = FREEFILE OPEN "c:\unicode.txt" FOR BINARY ACCESS READ LOCK SHARED AS #hFile IF IsFileUnicode(BYVAL hFile) = 1 THEN MSGBOX "File is unicode" ELSE MSGBOX "Not unicode" END IF CLOSE #hFile END FUNCTION
Last edited by Wayne Diamond; 30 Jul 2009, 11:35 AM.-
Comment
-
The WinAPI function IsTextUnicode() sounds handy for your task.
If you are on Win9x, here is what you MUST use to use this API: PB/CC: IsTextUnicode with Microsoft Unicode Layer for Win95/98/ME April 11, 2002
Should work on anything after 9x with only minor tweaks.
MCMMichael Mattias
Tal Systems (retired)
Port Washington WI USA
[email protected]
http://www.talsystems.com
Comment
-
IsTextUnicode is probably the better way to go if you need specific info about the file, there's a fair amount to it at the assembly level though (mainly because it can tell you exactly what type of file it is - see the IsTextUnicode documentation) ... here is just a small fraction from the start of ntdll's RtlIsTextUnicode
Code:77F612E5 > 55 push ebp 77F612E6 8BEC mov ebp, esp 77F612E8 83EC 5C sub esp, 5C 77F612EB 8B4D 0C mov ecx, dword ptr ss:[ebp+C] 77F612EE 53 push ebx 77F612EF 33DB xor ebx, ebx 77F612F1 56 push esi 77F612F2 8BF1 mov esi, ecx 77F612F4 D1EE shr esi, 1 77F612F6 B8 00010000 mov eax, 100 77F612FB 3BF0 cmp esi, eax 77F612FD 895D CC mov dword ptr ss:[ebp-34], ebx 77F61300 895D D0 mov dword ptr ss:[ebp-30], ebx 77F61303 895D D4 mov dword ptr ss:[ebp-2C], ebx 77F61306 895D D8 mov dword ptr ss:[ebp-28], ebx 77F61309 895D DC mov dword ptr ss:[ebp-24], ebx 77F6130C 895D AC mov dword ptr ss:[ebp-54], ebx 77F6130F 895D B0 mov dword ptr ss:[ebp-50], ebx 77F61312 895D BC mov dword ptr ss:[ebp-44], ebx 77F61315 895D C0 mov dword ptr ss:[ebp-40], ebx 77F61318 895D C4 mov dword ptr ss:[ebp-3C], ebx 77F6131B 895D C8 mov dword ptr ss:[ebp-38], ebx 77F6131E 895D E0 mov dword ptr ss:[ebp-20], ebx 77F61321 895D B4 mov dword ptr ss:[ebp-4C], ebx 77F61324 895D B8 mov dword ptr ss:[ebp-48], ebx 77F61327 895D A8 mov dword ptr ss:[ebp-58], ebx 77F6132A 895D F0 mov dword ptr ss:[ebp-10], ebx 77F6132D 895D E8 mov dword ptr ss:[ebp-18], ebx 77F61330 895D EC mov dword ptr ss:[ebp-14], ebx 77F61333 895D E4 mov dword ptr ss:[ebp-1C], ebx 77F61336 895D F4 mov dword ptr ss:[ebp-C], ebx 77F61339 895D FC mov dword ptr ss:[ebp-4], ebx 77F6133C 8975 A4 mov dword ptr ss:[ebp-5C], esi 77F6133F 8945 F8 mov dword ptr ss:[ebp-8], eax 77F61342 77 03 ja short ntdll.77F61347 77F61344 8975 F8 mov dword ptr ss:[ebp-8], esi 77F61347 83F9 02 cmp ecx, 2 77F6134A ^ 0F82 0C6CFFFF jb ntdll.77F57F5C 77F61350 8B55 08 mov edx, dword ptr ss:[ebp+8] 77F61353 0F84 96330200 je ntdll.77F846EF 77F61359 83F9 02 cmp ecx, 2 77F6135C 76 18 jbe short ntdll.77F61376 77F6135E 3BF0 cmp esi, eax 77F61360 77 14 ja short ntdll.77F61376 77F61362 F645 0C 01 test byte ptr ss:[ebp+C], 1 77F61366 75 0E jnz short ntdll.77F61376 77F61368 8B45 F8 mov eax, dword ptr ss:[ebp-8] 77F6136B F64442 FF FF test byte ptr ds:[edx+eax*2-1], 0FF [B]<-- Check for the FF magic byte[/B] 77F61370 0F84 91330200 je ntdll.77F84707
-
Comment
-
Yo, Wayne, your "check the disk file header" code may be working now, but there is a bug waiting to bite you in a tender area....
Code:OPEN "c:\unicode.txt" FOR BINARY ACCESS READ LOCK SHARED AS #hFile ... SEEK #hFile, 0
Or, you can remove the guesswork entirely (recommended):
Code:SEEK #hFile, FILEATTR(hFile, -2&)
Last edited by Michael Mattias; 30 Jul 2009, 09:47 AM.Michael Mattias
Tal Systems (retired)
Port Washington WI USA
[email protected]
http://www.talsystems.com
Comment
-
Also if you want to get halfword at start of file, no sense creating a string and then requesting the address of the data...
Code:LOCAL FileHeaderWord AS WORD SEEK ... GET hFile, ,FileHeaderWord
You might have to twink with the order of the bytes in your comparison.
Not that it matters, because OPENing and reading the file takes a LOT more time than does building a temp string and this is really moot performance-wise. But I do believe fewer steps to be a tad less cryptic when you go back to this code in the future.
MCMMichael Mattias
Tal Systems (retired)
Port Washington WI USA
[email protected]
http://www.talsystems.com
Comment
Comment