Announcement
Collapse
No announcement yet.
Absolute assurance of file uniqueness.
Collapse
X
-
See PowerBASIC Help for ASM
PowerBASIC recognizes either an apostrophe ( ' ) or a semi-colon ( ; ) to specify a comment after a line of assembler code:
! PUSH EAX ; save the EAX register
! PUSH EBX ' save the EBX register
Leave a comment:
-
Thank you Everyone for the codes and explanation.
One question for ASM, does semi colon ; indicates a comment ? as shown in the code below
Code:! mul edi ;eax = eax * FNV_32_PRIME
Can we use tilt ' instead of semi colon ; for ASM comments?
Code:! mul edi ' eax = eax * FNV_32_PRIME
Leave a comment:
-
I have these functions from a project and I am sure I got them from the forum so credit is due to someone else...
Code:FUNCTION FNV32( BYVAL dwOffset AS DWORD, BYVAL dwLen AS DWORD, BYVAL offset_basis AS DWORD ) AS DWORD #REGISTER NONE ! mov esi, dwOffset ;esi = ptr to buffer ! mov ecx, dwLen ;ecx = length of buffer (counter) ! mov eax, offset_basis ;set to 0 for FNV-0, or 2166136261 for FNV-1 ! mov edi, &h01000193 ;FNV_32_PRIME = 16777619 ! xor ebx, ebx ;ebx = 0 nextbyte: ! mul edi ;eax = eax * FNV_32_PRIME ! mov bl, [esi] ;bl = byte from esi ! xor eax, ebx ;al = al xor bl ! inc esi ;esi = esi + 1 (buffer pos) ! dec ecx ;ecx = ecx - 1 (counter) ! jnz nextbyte ;if ecx is 0, jmp to NextByte ! mov FUNCTION, eax ;else, function = eax END FUNCTION '----------------------------------------------------------------------------(') FUNCTION String2FNV32( BYVAL strText AS STRING ) AS DWORD FUNCTION = FNV32( BYVAL STRPTR( strText ), LEN( strText ), 2166136261 ) END FUNCTION
Leave a comment:
-
Originally posted by Tim Lakinir View PostHi George,
What is this FNV-1a ? any code to show how it works?
Leave a comment:
-
Hi George,
What is this FNV-1a ? any code to show how it works?
Leave a comment:
-
In this case you don't need cryptographic strength, you could even use FNV-1a which is fast and produces a small output numbers (standard implementations even as small as a dword). There are small chances of duplicates but if you consider the file size as part of your compare it starts to reduce potential collisions.
Leave a comment:
-
Originally posted by Eric Pearson View PostYou're doing a quick test for equal file length before doing any hash calculations, right?
BTW just for fun... I have a number of 19702x10462 TIFF files on my local drive, up to 600 MB each. Photoshop doesn't like them very much.
I thought my 20013 x 10427 pixel "Admiralty_Chart_No_5308_The_World_Sailing_Ship_Routes,_Published_1946.jpg" was big at 36 MB
(but it too is 600MB in memory when loaded in Irfanview - guess if I saved it as Tiff it wo uld be the same)
Leave a comment:
-
"You're doing a quick test for equal file length before doing any hash calculations, right?"
Yes!
Leave a comment:
-
You're doing a quick test for equal file length before doing any hash calculations, right?
BTW just for fun... I have a number of 19702x10462 TIFF files on my local drive, up to 600 MB each. Photoshop doesn't like them very much.
Leave a comment:
-
Thanks All! Interesting thoughts. Much to ponder.
And yes, I could do a byte by byte comparison in a pinch.
Leave a comment:
-
On paper MD5 is faster than SHA256 but in practice an application which is reading many files from a drive will not see the 'paper' difference. AES128 is seven times faster than AES256 on paper. A few years ago I was using AES128 in an application and was streaming large files. Out of curiosity I switched to AES256 and found that the 'edge' of AES128 had been greatly diminished. I stayed with AES256.
If David uses the Microsoft APIs he can easily switch from MD5 to SHA256 and may find that the performance hit is nothing like what it says on 'paper'. Bear in mind the title of this thread: "Absolute assurance of file uniqueness."
Leave a comment:
-
You cannot allow perfection to be the enemy of the perfectly acceptable.
Leave a comment:
-
I'd certainly understand if SHA256 was used for the software that runs nuclear power stations, but when looking for duplicate image files your first sentence says it all for me. I'd want my PowerBASIC program to run faster.
Leave a comment:
-
As far as I know no one has experienced a MD5 collision in the 'wild' yet.
We have a similar argument with random number generators. A graphics programmer may want blinding fast numbers but is not concerned with top drawer randomness. In that case, a LCG will do. On the other hand, someone may want top drawer randomness in which case a LCG will not do.
Someone may want a blindingly fast hash function but is not concerned with the odd collision. On the other hand, someone may not want collisions at all.
If collisions are out of the question then use SHA256. It is as simple as that. Well, not quite - Blake2b is better and Blake3 is better still. Good luck on implementing them.
Leave a comment:
-
I'm no expert, but don't I remember reading that the chances of an non-intentional MD5 collision is 1 quadrillion squared, or something absurd like that?
I also remember a long-ago forum thread about CRC32. Ah, the good old days.
Leave a comment:
-
Originally posted by StuartIt doesn't matter if your hash function has zero security.
From the SeaHash website: "It aims to have high quality pseudorandom output and few collisions, as well as being fast."
With SHA256 the likelihood of a collision is very nearly zero. Nobody has managed to 'manufacture' a collision with MD5 HMAC yet.
A few years ago, quite a few, a guy on this forum had an application which had worked for years without issue and started to have issues. His latest system had many more files than his old system had when he wrote the application. He was using CRC32. With a stronger hash the problem ceased.
Leave a comment:
-
Originally posted by David Roberts View PostMD5 has a security level 0f 64-bit which is woefully weak today. SHA256 has a security level of 128-bit and should hold us in good stead for a few years yet.
It doesn't matter if your hash function has zero security. All you need is a fast non-crytographic hash , something like SeaHash or xxHash.for example. (Actually since they are about 50 times as fast as MH5/SHA1 et al, you could use both a lot faster than a single cryptographic hash - use one on the first pass and then the other one on any collisions)
Leave a comment:
-
MD5 has a security level 0f 64-bit which is woefully weak today. SHA256 has a security level of 128-bit and should hold us in good stead for a few years yet.
2^128 x 2^128 = 2^256.
Use SHA256.
If you must use 128-bit, then use MD5 HMAC.
Leave a comment:
Leave a comment: