Announcement

Collapse
No announcement yet.

Absolute assurance of file uniqueness.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tim Lakinir
    replied
    Thank you Frank

    Leave a comment:


  • Frank Rogers
    replied
    See PowerBASIC Help for ASM
    PowerBASIC recognizes either an apostrophe ( ' ) or a semi-colon ( ; ) to specify a comment after a line of assembler code:

    ! PUSH EAX ; save the EAX register

    ! PUSH EBX ' save the EBX register

    Leave a comment:


  • Tim Lakinir
    replied
    Thank you Everyone for the codes and explanation.

    One question for ASM, does semi colon ; indicates a comment ? as shown in the code below

    Code:
    ! mul edi ;eax = eax * FNV_32_PRIME

    Can we use tilt ' instead of semi colon ; for ASM comments?

    Code:
    ! mul edi                '   eax = eax * FNV_32_PRIME

    Leave a comment:


  • Frank Rogers
    replied
    2002 post by Wayne Diamond
    2014 post by Wayne DIamond

    Leave a comment:


  • George Bleck
    replied
    I have these functions from a project and I am sure I got them from the forum so credit is due to someone else...

    Code:
    FUNCTION FNV32( BYVAL dwOffset AS DWORD, BYVAL dwLen AS DWORD, BYVAL offset_basis AS DWORD ) AS DWORD
    #REGISTER NONE
      ! mov esi, dwOffset ;esi = ptr to buffer
      ! mov ecx, dwLen ;ecx = length of buffer (counter)
      ! mov eax, offset_basis ;set to 0 for FNV-0, or 2166136261 for FNV-1
      ! mov edi, &h01000193 ;FNV_32_PRIME = 16777619
      ! xor ebx, ebx ;ebx = 0
    nextbyte:
      ! mul edi ;eax = eax * FNV_32_PRIME
      ! mov bl, [esi] ;bl = byte from esi
      ! xor eax, ebx ;al = al xor bl
      ! inc esi ;esi = esi + 1 (buffer pos)
      ! dec ecx ;ecx = ecx - 1 (counter)
      ! jnz nextbyte ;if ecx is 0, jmp to NextByte
      ! mov FUNCTION, eax ;else, function = eax
    END FUNCTION
    
    
    
    '----------------------------------------------------------------------------(')
    
    
    
    FUNCTION String2FNV32( BYVAL strText AS STRING ) AS DWORD
      FUNCTION = FNV32( BYVAL STRPTR( strText ), LEN( strText ), 2166136261 )
    END FUNCTION

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Tim Lakinir View Post
    Hi George,
    What is this FNV-1a ? any code to show how it works?
    Try https://tinyurl.com/y4dqv989

    Leave a comment:


  • Tim Lakinir
    replied
    Hi George,
    What is this FNV-1a ? any code to show how it works?

    Leave a comment:


  • George Bleck
    replied
    In this case you don't need cryptographic strength, you could even use FNV-1a which is fast and produces a small output numbers (standard implementations even as small as a dword). There are small chances of duplicates but if you consider the file size as part of your compare it starts to reduce potential collisions.

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Eric Pearson View Post
    You're doing a quick test for equal file length before doing any hash calculations, right?

    BTW just for fun... I have a number of 19702x10462 TIFF files on my local drive, up to 600 MB each. Photoshop doesn't like them very much.
    Ouch,

    I thought my 20013 x 10427 pixel "Admiralty_Chart_No_5308_The_World_Sailing_Ship_Routes,_Published_1946.jpg" was big at 36 MB
    (but it too is 600MB in memory when loaded in Irfanview - guess if I saved it as Tiff it wo uld be the same )


    Leave a comment:


  • David Clarke
    replied
    "You're doing a quick test for equal file length before doing any hash calculations, right?"

    Yes!

    Leave a comment:


  • Eric Pearson
    replied
    You're doing a quick test for equal file length before doing any hash calculations, right?

    BTW just for fun... I have a number of 19702x10462 TIFF files on my local drive, up to 600 MB each. Photoshop doesn't like them very much.

    Leave a comment:


  • David Clarke
    replied
    Thanks All! Interesting thoughts. Much to ponder.

    And yes, I could do a byte by byte comparison in a pinch.

    Leave a comment:


  • David Roberts
    replied
    On paper MD5 is faster than SHA256 but in practice an application which is reading many files from a drive will not see the 'paper' difference. AES128 is seven times faster than AES256 on paper. A few years ago I was using AES128 in an application and was streaming large files. Out of curiosity I switched to AES256 and found that the 'edge' of AES128 had been greatly diminished. I stayed with AES256.

    If David uses the Microsoft APIs he can easily switch from MD5 to SHA256 and may find that the performance hit is nothing like what it says on 'paper'. Bear in mind the title of this thread: "Absolute assurance of file uniqueness."

    Leave a comment:


  • Michael Mattias
    replied
    You cannot allow perfection to be the enemy of the perfectly acceptable.

    Leave a comment:


  • Eric Pearson
    replied
    I'd certainly understand if SHA256 was used for the software that runs nuclear power stations, but when looking for duplicate image files your first sentence says it all for me. I'd want my PowerBASIC program to run faster.

    Leave a comment:


  • David Roberts
    replied
    As far as I know no one has experienced a MD5 collision in the 'wild' yet.

    We have a similar argument with random number generators. A graphics programmer may want blinding fast numbers but is not concerned with top drawer randomness. In that case, a LCG will do. On the other hand, someone may want top drawer randomness in which case a LCG will not do.

    Someone may want a blindingly fast hash function but is not concerned with the odd collision. On the other hand, someone may not want collisions at all.

    If collisions are out of the question then use SHA256. It is as simple as that. Well, not quite - Blake2b is better and Blake3 is better still. Good luck on implementing them.

    Leave a comment:


  • Eric Pearson
    replied
    I'm no expert, but don't I remember reading that the chances of an non-intentional MD5 collision is 1 quadrillion squared, or something absurd like that?

    I also remember a long-ago forum thread about CRC32. Ah, the good old days.

    Leave a comment:


  • David Roberts
    replied
    Originally posted by Stuart
    It doesn't matter if your hash function has zero security.
    Agreed.

    From the SeaHash website: "It aims to have high quality pseudorandom output and few collisions, as well as being fast."

    With SHA256 the likelihood of a collision is very nearly zero. Nobody has managed to 'manufacture' a collision with MD5 HMAC yet.

    A few years ago, quite a few, a guy on this forum had an application which had worked for years without issue and started to have issues. His latest system had many more files than his old system had when he wrote the application. He was using CRC32. With a stronger hash the problem ceased.

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by David Roberts View Post
    MD5 has a security level 0f 64-bit which is woefully weak today. SHA256 has a security level of 128-bit and should hold us in good stead for a few years yet.
    Security level is completely irrelevant when all you are doing is comparing hashes of multiple image files to identify duplicates.
    It doesn't matter if your hash function has zero security. All you need is a fast non-crytographic hash , something like SeaHash or xxHash.for example. (Actually since they are about 50 times as fast as MH5/SHA1 et al, you could use both a lot faster than a single cryptographic hash - use one on the first pass and then the other one on any collisions)

    Leave a comment:


  • David Roberts
    replied
    MD5 has a security level 0f 64-bit which is woefully weak today. SHA256 has a security level of 128-bit and should hold us in good stead for a few years yet.

    2^128 x 2^128 = 2^256.

    Use SHA256.

    If you must use 128-bit, then use MD5 HMAC.


    Leave a comment:

Working...
X