Announcement

Collapse

Forum Guidelines

This forum is for finished source code that is working properly. If you have questions about this or any other source code, please post it in one of the Discussion Forums, not here.
See more
See less

Huffman encoding/decoding

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Huffman encoding/decoding

    Note, Huffman compression works good for text files, images (such as BMP) and typically is used in combination with LZ methods.
    Code:
       ' Huffman Encoding / Decoding Class
       ' -------------------------------
       ' Based on three variants of VB code, found at planet-source-code.com
    
       #Compile Exe
       #Register None
       #Dim All
       #Include "Win32Api.Inc"
    
       $SourceFile = "C:\huf.src" ' <---- Change me
       $TargetFile = "C:\huf.dst"
       %nTest = 1
    
       Type tagHuffmanTree
          ParentNode As Integer
          RightNode  As Integer
          LeftNode   As Integer
          Value      As Integer
          Weight     As Long
       End Type
    
       Type tagHuffmanChar
          Bits(255) As Byte
       End Type
    
       Sub ErrMsg(szText As String)
          MsgBox szText, %MB_ICONEXCLAMATION Or %MB_TASKMODAL, "Critical error"
       End Sub
    
       Sub CreateBitSequences(Nodes() As tagHuffmanTree, ByVal NodeIndex As Integer, Bytes As tagHuffmanChar, CharValue() As tagHuffmanChar)
          Local NewBytes As tagHuffmanChar
    
          ' If this is a leaf we set the characters bit sequence in the CharValue array
          If Nodes(NodeIndex).Value >= 0 Then CharValue(Nodes(NodeIndex).Value) = Bytes: Exit Sub
    
          ' Traverse the left child
          If Nodes(NodeIndex).LeftNode > = 0 Then
             NewBytes = Bytes
             Incr NewBytes.Bits(0)
             NewBytes.Bits(NewBytes.Bits(0)) = 0
             CreateBitSequences Nodes(), Nodes(NodeIndex).LeftNode, NewBytes, CharValue()
          End If
    
          ' Traverse the right child
          If Nodes(NodeIndex).RightNode >= 0 Then
             NewBytes = Bytes
             Incr NewBytes.Bits(0)
             NewBytes.Bits(NewBytes.Bits(0)) = 1
             CreateBitSequences Nodes(), Nodes(NodeIndex).RightNode, NewBytes, CharValue()
          End If
       End Sub
    
       Function Compress_Huffman (InBuf As String, OutBuf As String) As Long
          Dim i As Local Dword, j As Local Dword, k As Local Dword
          Dim lNode1 As Local Long, lNode2 As Local Long, lNodes As Local Long, NodesCount As Local Integer
          Dim lWeight1 As Local Long, lWeight2 As Local Long
          Dim CharCount(255) As Local Dword
          Dim Nodes(511) As Local tagHuffmanTree
          Dim CharValue(255) As Local tagHuffmanChar
          Dim Bytes As Local tagHuffmanChar
          Dim bInBuf() As Local Byte, bOutBuf() As Local Byte
          Dim InBufSize As Local Dword, OutBufSize As Local Long
          Dim BitValue As Local Byte, Count As Local Integer, ByteValue As Local Byte
                      
          InBufSize = Len(InBuf)
          ReDim bInBuf(0) At StrPtr(InBuf)
    
          If InBufSize = 0 Then OutBuf ="": Function = -1: Exit Function
    
          ' Count the frequency of each ASCII code
          For i = 0 To InBufSize - 1: Incr CharCount(bInBuf(i)): Next
    
          ' Create a leaf for each character
          For i = 0 To 255
             If CharCount(i) Then
                Nodes(NodesCount).Weight = CharCount(i)
                Nodes(NodesCount).Value = i
                Nodes(NodesCount).LeftNode = -1
                Nodes(NodesCount).RightNode = -1
                Nodes(NodesCount).ParentNode = -1
                NodesCount = NodesCount + 1
             End If
          Next
    
          ' Create the Huffman Tree
          For lNodes = NodesCount To 2 Step -1
             ' Get the two leafs with the smallest weights
             lNode1 = -1: lNode2 = -1
             For i = 0 To NodesCount - 1
                If Nodes(i).ParentNode < 0 Then
                   If lNode1 < 0 Then
                      lWeight1 = Nodes(i).Weight
                      lNode1 = i
                   ElseIf (lNode2 = -1) Then
                      lWeight2 = Nodes(i).Weight
                      lNode2 = i
                   ElseIf Nodes(i).Weight < lWeight1 Then
                      If Nodes(i).Weight < lWeight2 Then
                         If (lWeight1 < lWeight2) Then
                            lWeight2 = Nodes(i).Weight
                            lNode2 = i
                         Else
                            lWeight1 = Nodes(i).Weight
                            lNode1 = i
                         End If
                      Else
                         lWeight1 = Nodes(i).Weight
                         lNode1 = i
                      End If
                   ElseIf (Nodes(i).Weight < lWeight2) Then
                      lWeight2 = Nodes(i).Weight
                      lNode2 = i
                   End If
                End If
             Next
    
             ' Create a new leaf
             Nodes(NodesCount).Weight = lWeight1 + lWeight2
             Nodes(NodesCount).LeftNode = lNode1
             Nodes(NodesCount).RightNode = lNode2
             Nodes(NodesCount).ParentNode = -1
             Nodes(NodesCount).Value = -1
    
             ' Set the parentnodes of the two leafs
             Nodes(lNode1).ParentNode = NodesCount
             Nodes(lNode2).ParentNode = NodesCount
    
             ' Increase the node counter
             Incr NodesCount
          Next
    
          ' Traverse the tree to get the bit sequence for each character
          CreateBitSequences Nodes(), NodesCount - 1, Bytes, CharValue()
    
          ' Calculate the length of the destination string after encoding
          j = 0: k = 0: Count = 0 ' chars in use
          For i = 0 To 255
             If CharCount(i) Then
                Incr Count ' number of characters used
                j = j + CharValue(i).Bits(0)
                k = k + CharValue(i).Bits(0) * CharCount(i)
             End If
          Next
          j = (j + 7) \ 8: k = (k + 7) \ 8 ' size in bytes
    
          OutBuf = Space$(5 + 2 * Count + j + k)
          ReDim bOutBuf(4 + 2 * Count + j + k) At StrPtr(OutBuf)
    
          ' Store the length of the source string and the number of characters used
          CopyMemory ByRef bOutBuf(0), ByRef InBufSize, 4
          bOutBuf(4) = Count - 1
          OutBufSize = 5
    
          ' Store the used characters and the length of their respective bit sequences
          Count = 0
          For i = 0 To 255
             If CharValue(i).Bits(0) Then
                bOutBuf(OutBufSize) = i: Incr OutBufSize
                bOutBuf(OutBufSize) = CharValue(i).Bits(0): Incr OutBufSize
                Count = Count + 16 + CharValue(i).Bits(0)
             End If
          Next
    
          ' Store the Huffman Tree into the result converting the bit sequences into bytes
          BitValue = 1
          ByteValue = 0
          For i = 0 To 255
             If CharValue(i).Bits(0) Then
                For j = 1 To CharValue(i).Bits(0)
                   If (CharValue(i).Bits(j)) Then ByteValue = ByteValue + BitValue
                   If (BitValue = 128) Then
                      bOutBuf(OutBufSize) = ByteValue: Incr OutBufSize: ByteValue = 0
                      BitValue = 1
                   Else
                      BitValue = BitValue + BitValue
                   End If
                Next
             End If
          Next
          If BitValue > 1 Then bOutBuf(OutBufSize) = ByteValue: Incr OutBufSize
    
          ' Encode the data by exchanging each ASCII byte
          BitValue = 1
          k = 0
          For i = 0 To InBufSize - 1
             For j = 1 To CharValue(bInBuf(i)).Bits(0)
                If (CharValue(bInBuf(i)).Bits(j) = 1) Then k = k + BitValue
                If (BitValue = 128) Then
                   bOutBuf(OutBufSize) = k
                   OutBufSize = OutBufSize + 1
                   BitValue = 1
                   k = 0
                Else
                   BitValue = BitValue + BitValue
                End If
             Next
          Next
          ' Add the last byte
          If (BitValue > 1) Then bOutBuf(OutBufSize) = k: Incr OutBufSize
    
       End Function
    
       '===============================================
       Sub CreateTree(Nodes() As tagHuffmanTree, NodesCount As Integer, Char As Long, Bytes As tagHuffmanChar)
          Local i As Integer
          Local NodeIndex As Long
    
          NodeIndex = 0
          For i = 1 To Bytes.Bits(0)
             If (Bytes.Bits(i) = 0) Then
                ' Left node
                 If (Nodes(NodeIndex).LeftNode = -1) Then
                    Nodes(NodeIndex).LeftNode = NodesCount
                    Nodes(NodesCount).ParentNode = NodeIndex
                    Nodes(NodesCount).LeftNode = -1
                    Nodes(NodesCount).RightNode = -1
                    Nodes(NodesCount).Value = -1
                    NodesCount = NodesCount + 1
                 End If
                 NodeIndex = Nodes(NodeIndex).LeftNode
              Else
                 ' Right node
                 If (Nodes(NodeIndex).RightNode = -1) Then
                    Nodes(NodeIndex).RightNode = NodesCount
                    Nodes(NodesCount).ParentNode = NodeIndex
                    Nodes(NodesCount).LeftNode = -1
                    Nodes(NodesCount).RightNode = -1
                    Nodes(NodesCount).Value = -1
                    NodesCount = NodesCount + 1
                 End If
                 NodeIndex = Nodes(NodeIndex).RightNode
             End If
          Next
          Nodes(NodeIndex).Value = Char
       End Sub
    
       Function UnCompress_Huffman (InBuf As String, OutBuf As String) As Long
          Dim i As Local Dword, j As Local Dword, k As Local Dword
          Dim lNode1 As Local Long, lNode2 As Local Long, lNodes As Local Long, NodesCount As Local Integer
          Dim lWeight1 As Local Long, lWeight2 As Local Long
          Dim CharCount(255) As Local Dword
          Dim Nodes(511) As Local tagHuffmanTree
          Dim CharValue(255) As Local tagHuffmanChar
          Dim Bytes As Local tagHuffmanChar
          Dim bInBuf() As Local Byte, bOutBuf() As Local Byte
          Dim InBufSize As Local Dword, OutBufSize As Local Long
          Dim BitValue As Local Byte, Count As Local Integer, ByteValue As Local Byte
          Dim InBufPos As Dword
          Dim NodeIndex As Long
          Dim lOutBufSize As Long
          
    
          InBufSize = Len(InBuf)
          If InBufSize < 5 Then OutBuf = "": Function = -1: Exit Function
          ReDim bInBuf(0) At StrPtr(InBuf)
    
          ' Extract the length of the original string and the number of characters used
          CopyMemory ByRef OutBufSize, ByRef bInBuf(0), 4
          Count = bInBuf(4) + 1
          InBufPos = 5
    
          OutBuf = Space$(OutBufSize)
          ReDim bOutBuf(OutBufSize - 1) At StrPtr(OutBuf)
          lOutBufSize = OutBufSize
    
          ' Get the used characters and their respective bit sequence lengths
          For i = 1 To Count
             CharValue(bInBuf(InBufPos)).Bits(0) = bInBuf(InBufPos + 1): InBufPos = InBufPos + 2
          Next
    
          ' Extract the Huffman Tree, converting the byte sequence to bit sequences
          ByteValue = bInBuf(InBufPos): Incr InBufPos
    
          BitValue = 1
          For i = 0 To 255
             If CharValue(i).Bits(0) Then
                For j = 1 To CharValue(i).Bits(0)
                   If (ByteValue And BitValue) Then CharValue(i).Bits(j) = 1
                   If (BitValue = 128) Then
                      ByteValue = bInBuf(InBufPos): Incr InBufPos
                      BitValue = 1
                   Else
                      BitValue = BitValue + BitValue
                   End If
                Next
             End If
          Next
          If (BitValue = 1) Then Decr InBufPos
    
          ' Create the Huffman Tree
          NodesCount = 1
          Nodes(0).LeftNode = -1
          Nodes(0).RightNode = -1
          Nodes(0).ParentNode = -1
          Nodes(0).Value = -1
          For i = 0 To 255
             CreateTree Nodes(), NodesCount, i, CharValue(i)
          Next
    
          ' Decode the actual data
          OutBufSize = 0
          Do
             If InBufPos >= InBufSize Then Exit Do
             ByteValue = bInBuf(InBufPos)
             BitValue = 1
             Do
                If (ByteValue And BitValue) Then NodeIndex = Nodes(NodeIndex).RightNode Else _
                                               NodeIndex = Nodes(NodeIndex).LeftNode
                If Nodes(NodeIndex).Value >= 0 Then
                   bOutBuf(OutBufSize) = Nodes(NodeIndex).Value: Incr OutBufSize
                   If OutBufSize = lOutBufSize Then InBufPos = InBufSize - 1: Exit Do
                   NodeIndex = 0
                End If
                If BitValue = 128 Then Exit Do Else BitValue = BitValue + BitValue
             Loop
             Incr InBufPos
          Loop
    
       End Function
    
    
       Function PbMain
          Local f As Long, t1 As Single, t2 As Single, tt As Dword
          Local InBuf As String, OutBuf As String, InBufC As String
    
          f = FreeFile: ErrClear
          Open $SourceFile For Binary As #f
          If Err = 0 Then If Lof(f) = 0 Then Err = 1 Else Get$ #f, Lof(1), InBuf
          Close #f
          If Err Then ErrMsg "Can't read the source file": Exit Function
    
          t1 = Timer
          For tt = 1 To %nTest
             If Compress_HuffMan (InBuf, OutBuf) < 0 Then ErrMsg "Can't compress": Exit Function
          Next
          t1 = Timer - t1
    
          f = FreeFile: ErrClear
          Open $TargetFile For Output As #f
          If Err = 0 Then Print #f, OutBuf;
          Close #f
          If Err Then ErrMsg "Can't write the target file": Exit Function
    
          t2 = Timer
          For tt = 1 To %nTest
             If UnCompress_HuffMan (OutBuf, InBufC) < 0 Then ErrMsg "Can't decompress": Exit Function
          Next
          t2 = Timer - t2
          
          If InBuf <> InBufC Then MsgBox "Not equal": Exit Function
          
          MsgBox "Encode: " + Format$(1000 * t1 / %nTest, "#") + " ms" + $CrLf + _
                 "Decode: " + Format$(1000 * t2 / %nTest, "#") + " ms" + $CrLf + _
                 "Aspect ratio: " + Format$(100 * Len(OutBuf) / Len(InBuf), "#.##") + " %"
       End Function
    ------------------
    E-MAIL: matus@perevozki.ru

  • #2
    Thanks Semen,
    Huffman compression seems to work fast!

    As I was converting your sample into a modular huffman-code I ran into a little
    "bug". If you're including this code in an already existing code with other open files/databases, you should
    change this line:

    If Err = 0 Then If Lof(f) = 0 Then Err = 1 Else Get$ #f, Lof( 1 ), InBuf

    to

    If Err = 0 Then If Lof(f) = 0 Then Err = 1 Else Get$ #f, Lof(f), InBuf

    as Lof() needs to point to f not 1.

    For whatever it's worth; if someone else wants to make the Huffman code part of
    your own project; replaced Pbmain with these two functions:
    Code:
    Function SaveAsHuffman(ByVal SourceFile As String,ByVal TargetFile As String) As Long
          Local f As Long, InBuf As String, OutBufC As String
                
          'Read sourceFile and Compress it:
          f = FreeFile: ErrClear
          Open SourceFile For Binary As #f 
          If Err = 0 Then If Lof(f) = 0 Then Err = 1 Else Get$ #f, Lof(f), InBuf
          Close #f
          If Err Then ErrMsg "Can't read the source file": Exit Function
                                                                    
          'Compress source file:
          If Compress_HuffMan (InBuf, OutBufC) < 0 Then ErrMsg "Can't compress": Exit Function
          
          'Save Compressed file:
          f = FreeFile: ErrClear
          Open TargetFile For Output As #f
          If Err = 0 Then Print #f, OutBufC;
          Close #f
          If Err Then ErrMsg "Can't write the target file": Exit Function
          Function = 1
       End Function         
       
       Function RetrieveFromHuffman(ByVal SourceFile As String,ByVal TargetFile As String) As Long
          Local f As Long, InBufC As String, OutBuf As String
          
          'Read Compressed SourceFile:
          f = FreeFile: ErrClear
          Open SourceFile For Binary As #f
          If Err = 0 Then If Lof(f) = 0 Then Err = 1 Else Get$ #f, Lof(f), InBufC
          Close #f
          If Err Then ErrMsg "Can't read the source file": Exit Function 
          
          'Decompress:      
          If UnCompress_HuffMan (InBufC, OutBuf) < 0 Then ErrMsg "Can't decompress": Exit Function
    
          'Save Decompressed Huffman:
          f = FreeFile: ErrClear
          Open TargetFile For Output As #f
          If Err = 0 Then Print #f, OutBuf;
          Close #f
          If Err Then ErrMsg "Can't write the target file": Exit Function
          Function = 1
       End Function
    ------------------
    Henning
    Henning

    Comment


    • #3
      hi semen,
      ur huffman implementation has some memory bugs.
      u have to dim/redim ur tables to allocate memory for buffering.

      to do this, replace the redim statement in huffman compress/decompress
      ReDim bInBuf(0) At StrPtr(InBuf)
      with:
      REDIM bInBuf(InBufSize-1) AT STRPTR(InBuf)

      than the code worx fine


      ------------------

      Comment


      • #4
        Tobias --
        I thought that Redim At doesn't allocate a memory at all
        IMO. PB should "eat" wrong dimensions w/o side effect.


        ------------------
        E-MAIL: matus@perevozki.ru

        Comment


        • #5
          IMHO with side effects
          if u enable error checking by setting $DEBUG ERROR ON,
          compress/decompress routines failed.
          "$DEBUG ERROR ON checks for certain types of errors,
          it is best to enable error-checking when developing a program"
          *blabla*, c dox for details...
          each access of bInBuf(), u dimmed with zero elements,
          causes error 9 [Subscript / Pointer out of range].
          i know that u use the InBuf as base adress of that array and
          that´s why mem is already allocated, but if u have error checking in the
          project enabled, obvious errors will occur...

          ------------------

          Comment


          • #6
            Honestly, I never was interesting what does a debugger, because I do not use it at all (old "Print" school).
            "Subscript / Pointer out of range" doesn't occur in run-time, because resulting code doesn't test boundaries.
            Guess, Bob/Lance/Tom will confirm this fact.

            PS. Lance/Tom - pls, move our discussion into Programming form.



            ------------------
            E-MAIL: matus@perevozki.ru

            Comment


            • #7
              hi semen,
              i know resulting code doesn´t test boundaries,
              cuz i wanna point this out at my last post...
              next time i copy & paste full docu text

              i only speak about compiling with DEBUG ERROR ON
              OR DEBUGGING the routine with the debugger.

              btw: i hope u know that the debugger uses advanced error checking like DEBUG ERROR ON while debugging.

              i compiled this code in an existing application. in this app
              debug error was set to on... the routine returns strings
              filled with space$ for example.
              debugging your example has the same effect.


              but i don´t wanna waste my & your time discussing about this,
              i only want to tell other guys here that they don´t compile this
              code with debug error on or debugging your routine ... OR generally
              to debug apps with DIM/REDIM AT and not correct bounds.
              btw: many programmers use the debugger to unterstand 3rd party code,
              especially those who have bought cc 3.0 or/and pbwin 7.0 yet, because
              of the nice debugging feature "Animate"

              ps:
              why don´t u use the DIM AT syntax directly at the first DIM statement?
              ur code :
              Dim bInBuf() As Local Byte
              ...
              ReDim bInBuf(0) At StrPtr(InBuf)

              do it like thiz:
              DIM bInBuf() AT STRPTR(InBuf)
              or like thiz while _DEBUGGING_:
              DIM bInBuf(InBufSize-1) AT STRPTR(InBuf)




              ------------------

              Comment

              Working...
              X