letter: s frequency: 3
letter: t frequency 3
letter: a frequency 1
letter: e frequency 1
letter: h frequency 1
letter: i frequency 2
letter: frequency 3
Then we have to create a binary tree, this could be done by sorting this list by frequency and making the two lowest elements into leaves, creating a parent node with a frequency that is the sum of the two lower element's frequencies. The two elements are removed from the list and the new parent node is inserted into the list by frequency and we have to sort the list again and repeat this process until the there is only one element left in the list, that is the root.
The idea behind Huffman coding is based upon the frequency of a symbol in a sequence. The symbol that is the most frequent in that sequence gets a new code that is very small, in the other hand the symbols that have a small frequency will get a code that is very long.
In my first attempt I was using only list and dictionaries but I get a mess when a I want to get the code of each letter because every time I have 2 nodes with the same value the tree traversal take wrong turns; so I decide to use a structure to create every node. With this structure I know which letter it has, its frequency, its code and the node that is on its left branch and on its right branch.
This implementation was simpler to traverse the tree but at first for me it was confused because I didn't know how to print every node.
This is my implementation in python:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Node: | |
def __init__(self,letter, frequency, left, right): | |
self.letter = letter | |
self.frequency = frequency | |
self.left = left | |
self.right = right | |
self.code = '' | |
def __str__(delf): | |
return self.letter | |
def lettersFrequency(text): | |
freq = {} | |
for c in text: | |
if c not in freq.keys(): | |
f = text.count(c) | |
freq[c] = f | |
return freq.items() | |
def orderFrequencies(freq): | |
nodes = [] | |
freq = sorted(freq, key=lambda f:f[1]) | |
for f in freq: | |
node = Node(f[0], f[1], None, None) | |
nodes.append(node) | |
return nodes | |
def getInfo(node): | |
if len(node.letter) == 1: | |
print 'letter:',node.letter | |
print 'frequency', node.frequency | |
print 'code:', node.code, '\n' | |
if node.left != None: | |
getInfo(node.left) | |
if node.right != None: | |
getInfo(node.right) | |
def getNodes(tree): | |
while len(tree) > 1: | |
left = tree.pop(0) | |
right = tree.pop(0) | |
letter = left.letter + right.letter | |
value = left.frequency + right.frequency | |
left.code = '0' | |
right.code = '1' | |
node = Node(letter, value, left, right) | |
tree.append(node) | |
tree.sort() | |
return tree | |
#the first node that receives is the root of the tree | |
def getCodes(node): | |
if node.left != None: | |
node.left.code = node.code + node.left.code | |
getCodes(node.left) | |
if node.right != None: | |
node.right.code = node.code + node.right.code | |
getCodes(node.right) | |
def huffmanTree(node, data): | |
if not data.has_key(node.letter): | |
data[node.letter] = node | |
if node.left != None: | |
data = huffmanTree(node.left, data) | |
if node.right != None: | |
data = huffmanTree(node.right, data) | |
return data | |
def compress(data, text): | |
codesTable = {} | |
compressed = '' | |
nodes = data.keys() | |
for n in nodes: | |
node = data[n] | |
codesTable[node.letter] = node.code | |
for c in text: | |
compressed += codesTable[c] | |
return codesTable, compressed | |
def decompress(node, compressed): | |
root = node | |
ctext = [] | |
for c in compressed: | |
ctext.append(c) | |
msg = '' | |
while len(ctext) > 0: | |
n = int(ctext.pop(0)) | |
if(n==0): | |
if (node.left != None and len(node.left.letter) == 1): | |
msg += node.left.letter | |
node = root | |
else: | |
node = node.left | |
else: | |
if (node.right != None and len(node.right.letter) == 1): | |
msg += node.right.letter | |
node = root | |
else: | |
node = node.right | |
return msg | |
if __name__ == '__main__': | |
text = raw_input('Text to compress: ') | |
text = text.replace('\t',' ').replace('\n',' ') | |
freq = lettersFrequency(text) | |
freq = orderFrequencies(freq) | |
tree = getNodes(freq) | |
getCodes(tree[0]) | |
data = huffmanTree(tree[0], data = {}) | |
codesTable, compressed = compress(data, text) | |
print 'Original Text: ', text,'\n' | |
print 'Compressed text: ',compressed,'\n' | |
getInfo(tree[0]) | |
msg = decompress(tree[0], compressed) | |
print 'Decompressed text: ', msg |
This are some of the tests:
![]() |
Test text: This is a test |
![]() |
Test text: I would trade all my technology for an afternoon with Socrates. |
![]() |
Test text: I would trade all my technology for an afternoon with Socrates. |
No estás en realidad pasado la salida a algo ni sacando tasas de compresión... El reporte no corresponde a lo que se esperaba. 6+4 pts.
ReplyDelete