Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possibile to define a codec from a table of prefixes? #11

Open
RigacciOrg opened this issue Nov 20, 2020 · 1 comment
Open

Is it possibile to define a codec from a table of prefixes? #11

RigacciOrg opened this issue Nov 20, 2020 · 1 comment

Comments

@RigacciOrg
Copy link

Is it possibile to define a codec starting from a pre-made table of symbols, prefixes and values? I don't have frequencies, etc., I have just a table like the one attached below.

I inspected the pickle objects provided for predefined frequency tables (json, xml, etc.); I think that I can manage to create the code_table part, but cannot figure how to build type, concat and metadata. It would nice if I can just declare a dictionary or something like this in the code, instead of integrating a pickle object into the library.

ECG default Huffman table

P.S. the table above is the default Huffman table used to compress electrocardiography data using the SCP-ECG standard.

@soxofaan
Copy link
Owner

soxofaan commented Apr 6, 2021

yes it's possible when you use PrefixCodec, which is the parent class of HuffmanCodec.
The latter actually just takes care of converting the frequency table to a prefix code table, the former takes care of the prefix code encoding and decoding.

when you have for example this code table (based on your screenshot):

symbol bits value
1 1 0 (0)
2 3 4 (100)
3 3 5 (101)
4 4 12 (1100)
5 4 13 (1101)
6 5 28 (11100)

you can build a codec like this:

from dahuffman.huffmancodec import PrefixCodec

table = table = {
    1: (1, 0),
    2: (3, 4),
    3: (3, 5),
    4: (4, 12),
    5: (4, 13),
    6: (5, 28),
}

codec = PrefixCodec(table, eof=6)

encoded = codec.encode([1,2,3,4,5,1,2,3,4,5,1,2,3,4,5])
print(codec.decode(encoded))

A problem might be that the current implementation requires you to have an "end of file" (eof) symbol in the table, which is used to mark the end of the bit stream when it does not align properly with byte boundaries. In this example I used symbol 6 as eof.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants