proxygen
proxygen::huffman::HuffTree Class Reference

#include <Huffman.h>

Inheritance diagram for proxygen::huffman::HuffTree:
TestingHuffTree

Public Member Functions

 HuffTree (const uint32_t *codes, const uint8_t *bits)
 
 HuffTree (HuffTree &&tree)=default
 
 ~HuffTree ()
 
bool decode (const uint8_t *buf, uint32_t size, folly::fbstring &literal) const
 
uint32_t encode (folly::StringPiece literal, folly::io::QueueAppender &buf) const
 
uint32_t getEncodeSize (folly::StringPiece literal) const
 
std::pair< uint32_t, uint8_tgetCode (uint8_t ch) const
 
const uint32_tcodesTable () const
 
const uint8_tbitsTable () const
 

Protected Member Functions

 HuffTree (const HuffTree &tree)
 

Protected Attributes

SuperHuffNode table_ [46]
 

Private Member Functions

void fillIndex (SuperHuffNode &snode, uint32_t code, uint8_t bits, uint8_t ch, uint8_t level)
 
void buildTree ()
 
void insert (uint32_t code, uint8_t bits, uint8_t ch)
 

Private Attributes

uint32_t nodes_ {0}
 
const uint32_tcodes_
 
const uint8_tbits_
 

Detailed Description

Immutable Huffman tree used in the process of decoding. Traditionally the huffman tree is binary, but using that approach leads to major inefficiencies since it's using per-bit level processing and needs to perform several memory accesses and bit operations for every single bit. This implementation is using 8-bit level indexing and uses aggregated nodes that link up to 256 other nodes. The complexity for lookup is reduced from O(bits) to O(Bytes) which is 1 or 2 for most of the printable characters. The tradeoff of using this approach is using more memory and generating the tree is more laborious since we need to fill all the subtrees denoted by a character code, which is an unique prefix.

Example

bit stream: 00001111 1111010

  1. our lookup key is 00001111 which will point to character '/', since the entire subtree with prefix 0000 points to it. We know the subtree has just 4 bits, we remove just those from the current key. bit stream: 11111111 010
  2. key is 11111111 which points to a branch, so we go down one level bit stream: 010
  3. we don't have enough bits, so we use paddding and we get a key of 01011111, which points to '(' character, like any other node under the subtree '010'.

Definition at line 84 of file Huffman.h.

Constructor & Destructor Documentation

proxygen::huffman::HuffTree::HuffTree ( const uint32_t codes,
const uint8_t bits 
)
explicit

the constructor assumes the codes and bits tables will not be freed, ideally they are static

Definition at line 21 of file Huffman.cpp.

References buildTree().

22  : codes_(codes), bits_(bits) {
23  buildTree();
24 }
const uint8_t * bits_
Definition: Huffman.h:148
const uint32_t * codes_
Definition: Huffman.h:147
proxygen::huffman::HuffTree::HuffTree ( HuffTree &&  tree)
explicitdefault
proxygen::huffman::HuffTree::~HuffTree ( )
inline

Definition at line 92 of file Huffman.h.

References proxygen::huffman::HuffNode::ch, fizz::decode(), encode(), folly::size(), uint32_t, and uint8_t.

92 {}
proxygen::huffman::HuffTree::HuffTree ( const HuffTree tree)
explicitprotected

Definition at line 26 of file Huffman.cpp.

References buildTree().

26  :
27  codes_(tree.codes_), bits_(tree.bits_) {
28  buildTree();
29 }
const uint8_t * bits_
Definition: Huffman.h:148
const uint32_t * codes_
Definition: Huffman.h:147

Member Function Documentation

const uint8_t * proxygen::huffman::HuffTree::bitsTable ( ) const

Definition at line 102 of file Huffman.cpp.

References bits_.

102  {
103  return bits_;
104 }
const uint8_t * bits_
Definition: Huffman.h:148
void proxygen::huffman::HuffTree::buildTree ( )
private

initializes and builds the huffman tree

Definition at line 126 of file Huffman.cpp.

References bits_, codes_, i, insert(), proxygen::huffman::kTableSize, and uint32_t.

Referenced by HuffTree().

126  {
127  // create the indexed table
128  for (uint32_t i = 0; i < kTableSize; i++) {
129  insert(codes_[i], bits_[i], i);
130  }
131 }
void insert(uint32_t code, uint8_t bits, uint8_t ch)
Definition: Huffman.cpp:78
const uint8_t * bits_
Definition: Huffman.h:148
const uint32_t * codes_
Definition: Huffman.h:147
const uint32_t kTableSize
Definition: Huffman.h:21
const uint32_t * proxygen::huffman::HuffTree::codesTable ( ) const

Definition at line 98 of file Huffman.cpp.

References codes_.

98  {
99  return codes_;
100 }
const uint32_t * codes_
Definition: Huffman.h:147
bool proxygen::huffman::HuffTree::decode ( const uint8_t buf,
uint32_t  size,
folly::fbstring literal 
) const

decode bitstream into a string literal

Parameters
bufstart of a huffman-encoded bit stream
sizesize of the buffer
literalwhere to append decoded characters
Returns
true if the decode process was successful

Definition at line 31 of file Huffman.cpp.

References proxygen::huffman::HuffNode::bits, proxygen::huffman::HuffNode::ch, proxygen::huffman::HuffNode::data, i, proxygen::huffman::SuperHuffNode::index, proxygen::huffman::HuffNode::isLeaf(), proxygen::huffman::HuffNode::metadata, folly::basic_fbstring< E, T, A, Storage >::push_back(), proxygen::huffman::HuffNode::superNodeIndex, table_, uint32_t, and uint8_t.

Referenced by TEST_F().

33  {
34  const SuperHuffNode* snode = &table_[0];
35  uint32_t w = 0;
36  uint32_t wbits = 0;
37  uint32_t i = 0;
38  while (i < size || wbits > 0) {
39  // decide if we need to load more bits using an 8-bit chunk
40  if (i < size && wbits < 8) {
41  w = (w << 8) | buf[i];
42  wbits += 8;
43  i++;
44  }
45  // key is used for performing the indexed lookup
46  uint32_t key;
47  if (wbits >= 8) {
48  key = w >> (wbits - 8);
49  } else {
50  // this the case we're at the end of the buffer
51  uint8_t xbits = 8 - wbits;
52  w = (w << xbits) | ((1 << xbits) - 1);
53  key = w;
54  wbits = 8;
55  }
56  // perform the indexed lookup
57  const HuffNode& node = snode->index[key];
58  if (node.isLeaf()) {
59  // final node, we can emit the character
60  literal.push_back(node.data.ch);
61  wbits -= node.metadata.bits;
62  snode = &table_[0];
63  } else {
64  // this is a branch, so we just need to move one level
65  wbits -= 8;
66  snode = &table_[node.data.superNodeIndex];
67  }
68  // remove what we've just used
69  w = w & ((1 << wbits) - 1);
70  }
71  return true;
72 }
constexpr auto size(C const &c) -> decltype(c.size())
Definition: Access.h:45
void push_back(const value_type c)
Definition: FBString.h:1437
SuperHuffNode table_[46]
Definition: Huffman.h:152
uint32_t proxygen::huffman::HuffTree::encode ( folly::StringPiece  literal,
folly::io::QueueAppender buf 
) const

encode string literal into huffman encoded bit stream

Parameters
literalstring to encode
bufwhere to append the encoded binary data

Definition at line 133 of file Huffman.cpp.

References bits_, ch, codes_, i, folly::io::detail::Writable< Derived >::push(), folly::Range< Iter >::size(), uint32_t, uint8_t, and folly::io::detail::Writable< Derived >::writeBE().

Referenced by TEST_F().

134  {
135  uint32_t code; // the huffman code of a given character
136  uint8_t bits; // on how many bits code is represented
137  uint32_t w = 0; // 4-byte word used for packing bits and write it to memory
138  uint8_t wbits = 0; // how many bits we have in 'w'
139  uint32_t totalBytes = 0;
140  for (size_t i = 0; i < literal.size(); i++) {
141  uint8_t ch = literal[i];
142  code = codes_[ch];
143  bits = bits_[ch];
144 
145  if (wbits + bits < 32) {
146  w = (w << bits) | code;
147  wbits += bits;
148  } else {
149  uint8_t xbits = wbits + bits - 32;
150  w = (w << (bits - xbits)) | (code >> xbits);
151  // write the word into the buffer by converting to network order, which
152  // takes care of the endianness problems
153  buf.writeBE<uint32_t>(w);
154  totalBytes += 4;
155  // carry for next batch
156  wbits = xbits;
157  w = code & ((1 << xbits) - 1);
158  }
159  }
160  // we might have some padding at the byte level
161  if (wbits & 0x7) {
162  // padding bits
163  uint8_t padbits = 8 - (wbits & 0x7);
164  w = (w << padbits) | ((1 << padbits) - 1);
165 
166  wbits += padbits;
167  }
168  // we need to write the leftover bytes, from 1 to 4 bytes
169  if (wbits > 0) {
170  uint8_t bytes = wbits >> 3;
171  // align the bits to the MSB
172  w = w << (32 - wbits);
173  // set the bytes in the network order and copy w[0], w[1]...
174  w = htonl(w);
175  // we need to use memcpy because we might write less than 4 bytes
176  buf.push((uint8_t*)&w, bytes);
177  totalBytes += bytes;
178  }
179  return totalBytes;
180 }
const uint8_t * bits_
Definition: Huffman.h:148
constexpr size_type size() const
Definition: Range.h:431
void push(const uint8_t *buf, size_t len)
Definition: Cursor.h:755
auto ch
const uint32_t * codes_
Definition: Huffman.h:147
void writeBE(T value)
Definition: Cursor.h:744
void proxygen::huffman::HuffTree::fillIndex ( SuperHuffNode snode,
uint32_t  code,
uint8_t  bits,
uint8_t  ch,
uint8_t  level 
)
private

recursive function for generating subtrees

Definition at line 109 of file Huffman.cpp.

References proxygen::huffman::HuffNode::bits, proxygen::huffman::HuffNode::ch, ch, proxygen::huffman::HuffNode::data, proxygen::huffman::SuperHuffNode::index, proxygen::huffman::HuffNode::metadata, and uint8_t.

Referenced by insert().

110  {
111  if (level == 8) {
112  snode.index[code].data.ch = ch;
113  snode.index[code].metadata.bits = bits;
114  return;
115  }
116  // generate the bit at the current level
117  code = code << 1;
118  for (uint8_t bit = 0; bit <= 1; bit++) {
119  fillIndex(snode, code | bit, bits, ch, level + 1);
120  }
121 }
auto ch
void fillIndex(SuperHuffNode &snode, uint32_t code, uint8_t bits, uint8_t ch, uint8_t level)
Definition: Huffman.cpp:109
pair< uint32_t, uint8_t > proxygen::huffman::HuffTree::getCode ( uint8_t  ch) const

get the binary representation for a given character, as a 32-bit word and a number of bits is represented on (<32). The code is aligned to LSB.

Parameters
chASCII character
Returns
pair<word, bits>

Example: 'e' will be encoded as 1 using 4 bits: 0001

Definition at line 196 of file Huffman.cpp.

References bits_, and codes_.

196  {
197  return std::make_pair(codes_[ch], bits_[ch]);
198 }
const uint8_t * bits_
Definition: Huffman.h:148
auto ch
const uint32_t * codes_
Definition: Huffman.h:147
uint32_t proxygen::huffman::HuffTree::getEncodeSize ( folly::StringPiece  literal) const

get the encode size for a string literal, works as a dry-run for the encode useful to allocate enough buffer space before doing the actual encode

Parameters
literalstring literal
Returns
size how many bytes it will take to encode the given string

Definition at line 182 of file Huffman.cpp.

References bits_, ch, i, folly::size(), folly::Range< Iter >::size(), uint32_t, and uint8_t.

Referenced by TEST_F().

182  {
183  uint32_t totalBits = 0;
184  for (size_t i = 0; i < literal.size(); i++) {
185  // we just need the number of bits
186  uint8_t ch = literal[i];
187  totalBits += bits_[ch];
188  }
189  uint32_t size = totalBits >> 3;
190  if (totalBits & 0x07) {
191  ++size;
192  }
193  return size;
194 }
const uint8_t * bits_
Definition: Huffman.h:148
constexpr size_type size() const
Definition: Range.h:431
auto ch
constexpr auto size(C const &c) -> decltype(c.size())
Definition: Access.h:45
void proxygen::huffman::HuffTree::insert ( uint32_t  code,
uint8_t  bits,
uint8_t  ch 
)
private

insert a new character into the tree, identified by an unique code, a number of bits to represent it. The code is aligned at LSB.

Definition at line 78 of file Huffman.cpp.

References proxygen::huffman::HuffNode::data, fillIndex(), proxygen::huffman::SuperHuffNode::index, proxygen::huffman::HuffNode::isLeaf(), proxygen::huffman::HuffNode::isSuperNode, proxygen::huffman::HuffNode::metadata, nodes_, proxygen::huffman::HuffNode::superNodeIndex, table_, uint32_t, and x.

Referenced by buildTree().

78  {
79  SuperHuffNode* snode = &table_[0];
80  while (bits > 8) {
81  uint32_t mask = 0xFF << (bits - 8);
82  uint32_t x = (code & mask) >> (bits - 8);
83  // mark this node as branch
84  if (snode->index[x].isLeaf()) {
85  nodes_++;
86  HuffNode& node = snode->index[x];
87  node.metadata.isSuperNode = true;
88  node.data.superNodeIndex = nodes_;
89  }
90  snode = &table_[snode->index[x].data.superNodeIndex];
91  bits -= 8;
92  code = code & ~mask;
93  }
94  // fill the node with all the suffixes
95  fillIndex(*snode, code, bits, ch, bits);
96 }
Definition: InvokeTest.cpp:58
const int x
auto ch
void fillIndex(SuperHuffNode &snode, uint32_t code, uint8_t bits, uint8_t ch, uint8_t level)
Definition: Huffman.cpp:109
SuperHuffNode table_[46]
Definition: Huffman.h:152

Member Data Documentation

const uint8_t* proxygen::huffman::HuffTree::bits_
private

Definition at line 148 of file Huffman.h.

Referenced by bitsTable(), buildTree(), encode(), getCode(), and getEncodeSize().

const uint32_t* proxygen::huffman::HuffTree::codes_
private

Definition at line 147 of file Huffman.h.

Referenced by buildTree(), codesTable(), encode(), and getCode().

uint32_t proxygen::huffman::HuffTree::nodes_ {0}
private

Definition at line 146 of file Huffman.h.

Referenced by insert().

SuperHuffNode proxygen::huffman::HuffTree::table_[46]
protected

Definition at line 152 of file Huffman.h.

Referenced by decode(), and insert().


The documentation for this class was generated from the following files: