(Derived from an assignment by Professor Vijay Raghunathan) Please read the entire README (including FAQ at the end) before starting your assignment, or asking for help. // ~ Overview ~ // Huffman encoding is a loss-less compression technique that is used widely in such things as JPEG compression and MP3 compression. It was developed in the 1950s (!) by David Huffman, who was doing his PhD at MIT. Huffman encoding is still used because it is simple and, under certain plausible conditions, it is optimal, which means that someone *proved* that there is no better way to compress data. (!) Please read the entire attached PDF file from Spring 2013. This file includes the relevant theory for Huffman encoding. To complete this assignment, you must write ten functions, as listed in "answer11.h". // ~ Hard ~ // THIS ASSIGNMENT IS HARD. DO NOT LEAVE IT TO THE LAST MINUTE. // ~ Learning Goals ~ // (1) Recursion (2) Linked-lists (3) Binary Trees (4) Binary files (5) Bit twiddling (6) Memory management // ~ Assignment Files ~ // + answer11.h: Header file contains the declarations that you must implement. These functions must be implemented in a file called "answer11.c". + treefun.c: This source file contains some auxiliary functions that are useful for displaying trees, and linked-lists of trees. There is also a function that decodes a Huffman file, once the header has been read. This last function should be useful to look at, because it shows an example of reading a file bit by bit. + treefun.h: Header file for treefun.c. There is no need to edit this file. + example.c: Some example code that shows how to write a program that reads in a Huffman encoded file (either text or binary header), and prints information about it. This code cannot be run until after implementation of the functions in "answer11.c" + read-header-example.text: This file shows how a Huffman encoding tree is built step-by-step. Please see the "Hints" for more information. + README: this file. + tester: for testing your assignment + figures: a folder of jpeg images that illustrates how a tree is built. + inputs: a folder that contains 13 input testcases. Six files have text headers, and seven files have binary headers. You will need to be able to process all of these files without any errors, including memory errors. + expected: a folder that shows the expected output for each input file, assuming you ran the program in 'example.c' + output-tester: When you run the tester, it will save the output of each testcase and the valgrind log for each testcase in separate files inside of this directory. Please review these files if you fail a testcase, so that you can recreate the problem yourself. // ~ Submitting Your Assignment ~ // You must submit one zip file to Blackboard. This zip file must contain: (1) answer11.c You create the zip file using the following command. > zip pa11.zip answer11.c If your zip file does not meet the above specification, then you may get zero for this assignment. YOU HAVE BEEN WARNED. Following the instructions is *part* of getting the assignment correct. So please follow the instructions. // ~ Determining Your Mark ~ // The tester program is there to ensure that you have followed the instructions correctly. Run the tester program as follows: > ./tester // ~ Hints ~ // (*) Help, I don't know where to start? Start by writing *and* test every function except for: (1) void Stack_popPopCombinePush(Stack * stack); (2) HuffNode * HuffTree_readTextHeader(FILE * fp); (3) HuffNode * HuffTree_readBinaryHeader(FILE * fp); There is no point attempting the last two functions until after you are certain that you have everything else correct. Many of these functions are extremely short and simple. See the last hint for more information on (1) and (2). -------------------------------------------------------------------------------- (*) What is the point of creating a struct "Stack" that only contains a single pointer? This is a subtle design issue that illustrates how *encapsulation* can simplify code. The Stack interface is six very simple functions that operate on a Stack pointer. The person who uses this code (okay yourself) never needs to worry about the underlying implementation. Simply use the six functions. When you modify the stack, you never need to manage the pointer to the head element, or consider the case when it is NULL. Simple. (!) By encapsulating the Stack logic, your code will be simpler, and easier to keep error free. -------------------------------------------------------------------------------- (*) How do I read a byte or a bit from a binary file? Firstly, if you haven't completed *and* tested HuffTree_readTextHeader(...), then you are wasting your time. Do that first. A big hint for doing this is contained in "treefun.c". If you look at the Huffman_applyTree(...) function, you will see an example of how to read a file bit by bit. Don't forget that a byte is just 8 bits. Therefore, if you want to read a byte, you simply read 8 bits, and compose them into a byte. -------------------------------------------------------------------------------- (*) Compose bits into a byte? This bit twiddling is confusing! If you want to set a bit, use logical or: '|' If you want to clear bits, use logical and: '&' If you want to position a bit, use bit-shifts: '<<' and '>>' So, for example, let: unsigned char X = 0b10010110; To set the smallest (last) bit: printBits(X | 0b00000001); // 0b10010111 To clear the last four bits: printBits(X & 0b11110000); // 0b10010000 To position a bit in the sixth position: unsigned char six = 1 << 6; printBits(six); // 0b01000000 Note that I've numbered the bits as follows: // Bit# 76543210 // 0b01000000 The reason is that it follows the value of each bit. For example, if bits "0, 1 and 5" are set, then the value of the number is: Bit# 76543210 0b00100011 = 2^5 + 2^1 + 2^0 = 32 + 2 + 1 = 35 To set the sixth bit: // Bit# 76543210 // 0b10010111 // or 0b01000000 // = // 0b11010110 printBits(X | six); To clear every bit except for the sixth // 0b10010111 // and 0b01000000 // = // 0b00000000 printBits(X & six); To test if the sixth bit is set: if(X & six != 0) { // The sixth bit was set. } else { // Was not set } The code for printBits is as follows: /** * Print the bits of X * In the for loop, we * (1) take the value of X and right shift it by 'ind' ((X >> ind) * (2) clear out all bits except for the zeroth bit. & 0x01) * 0x01 is hexidecimal for 1, and in binary is: 0b00000001 */ void printBits(unsigned char X) { printf("0b"); int ind; for(ind = 7; ind >= 0; --ind) printf("%d", ((X >> ind) & 0x01)); printf("\n"); } -------------------------------------------------------------------------------- (*) Okay, I'm really, really, stuck on reading the text header, and with pop-pop-combine-push(...) There is no point reading this section until you have implemented the first seven functions in answer11.h, and furthermore, that you have read the pdf document, and consulted the class notes. Please look at the file "read-header-example.text". It shows how the stack and encoding tree change as the header is read for the file: "inputs/01-tibbs.txt-huff". Your code should replicate exactly the same stack of trees as it reads bytes from the header of this file. The function "Stack_print(...)" is your friend. Also, ask questions on Blackboard, and show up during office hours.