Information theory analyzer

Hello :)

I’ve taken an exam those summer called “Information theory” and as a project I’ve developed, with the help of Alfredo Aiello, a program for evaluating entropy (simple, joint and conditional) and mutual information of random variables (and subset of them) from a sample set.

Like every other time I want to share the code (that is licensed with GNU General Public License 3.0) with the whole world, so this is a zip file containing all the code.
There isn’t a makefile right now but you can easily find a way to compile it, the only external dependencies is my ArgumentList library.

After compiling it you can use it on command line just tiping:

./ITA <[-e] | [-j] | [-c] | [-m]> -n <null_char> <set_file> <input_file> <output_file>

Obviously you have to call your binary :)

The first parameter is for the operation wanted (-e for entropy, -j for joint entropy, -c for conditional entropy and -m for mutual information), the second is for setting the null char of the sample set and last three ones are filename.
The first of them (set_file) permits you to control which variable of the sample set is in which subset, I know I can be a little too enigmatic but I’ll provide an example later; the second (input_file)  is the file which contains the sample set and the third (output_file) is where do you want the output will be written.

The sample set format is really sample, is a text file where the columns are the variable and the rows are the samples, something like that:

1 0 n 1
n 2 n n
a 0 2 2
2 c 1 2

This is the description of a sample set with 4 samples and 4 variables.
The format for the set_file is really simple too, again a text file where the first number is the length of first subset and all the others are the indexes (starting from 0) of the columns in the subset.

Let’s for example evaluate the conditional entropy between those two subset of the previously given sample set:

X = {0, 2, 3}
Y = {1}

Then we’ll have to write a file (let’s call it example.set) like this one:

3 0 2 3 1

And, if our sample set is called example.txt, call the program in that way:

./ITA -c -n n example.set example.txt out.txt

Enjoy!

No related posts.

Tags: , , ,

2 Responses to “Information theory analyzer”

  1. epokh Says:

    Hey, I haven’t seen the code but are you using a biased or unbiased estimation?
    Can I also choose the word size of the stream to encode?
    Can I also estimate the sublinear and extensive entropy?
    I work quite a lot with entropy estimation and I wrote my functions but I also use Matlab.
    Cheerz.

  2. Alessio Sclocco Says:

    Hey epokh.

    The code was especially designed to be 100% compatible (in a black box kind of way) with an old C+Java code that was a BSc thesis, so not offer many functionalities; for example I designed it at first to deal with words of a user defined size but it wasn’t useful for the Professor’s goal so I removed the thing, but I’m sure it can be easily reintroduced.
    For the rest it simply does what is wrote on that page, you can give it a file with some samples (the original application was designed for medical samples) and it evaluate the desired function, it works with a finite stream.
    By the way today, reading your comment, I get interested again in this “old” (?) code and if can be useful for something I would like to add functionalities or make it a library (it would be easy, is designed not perfectly but not so bad still).

    Ciao :)

Leave a Reply