import numpy as np

data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(0, 1, size=(2, 1000))
data3 = np.random.normal(0, 1, size=(4, 1000))

Shannon Entropy

Shannon entropy H is given by the formula \(H=-\sum_{i}p_{i}\log_{b}(p_{i})\) where \(p_{i}\) is the probability of character number \(i\) appearing in the stream of characters of the message.

Consider a simple digital circuit which has a two-bit input (\(X\), \(Y\)) and a two-bit output (\(X\) and \(Y\), \(X\) or \(Y\)). Assuming that the two input bits \(X\) and \(Y\) have mutually independent chances of \(50%\) of being HIGH, then the input combinations \((0,0)\), \((0,1)\), \((1,0)\), and (\(1,1)\) each have a 1/4 chance of occurring, so the circuit’s Shannon entropy on the input side is \(H(X,Y)=4{\Big (}-{1 \over 4}\log _{2}{1 \over 4}{\Big )}=2\) Then the possible output combinations are (0,0), (0,1) and (1,1) with respective chances of 1/4, 1/2, and 1/4 of occurring, so the circuit’s Shannon entropy on the output side is \(H(X{\text{ and }}Y,X{\text{ or }}Y)=2{\Big (}-{1 \over 4}\log _{2}{1 \over 4}{\Big )}-{1 \over 2}\log _{2}{1 \over 2}=1+{1 \over 2}=1{1 \over 2}\), so the circuit reduces (or “orders”) the information going through it by half a bit of Shannon entropy due to its logical irreversibility.

from gcpds.entropies import Shannon

ent = Shannon(data1)
print(f"Input data shape: {data1.shape}")
print(f"Entropy: {ent}", end='\n\n')
Input data shape: (1000,)
Entropy: 3.4263432580844695
Shannon(data1, base=10)  # Default base is 2
Shannon(data1, bins=12)  # Default bins value used to calculate the distribution is 16
3.0154281530433003
Shannon(data2, conditional=1)
3.348718451910984

Joint entropy

For 2 variables:
\({\displaystyle \mathrm {H} (X,Y)=-\sum _{x\in {\mathcal {X}}}\sum _{y\in {\mathcal {Y}}}P(x,y)\log _{2}[P(x,y)]}\)
ent = Shannon(data2)

print(f"Input data shape: {data2.shape}")
print(f"Entropy: {ent}")
Input data shape: (2, 1000)
Entropy: 6.454989877719003
For more than two random variables \({\displaystyle X_{1},...,X_{n}} X_{1},...,X_{n}\) this expands to
\({\displaystyle \mathrm {H} (X_{1},...,X_{n})=-\sum _{x_{1}\in {\mathcal {X}}_{1}}...\sum _{x_{n}\in {\mathcal {X}}_{n}}P(x_{1},...,x_{n})\log _{2}[P(x_{1},...,x_{n})]}\)
ent = Shannon(data3)

print(f"Input data shape: {data3.shape}")
print(f"Entropy: {ent}")
Input data shape: (4, 1000)
Entropy: 9.801254959649105

Conditional entropy

Joint entropy is used in the definition of conditional entropy
\({\displaystyle \mathrm {H} (X|Y)=\mathrm {H} (X,Y)-\mathrm {H} (Y)}\)
ent = Shannon(data3, conditional=0)  # `conditional` is an index of the input array

print(f"Input data shape: {data3.shape}")
print(f"Entropy: {ent}")
Input data shape: (4, 1000)
Entropy: 6.332143741478394

References

  • Thomas M. Cover; Joy A. Thomas. Elements of Information Theory. Hoboken, New Jersey: Wiley. ISBN 0-471-24195-4.