Naive information theory says it should be the same as 24 bit, since you need 12 bits x 2 to represent it. But... many combinations of bits reduces the number of _unique_ pieces of 24 bit space you can sample. So it's actually much smaller.
You can easily show that, for each colour element, you can get every value from 0 to 2^4-1 and thus, the sum of the two (ignoring halving) means you can get every value from 0 to (2^4-1)*2 ie. 31 possible values.
Thus, you have a colour space of (31)^3=29791
And my maths and your program agree
Or in other words, by dithering you are disposing of 99.82% of the information.
If instead you dither across N elements, then you get (1+N*(2^4-1))^3 effective colour values
You can easily show that, for each colour element, you can get every value from 0 to 2^4-1 and thus, the sum of the two (ignoring halving) means you can get every value from 0 to (2^4-1)*2 ie. 31 possible values.
Thus, you have a colour space of (31)^3=29791
And my maths and your program agree
Or in other words, by dithering you are disposing of 99.82% of the information.
If instead you dither across N elements, then you get (1+N*(2^4-1))^3 effective colour values