Hi Karl,
I believe that you're correct in that information is a kind of emergent property that arises only in the context of multiple data. From what I can gather, it's primarily the distinctness of the data that gives for high information content per datum. I'm not sure how deep you have gotten into the math of information theory, so please forgive me for going into further detail: there will be a point.
Consider receiving a string of symbols (data) consisting of "0123". This string contains 4 distinct symbols that all occur with the same frequency (1/4th of the time), which ends up producing a Shannon entropy (average information content per symbol) of S = ln(4). Converting that to binary entropy, it's S_b = S / ln(2) = 2. The same entropy is produced by the semi-randomized string "0312", which still contains 4 distinct symbols that occur with the same frequency -- the order of the symbols doesn't affect the result, only the distinctness. What the binary entropy effectively means is that if we wish to represent 4 distinct symbols that occur with the same frequency, then we're going to need at least ln(4)/ln(2) = 2 bits per symbol. This notion becomes very clear if we're already familiar with integer data types on the computer, where we know that a 2-bit integer type can represent 2^2 = 4 distinct symbols (0 through 4). Likewise, a 3-bit integer can represent 2^3 = 8 symbols (0 through 7), etc, etc.
A symbol by itself does not have information content -- the extremely short string "0" produces an entropy of S = ln(1) = 0. There is no distinctness to be had because there is nothing else in the string to compare our single symbol to.
To consider randomization further, consider the string "1010010101010110101010101010110101100010100111010001001101011010", which is 64 symbols long. It's apparent that the string cannot be considered to be totally random just at the root level -- there are only two root symbols ("0" and "1"); any repetition of any symbol is a pattern; patterns are anathema to randomness. Secondly, we notice that there are patterns in the adjacency of the symbols as well; the composite symbols "00", "01", "10", "11" appear quite often. Since we clearly cannot avoid patterns, we must then look for balance (equal frequency) of the patterns at all possible levels in order to test for randomness.
As for actual figures regarding the frequencies of our string, we get...
For 0th level adjacency (singles), we go through the string one step at a time from the left to the right. This gives us the singles "1", "0", "1", "0", "0", etc:
The root symbol "0" occurred 32/64th of the time, as did the root symbol "1".
Balance in frequency occurred.
As for 1st level adjaceny (pairs), we go through the string one step at a time from the left to the right. This gives us the pairs "10", "01", "10", "00", "01", etc:
The composite symbol "00" occurred 7/63th of the time;
The composite symbol "01" occurred 24/63th of the time.
The composite symbol "10" occurred 25/63th of the time.
The composite symbol "11" occurred 7/63th of the time.
Imbalance in frequency occurred.
As for 2nd level adjacency (triplets), we likewise go through the string one step at a time from the left to the right. This gives us the triplets "101", "010", "100",
"001", "010", etc:
The composite symbol "000" occurred 2/62th of the time.
The composite symbol "001" occurred 5/62th of the time.
The composite symbol "010" occurred 18/62th of the time.
The composite symbol "011" occurred 6/62th of the time.
The composite symbol "100" occurred 5/62th of the time.
The composite symbol "101" occurred 19/62th of the time.
The composite symbol "110" occurred 6/62th of the time.
The composite symbol "111" occurred 1/62th of the time.
Imbalance in frequency occurred.
We will skip detailing the 3rd, 4th, etc level adjacency. The point is that at the n-th level of adjacency, there are 2^(n 1) distinct symbols (root symbols for the 0th level, composite symbols for 1st level and higher), and we can consider the whole string to be random only if there is perfect balance in the frequency of the symbols at *all* levels of adjacency. Since there are imbalances in the frequency at the 1st and 2nd level, the string cannot be considered to be random.
Of course, if our string was indeed generated by a random process, then we would simply need to keep making the string longer and eventually a balance in the frequency at all levels of adjacency will naturally arise as time goes on. This is interesting to note because it means that we simply cannot know if a binary number generating process is truly random until we can analyze the symbols it puts out at an infinite level of adjacency, which requires that our string be infinite in length. It's literally impossible to tell with perfect certainty that a string of "0"s and "1"s is randomly-generated if the string is finite in length.
In essence, when someone says "quantum physics is random" and they show some data as evidence, it's still only an assumption because we do not have an infinite number of measurements to verify that statement. We can become more and more confident as we make more and more measurements (and find that the frequency in the patterns are balanced at higher and higher levels of adjacency), but we can never ever be absolutely certain until we take an infinite number of measurements. Consequently, if we were to make billions and billions of measurement and we found out that there was an unexpectedly large imbalance in frequency at say, the 23324324th level of adjancency, and this imbalance simply does not go away when we make billions and billions of more measurements, then we could say with a high degree of confidence that something deterministic is likely going on there, deep at the heart of things.
In other words, your essay indirectly puts a spotlight on one of our basic physical assumptions: "quantum physics is random" when you make mention of randomized strings. I think that's a pretty important observation, and some of your readers might pick up on it right away. This is another reason why I liked your essay a lot.