Hi Erik, very well written essay. You do not mention feedback from higher to lower level at all, and so just have the ascending ladder of micro to macro giving causality. However, for example, social relations and/or a complex environment can cause stress to an organism that then affects the concentrations of biochemicals such as hormones and neurotransmitters in the individual that can subsequently affect it's (the organism's) behaviour. So tracing the behaviour only to the biochemical concentrations misses out on the why the levels are what they are; and misses an important feedback in the causal chain. Kind regards Georgina

    Thanks for reading Georgina.

    The reason I focus on an ascending ladder is because it's easy to slide between definitions when talking about this stuff, and working with discrete finite models with discrete finite sets of supervening scales forces rigor. For example, sometimes people talk about "top-down causation" when they are really talking about something that's probably more clearly described as "whole/part" causation. Your example of an organism that's stressed from the surrounding social relations is more like how part might influence a greater whole (the social structure). Talking about feedback makes a lot of sense in that regard. However, wholes don't share an identity with individual parts. A higher scale, however, does seem to have something like an identity relationship with its lower scales (there may be some caveats to this, but in general for discrete finite systems this seems solid). So what would it mean for there to be feedback between a thing itself across scales? My point is not that it can't possibly happen, but rather that it's a) different than the more obvious whole/part feedback and b) there are simpler (i.e., more rigorous) ways of investigating the phenomenon first. I think you are completely correct that this isn't the final story, and you're right to ask the question: about the intermediary scales? What you refer to as the "causal chain" is something I mention in terms of future directions in some of the supplementary material (E).

    2 months later

    If you're listening, there may be more comments here because of the Quanta post https://www.quantamagazine.org/a-theory-of-reality-as-more-than-the-sum-of-its-parts-20170601/. At this point the contest looks to be going well for you, so an early congratulations for that at least, as well as for the mention at Quanta.

    I have a query about your 4x4 matrix on page 4, and its reduction to 2x2. Surely the choice to make the reduction by projection to 3+1 dimensional subspaces instead of to 1+3 or to 2+2 introduces significant information? Indeed, a projection to arbitrary subspaces could be introduced, resulting in almost any 2x2 matrix. Your text offers that "a macroscale is constructed of a grouping (coarse-­‐grain) of the first three microstates (it is multiply realizable)", which uses information contained in the matrix to choose the particular 3+1 projection. The choice of "multiple realization" would seem to identify a particular algorithm for identifying subspaces, with the implication of some degree of algorithmic complexity. I'm sadly no expert on information theory, however.

    If you answer this, I may formulate further questions or comments.

      Hey Peter - yup, still receiving the emails if someone posts here. Thanks for the congrats about Quanta, I was pretty happy at the level of complexity in which the author wrote about the research and hopefully it attracts more researchers into the area.

      To answer your query about the 4x4 matrix: if I understand you correctly, your intuition is correct that there are many possible projections, many of which may lead to different values. You're right. The conceptually simplest way to deal with it is to brute force it. Just try all possible projections, and one will have the most information. However, I'm not sure what you mean by saying that the choice of multiple realization would identify a particular algorithm for identifying subspaces. There's a certain mapping associated with it, but I wouldn't call it an algorithm. You are definitely correct that all this has some connection to the algorithmic complexity - in general, program length should decrease for any macroscales.

      I think it is, but I won't go to the wall over whether the choice of mapping as a nonlinear function of the matrix elements is an algorithm; in any case this is mostly an illustrative toy model for you.

      I left a comment at Quanta. The first paragraph is specific to the Quanta article, and I'll paraphrase the second paragraph as saying that I think you are too dependent on conventional thinking about real-space renormalization (which is by its nature rather ad-hoc, but there's largely no alternative), but I'll repeat the third paragraph here:

      ««The idea that macroscopic observables might be more than is given by microscopic observables can be presented much more abstractly in the context of the Haag-Kastler axioms as a failure of Additivity (which is that the algebra of observables associated with a union of two or more regions is the same as the algebra that is freely generated by the algebras of observables associated with each region separately). A weakening of the Wightman axioms is also possible (which is inevitably much more concrete, but it would need a dozen pages to describe). Hoel will eventually need something like such constructions in order to make contact with the causal structure of quantum (field) theory. I think Hoel's constructions here and in the FQXi essay are very helpful, nonetheless they seem to me ad-hoc enough relative to the underlying cellular, biochemical, and atomic structure as we understand them to make it difficult to make precise claims about causality.»»

      All except the last sentence of that is so specific to my perspective on QFT and (correct me, but I suppose) out of your range of expertise that I don't expect an answer. The last sentence is mostly for future reference; IMO, claims about causality have to be very carefully formulated.

      Natalie Wolchover is one of the very best Science Writers. You were very lucky to get her. I presume that she represented Scott Aaronson's response more-or-less accurately; I take his (somewhat but not totally dismissive) response partly to reflect my feeling that you don't yet have a formal enough presentation to make the suggestiveness of your approach really stick.

      I hope that's helpful, but I am a crazy old bat. I'm fairly close to you, in New Haven, so if it seems curious enough I would be interested to meet you.

      Over on Shtetl-Optimized, you mention "I'm not answering all things people toss out as I don't want to spam the thread." I expect you're not even referring to me, but FWIW I'll repeat my comment there (which reels in a little my comment above that I wouldn't go to the wall over the choice of mapping),

      «Hoel works with Markov processes, but specializes to systems that have an exact macro-state separation: that is, having a diagonal block-matrix presentation. Physical systems, however, are only separable in this way for limited lengths of time. There are small probabilities, for example, of interactions between my fingers, subsystems of my body, and my toes; infrequently, I have to cut my toenails. In general, every entry in a Markov matrix is likely to be non-zero and different from other entries.

      To identify how to coarse-grain in the general case, we have to consider either the matrix entries or we have to consider whatever information is not encoded in the Markov matrix (physical adjacency, for example). If the former, an algorithm might, for example, compute an eigenvector basis (over the complex numbers) and compute which to discard (but there's likely a better algorithm!); the algorithm is clearly a source of extra information. If the latter, there's an even clearer source of extra information. In both cases, the Markov process is embedded in a larger system of other degrees of freedom, but we can't just move Hoel's argument to that larger system, because the same problem applies to it (perhaps more so, because now either the matrix or the external information is more complex).»

      Perhaps even this doesn't engage with or misunderstands your approach too much for it to be fruitful to engage with it? I always thought my reach into Axiomatic QFT (or more generally, any algebra of observables approach) would likely be too much in my own way of thinking for you, but I'm curious whether you think this later thought illuminates your formalism.

      Hey Peter - I definitely wasn't referring to anyone in particular. I didn't want to spam Scott's thread with replies to questions, so I was honest that I wouldn't be answering everyone. I'm happy to answer anything posted here, however.

      To your question about whether physical systems are separable enough for coarse-graining (if I understood you correctly): there's no requirement that groupings don't interact, or that they stay steady across time. But you're right in that for the examples everything is pretty stationary. It's interesting to think about what could happen. Groupings could change moment to moment, but this gets more complicated because there are also spatiotemporal groupings. I'm not sure off the top of mind if it applies, but there's an interesting thing called a Markov blanket that might be relevant here.

      As you suggest, it is possible to use something like physical proximity in the calculations. But I'd view it as a heuristic aspect to the calculation, not something to directly weight the result by. However, the theory is physics-neutral so perhaps something about our physics would need to make that true (just throwing stuff out there).

      All the best! Thanks for reaching out - Erik

        Thanks, that seems a good answer. It's rather open-ended, but that's OK.

        It felt to me that Scott was talking rather at cross-purposes to you, and particularly that he and other commenters on the thread were mostly too far away from the Markovian process mathematics for the discussion to be very productive, so Kudos for removing yourself from it so cleanly.

        As I've mentioned above, my focus is on algebra of observables approaches to Physics, including algebraic QFT but also classical random fields, where the relationship to space-time is firmly specified; Markovian processes are certainly useful as models, but they are rather removed from the QFTs that physicists take seriously as fundamental physical models almost to the exclusion of all other models. Not that I take physicists necessarily to be right about that, but I suppose that if one wants to talk seriously to physicists and to talk about mind supervening on physics some moderately precise link to space-time does have to be made (albeit at the Planck scale totally precise is unlikely). The link between Markov models and consciousness seems also too tenuous to me, but I don't have a mathematical or other framework in which to discuss how it is tenuous.

        Perhaps finally, thanks for mentioning the "Markov blanket" idea, which I had not come across before.

        I'm going to put my worries about the construction in terms I'm familiar with, quantum mechanics, but which I hope will not be too unfamiliar for you, Erik, instead of in the terms you have been using, which I take to be of stochastic matrices and stochastic vectors representing states.

        In the QM context of Hilbert spaces and the representation of states by density matrices, von Neumann entropy, [math]\mathsf{vN}(\hat\rho)=-\mathsf{Tr}[\hat\rho\ln\hat\rho],[/math] is perhaps the commonest measure of information (there are certainly others, but just replace von Neumann by some other measure everywhere in what follows). If we take coarse-graining to be a map, which in general will be nonlinear, from n-by-n density matrices to m-by-m density matrices, [math]X:\mathcal{D}(\mathcal{H}_n)\rightarrow\mathcal{D}(\mathcal{H}_m);\hat\rho\mapsto X(\hat\rho),[/math] then the von Neumann entropy of the coarse-grained system will be [math]\mathsf{vN}(X(\hat\rho))=-\mathsf{Tr}[X(\hat\rho)\ln X(\hat\rho)].[/math] It seems clear(?) that how the von Neumann entropy of the coarse-grained model will be different from the von Neumann entropy of the pre-coarse-grained model will depend both on what choice we make for X and on what density matrices occur as pre-coarse-grained models.

        I think my problem is that it's not clear to me that a detailed accounting of where there is more or less information is a specially good way to think about causality or consciousness. In QM, what "causes" what is systematically described by the Hamiltonian, defined as the infinitesimal generator of the evolution of vector states as a function of time; if one works with a Hilbert space of vector states and with density matrices, as Physics now kinda does, that's it.

        Now, not very seriously, I'll go off the deep end... There are possibilities for QM to model consciousness insofar as we can construct, not for a toy model of a few dimensions, perhaps, but for a sufficiently large Hilbert space, a measurement operator C that returns "1" if there is at least one conscious agent or "0" if not, as well as more physical measurements such as the intensity of the electric field. The algebraic relationships of C with the Hamiltonian and with whatever coarse-graining operators we use would determine what measurement results would be expected in a given state and at different levels of coarse-graining (which would include, for example, correlations between consciousness and the intensity of the electric field). Ensuring such models are empirically useful is of course not so easy.

        I believe it was David Hume who pointed out that murder doesn't exist at the micro level--where at a physics level does an ethical wrong get encoded?

        I'm wondering about macro items other than agents--the meaning of a printed word doesn't exist at the micro level, nor does a photograph exist in its pixels. Temperature--or any average or similar calculation--exists at a different "level" than that of the atoms whose root mean square velocity it measures.

        I'm not clear on how the exclusion principle can exclude things that can't exist at micro levels, such as an image, a murder, or a feeling.

        Maybe I'm completing missing something?

        Your article is so beautifully and clearly written.

        Thanks!

        Sanjay

          Did not know that Hume quote! Thank you Sanjay.

          Like most contemporary arguments in philosophy, the exclusion principle is a re-statement of things people have talked about for a while. It was given its contemporary incarnation in a debate about mental causation. However, others have pointed out that the argument easily generalizes to all macro-properties (because the central argument is that anything that supervenes on anything else can't really be a cause). What the argument would say in the cases that you give (an image, a murder, a feeling) is that those things are merely epiphenomenal higher-level descriptions, which may be useful but don't actually cause anything. So an image is just a set of pixels, a murder is just a particular set of atomic trajectories, and a feeling just some neurons firing (or you could go lower). All the "causal work" that brings them about is being done at the lower scale.

          You bring up a very good point about temperature - there do seem to be macro-states that ignore (or mostly ignore) the underlying micro-states. What the theory of causal emergence captures is how this also mean that the causal structure can be different at the higher scale, precisely because of this.

          Thanks for your essay,

          1. Fisher proposed how nuclear (quantum) entanglement might be stable over mesoscopic distances in the brain [1].

          2. i don't understand causality in quantum mechanics too well, but this is cool. they've demonstrated a computational speedup for a parity task with a single qutrit [2]. it takes one evaluation in the quantum case but two evaluations in the classical computation. it seems like there's a notion of 'quantum causality' going on that's not an averaging over (or function of) the microcausal chains, since they take too long.

          [1] https://www.theatlantic.com/science/archive/2016/11/quantum-brain/506768/

          [2] https://www.nature.com/articles/srep14671

          Paul

            Thanks for links Paul, they were interesting. It's possible that quantum effects in the brain turn out to have something to do with consciousness, although my personal rating of that probability is extremely low as of now, for a few reasons.

            Appreciate the kind words,

            Erk

            Thank you so much for this explanation. I get it. Utter reductionism.

            OK, so as cars pull up to a stop sign, and one by one, stop, the exclusion principle is telling us that the configuration of the word STOP, and its processing by each driver's brain is in no way causative of the drivers' stopping, that the word STOP is an epiphenomenon of the underlying physics between the lightwaves reflecting from the stop sign, the drivers' brains, and the brake pads (all of which are epiphenomena too).

            Furthermore, if I can reliably predict that certain large groupings of atoms, whom I call competent drivers of functional cars, will "stop" at this other grouping I call a "stop sign," I seem to be encoding a level of information that is real and substantive, as proven by my ability to do it. If not, then proof, math, all human concepts disappear into the exclusion principle as meaningless, leaving us with no "real" basis for our lived sense of meaning in the world.

            If you have in fact found a way to bridge this gap, and resurrect the reality of human (and other agents') thoughts and experiences, I'm in awe. I'm re-reading now, trying to better understand.

            Hume's argument is in Part I of "A Treatise on Human Nature, 1739:

            Take any action allow'd to be vicious: Willful murder, for instance. Examine it in all lights, and see if you can find that matter of fact, or real existence, which you call vice. In which-ever way you take it, you find only certain passions, motives, volitions and thoughts. There is no other matter of fact in the case. The vice entirely escapes you, as long as you consider the object. You never can find it, till you turn your reflexion into your own breast, and find a sentiment of disapprobation, which arises in you, towards this action. Here is a matter of fact; but 'tis the object of feeling, not of reason. It lies in yourself, not in the object. So that when you pronounce any action or character to be vicious, you mean nothing, but that from the constitution of your nature you have a feeling or sentiment of blame from the contemplation of it. Vice and virtue, therefore, may be compar'd to sounds, colours, heat and cold, which, according to modern philosophy, are not qualities in objects, but perceptions in the mind...

            7 days later

            Would it change anything if instead of giving the state of group A as {1,0,0,1,1,1,0,1,1,0} or as the coarse-grained {6}, you give it as a function of both, {1,0,0,1,1,1,0,1,1,0,6}?

              Thanks Paul - it depends on precisely what's being done, but if you give the future state as a function of the microscale {0.... 1} then it will have low effective information. If you give it as a macroscale {6} then it will have higher effective information. Since both sets cover the same space of possibilities but one (the macroscale) has more information, there won't be any extra information by including the microscale. However, I think in general your intuition is correct that we can think about causation at multiple scales in some cases, although I also think this won't be true in all cases.

              Write a Reply...