Intention is Physical by Natesh Ganesh

Natesh Ganesh

Dear Edwin,

Sorry for the delayed response. I had a conference paper deadline and finally have some time to myself. Thank you for your detailed response and encouraging comments. It fills me greater confidence to keep working harder at a solution.

I really enjoyed your discussion on the effect of the consciousness field on electrons vs ions. It is an interesting point you make. Some colleagues of mine in the department are working on ion-based memristor devices, which might actually serve as a better substrate to interact with a consciousness field than an electronic device. Furthermore I completely agree with you on the concept of a 3D structure, than the traditional 2D architecture. I too am convinced that any system capable of a comparable consciousness should have some kind of 3D structure. Interestingly I am in discussion with them to possibly constructing 3D array of sorts with these ionic memristors, with the type of constraints that I talk about in my essay (if we figure out how to impose them) and just let it run in an input environment to see what it does. Should be very interesting I think.

I am about 6-7 months from finishing and in full writing mode, but I will definitely take a look at the resources you mentioned (especially the ones on pattern recognition). One can never learn enough and I am sure they will provide some new insight for me. Thanks.

Natesh

PS: I finally got around to rating your essay. I would appreciate it if you rate mine, if you havent already. If you have already, thank you very much!

Natesh Ganesh

Dear Robert,

Thank you for your encouraging reply. I am happy to hear that you liked my submission. Yes, I do think that "minimal dissipation" might provide a sufficient condition for emergence of goal-oriented agency.

Yes, I have across Bennett's work! I think he has been one of the most influential thinkers of our time!! I study the fundamental thermodynamic limits to computing as part of my dissertation and I often use the works of Landauer and Bennett. I also like his work on reversible computing, and am hoping the field will gain more momentum. This is my favorite paper of his "Dissipation-error tradeoff in proofreading." (Apologies for my long winded rant.)

Good luck on the contest. I will definitely take a look at your submission. Thanks.

Natesh

PS: My title is actually a play of words on Landauer's famous paper "Information is Physical".

Edwin Klingman

Dear Natesh,

I have now rated you (10). Past experience has indicated that there may be turbulence in the final hours, so I had planned to hold off to help you then, but perhaps increased visibility will help now. Some earlier essays that I pushed up for visibility were immediately given '1's by whatever trolls lurk in low places.

The final decisions are made by FQXi judges, and I think they will judge your work well.

I am very glad that you agree about the 3-D structure. What you say about ionic memristors is very interesting! I'm glad to hear this. I hope we stay in touch.

Best,

Edwin Eugene Klingman

Simon DeDeo

Dear Natesh --

Let me ask a very basic question. Say I take a simple Newtonian system, two planets orbiting around each other.

I hit one with a rock, and thereby change the orbital parameters. There's a map from the parameters that describe the incoming rock, and the resulting shift in the system. The system appears to have "learned" something about the environment with minimal (in fact, zero) dissipation.

If I let the rock bounce off elastically, then there is strictly no change in entropy. I could probably arrange the environment in such a way that the system would show decreasing amounts of change to rocks flying at random times in from a particular direction. In general, there will be nice correlations between the two systems.

Why is this open system not an inferential agent?

I suppose I'm trying to get a sense of where the magic enters for you. I think you're cashing out efficiency in terms of KL distance between "predictor" at t and world at time t+1, presumably with some mapping to determine which states are to correspond. This seems to work very well in a lot of situations. But you can also construct cases where it seems to fail. Perhaps because the notion of computation complexity doesn't appear?

Thank you for a stimulating read. It's a pleasure to see the Friston work cited alongside (e.g.) Jeremy England.

Yours,

Simon

Natesh Ganesh

Dear Simon,

Thank you for your comments and questions. This is a nice coincidence. I just finished reading about the Borgesian library and currently on section 2 "the physics of the gap". Great piece of writing and I will reach out on your page once I am done reading and re-reading and digesting it.

"Why is this open system not an inferential agent?"

--> Yes, it technically is for that very particular environment providing those particular input signals. If those planets saw ONLY the type of conditions, that allowed it maintain a specific macrostate at minimal dissipation, then we might have to entertain the possibility that it is an inferential agent in that environment. In section-2 of my submission, I introduced the notion of the link between minimal dissipation and learning. I added section-4 to not only show the link to Englands work, but also explain why we should focus on systems that are minimally dissipative in all the input signals from their environment, that they might encounter as they maintain their macrostate. For example, if we thought about a system that was minimally dissipative for one input signal but not the rest of it, I would think that system is not an inferential agent, unless the probability of that particular signal goes to 1.

"This seems to work very well in a lot of situations. But you can also construct cases where it seems to fail. Perhaps because the notion of computation complexity doesn't appear?"

--> Can you please give me a simple example. I seem to be not following here. If you mean that it is possible to construct simple cases of systems that are minimally dissipative in a particular environment, and do not learn anything, my first guess is that it does not possess the sufficient complexity to do so, hence not satisfying that constraint of the hypothesis. After all, there are large nice periods of blissful dreamless unconscious sleep where we dont learn or infer anything either, which would be explained by changes to our computational complexity while maintaining the minimal dissipation.

On a side note, I do wonder that given our finite computational complexity and if our brain is indeed a minimally dissipative system, might serve to explain why there might be some computational problems, that our brain simply cannot solve by itself.

I agree that both Friston and England's works are very influential, and drove to me to look for a link between the two. Hopefully I satisfactorily answered your great questions. If I have not, please let me know and I will take another crack at it.

Cheers

Natesh

PS: I am continuing to work on an updated version of the essay to better clarify and explain myself without the constraints of a word limit. The questions you have asked are very useful, and I will include explanations that version to better address them.

Simon DeDeo

Dear Natesh -- thank you for your very thoughtful response.

You asked me about this remark:

"I think you're cashing out efficiency in terms of KL distance between "predictor" at t and world at time t+1, presumably with some mapping to determine which states are to correspond. This seems to work very well in a lot of situations. But you can also construct cases where it seems to fail."

Saying "Can you please give me a simple example."

So an example would be running two deterministic systems, with identical initial conditions, and with one started a second after the first. The first machine would be a fantastic predictor and learner. There there's correlation, but some kind of causal connection, once initial conditions are fixed, is missing from the pair. Minimally dissipative.

Another example (more complicated, but works for proabilistic/non-deterministic evolution) would be the Waterfall (or Wordstar) problems. With a lot of work, I can create a map between any two systems. It might require strange disjunctive unions of things ("System 1 State A corresponds to System 2 State B at time t, C at time t+1, W or X at time t+2...") and be very hard to compute, but it's there. I'm not sure how dissipative the two could be, but my guess is that it's hard to rule out the possibility that the coarse-grained state spaces the maps imply could have low dissipation.

(Scott Aaronson has a nice piece on computational complexity and Waterfall problems--http://www.scottaaronson.com/papers/philos.pdf)

You see a version of this in the ways in which deep learning algorithms are able to do amazing prediction/classification tasks. System 1 it turns out, with a lot of calculations and work, really does predict System 2. But if System 1 is the X-ray image of an aircraft part and System 2 is in-flight airplane performance, does it really make sense to say that System 1 has "learned", or is inferring, or doing anything agent-like? Really the effort is in the map-maker.

Yours,

Simon

Natesh Ganesh

Dear Simon,

I address your comments/questions below:

"So an example would be running two deterministic systems, with identical initial conditions, and with one started a second after the first. The first machine would be a fantastic predictor and learner. There there's correlation, but some kind of causal connection, once initial conditions are fixed, is missing from the pair. Minimally dissipative."

--> Please bear with me, but I take my time in understanding all the tiny details completely. Correct me if I am wrongly characterizing what you are saying, If the two systems are run in the manner that you describe, are you saying that the joint system is minimally dissipative or just the second system? If they are jointly dissipative, then the correlation between the two would be plastic as expected. I mention this at the start of section 5, where I discuss how subsystem relationships should be plastic if the joint system is minimally dissipative. And the correlation will hence vary depending upon the input provided. Does that answer your point?

"Another example (more complicated, but works for proabilistic/non-deterministic evolution) would be the Waterfall (or Wordstar) problems."

--> Let me get back to you on this once I have a firmer grasp on what these problems are exactly. I remember reading about it in Aaronson's blog a while ago and I need to revisit it. Thank you for that particular link. I am an avid fan of his blog and work, and the updated version of the essay has references to his blog post on the Integrated Information Theory.

"You see a version of this in the ways in which deep learning algorithms are able to do amazing prediction/classification tasks. System 1 it turns out, with a lot of calculations and work, really does predict System 2. But if System 1 is the X-ray image of an aircraft part and System 2 is in-flight airplane performance, does it really make sense to say that System 1 has "learned", or is inferring, or doing anything agent-like? Really the effort is in the map-maker."

-->I agree that while deep learning networks are learning in a manner similar to us, there are large differences between us and deep learning algorithms. Along the lines of John Searle's Chinese room argument, I would argue that such algorithms are only syntatical and there are no semantics there. Furthermore running such algorithms on a von Neumann architecture GPUs (as they traditional are) means these are not minimally dissipative systems. I think the plastic subsystem connections are needed for any system to be minimally dissipative and von Neumann architecture does not have that. If we went to systems with neuromorphic architecture, then it becomes a lot more interesting I think.

I agree with you that the effort is really with the map-making and this is why I am very interested in unsupervised learning with an array of devices called memristors (Look for Prof. Yang's group at Umass,Amherst. They are doing cool things like this). Short of starting with an artificial primordial soup and evolving/self-organize an artificial brain on silicon in an accelerated manner, I think such an approach is the best way to test my ideas and build an agent remotely close to us (Since we know somethings about the final product aka our brain, we can cheat and start with an array of memristors since they can behave as neurons and synapses. How to impose other thermodynamic constraints on this array is something I am thinking about now). We just set up the array of physical devices without any preprogramming or map making, let it run and supply it with inputs and it is allowed to make its own maps and provide outputs. If such a system is able to answer questions about flight performance, based on x-ray image of the airplane, I think a) that would be amazing and b) we have to seriously entertain the possibility that it is an agent like us (I am not touching the question whether such an agent is conscious or not by a 10 foot pole haha)

I hope I didnt miss anything and answered your questions. Let me know if I need to further clarify more things.

Cheers

Natesh

PS: In all of this I think I might have to seriously step back and see if there is some fundamental difference between self-organized systems and those systems which are designed by another 'intelligent' systems, and if that changes things.

Simon DeDeo

Dear Natesh --

It's fun to go back and forth on this.

If the time-delayed system is indeed learning according to your scheme, this seems to be a problem for your scheme. Two independently-evolving systems should not be described as one "learning" the other. Of course, it is a limit case, perhaps most useful for pointing out what might be missing in the story, rather than claiming there's something bad about the story.

The machine learning case provides a different challenge, I think. You seem to agree that the real difficulty is contained in the map-making. But then this makes the prediction/learning story hard to get going without an external goal for the map-maker. Remember, without some attention to the mapping problem, the example of X-ray images predicting in-flight behavior implies that the X-ray images themselves are predicting/learning/in goal directed relationship to the in-flight behavior; not the algorithm, which is just the discovery of a mapping. More colloquially, when my computer makes a prediction, I have to know how to read it off the screen (printout, graph, alarm bell sequence). Without knowledge of the code (learning or discovered post hoc) the prediction is in theory only.

You write, "In all of this I think I might have to seriously step back and see if there is some fundamental difference between self-organized systems and those systems which are designed by another 'intelligent' systems, and if that changes things." I think that might be the main point of difference. I'm happy to use the stories you tell to determine whether an engineered system is doing something, and this seems like a really interesting criterion. Yet I'm just not sure how to use your prescriptions in the absence of (for example) a pre-specified agent who has desires and needs satisfied by the prediction.

Thank you again for a provocative and interesting essay.

Yours,

Simon

[deleted]

Dear Simon,

"It's fun to go back and forth on this."

-->Agreed.

"If the time-delayed system is indeed learning according to your scheme, this seems to be a problem for your scheme. Two independently-evolving systems should not be described as one "learning" the other."

--> I think I misunderstood the problem you had presented (a simple case of lost in translation I guess). If the two systems are evolving independently and there are no inputs being presented to either one of them, then I am not sure what it is that they can learn in the first place. But then again, if this is a limiting case of no inputs at all, then I must think about this further. Since my derivations start with the assumptions that there are some external inputs affecting the physical system in question, I would say that a system just evolving without being affected by external inputs and dissipating minimally is not learning anything. This is further captured by the fact that the mutual information complexity measure can serve as a measure of memory/history in the system.

"Remember, without some attention to the mapping problem, the example of X-ray images predicting in-flight behavior implies that the X-ray images themselves are predicting/learning/in goal directed relationship to the in-flight behavior; not the algorithm, which is just the discovery of a mapping."

--> I agree that the X-ray image itself cannot predict but a minimally dissipative system which is presented with the X-ray image as inputs that affect it's state transition might be capable of learning and predicting from the input image.

"Yet I'm just not sure how to use your prescriptions in the absence of (for example) a pre-specified agent who has desires and needs satisfied by the prediction."

--> I argue that my constraints specify which systems could be goal oriented agents in the first place. And that goals and desires are created and evolved as such systems interact with their input environment.

"Thank you again for a provocative and interesting essay."

--> Thanks for a very a stimulating discussion. I am pretty convinced that I should rename the minimal dissipation hypothesis to something like the "dissipation-complexity" tradeoff principle to reduce the confusion.

Cheers

Natesh

Simon DeDeo

Dear Natesh --

I'm just finishing up an article on learning and thermodynamic efficiency (using the Still & al. framework of driving), so I think my head's full of a set of ideas that are competing and overlapping with your insights here. To be clear, I think this is a fantastic piece, and one of the most provocative in a (very good) bunch.

I hope we see more cross-over work at the interface of origin of life, themodynamics, and machine learning and I encourage you to publish a version of this in a journal (you might consider Phys. Rev., or perhaps the journal Entropy).

Yours,

Simon

Natesh Ganesh

Dear Edwin,

Thank you for your kind rating. Yes, I agree with you on the sad trolling that has been going on, that I fear is hurting the contest overall. I was hit with 5 continuous 1's without any feedback, which sent my essay in freefall and left me disheartened earlier. Hopefully I will have the opportunity to have the work judged by the FQXi panel. Good luck on the contest and I would very much like to stay in touch. Thanks.

Natesh

Don Limuti

Hi Natesh,

The posts in this blog are as interesting a conversation as are in this contest. In particular your conversation with Ines Samengo is most interesting. More on that in a moment.

The wording of FQXi.org's contest is nebulous, unless you realize it is about the MUH of Tegmark. Tegmark's emphasis is about Mathematics. Landauer's emphasis is about Information. Your emphasis is about Intention. My emphasis is about how we choose. I would make a hierarchy as shown below:

"Mathematics is Physical"...........Tegmark

"Information is Physical"..............Landauer

"Intention is Physical"..................Ganesh

"Choice (intention from a personal viewpoint) is Physical (maybe), but we can never know it"......Limuti

I did read your essay and honestly I had trouble following it (I did however spot the insulated gate mosfet structures :))

The image your essay triggered in me was Valentino Braitenber's book "Vehicles, Experiments in Synthetic Psychology". It is easy to make the vehicles look as if they had "emergent" goals.

Non equilibrium thermodynamics as treated by you and Ines was interesting. Ines brought out the memory clearing needed by Maxwell's demon to control the entropy (I think I got that right). Perhaps this memory clearing is why we can't know how we choose. For example, move your finger. How did you do that? Do not point to MRIs or brain function. I maintain that you have no direct experiential record (knowledge or memory) of how you moved your finger. I believe the answer is that you moved your finger, but you do not know directly how you did that. Was Maxwell's demon involved? I know this is a bit esoteric, but would like to know what you think.

In my essay I hoped to get across how convoluted the language of determinism and freewill is. Don and Lexi each took a side. However, each also used Unconsciously the other viewpoint during the conversation.

You forced me to think....minor miracle. Therefore this is a super essay!

Thanks,

Don Limuti

Natesh Ganesh

Hi Don,

Thank you for your very kind comments. I am glad to see that you liked the essay. Ines's work was outstanding and it was very insightful to discuss ideas with her.

"The image your essay triggered in me was Valentino Braitenber's book "Vehicles, Experiments in Synthetic Psychology". It is easy to make the vehicles look as if they had "emergent" goals. "

--> I will check this book out.

"In my essay I hoped to get across how convoluted the language of determinism and freewill is. Don and Lexi each took a side. However, each also used Unconsciously the other viewpoint during the conversation."

--> Ha!! Wonderful. I did not immediately get that but it adds much more to your submission. Thanks.

Cheers

Natesh

Stefan Keppeler

Dear Natesh,

thanks for your kind comments on my page, which led me to your interesting essay.

I'm afraid, you lose me on page 2. What is [math]\mathcal{S}\text{?}[/math] A Hilbert space? What are the [math]\sigma\text{?}[/math] A basis for this Hilbert space? Similarly, what are [math]\mathcal{R}[/math] and the [math]\hat{x}\text{?}[/math] What does [math]\mathcal{R}_0\mathcal{R}_1[/math] mean? Is that some kind of product? The transition mappings [math]\mathcal{L}\text{,}[/math] are they unitary, stochastic, or...? You write that some time evolution is governed by a Schrödinger equation, what's the corresponding Hamiltonian? How is this Hamiltonian related to the [math]\mathcal{L}\text{?}[/math]

Or maybe we can go back one step, away from the technical details: What does it mean that a system has "constraints on its finite complexity"? And can I think of dissipation as energy transfer from the system to the heat bath?

Sorry for so many questions, I just feel I can't get the message, when I don't even understand the terminology on the first few pages.

Cheers, Stefan

PS: Sorry for the rendering - I don't know how to do inline math here. Each equation-tag causes a linebreak. :-(

Natesh Ganesh

Hi Stefan,

No problem at all. I had the same problem and pretty much gave up on using Latex in this forum :D. Given the word limit, I could not get into explaining all the terms you had stated in detail, but here is a paper with all the details- Ganesh, Natesh, and Neal G. Anderson. "Irreversibility and dissipation in finite-state automata." Physics Letters A 45.377 (2013): 3266-3271. Let me know if you are having trouble accessing this.

The paper was written for deterministic automata but the extensions to stochastic mappings hold. The entire universe of referent-system-bath evolve unitarily but the system evolution can be non-unitary (probably is). Shortened version would be S-system in which FSA is instantiated with states \sigma. R=R0R1 is the joint system of past inputs R0 and present input R1, with x being a string in that distribution of inputs (In a classical case, all of these are essentially random variables). L is the transition mapping for which the corresponding Hamiltonian of the global joint system can be constructed so as to achieve the necessary state transition.

"What does it mean that a system has constraints on its finite complexity"?

--> If the complexity of the system can be captured by a mutual information measure, then a finite state automata with finite number of states can only have finite complexity. When we perform optimization of a variable, while keeping another condition constant, we call it constrained optimization and the condition as a constraint.

"And can I think of dissipation as energy transfer from the system to the heat bath?"

---> Yes! Thats exactly what it is. Details are in that paper again.

Thanks for your questions. Wish I had more space to explain all the terms in details. I am working on a more formal paper now and hopefully I can be a lot more detailed in that so as to avoid confusion. Let me know if there any more points to be clarified and I shall be happy to do it.

Cheers

Natesh

Stefan Keppeler

Dear Natesh,

thanks, after reading your Phys. Lett. A article, I think I understand the definitions. I think I also understand roughly how you obtain the bound (3) in your article. There is a similar (but not identical?) bound on page 2 of your essay, which I think is neither derived in your article nor in your essay -- or did I overlook anything?

Cheers, Stefan

Natesh Ganesh

Hi Stefan,

"There is a similar (but not identical?) bound on page 2 of your essay, which I think is neither derived in your article nor in your essay -- or did I overlook anything?"

--> Yes, the bound in the essay was not derived here but is an extension of the bound in the Phys. Lett. paper. The bound in that paper was derived for independent inputs i.e. R0 and R1 are independent. The bound in the essay is derived for correlated R0 and R1, thus generalizing the bound from Phys. Lett. paper (I am writing a new paper on this generalization but it will hold if you follow the same set of steps from the earlier paper). The bound in the essay will reduce to the one in the 2013 paper if you assume R0 and R1 have zero correlations with the last term in equation (3) going to zero. Hope that explains everything. I am glad to see you are being extremely rigorous with the essay. Please keep the questions and comments coming.

Cheers

Natesh

Stefan Keppeler

Hi Natesh,

I'm slowly making progress. It's not so easy since it's essentially all new for me...

In your Sec. II you state and justify your minimal dissipation hypothesis. Towards the end of Sec. II you conclude that "learning dynamics are inevitable in a trade off between energy dissipation and statistical complexity". Your essay prompted me to have a (superficial) look at Still et al. 2012, who conclude that "making a predictive model of the environment and using available energy efficiently [are fundamentally related]". Is that essentially the same as your minimal dissipation hypothesis or is there a subtle difference which I'm missing?

Thanks for bearing with me, cheers, Stefan

Natesh Ganesh

Hi Stefan,

"Is that essentially the same as your minimal dissipation hypothesis or is there a subtle difference which I'm missing?"

--> The link between learning and energy dissipation itself is not new. Energy efficiency as a possible unifying principle has been touted before. Still obtains the bounds in her paper (derived under different assumptions) and suggests the same idea of a link between the two. The main difference is that I go further than that and hypothesize that learning is simply a manifestation of energy efficient dynamics, and that (explained in section 4 essay) perhaps we need to look at a framework in which evolution and learning as manifestation of larger thermodynamic principles. The evolution part has been suggested by England's work (discussed in the essay) and I relate it back to the minimal dissipation hypothesis and show how framing the problem as I have in section 2, we can now relate it back to known ideas in machine learning, neuroscience, etc. I saw a recent video by Still where she was trying to do something similar as well starting from her derivation of the bound and a different setup. I am reaching out to her to get her thoughts on this. Hope that answers your question.

Cheers

Natesh

Stefan Keppeler

Hi Natesh, thanks for guiding me through your innovative work. And good luck for contest, Stefan

Cristinel Stoica

Hi Natesh,

Very interesting and well-written essay! I liked the idea of the minimal dissipation hypothesis, and how you used it to learning dynamics and the emergence of goal-oriented agency and the biological evolutionary process.

Best regards,

Cristi

Natesh Ganesh

Hi Cristi,

Thanks for your comments. It is definitely a very interesting idea and I intend to keep working on it. I have read and rated your essay and it was very good piece of work. Good luck in the competition. Thanks.

Cheers

Natesh

PS: Kindly rate my essay if you havent already. If you have, thank you very much for doing so.

Vladimir Fedorov

Dear Natesh,

With great interest I read your essay, which of course is worthy of the highest rating.

I'm glad that you have your own position

«I will present the fundamental relationship between energy dissipation and learning dynamics in physical systems. I will use this relationship to explain how intention is physical, and present recent results from non-equilibrium thermodynamics to unify individual learning with dissipation driven adaptation.»

«I will refer to as the minimal dissipation hypothesis»

Your assumptions are very close to me «the phase space characterization of self-organized systems which dissipate minimally, improved understanding of internal control mechanisms to maintain criticality, and detailed formulations of cognitive states as phase transitions in a (non-chaotic strange) attractor.»

You might also like reading my essay , where it is claimed that quantum phenomena occur in the macrocosm due to the dynamism of the phase state of the elements of the medium in the form de Broglie waves of electrons, where parametric resonance and soliton occur, and this mechanism of operation is analogous to the principle of the heat pump. At the same time, «the minimal dissipation hypothesis» is realized.

I wish you success in the contest.

Kind regards,

Vladimir

« Previous Page Next Page »