Thanks so much George! Actually, Larry was on my PhD thesis committee at UW-Madison. He does excellent work.
There's a handful of analytic philosophers who have thought about these issues, starting with Yablo. There's also List and Menzies, as well as Shapiro and Sober. All these people do incredible work and have all touched on issues related to causal emergence at some point or another, although most are focused more on problems of mental causation. None have, as far as I know, argued explicitly for the theory laid out here and elsewhere.
One constant problem that I have with this research is the consequence of framing it in terms of the exclusion problem. It's a good way to frame it because it hammers the problem home, but it's a bad way because the exclusion argument is a well-known philosophical issue and people then immediately assume this is a philosophical solution to a philosophical problem. But as I indicate in the essay, I'm using the exclusion argument as a stand-in for a more general issue concerning causal structure, information, and model choice.
Ultimately, I think this requires a scientific (or mathematical) theory, composed of: A) formalizing supervenience as changes in scale or as highlighting only subsets of the system's state-space; B) some sensitive measure of causation and/or information (I've used information theory and Pearl's causal calculus) that can handle things like noise, is proven to be related to various important causal properties, doesn't give nonsensical answers for simple scenarios, etc; C) actually checking and proving that B can be higher across various scales made with A; D) explaining why it's theoretically even possible that the macro can beat the micro; E) hopefully some applications.
Originally we argued in 2013 for the D that macroscales reduce the noise in the system (over both the past and the future), and that's how causal emergence occurs. I think there's another interesting way of framing it, which is that macroscales can be thought of as codes (as I argue here and elsewhere), and the macro can beat the micro because of Shannon's noisy-channel coding theorem. Hopefully both these explanations help with E: actual applications.