Stefan,
An inconsistency does indeed remain. But that too is caused by additional misinterpretations of what is happening. Physicists have been using the wrong analogies, to think about this, for a very long time.
I want you to look at a diagram, while I describe this. But to avoid violating someone's copyright, I don't want to copy and paste it here. Please open another browser window. Under Wikipedia's page for "Wheeler's delayed choice experiment", there is a heading for "External links", the first of which is "Wheeler's Classic Delayed Choice Experiment by Ross Rhodes". If you click on that, it should take you to "www.bottomlayer.com/bottom/basic_delayed_choice.htm". The figure of interest is on that page.
Imagine "1" to be a radio transmitter antenna. "5" are two radio receivers, attached to highly directional, large, parabolic antennae. The slits at "2" create "Multi-Path" Interference. The radio antenna is simultaneously transmitting multiple stations, or channels. So there is also "Multi-Channel" Interference.
To receive a signal with no interference, you must use a frequency band-limited filter to remove "Multi-Channel", and a spatial beam-limited filter to remove the "Multi-Source". The telescopes (large parabolic antennae) at "5" act as the spatial filters.
Next, consider that, in addition to producing the "Multi-Path", the two slits act as a crude diffraction grating, dispersing the spectrum into the interference "fringes", which are the different "channels", that when combined together, result in the "Multi-Channel", so by "tuning" to just one "fringe", you have effectively tuned to just one channel, and thereby eliminated the "Multichannel" Interference. Consequently, there is no surprise that you see no interference, since the apparatus filtered out both the "Multi-Path" and the "Multi-Channel". So called Quantum erasers", in effect, merely remove the filters to restore the interference.
The last sentence in the Introduction, on the Wiki page cited above, reads "The fundamental lesson of Wheeler's delayed choice experiment is that the result depends on whether the experiment is set up to detect waves or particles." Yes indeed they do, and for exactly the reasons given in my essay and the related posts under both my essay and Lorraine Ford's. More specifically, it depends on whether the experiment was set up to detect a Fourier Superposition (waves) or a single frequency (particle). Thus, on page 6 of my essay, I stated that "...the correct model for the observations is not a superposition, but is indeed a single frequency wave..." Having the correct model matters, It matters a LOT; why it matters was discussed, at some length, in the posts under Lorraine Ford's essay. Basically it boils down to this; knowing what to look for, enables one to recognize and filter out all the "crap", that does not look like what one is looking for.
This brings us to the two Gaussian filters described in my essay. Look again at the figure cited above. Replace each of the two receiver/telescopes at "5", with a pair of of identical receivers/telescopes. Carefully design the frequency filter passbands to have Gaussian responses, as described in the essay. Now, as described, you have a pair of particle counters that can be used to INFER, not measure, the single frequency, via the ratio of particle counts, with an accuracy that greatly exceeds the uncertainty principle. That is why having the correct model matters.
Next, consider the following:
The Fourier transform of a Gaussian function, is itself, another Gaussian function. Furthermore, a Gaussian function yields the minimum possible time-bandwidth product, and thus the minimum "uncertainty" in the Fourier uncertainty principle.
Returning, for the moment, to the radio signal analogy, remove the slits at "2", digitize the signal, multiply it (windowing) by a Gaussian shaped "window" and compute a Discrete Fourier Transform (DFT). Voila! You have just constructed not just a pair of Gaussian filters, but an entire "filter bank", each consecutive pair of which, can be used as described in my essay. Each of these pairs can be used to estimate the signal frequency. But, of course, due to the narrow bandwidths, pairs that are tuned closest to the signal frequency, have much higher signal-to-noise ratios, and thus provide much more accurate frequency estimates. In the formula for a Fourier Transform, multiplying by the complex exponential represents a tuning operation. The subsequent integration is a lowpass filtering operation. Multiplying by the Gaussian shapes the frequency response of that lowpass filter, to be another Gaussian function. Hence, a Fourier Transform can be viewed as a filter bank; a large number of tuned receivers. In a DFT, these receivers may be numbered, 1,2,3...n, and are spaced "D" Hz apart.
In the formula in the essay: f = nD+D/2-(cD/2)ln(a(f)/b(f)). nD+D/2, is thus the center frequency of a receiver pair. The amplitude ratio term, in effect, yields a fine frequency interpolation. The reason I mention all this is to point out that the amplitude (particle count) ratio only yields the frequency offset from the pair's center frequency. The complex exponential in the formula for the Fourier Transform, acts a a single-stage, frequency tuner, known in the old radio receiver literature as a heterodyne. Now replace the two-slit "diffraction grating" and you have introduced a second-stage of frequency tuning, a superheterodyne.
In effect, this entire apparatus is nothing more than a crude superheterodyne tuner, feeding into a spacial filter, followed by an AM receiver. Adding the paired AM receivers converts it into an FM receiver.
Rob McEachern