Thanks, Sean. I've just printed your essay, and am looking forward to reading it.
As for your questions, the primary concept in the paper is that of a nonlocal constraint, not about "nonlocality" in general. The Gauss Law constraints are local, by my usage, even though they are expressed in terms of derivatives. E.g., the Gauss Law for the electric field insists that the charge density at a point is equal to the divergence of the electric field at that point. What makes it local is that all the quantities we're concerned with have to do with an infinitesimal neighborhood around each point. On the other hand, if the electric field were sensitive to charges a finite distance away, we would have an example of a nonlocal constraint.
There's a somewhat more interesting example of a nonlocal constraint in the paper, in which one has a universe with timelike compactification. Spatial compactification will also do, and has in fact been studied. Either way, the periodicity one finds in these models means that the matter configuration at one point may determine completely the matter configuration at other points. That is a kind of nonlocal constraint.
In GR, I think the nonlocality is of a different sort, though I'm not sure I follow the question entirely. Offhand, I'd say that the nonlocality of the observables has to do with the diffeomorphism invariance (hence the physical meaninglessness of talking about properties "at a point"), but that the solvability of the constraints would be distinct. In any case, I'd say that GR is local in that if you're giving data on a Cauchy surface, it can be freely varied from point to point (at least as long as it's sufficiently differentiable).