To me, the most simple way to think about Bell inequalities is as (hyper-)planes delimiting the set of convex combinations of value-assignments of possible measurements. That is, for e. g. the CHSH-setup, you have four measurements, and hence sixteen value assignments from (0,0,0,0) (or -1, which I use in the paper) to (1,1,1,1). Then, the general state of the system is a 16-dimensional vector of unit 1-norm, i. e. a probability distribution yielding the probability of finding each of the sixteen possible value assignments. The states which have only one entry equal to 1, and the rest equal to 0, then form the vertices of a convex polytope; this convex polytope can equally well be described in terms of its facets, which are the Bell inequalities of this setting.
Given this, I think how Bell inequalities are violated in my setting becomes readily apparent: if all Bell inequalities are obeyed, then you can construct a description in terms of the above, as a convex mixture of fixed value assignments. But the diagonal argument shows precisely that you can't make such an assignment. Hence, in some cases at least, it follows that we can't formulate a description of the system in the above terms; but then, in these cases, some Bell inequality must be violated.
Of course, this doesn't get me anywhere near deriving the Tsirelson bound. Non-computability lurks there, too, as was just recently shown (https://arxiv.org/abs/2001.04383).
As for counterfactual definiteness, I think a strength of my approach is that it gives a straightforward explanation where and when it is applicable---namely, only when reasoning about values explicitly provided by my f(n,k). We can talk counterfactually about the value of the spin (in some particular direction) of a distant particle, reasoning that it would have been the same even had we made a different local measurement, only if there is a definite value provided by the maximum information attainable about the system; but if, for example, that information is instead taken up by yielding a definite value for the correlation between two observables, then such talk becomes meaningless.
So if our knowledge about the system is given by (x-spin 1 is up, x-spin 2 is down), we can consider that x-spin 2 would have been down, even if we had made a different measurement on 1; but if it's instead given by (x-spin 1 is up, x-spin 2 is opposite that of 1), then the fact that the x-spin of 1 is some particular way is a necessary prerequisite for being able to reason about x-spin 2---a prerequisite that we loose if we imagine that we had made some other measurement on 1.