Abstract

Goodman’s grue paradox is unassailable if we hold that instances confirm generalizations, for the evidence at hand is both an instance of ‘All emeralds are green’ and ‘All emeralds are grue’. But if we consider what bearing the denials of the two hypotheses have on the evidence, a very different picture emerges. This paper argues that the denial of ‘All emeralds are grue’ is more positively relevant to the evidence to date than the denial of ‘All emeralds are green’ is to the evidence and that therefore ‘All emeralds are green’ is better supported by the evidence than ‘All emeralds are grue’. The measure of support we employ—S(h|e) = p(e|h) – p(e|~h)—is motivated by the familiar relevance condition of confirmation, namely e confirms h only if p(h|e) > p(h).

I. The Definition of Support

If evidence e confirms hypothesis h, it seems reasonable to suppose e raises the probability of h over its prior value, it makes it more likely that h is true. Other conditions may be necessary for e to confirm h, but what’s called the relevance condition, namely p(h|e) > p(h), seems to be a bare minimum. So we affirm the following Axiom:

**Axiom 1**. e confirms h only if p(h|e) > p(h)

With Bayes Theorem and a little manipulation (proof in section VI) this necessary condition can also be stated as:

**Lemma 1.** e confirms h only if p(e|h) > p(e|~h).

This motivates the following definition of degree of support. The degree of support e gives to h is: S(h|e) = p(e|h) – p(e|~h). From this it follows that e supports h more than e_{a} supports h_{a}, iff p(e|h) – p(e|~h) > p(e_{a} | h_{a}) – p(e_{a} | ~h_{a}). In this paper we use this formalism to solve Goodman’s New Riddle of Induction, though Hume’s ‘old riddle of induction’ will remain unsolved.

II. Goodman’s Riddle

Hume’s old riddle of induction is to give a justification for thinking that future predictions will resemble past predictions. Goodman suggests Hume’s answer that observed regularities create a habit in the mind to expect similar outcomes in the future is incomplete because it fails to give conditions under which some regularities, as opposed to others, get habituated. Specifying those conditions is Goodman’s new riddle of induction. Goodman motivates the riddle as follows.

Goodman asks us to consider two competing hypotheses: h = ‘All emeralds are green’ and h_{a} = ‘All emeralds are grue’ where x is grue iff x is examined before some future time t and is green or x is not examined before t and is blue. Coming at the problem afresh, we would no doubt start by examining say n emeralds and noting that they are green. Let ‘The n emeralds examined have been green’ be our evidence e. Goodman would have us note that since it is before t, the n emeralds can also be described as grue, that is e is true iff e_{a} is true, where e_{a} is ‘The n emeralds examined have been grue’. But if instances confirm generalizations, as seems at first reasonable, e_{a} confirms h_{a} just as e confirms h, though h and h_{a} yield very divergent projections about our inspection of emeralds after time t. We want to say e confirms h but e_{a} doesn’t confirm h_{a}, that green is projectible to future instances while grue is not. At first grounds for these assertions seem straightforward: h_{a} uses a funny bifurcated predicate that makes explicit reference to a time instant; we might think that’s why h_{a} is not confirmed by e_{a} and grue is not projectible to future instances. But Goodman has a ready response: if we define bleen as x is bleen iff x is examined before t and is found to be blue or is not examined before t and is green, and take grue and bleen as basic then our familiar green and blue will have ‘funny’ definitions. Thus x is green iff x is examined before t and found to be grue or is not examined before t and is bleen. So it seems our initial objection is question-begging: with both pairs green/blue and grue/bleen equally ‘funny’ as defined from the other vantage point, why should we prefer one predicate and its associated hypothesis over the other. Goodman’s riddle is to give non-question-begging criteria for excluding predicates like grue and bleen from hypotheses. His basic answer is that green is projectible while grue is not because green is entrenched in the sense of being more often used in successful past projections.

The problem with Goodman’s solution is that it is a ‘skeptical’ solution: entrenchment doesn’t seem to put us on the kind of solid footing we had hoped for as a basis for science. Moreover, Goodman gives no justification for choosing a green predicate over a grue one when both are equally entrenched or neither entrenched at all as would be the case at the dawn of a science. Below we give a ‘straight’ solution of the paradox based on a little application of probability theory. Lest the notation seem intimidating, let me just mention the basic idea is as simple as knowing that there are more ways of taking say 2 things out of 6 things than taking them out of 4 things.

III. The Argument

We’ve already defined e, e_{a,} h and h_{a}. Let’s first note that both pairs e and h, and e_{a} and h_{a} satisfy the modified version of the relevance condition, namely p(e|h) > p(e|~h) and p(e_{a}|h_{a}) > p(e_{a}|~h_{a}). Both inequalities hold because the left sides are equal to 1 while the right sides are less than 1; the left sides equal 1 because both h and h_{a} entail e and e_{a }respectively. This alone doesn’t mean e_{a} confirms h_{a}, or for that matter e confirms h. In fact, that it is satisfied for both pairs, should warn us that the relevance condition alone is not sufficient for confirmation. Still armed further with our concept of support we should be able to ask whether e supports h more firmly than e_{a} supports h_{a}.

But first we need to state the assumptions we will use in the argument. They are:

- There are N emeralds in the universe.
- An emerald’s color doesn’t change—it’s just that under the grue hypothesis, the emeralds examined before t are green and those not examined by t are blue. Note: the possibility that the color could change is Hume’s old riddle of induction, which we can say with Goodman is dissolved if not solved.
- Sampling in e is done without replacement as is generally the case in science where specimens are ‘tagged’; the argument can work with replacement as well though I’ve not shown it.
- Let Dj be the hypothesis ‘Exactly j emeralds are not green’ and let D
_{a}j be the hypothesis ‘Exactly j emeralds are not grue’. Note: ~h = U Dj and ~h_{a}= U D_{a}j with j ranging from 1 to N. Also note: all the Dj are mutually exclusive and all the D_{a}j are mutually exclusive. We assume p(Dj) = p(D_{a}j) for all j. Due to the mutual exclusivity of the Dj’s and also of the D_{a}j’s, this implies the key prior probabilities are equal, i.e. p(~h) = p(~h_{a}) and p(h) = p(h_{a}), conceding something to the skeptic.

We argue as follows:

By definition we have that e supports h more than e_{a }supports h_{a}, iff:

p(e|h) – p(e|~h) > p(e_{a}| h_{a}) – p(e_{a}| ~h_{a}).

In our case since p(e|h) = p(e_{a}|h_{a}) = 1, we have e supports h more than e_{a }supports h_{a }if p(e|~h) < p(e_{a}|~h_{a}).

Now using the expansion described in assumption 4, we have e supports h more than e_{a} supports h_{a }if p(e| U Dj) < p(e_{a}| U D_{a}j).

We progress further by stating the following theorem dependent on assumption 4. The theorem is proved in section VI.

**Theorem 1.** If p(e|Di) < p(e_{a}|D_{a}i) for some i and p(e|Dj) < p(e_{a}|D_{a}j) for all j (and D and D_{a} defined as in assumption 4), then p(e| U Dj) < p(e_{a}| U D_{a}j).

With this theorem, we only need to show p(e|Di) < p(e_{a}|D_{a}i) for some i and p(e|Dj) < p(e_{a}|D_{a}j) for all j in order to show n green emeralds support ‘All emeralds are green’ more than n grue emeralds support ‘All emeralds are grue’.

Now p(e|Dj) is the probability that the n examined green emeralds come from the N-j green emeralds in the whole population (by hypothesis, exactly j emeralds are not green). Thus p(e|Dj) = C(N-j,n) / C(N,n) where C(x,y) is the combination of x things taken y at a time.

Now what about p(e_{a}|D_{a}j)? I submit that p(e_{a}|D_{a}j) = C(N-j+X,n)/C(N,n) with X > 0 for all j. If true, this would mean p(e|Dj) < p(e_{a}|D_{a}j) for all j, which by Theorem 1 would imply p(e| U Dj) < p(e_{a}| U D_{a}j). By definition of the Dj’s this is equivalent to p(e|~h) < p(e_{a}|~h_{a}). This, given our definition of support and the fact that p(e|h) = p(e_{a}|h_{a}) = 1, would mean e supports h more than e_{a }supports h_{a}.

But why does p(e_{a}|D_{a}j) = C(N-j+X,n)/C(N,n)? What is the X? Well p(e_{a}|D_{a}j) is the probability that n grue emeralds are observed given that there are exactly j non-grue emeralds. I.e. the n grue emeralds can come out of N-j grue emeralds in the whole population. But there is another source for grue emeralds before t: of the j unexamined non-grue emeralds, there is a positive probability that some of them in not being blue after t will in fact be green. And they would’ve been observed to be green if they had been observed prior to t because, under pain of reduction to the old riddle, emeralds don’t change color.

This point can be made more boldly by considering another hypothesis: ‘All emeralds are GNG’, where x is GNG if it’s examined before t and found to be green or it’s not examined by t and is not-green. This is after all the essence of Goodman’s paradox; grue is just a rhetorical device. But what is p(e_{a}|D_{a}j) for this hypothesis? It’s simply

C(N-j+j,n)/C(N,n) or 1. This is intuitive because the probability of observing n emeralds to be GNG is 1, regardless of how many GNG emeralds there are (though there should be some emeralds). Admittedly, the GNG hypothesis is not entirely vacuous: it specifies a time instant before which emeralds are green and at or after which they are non-green. But that makes no difference to our evidence statement. If an emerald is GNG and no mention is made of the time that it’s observed, it can come from anywhere in the population no matter how many emeralds are GNG. It might be urged that our evidence statement should mention the time by which the emeralds have been observed. But if the statement, ‘n green emeralds have been observed’ is sufficient without any mention of time—why should the observation of n GNG emeralds or grue emeralds be incomplete without mention of time?

*Key Point*: Denying ‘All emeralds are grue’ is more positively relevant to our evidence than denying ‘All emeralds are green’. This is because the n grue emeralds observed before t could have come from the N-j grue emeralds or from the j non-grue emeralds unobserved by t that are green.

The X in the formula for p(e_{a}|D_{a}j) addresses the second possibility. The moral is unobserved cases have a bearing on the probability of the observed cases being a certain way. More unobserved green emeralds in the population make observing green emeralds more likely. “They also serve who only stand and wait.” (John Milton, Sonnet XIX).

Since this argument shows p(e| U Dj) < p(e_{a}| U D_{a}j), we have shown e supports h more than e_{a }supports h_{a}.

IV. Some ‘Tidying Up’

An objection may be raised that p(e_{a}|D_{a}j) just equals C(N-j,n)/C(N,n) because the supposition of D_{a}j means there are exactly j non-grue emeralds. To get around this objection, a clarification needs to be made: the numerator for p(e_{a}|D_{a}j) shouldn’t be the number of ways n grue emeralds can be taken out of the grue emeralds there *are*; it should be the number of ways that n grue emeralds could be taken out of the grue emeralds that *could’ve been*. This is really the correct way to compute this probability. We just typically ignore the more complicated wording because normally whether to count something as an instance doesn’t depend on when it was observed.

Another clarification needs to be made about p(e_{a}|D_{a}j): p(e_{a}|D_{a}j) = C(N-j+X,n)/C(N,n) if there are X green emeralds among the j non-grue emeralds in the model population. But what if it just so happens that there are no green emeralds among the j non-grue emeralds as would be true for small j? Then p(e_{a}|D_{a}j) can’t be shown to be greater than p(e|Dj) for that j using simple counting. This presents no serious problem for two reasons:

- p(e|Dj) can be shown to be less than p(e
_{a}|D_{a}j) for even that j because the idea remains intuitive. Intuitively, denying all emeralds are grue partly raises the probability of an emerald being green by denying a contrary—that an emerald is blue—for some emeralds. This doesn’t happen when we deny all emeralds are green. A more complicated formula that ‘cashes out’ this intuition can no doubt be worked out but it needn’t concern us here; and - The consequent of Theorem 1 goes through so long as p(e|Di) < p(e
_{a}|D_{a}i) for some i because p(e|Dj) will be less than or equal to p(e_{a}|D_{a}j) for all j.

With these issues tidied up, let us turn now to some objections a Goodmanian might make.

V. Some Goodmanian Objections

A Goodmanian may argue as follows: emeralds don’t change color; grue emeralds stay grue and non-grue emeralds stay non-grue regardless of when they’re observed; we green/blue theorists only think they change color…just as a grue/bleen theorist would think that to suppose emeralds stay green is to suppose they change color from grue if they had been examined before t to bleen if they were not examined by t. But this is a bigger skeptic than I’m willing to tackle. To suggest that a given emerald could go from green to blue as would have to be the case if it was to remain grue is to simply pose Hume’s old riddle of induction which asks us to give a justification for why the future will resemble the past. This I’m prepared to concede has only a skeptical solution. Goodman’s riddle, in contradistinction to Hume’s, is to give conditions under which some regularities, as opposed to others, get habituated and are projectible. Goodman’s solution is ‘skeptical’ insofar as his conditions under which regularities are projectible are essentially the same as the conditions under which they get habituated, i.e. they have been successfully projected in the past. In the case of green emeralds as opposed to grue emeralds, I’ve given a straight, non-skeptical solution: green emeralds support ‘All emeralds are green’ more than grue emeralds support ‘All emeralds are grue’. This is because the evidence, ‘n emeralds have been observed to be green’ is less likely given the denial of the green hypothesis than, ‘n emeralds have been observed to be grue’ is given the denial of the grue hypothesis. This in turn is due to the fact that the n emeralds observed to be grue could’ve come from green emeralds before t, blue emeralds at or after t *and* green emeralds after t (had they been observed before t).

Goodman’s standard grue/bleen move is so familiar by now that some may think it’s begging to be used. Let’s see what happens if we ‘grue/bleen’ our argument for p(e|Dj). Would we also get p(e|Dj) = C(N-j+X,n)/C(N,n) for some positive X? That this doesn’t happen can be seen as follows.

Our argument for why p(e_{a}|D_{a}j) = C(N-j+X)/C(N,n) for some positive X was: p(e_{a}|D_{a}j) is the probability that n grue emeralds are observed given that there are exactly j non-grue emeralds. I.e. the n grue emeralds can come out of N-j grue emeralds in the whole population. But there is another source for grue emeralds before t: of the j unexamined non-grue emeralds, there is a positive probability that some of them in not being blue after t will in fact be green. And they would’ve been observed to be green if they had been observed prior to t because, under pain of reduction to the old riddle, emeralds don’t change color.

Doing a grue/bleen parallel we have: p(e|Dj) is the probability that n green emeralds are observed given that there are exactly j non-green emeralds. I.e. the n green emeralds can come out of the N-j green emeralds in the whole population. But there is another source for green emeralds before t: of the j unexamined non-green emeralds, there is a positive probability that some of them in not being bleen after t will in fact be grue. *And they would’ve been observed to be grue if they had been observed prior to t because, under pain of reduction to the old riddle, emeralds don’t change color**.*

But the parallel fails because the italicized sentence is false: if an emerald is grue after t, it is blue, and it would’ve been observed to be blue if it had been observed prior to t because under pain of reduction to the old riddle, emeralds don’t change color. This means it would’ve been observed to be blue or bleen prior to t not grue or green so there is no additional source for green emeralds before t when we’re computing p(e|Dj).

The parallel construction fails because, while a green emerald observed after t would’ve been green if observed before t, a grue emerald observed after t would’ve been bleen if observed before t. Under pain of reduction to the old riddle, once a green emerald, always a green emerald. On the other hand, assuming emeralds don’t change color, doesn’t commit us to grue emeralds remaining grue regardless of when they are observed—in fact, it commits us to just the opposite: grue emeralds observed after t would’ve been blue or at worst bleen if observed before t. Thus they don’t raise the probability p(e|Dj) over the simple C(N-j,n) / C(N,n), making it less than p(e_{a}|D_{a}j) for at least some j which proves our result.

Now it just remains to prove Lemma 1 and Theorem 1 stated in the body of the paper.

VI. The Proofs

First we prove, Lemma 1., namely:

**Lemma 1.** e confirms h only if p(e|h) > p(e|~h).

proof:

1. e confirms h Assumption

2. p(h|e) > p(h) 1, Axiom 1.

3. 1 – p(h|e) < 1 – p(h) From 2.

4. p(~h|e) < p(~h) From 3.

5. p(~h|e) = p(e|~h) p(~h) / p(e) Bayes Theorem

6. p(e|~h) p(~h) / p(e) < p(~h) From 4,5

7. p(e|~h) < p(e) From 6

8. p(e) = p(e|h) p(h) + p(e|~h) p(~h) From conditionalizing e on h and ~h

9. p(e|~h) < p(e|h) p(h) + p(e|~h) p(~h) From 7,8

10. p(e|~h)(1-p(~h)) < p(e|h) p(h) From 9

11. p(e|~h)p(h) < p(e|h) p(h) From 10

12. p(e|~h) < p(e|h) From 11, QED.

Lastly we prove Theorem 1., namely:

**Theorem 1.** If p(e|Di) < p(e_{a}|D_{a}i) for some i and p(e|Dj) < p(e_{a}|D_{a}j) for all j (and D and D_{a} defined as in assumption 4), then p(e| U Dj) < p(e_{a}| U D_{a}j).

proof:

1. (Assumption 4 restated) Let Dj be the hypothesis ‘Exactly j emeralds are not green’ and let D_{a}j be the hypothesis ‘Exactly j emeralds are not grue’. Note: ~h = U Dj and ~h_{a} = U D_{a}j with j ranging from 1 to N. Also note: all the Dj are mutually exclusive and all the D_{a}j are mutually exclusive. We assume p(Dj) = p(D_{a}j) for all j (note: in the below we omit the phrase ‘for all j’). This implies the key prior probabilities are equal, i.e. p(~h) = p(~h_{a}) and p(h) = p(h_{a}).

2. p(e|Di) < p(e_{a}|D_{a}i) for some i Assumption

3. p(e|Dj) < p(e_{a}|D_{a}j) Assumption

4. å p(e|Dj) < å p(e_{a}|D_{a}j) From 2,3

5. å p(e I Dj) < å p(e_{a} I D_{a}j) From 4, Definition of Conditional

Probability and that p(Dj) = p(D_{a}j)

(Assumption 4.)

6. p(e|U Dj) = p(e I (U Dj)) / p (U Dj) Definition of Conditional Probability

7. p(e|U Dj) = p(U (e I Dj)) / p (U Dj) From 6, Distribution of I over U

8. p(e|U Dj) = å p(e I Dj) / p (U Dj) From 7, Definition of U and mutual

exclusivity of the Dj

9. p(e|U Dj) < å p(e_{a} I D_{aj}j) / p (U Dj) From 5,8

10. p(e|U Dj) < å p(e_{a} I D_{a}j) / p (U D_{a}j) From 9 and that p(~h) = p(~h_{ a})

11. p(e|U Dj) < p(U (e_{a} I D_{a}j)) / p (U D_{a}j) From 10, Definition of U and mutual

exclusivity of the D_{a}j

12. p(e|U Dj) < p(e_{a} I (U D_{a}j)) / p (U D_{a}j) From 11 and Reverse of Distribution

of I over U

13. p(e|U Dj) < p(e_{a}| U D_{a}j) From 12 and Definition of

Conditional Probability. QED.

References:

Goodman, Nelson. 1955. *Fact, Fiction, and Forecast*. Cambridge, Mass.: Harvard University Press.

I don’t even know how I ended up here, but I thought this post was great. I do not know who you are but definitely you’re going to a famous blogger if you are not already Cheers!