It is almost axiomatic that after any team match, players from at least one pair (and often from both) feel that they “played better” than at the other table. In some round-robin and Swiss tournaments, the organizers go to the trouble of providing Butler scores.
It is undeniable that in the short run, Butler scores are only an indication. For instance, should you be penalized if our opponents stumble into a poor but making four hearts contract that makes on three finesses? Or if you were fixed by your no-trump range? But in the long run, luck tends to even out.
As I was analyzing my team’s results in the 2004 World Team Bridge Olympiad, it occurred to me that it ought to be possible to calculate more detailed scores at the pairs level and indeed at the individual level. This article describes the scoring technique I arrived at, which I feebly call the “Valet score” to stay with the connotation of service. The Butler score was probably named after a Mr. Butler, but then “valet” in French means “jack.”
With this, a new level of analysis—and no doubt finger-pointing—is possible. In the process I derived some empirical statistics on the relative importance of bidding, play and defense. To my knowledge, neither of these two topics have previously been exposed in print.
The Valet score requires detailed information not only about the scores at each table, but also about the contract. Such information is rarely recorded electronically except at the highest levels of play. Even so, it is by no means self-evident that high-quality data be made available. I thank the ECats website for providing such records in electronically readable PDF format.
Example 1. Imagine that 101 pairs all bid to the same contract of four hearts. At a hundred tables declarer makes the obvious ten tricks for +420, but your partner contrives to go down one for a well-deserved minus 10 IMPs. Surely you mentally charge this to partner’s declarer play. But it shows up in your combined Butler score just the same.
Example 2. Imagine that the one hundred other pairs all bid to four hearts, but you and your partner stop in three hearts. All pairs make ten tricks. This time your Butler score is minus 6 IMPs, and it should rightly be charged specifically to your bidding.
Example 3. Finally, imagine that 101 pairs all bid to four hearts, but you alone blow a trick on defense, allowing the contract to make, while it fails by one trick at all other tables. The 10-IMP Butler loss should be charged specifically to your defense.
The Valet score attempts to allocate your Butler score to four categories: Your combined bidding, your declarer play, partner’s declarer play, and your combined defense. In each board a bidding score and one other score (either a declarer score or a defense score) are generated for each pair. If your side declared, your side gets a declarer score. Needless to say, the defending side also gets the opposite score as a defense score.
The key is to decompose the Butler score into a bidding component and a play/defense component that add up to the Butler score. Once this is achieved, it is obvious how to allocated the play/defense component to the right category. The allocation suffers from the same shortcomings as the original score, but again luck evens out in the long run.
The bidding component is calculated as the IMP equivalent of a synthetic score relative to the Butler datum score. The synthetic score is calculated by ignoring what happened in the play at your own table and instead examining what happened in the play at all other tables that played in the same denomination from the same side. If you played as North-South in three hearts, all the tables that played hearts from North-South side are included, whether they played in two hearts, four hearts or six hearts doubled.
In the unlikely event that some tables played hearts from the East-West side, these are ignored. The tables that played in no-trump, diamonds etc. are also ignored.
One might object that declarers in two hearts and four hearts might adopt very different lines of play. This is certainly true. But first, we usually won’t get enough comparable results if we only include those tables that played the exact same contract from the same side of the table. And second, even so the declarers may have different inferences available from the bidding. In practice, it is rather reasonable to look only at the number of tricks taken, regardless of the contract. And if your partner finds a 99.5% safety play in four hearts for ten tricks, while the other declarers took a 99% line for eleven tricks, partner’s play actually loses IMPs in the long run.
Once we have a list (a histogram) of tricks taken available, we calculate a hypothetical score for each result. If our declarer was in four hearts and 50% of the other declarers took nine tricks while 50% took ten tricks, our declarer is assigned a synthetic “bidding score” of 260 (the average of +620 and –100). However, a number of pairs instead tried three no-trumps which invariably failed, so the Butler datum score is actually somewhat lower at 140.
So far, it matters not whether our declarer took nine or ten tricks. Our side’s Valet score for the bidding is the IMP equivalent of the difference between 260 and 140, or +3 IMPs. This is our reward for bidding to the contract that at least has some chances. It is the average Butler score that we would have got on average if our declarer had become indisposed and we had instead picked an average declarer of a hearts contract at another table to play for us.
If our declarer actually took ten tricks, we got a Butler score of +10 IMPs for the 480-point plus. If our declarer went down one, we got a Butler score of –6 IMPs. In the former case, we assign a declarer Valet score of +7 IMPs (the difference between +10 IMPs for the whole board and +3 IMPs for the bidding part). The opponents get a –7 IMPs Valet defense score for their troubles. In the latter case, we get a declarer Valet score of –9 IMPs and a defense Valet score of +9 IMPs.
Probably our actual score was not due to both brilliant declarer play and poor defense, so it is not fair in the short run to assign the Valet score both to declarer and to the defense. But this is no different for the normal Butler score.
Let us now return to the three examples at the beginning of this section. In the first example, your bidding score is 0 IMPs as any other declarer would have made your contract. Partner’s declarer play is charged with the loss (and the opponents get a 10-IMP windfall in their defense Valet score).
In the second example, your synthetic bidding score is +170 as all other declarers would have made ten tricks in your lowly three hearts contract. Therefore your Valet bidding score is –10 IMPs. Your Valet declarer score is 0 IMPs as you took the normal number of tricks.
In the third example, your synthetic score is –420, so your bidding score is 0 IMPs. Your defense score is –10 IMPs (and declarer gets a windfall score). In summary, the Valet score behaves sensibly in these examples, subject to the same shortcomings as the Butler score.
One might object that the brilliant pair that bids to the thin three no-trump contract does so in the knowledge that they will also play the cards better. Yet the Valet score will charge them with poor bidding (as the average no-trump declarer in the room would go down) and give them a very good declarer-play score. This is true, and at the top end of the Valet rankings the bidding scores are probably understated and the declarer-play scores overstated.
One might also object that the allocation of the Butler score to two Valet components is distorted by the nonlinearity of the IMP score. In the example above with some declarers in four hearts and some in three no-trumps, we started by allocating an IMP score to the bidding component and then allocated the remaining IMPs to the play/defense component. We could also have started by allocating the difference between the actual result and the bidding datum score to the play/defense component, and then allocated the rest to the bidding component. If declarer makes four hearts, he would then get the difference between 620 and 260 (the bidding datum score), or +8 IMPs for his play. The remaining +2 IMPs would be allocated to the bidding. This 2-8 split is in contrast to the 3-7 split above. Similarly, if declarer goes down, the split would be +2 to –6 instead of +3 to –9 IMPs.
In practice it seems most logical to allocate the bidding component first. However, one can also do it the other way round, or split the total available IMPs proportionally to the component IMP scores that one would get by calculating them relative to the bidding datum score, or in some other way. It turns out to make very little difference numerically.
The 2004 Olympiad featured 72 national teams divided into four groups of 18 teams each, playing a full round-robin tournament with 17 round of 20 boards each. If a team consisted of three pairs playing equally, each pair would thus play about 220 boards—enough for good statistics.
The Hackett twins from England did best overall with an impressive 1.25 Butler score, split with a 0.31 Valet score in the bidding, 0.60 in the play and 0.24 on defense—a great all-round performance. But not all top pairs were as versatile. Fifth-ranked Camberos and Bianchedi from Argentina notched a 1.05 Butler score, composed of an amazing 1.17 in the bidding, -0.46 in the play and 0.34 on defense. They only played 120 boards together, so the statistics may be slightly less reliable.
The highest declarer Valet score was achieved by Banerjee from India with 0.40. His partnership with Mukherjee achieved an overall Butler score of –0.03, mostly due to a defensive Valet score of –0.53.
Zia and Rosenberg ranked thirteenth overall with a Butler score of 0.86, comprised of a bidding score of –0.18, a declarer score of 0.12 for Zia and 0.34 for Rosenberg (fourth in the declarer department), and a defense score of 0.58.
Overall the correlations between different components of the Valet score were not very strong. For instance, the correlation between the bidding component and the combined rest of the Valet score was 0.36.
These results were obtained with an Excel spreadsheet and some lengthy Excel macros, combined with the PDF files from ECats. The spreadsheet is available upon request from the author.
It is often held that bridge at the highest level is a bidder’s game. The Olympiad may not be the very highest level due to the varying level of the national teams, but it is certainly a strong tournament, as evidenced by some strong pairs with middling scores in the Butler rankings.
One may measure the variability of the a category of scores by their standard deviation. The standard deviations of the Valet scores for bidding, declarer play (for both players of a pair combined) and defense are 0.34, 0.24 and 0.23 IMPs, respectively. (The standard deviations of declarer play and defense do not have to come the same, but it is pleasing that they are almost the same.) Expressed in percentage terms, the split is 42%, 30% and 28%, respectively. By this measure, bidding is the largest component but does not swamp the game.
I wrote the above in 2004 and was reminded of it when I read this post: Recalculating butler scores to delineate declaring vs defending proficiency/ My Excel spreadsheet including the VBA macros is available by personal message, and if there is enough interest, perhaps we'll find a place to put it up. I also have the Valet scores from the 2004 Olympics. Re-reading the results, knowing what we know today, does raise a few interesting questions...
Butler scores were named after Geoffrey Butler, an English administrator of the 1950s and 1960s.