For those who have competed at Regionals (this year or any other year), I’d like to hear if anyone finds the extreme variation in judges’ scores raises some questions about the process. I’ve competed for many years, and the scores for an entry are usually within a few points of each other. Two questions: 1) Did anyone experience some very wild differences in scoring?; and, 2) should the AHA take the scores and statistically analyze the numbers to see if some judges are skewing the results? It seems that at this level of competition, the judges should be very close in their scoring of the entries and not have huge differences in their judge sheets. Statistically evaluating the judges should be considered to see if there is something amiss.
How much of a spread are you seeing?
While scores should be close there is an intrinsic flaw in attempting to take something qualitative and converting it into something quantitative.
Personally I find it as useful as pissing in the wind. The comments by the judges are the most useful an even then they are subjective.
I agree. We enter with high hopes but in reality it’s a crap shoot.
I recently asked a very successful competitor what the secret was to getting a beer to score well. His answer illustrates the point: enter the same beer in multiple competitions. Since implementing his advice, scoring low 30(s) ranging to mid 40(s) for the same beer has been a routine experience. An outlier was a beer that scored a 25 and a 42, brewed, packaged, shipped, judged the same day in different competitions earlier this year.
As far a judge comments, I consider the source. There are some well known highly ranked judges that when they make a comment you take it to the bank. Other un/low ranked judges comments are glossed over fairly quickly. Especially, if the beer is a known high scorer.
AHA needs to step up its game and perform some statistical analysis to determine which judges may not be good at the job. Statistics will expose judges who consistently differ from the other judges when ranking the same entry.
Even the BJCP guidelines for testing judges say that they don’t expect two scores to be wider than seven points. Even at seven points, that would seem like a pretty big difference.
While I agree with the sentiment, the typical contest is begging for judges and probably wouldn’t turn many away. It might be a good tool for judges to understand if their skills may need refinement.
If the AHA doesn’t analyze differences statistically, how does anyone know their skills need refinement? I think the AHA needs to step up and take on some responsibility.
The common mistake made by competitors is that they expect too much from the judges. I am a seasoned competitor with over 175 medals, many BOS and a handful of NHC final medals. I have come to expect that if the beer is good, it will advance. If it’s not as good as the competition I’m up against, then I won’t medal.
The judges are human, they have some days better than others. I know I have tasted my own beers and have had different opinions about my own beer from day to day. I don’t put much stock in scores, one judge will pick up a flaw while the other may find a positive attribute. What I do look for is if a beer has similar feed back between two competitions or I will look to see if I recognize the judge. I also need to know my own beers and be humble enough to know it’s not my best work.
The key to winning is consistently brewing and entering good beer and just plain good luck. If you hit the right set of judges and they like your beer, there’s a good chance of winning.
Don’t fall in love with your own beer. I have beers that do not go anywhere that I love to drink, they just can’t win for a multitude of reasons. Each style can have multiple interpretations and each judge is varied in their skill as a judge and just as important, their knowledge of the style.
It can get frustrating at times. I have had beers score a 38 and win BOS. I have also had a beer score a 48 that got second. My attitude is, don’t tell me my score, tell me if I medaled.
These judges are volunteers and I have grown to appreciate them and cut them some slack. I tried my own hand at judging and found it to be more difficult then I was expecting. The other thing to consider is that it is getting more and more difficult to find judges, so please don’t chase away the ones we have, even if it’s less than a perfect process.
Lastly, this is supposed to be fun. I try not to take it too seriously.
The observation that judges are both human and are volunteers is key here. They are in short supply and no amount of statistical analysis will change that. I’ll also add that judging is work that has to be experienced to be appreciated. Becoming a judge yourself can tell you a lot.
As being human, whether a volunteer or not, to see very large differences in scores or at times having comments that have nothing to do with BJCP guidelines seems to lower the bar to you get what you pay for. In the past I’ve had comments about an entry that it needed more plum, raisin, or … quality and it was like the judge didn’t even know what beer style they were judging. Nothing in the style would have suggested anything about plum, raisin, or … for the style. I passed that off as the person was just drunk or incompetent.
Maybe you should become a judge. It’s not as easy as you imagine.
I judge everything I drink. My scores are far more accurate than some judges. I no longer consume as I criticize everything because I focus on the flaws. I’m not negative about what I consume; I just no longer enjoy the process outside of competing. I know when I submit an entry what the score should be +/- 1.5 points each way. I wouldn’t submit an entry unless I know it is far above 40 points, and to receive scores in the low 30s from one judge and something with a much higher score from another judge suggests that the AHA needs to start evaluating how individuals are scoring the entries. Having the history of seeing comments that an entry has or doesn’t have qualities that are not even part of the BJCP guidelines should also raise concerns about whether judges even understand the guidelines (or even know what subcategory the entry is that they are judging). Self-dilution, in the context of a judge being qualified, refers to the tendency to minimize or downplay the judges’ own faults by manufacturing faults of the entries being judged. I’m not here to run the show; the AHA should be stepping up to figure out if there is a problem.
i judge my beers too, my ratings are almost 100% accurate!
Look, I get it - but no judge is ever going to be a 100% on the money according to any scientific analysis. That’s just not how the human palate or brewing works. I can have the same beer twice in two different contexts and will perceive it differently. Mybe my nose is different, maybe the environment is different, maybe the bottle is different because of a thousand different reasons.
Hell, even the big guys have trouble with flawless packaging consistency and they run ll the science and big money at it.
And again to reiterate - everything about a hobby organization, including this one, runs on the policy of “see a problem, help be the solution”
Look at the BJCP website - there’s less thn 6,500 active BJCP certified judges across all ranks on the entire planet - including the ones that you’re upset at. There’s even fewer of the higher ranked and experienced judges. I consider myself to be a solid judge (National ranked, yada yada yada), but I also guarantee you that you could feed me the same beer a week apart and I may score the thing differently.
When my club is running a competition we have to beg, wheedle, bribe, cajole and drive beers around to get enough judges in place to finish a 300-500 entry comp. The NHC is that on steroids and if you wanted only the top most qualified judges, it would never be completed in terms of judging.
A statistical analysis on relative difference of subjective scores.
I will get my smoke grinder out and start ciphering
Review the guidelines for BJCP category C2F. The guidelines are pretty broad. No judge should apply their superseding or stricter guidelines to fit their beliefs. So far, from two of three competitions, the entry from three judges averaged 41, and one judge at the Regionals gave it a 32. That is a nine-point difference. I will get the final results from the third competition later this month. If one score of 32 vs three other judges giving an average of 41 (39, 41, 43) isn’t a statistical difference, then I should ask for a refund from my university, where I graduated from. When I self-scored the entry, I told friends 40.5 (+/- 1.5 points). It seems odd that one person is that far off when comparing the rankings of all four judges.
Seriously you need to drink a sixer of your 41 pt beer and call it a day.
So true- for this issue and so many others. Drew- thanks for all the free work you do on behalf of homebrewers. It does not go unnoticed.
There’s nothing about the numbers you’ve put up that are surprising to me or out of whack with sensory analysis over time and judges.
In the professional wine judging world, where real money is on the table and the judges are credentialed in a way that makes the BJCP seem light weight, judges still demonstrate those levels of variances - the same judge, in the same tasting session (or across sessions) demonstrate a remarkable variance to their scoring.
In a perfect world, you’d be able to place a beverage in front of a judge and get a consistent and reliable score like you’ve put the sample before an electronic palate (which they’ve been trying to make for the last 40 years), but that doesn’t happen - even with the professionals.
I understand that it’s frustrating to get variability in the scores, but even for a large national competition, that’s to be expected - whether from judge bias, judge blind spots or sensitivities, mood of the judge, warmth of the room, how much of a “pfft” the container makes when it gets opened.
I’ll give you an example from a flight I just judged - we had a beer that came to us that was full of diacetyl. To me (very diacetyl sensitive) it tasted like I was swimming in a pool of movie theater popcorn. My fellow judge didn’t perceive it in the same way - so I gave the beer a high 20’s and they were up in the high 30’s - about an 11 point variance. We brought it in line after discussion, but if we had judged it separately, or with different judges, the results would have been differing scores.