Statistical Evaluation of Competition Numbers

Thank you! There’s a lot of engines making this stuff happen and we’ll need even more as the org goes fully independent!

Totally agree. We live in an era with instant replay for all sports, robots will replace umpires, and we’re trying to scrub all subjectivity out of everything in the interest of absolute “fairness .”

But if somebody invented a machine that chemically analyzed a beer sample and could grade its adherence to BJCP style guidelines with 100% consistency, should we use it? Might as well send it out to a lab, and skip the competitions altogether.

I like the fact that everyone’s taste buds are a little different, and some of us are (congenitally?) more or less able to detect certain flavors, aromas, mouthfeel, etc.

Sounds like one damn good cider to drink! Dont send it, just enjoy it.

I appreciate feedback. Some are more constructive than others. I view from the eyes of a woman who coached little league when my kids were little. It take immense effort to run a competition. No one gets paid. Add an ounce of appreciation, a heaping teaspoon of respect and a dash of humility and everything works well. Eliminate these things, and coaches quit. Good judges quit for similar reasons.

3 Likes

I’m not asking for perfection, but it sure seems that the AHA needs to analyze the judges’ performance statistically. The judges should be getting judged by the AHA through statistical analysis.

The thing that brings the numbers into any relevance is how the beer was rated in mini BOS relative to the other high scoring beers that advanced. For example, I had a beer score 42 in a competition, but not place. I know that there were several great nationally renowned homebrewers in my category. They just had better beers when tasted side by side. I can live with that. My beer was good enough to win other smaller competitions, just not that one.

2 Likes

You may be unaware that, other than NHC, the AHA generally has little to do with comps and nothing to do with judges.

7 Likes

So somehow those data are made available, and an analysis is performed. What is the outcome? Are certain judges publicly shamed? Does the BJCP have to produce a performance enhancement program for those judges? Who pays for this?

6 Likes

My entry was in the wild specialty beer category and my scores were within 4 points of each other. This was my first entry so I don’t have any past history to pull from but I felt like both my scores and the ratings I was given with each score were pretty accurate if not more positive than I was expecting.

2 Likes

Well put.

I entered my first competition recently (a best bitter), and my scores were in four points of each other. The narative feedback was really helpful, too, and combined with informal club feedback, will help me focus on areas of improvement and areas where I seem to be doing ok. I am super appreciative of not only the judges who took their time with thorough notes, but also the club members who invested effort into organizing and conducting the competition.

2 Likes

I primarily enter for feedback as I don’t have agreat sense of smell and taste. Still, it is nice to keep track of their scoring over different contests to see what is their tendencies. Sorta like knowing what teachers are hard graders.

2 Likes

You kind of have an answer to the question right here - you’ve got a cider in a category with broad guidelines to fit many different potential entries. Those types of categories are very difficult to judge imo. Where is the ideal intersection of wood/tannin to spirit to cider to acid to whatever else has been declared? If it’s scoring in the 40’s anywhere (assuming they are good judges outside NHC, bc NHC competition has good judges) then it’s a pretty solid cider to be sure. But you never know if something happened with that particular bottle. Or maybe you had NHC judges with finer palates and/or more experience with ciders than the other competitions you entered.

I’ve had a similar experience with fruited meads. Local comps won’t have many BJCP judges with lots of mead experience unless you’re in a few hot spots around the country. My meads will score much higher in the local comps than NHC, Valkyrie’s Horn, etc. where you’ve got really strong judges converging on those entries. The gold is the feedback from those strong judges.

1 Like

The issue is that not only did it this entry win a ‘Double Gold’ earlier in the year, it just received a ‘Gold’ at the California State Fair. Third place in the category was Annie Johnson’s entry. I know the score that this entry should have received. A score of 32 without any real negative comments suggests something is seriously wrong with how this judge scores entries given the guidelines. This judge created their own guidelines to generate a score of 32. I can’t wait to get the judge sheets from the CA State Fair, as that will confirm the judging of a 32 represents my contention that something is wrong with how this judge is doing their job.

So you have an axe to grind over a judge that gave you a low score.

BTW it’s not the judges “job” it’s a hobby and this judge is giving back to the brewing community by judging competitions. I am sure he/she has better things to do but decided to judge a home brew competition instead.

Give it a rest

4 Likes

It is a hobby for me also, but to have a judge make such a poor analysis undermines the enjoyment of it being a hobby.

I’ve always figured statistically that about 40% of judges are either idiots, unqualified, or having a bad day, or their scoresheets are otherwise unrepresentative of their nominal performance (perhaps they were rushed by the organizers, or ate garlic before judging, or who knows). Their scoresheets can safely be ignored and tossed into the recycle bin and forgotten forever. You’ll know which ones are among the 40% and statistically you’re probably correct at least 75% of the time, maybe more.

Could it be you’re taking it way too seriously?

6 Likes

The range every judge scores within is not regulated. Ultimately, best beers go to mini-boss and then scores become irrelevant. Essentially, they are a tool to differentiate, not a direct measure of quality.

2 Likes

So you’ve identified a flaw in the judging process. Let’s say there was some analysis done and it finds that judges don’t have a very high probability of agreement in scores (say 50% of the time they are within some acceptable range of one another).

What do you suggest happen as a result?
Would your suggestion change is the analysis didn’t agree with you presumption?

I think the majority of commenters here will agree that judging and competitions aren’t perfect. What changes would you like to see?

3 Likes

Where is the upvote button when you need it! This right here