Monday, September 20, 2010

regional bias in the AP poll?

If there's one thing college football fans like to do besides watch their team win, it's bitch about their team getting screwed. And besides referees, what better place is there to get screwed than in the polls? We're all sure that someone's out to get us, particularly Pac-10 fans who are convinced of an East Coast bias. The Michigan fan in me doesn't know whether or not Phil Fulmer actually did vote Michigan 6th after their undefeated 1997 season in retaliation for having his quarterback lose the Heisman to Charles Woodson, but sure likes to believe it anyway.

So, I took it upon myself to see if this bias actually exists, at least in the AP poll. These results use last week's poll, the individual results of which can be found here, or they could until the new poll came out anyway. Wish I knew where they kept poll archives, but anyway I started this last week so what I have works. Obviously one week's poll isn't a sample size worth having, so the results of this little experiment don't say anything about the poll in general. But I think they do say something about last week.

Methodology: There are 60 ballots in the AP poll, and I divided them up into regions based on the six BCS conferences. There are five national-media ballots in a separate category, and I also created a "crossover" category for voters whose newspaper doesn't fall into just one region - for example, the Macon Telegraph and St. Pete Times which could be ACC or SEC or the Pittsburgh Post-Gazette which could be Big Ten or Big East. I thought about putting them in both categories, but I wanted to make sure that when averaging things out, each poll was only counted once. Still, in certain places where it makes logical sense to do so, these crossover ballots are counted in both their two regions.

Teams getting votes also fall into these regions; most are in the actual BCS conferences so their placement is easy, and others like Fresno State or Houston get placed wherever seems logical. (In those two cases, Fresno State is considered Pac-10 and Houston is considered Big 12.) And in this little study, the rankings are considered to extend to every team that got votes, so Pittsburgh, for example, is considered to be ranked 26th since they were the team not in the top 25 who still got the most votes.

The hypothesis to be tested is that yes, some regional bias exists. It seems logical to think so: your local beat writer concerns himself first with the team he covers and then with the teams they're likely to play. On Saturdays he's probably not flipping idly through the channels for the most entertaining game, he's doing work. He's in the press box, and if he follows any out-of-town scores, they're probably next week's opponent. He knows Florida and USC and Oklahoma, of course, but he's got deadlines and crap all weekend and doesn't take the time to really get deep and weigh the intracacies of Oregon State against Auburn.

Hell, I admit to a little regional bias when I vote, which usually manifests itself more strongly in the early going and tends to disappear by about halfway through the season as the data gets more extensive. Why? Because by the time it's time to vote, I know all about the ridiculous experience on FSU's O-line and the depth in Clemson's secondary and all of that, and nothing even remotely close to that about any other conference except maybe the Big Ten. The preseason ballot methodology of adding up points based on the trophy watch lists was an attempt to mitigate that.

So I set upon two different ways of testing regional bias in the poll. The first is born of the occasional accusation that poll voters either believe their region is superior (the SEC comes to mind, and you can't blame people if they think the SEC media is biased), or try to hype up "their" teams based on a rooting interest or simply to sell papers.

You can certainly see where people get this idea. The second-highest vote for 26th-ranked Pittsburgh? Came from Ray Fittipaldo of the Pittsburgh Post-Gazette. The only vote for NC State? From Bill Cole of the Winston-Salem Journal. The poll isn't exactly saturated with this stuff, but there's enough anecdotal evidence for people to seize on.

So, I simply counted what I called "overvotes," which are defined as any time a team got a vote five or more places higher than where it was actually ranked. The chart below shows the results. Here's how to read it:

- Each row delineates a team, and each column delineates a region. The numbers are the numbers of overvotes by region. For example: South Carolina appeared on three ballots at least five places higher than their final ranking of 13th; that is, 8th or higher. One of those was a Pac-10 media outlet, one was an ACC outlet, and one was a Big Ten outlet.

- Yellow highlights indicate where a regional outlet overvoted a team from its own region. For example: Two Pac-10 outlets overvoted Utah.

- The first bold black line indicates the end of the top 25 and the beginning of the "also receiving votes" teams; the second is those teams ranked below 29th and hence, simply voting for them is an overvote.

- At the bottom are the number of outlets in each region; this is placed to show you why it's no surprise that (for example) the Big 12 has the most overvotes. I didn't average them out, it's not necessary.



Things that stuck out:

- The first thing that jumped out at me was the identity of the teams that got the most overvotes. In the top 25: Michigan, USC, Florida. Outside it: Georgia, Florida State. These are the outliers. Except for Georgia (and Georgia fans would probably vehemently disagree so they count half): blue-blood, big-name teams all. Could it be that voters do indeed have at least a subconscious bias that, obvious things like record being equal, a team named Michigan or USC is probably better than one named Stanford or Wisconsin?

- The rough bell-curve-ish appearance of the totals shouldn't be a surprise, nor should the fact that a disproportionate number - over a third - of the overvotes are for teams for whom simply ranking them is an overvote. Keep this in mind when contemplating the large number of yellow highlights in that area.

- Florida got a huge number of overvotes, especially for their position in the poll (9th) where it's really hard to do that. Yet none are from the SEC. More on that later.

So is there a regional bias in this poll?

In general, no. It took until the 14th-ranked team to find an overvote from the same region as the team. Of the 75 overvotes for top-25 teams, just 8, or 10.7%, are regionally-based; of the 193 total overvotes, just 38, or 19.7%, are regionally-based. That's about what you'd expect if they were randomly scattered.

However, there are some interesting data points which merit another look:

- In the top 29 teams - where a team must get a vote higher than 25th in order for it to be an over-vote - the prevalence of Pac-10 teams getting overvotes from Pac-10 media is noteworthy. Five teams from the Pac-10 region (Utah, USC, Stanford, Fresno State, Cal) get Pac-10 media overvotes. The only regions to even give a single overvote to a team from their region are the Big Ten (three to Michigan) and the Big 12 (one to Air Force.) One team each; those are the only other instance of a region's media overvoting a team from that region. And it should also be pointed out that of Air Force's five overvotes, three came from Pac-10 outlets.

It's also worth noting that Fresno State appeared on 10 total ballots, 5 of which were Pac-10 ballots.

This does seem a decent indicator of West Coast bias toward its own teams; the West Coast version of these events would be that this is proof of East Coast bias, as East Coast media doesn't have a handle on the quality of these teams because they're on late and not being watched.

- Of the five teams ranked 30-34 - that is, the five highest ones for whom a single vote is an overvote - a significant number of their votes come from their home region media:

Georgia: 5 of 15 overvotes come from SEC papers, twice the 10-of-60 rate of SEC representation (two are in the crossover category.) Georgia also received one-third of their total points from SEC papers.
Missouri: 3 of 9 overvotes from Big 12 outlets, again approximately twice the representation rate.
Clemson: a whopping 5 of 8 overvotes, and 18 of their 25 points, from ACC media.
Overall: 32% of these five teams' overvotes (Florida State and Georgia Tech also included) from home-region media, a huge number compared to the percentages of other teams' overvotes. More than twice the similar percentage outside these five teams.

- Of the four teams receiving just one vote (Baylor, Boston College, NC State, Northwestern) three of them received it from their home region.

So there's definitely an indication that regional bias occurs at times, though the poll at large doesn't seem greatly affected by it. So we go to the second method, asking the question: Is there a pattern of regional media ranking "their" teams higher, on average, than the rest of the country? I calculated the average ranking for each team in each region, and the overall average, to find the answer. The second chart is below. Reading guide:

- Only the top 25 teams this time. Again, row = team, column = region.

- There's a yellow or orange highlight to indicate a team's average in their home region. Yellow means the average ranking was lower than the national average, orange means the average ranking was higher.

- Red and green on the ends mean the same thing: If the team's average ranking from home media was higher, the box is green; if it was lower, red.

- The bottom number in the Delta column is the average of the differences.

- 26 is the rank assigned to all teams unranked in a ballot, by the reasoning that had I tallied points instead of averages as the AP does, unranked teams would all get zero.



Things that stick out:

- Florida is graded very severely by the SEC and national media, as compared to the rest of the country. By far the hugest negative delta. Do they know something everyone else doesn't?

- You'd expect smaller delta numbers at the top and bottom. It's a lot easier to create huge variance below the top-ranked teams, because you can move teams off the low end of the poll but not the high end. And there's almost always some consensus as to who the top-five teams are, but very little as to who is #15-20. As for the low end, all unranked teams get assigned #26 because voters don't rank below #25, so that too allows for less variation. So LSU's number is not too extraordinary, but Arizona's, while very similar, is actually really surprising.

That said, the only possible conclusion from this chart: Regional bias does not affect the top 25, at least in the Week 2 poll.

Sure, there are some outliers, and some anecdotal evidence of it. South Carolina is treated very kindly by SEC voters, and Utah gets similar special treatment from Pac-10 voters. But the yellow and orange, the red and green, and the average delta, don't lie. There are 12 teams with negative delta and 13 with positive, and the average is a very small number. It's pretty strongly suggestive of a random distribution, even to this D- scholar in statistics. What I expected to see was a consistent pattern of small positive differences between a team's average rank in all ballots and their average rank in home region ballots, not a near-random bell curve-ish distribution.

Even so, there are enough signs of regional bias creeping into certain elements of the poll that make this worth watching. And one week, as I said, is a really lame sample size. So I do intend to make this an every-week thing. I say "intend to" rather than "will" because I "intend" to do a lot of things that don't happen. But this is the kind of worthy project that I'm pretty good about sticking with.

1 comment:

Winfield Featherston said...

It's all in the name. The name "Notre Dame", "USC" [insert SEC team] goes a lot farther than anyone else. When you hear "Alabama vs. Tennesee" what's your initial thought? "That's gonna be a good game!" When in fact, its going to be a massacre because the Vols suck.