Does Data Matter in March?

Mar 8, 2018 | Blog

Data is the new currency of the world economy. It is the lifeblood of IoT. It is the seed of every major innovation. We are told time and time again of the power data holds, that decisions supported by data are inherently superior. That the data never lies.

Then, March Madness happens.

No matter how many queries, limits imposed, variables accounted for or comparisons made…my bracket seems immune to the supposed “power” of data.

Why is this exactly? There are many factors that go into the perpetuating the anomaly that is the NCAA Tournament.

For starters, the teams that qualify are rarely consistent over any meaningful period of time, a necessity for data collection. There are some exceptions to that rule: the Duke’s and North Carolina’s of the sport have established decade long streaks of tournament play. In spite of steady numbers around tournament inclusion, other important metrics are often far from consistent, even amongst these top tier teams.

It is nearly impossible to establish a consistent and accurate trend around rankings, regular season statistics, player age, difficulty of schedule, etc. You can’t weigh an undefeated season in the Mid-Atlantic the same as an undefeated ACC regular season. You can’t expect a school with high turnover, such as Kentucky, to not have a rebuilding year after sending four starters to the NBA. Accounting for those variables is a herculean task.

The Cinderella Phenomenon generally occurs every tournament, further disrupting an already fragile data pool. George Mason, Florida Gulf Coast, NC State, Butler (go Dogs)…the Cinderella Team is a team with little to no chance for a successful tournament run. Against all data driven odds, they manage to rise to the occasion. On paper, these teams have zero chance to advance. Yet year after year, they demolish statistically superior opponents and leap into the history books.

No one would look at the data and say 12th ranked Arkansas Little Rock should beat 5th ranked Purdue, or that 12thranked Yale could beat 5th ranked Baylor (unless you are me and called both accurately, greatest accomplishment to date). Every year, the month of March makes it very clear that there will always be variables data cannot capture.

What does this say about analytics and data modeling? March Madness teaches us the importance of understanding the fallibility of data, and planning accordingly. Making adjustments to data queries, leveraging machine learning, and collecting data intentionally continually improves your data integrity. Understanding how to contextualize data improves your ability to apply and correlate that data, resulting in enhanced modeling capabilities.

Data may never be able to give me the perfect bracket, but it may be able to help me understand other aspects of the game that improve my odds. If I continually evaluate, improve, and apply my data into the right models, the value of the available information will make itself known.