We decided to create our model because we noticed an unfilled niche in the field of election prediction. Despite the rapid growth of machine learning (ML) algorithms over the past decade, few (if any) groups applied them to election prediction. Many people believe that complex ML algorithms are difficult, if not impossible, to interpret. We have worked hard to ensure that our predictions are easy to understand while remaining extremely accurate. Here's a brief summary of how the 24cast.org model works:
In our case, we trained 1000 decision-tree models. Each branch of this tree looks at where data tends to “split”. For example, that might be in expert ratings: there is often a significant difference between races with a “Lean R” rating and a “Toss-up” rating. The model would (and has) learned this difference and analyzes the two sets of data separately. After many splits, the data is more easily distinguished—for example, into “Clear Democrat” races versus “Toss-ups”. To predict future data, it simply sends the data down the tree and gives the final result. By itself, one tree is fairly useless, but, combined, they can learn complex relationships between noisy data (like polls) and achieve high accuracy in areas other models cannot. We used LightGBM, a tree-based model designed in 2017 that quickly rose to stardom in machine learning competitions.
Using these already-trained models, we saw what they would have predicted in 2002-2022. Based on the errors of those predictions, we trained a second model: this time, to predict the standard deviation, assuming a normal distribution of predictions for each individual race. In math terms, we trained a second model to maximize the mean log-likelihood of the standard deviation, given the true margins and the predicted margins. After training this model, we ran it on 2024 predictions to calculate the final standard deviation. That's what allows us to take simulations from each race. The farther away the election is, the more we multiply this standard deviation, to reflect the innate uncertainty of predicting an election so early in the season.
With a normal distribution for each race (and correlations, given by SHAP–see below) we create a multivariate normal distribution describing every race in the 2024 election. By taking samples from this distribution, we effectively “simulate” a possible election. By doing this many times, we can understand the likelihood of events on a national scale, even without assuming normality.
We used data from FiveThirtyEight, Cook Political Report, Sabato's Crystal Ball, and FRED, among many others. A list of attributed data sources can be found in our GitHub. Our final dataset had more than 100 columns, with data ranging from polling averages to campaign finance to voting restrictions. The full list can be found here, but there are a few specific columns that merit additional discussion on this page.
Below, we describe how we get pollster ratings. However, we must aggregate these ratings into a single number afterward for them to be useful in any way. We do so by conducting a meta-analysis of the polls, in two different ways:
Since online/phone polls often reach completely different results based on the age groups they contact more heavily, we conduct additional meta-analyses to get the expected online poll margin and phone poll margin. This allows our model to correct for any major discrepancies in young/old voters between different polling methods.
The generic ballot (defined as the national Democratic % - Republican %) is a key feature in our predictions. Though it does not play a large role by itself, when added to past elections (for example, how much more Democratic a state is relative to the generic ballot), it becomes very useful. As such, we define a series of different generic ballots, allowing the model to decide which is more important. They are:
In addition to including incumbency in our model, we also include how much better a given incumbent performs than we'd expect, given their jurisdictions Cook PVI and Generic Ballot. This allows us to more accurately predict races like Vermont's Gubernatorial where a Republican will almost certainly win despite the state itself being a safe Democratic seat for presidency.
To analyze polls, we decided to create our own pollster ratings. If we created one pollster rating for 2024, we could not apply that rating to previous elections since we would be using the results of polls from those years to predict those same elections. In the data science world, this is called “data leakage” and it's a problem we faced at every stage of our work. We wanted to be as sure as possible that our predictions reflected the data we would have prior to the election, and data leakage would destroy that very important assumption.
To avoid data leakage here, we created a new set of pollster ratings for every election year, based on all polls within the last 12 years of that election. So, in our training data for the 2020 election, we used polls from 2008-2018 to get pollster ratings. We developed a 4-step process for determining pollster ratings for an election year.
After all of this math, what we get is a worst-case guess for how a pollster fares compared to others using the same methodology. A positive value in this weight gives us confidence that the pollster has conducted enough high-quality polls that it will continue to perform better than other pollsters with the same methodology. For those who want to see who these pollsters are, check out our Pollster Ratings file on our GitHub repository.
We then ran these models on every election going back to 2002. The pollster ratings used for our current predictions come from polls conducted between 2010-2022.
Though we did technically create a BPR-affiliated model for the 2022 Senate elections, 24cast.org is the first iteration that utilizes the full extent of modern ML methods for prediction and interpretation. To ensure our model would perform well for 2024 elections, we could not rely on our past predictions as we didn't have any! As such, we decided to backtest on the past two elections and compare our results to other models. We tested our model several different times (effectively, choosing random elections from our simulations) and found that we were more accurate than every other election model 60% of the time in 2020 and 70% of the time in 2022. With the addition of 2022 training data, we expect our 2024 model to outperform our back-tested models. However, election prediction is a probabilistic endeavor, not a guarantee – which is why we have distributions of possible outcomes.
Our model updates each day at midnight EDT with the latest data. Our predictions will constantly update as polls, campaign finance, and expert ratings change. Our model will finish updating on the day before the election. We use GitHub Actions and AWS with DynamoDB to gather new data and update our API, and R and Python to clean/analyze the incoming data while producing up-to-date predictions.
To interpret our results, we used a method called SHAP, or Shapley Additive Explanations (a weird acronym, we know). Shapley values originate from cooperative game theory, where a “game” involves a set of players who cooperate, and the goal is to fairly distribute the “payout” among them based on their contribution. The math of SHAP values is too detailed to include here, but they can be easily found in their journal paper. In simple terms, a SHAP value for a single feature is determined by what the model would predict if that feature was not included in the prediction. SHAP values are especially useful for tree-based models like LightGBM. In our case, SHAP gives importance values for each feature and each prediction, allowing for easy interpretation of how features like campaign finance affect margins. We hope that our work to make our predictions interpretable gives our readers a better understanding of the factors that most heavily impact the results of elections.
SHAP values are also useful because they provide an easy way to understand how races are interlinked. When we have local importances for each race, we can determine the correlation between races. North Carolina and Georgia, for example, are very similar states and thus have close correlations. SHAP agrees with this – past elections play a huge role in both states with a similar magnitude/direction.
While we aim to be as transparent as possible, there were so many minute issues we faced as we created this model that describing all of them would take an entire book. If you're interested in looking at our code, check out our GitHub! If you have more specific math questions (or if you're confused by any of our code) feel free to reach out to our team. We're always willing to nerd out about data and politics, so we'll try and respond ASAP!