We decided to create our model because we noticed an unfilled niche in the field of election prediction. Despite the rapid growth of machine learning (ML) algorithms over the past decade, few (if any) groups applied them to election prediction. Many people believe that complex ML algorithms are difficult, if not impossible, to interpret. We have worked hard to ensure that our predictions are easy to understand while remaining extremely accurate. Here's a brief summary of how the 24cast.org model works:
In our case, we trained 1000 decision-tree models, using a RMSE loss function. Each branch of this tree looks at where data tends to “split”. For example, that might be in expert ratings: there is often a significant difference between races with a “Lean R” rating and a “Toss-up” rating. The model would (and has) learned this difference and analyzes the two sets of data separately. After many splits, the data is more easily distinguished—for example, into “Clear Democrat” races versus “Toss-ups”. To predict future data, it simply sends the data down the tree and gives the final result. By itself, one tree is fairly useless, but, combined, they can learn complex relationships between noisy data (like polls) and achieve high accuracy in areas other models cannot. We used LightGBM, a tree-based model designed in 2017 that quickly rose to stardom in machine learning competitions.
Using these already-trained models, we saw what they would have predicted in 2002-2022. Based on the errors of those predictions, we trained a second model: this time, to predict the standard deviation, assuming a normal distribution of predictions for each individual race. In math terms, we trained a second model to maximize the mean log-likelihood of the standard deviation, given the true margins and the predicted margins. After training this model, we ran it on 2024 predictions to calculate the final standard deviation. That's what allows us to take simulations from each race.
With a normal distribution for each race (and correlations, given by SHAP–see below) we create a multivariate normal distribution describing every race in the 2024 election. By taking samples from this distribution, we effectively “simulate” a possible election. By doing this many times, we can understand the likelihood of events on a national scale, even without assuming normality.
We used data from FiveThirtyEight, Cook Political Report, Sabato's Crystal Ball, and FRED, among many others. A list of attributed data sources can be found in our GitHub. Our final dataset had more than 100 columns, with data ranging from polling averages to campaign finance to voting restrictions. The full list can be found here, but there are a few specific columns that merit additional discussion on this page.
The generic ballot (defined as the national Democratic % - Republican %) is a key feature in our predictions. Though it does not play a large role by itself, when added to past elections (for example, how much more Democratic a state is relative to the generic ballot), it becomes very useful. As such, we define a series of different generic ballots, allowing the model to decide which is more important. They are:
In addition to including incumbency in our model, we also include how much better a given incumbent performs than we'd expect, given their jurisdictions Cook PVI and Generic Ballot. This allows us to more accurately predict races like Vermont's Gubernatorial where a Republican will almost certainly win despite the state itself being a safe Democratic seat for presidency.
After September 11, 2024 (see details in our changelog), our polling averaging methodology became significantly more accurate. It has also become significantly more complicated. We're providing both a high-level overview of our method and a low-level mathematical description for those who may be interested.
High-Level Overview:
Our main goal was to maximize the amount of information we can get out of every poll. Polls report a detailed methodology, sample size, conflicts of interest, etc. There is a significant amount of data for those who are willing to look—so that's exactly what we did!
Previously, we only took three things into account in our pollster averages:
This creates a fairly good polling average! However, it leaves out key information (what about other methodologies, or partisanship in pollsters?) that we simply could not fit in our already-large model without significantly increasing error. We've instead created machine learning models that predict how well each poll will perform, given their methodology, pollster history, sample size, time since poll conducted, and partisanship of the pollster. We then combine these polls to get what is effectively the most mathematically accurate pollster averages given the information we have.
Low-Level Math:
Let , a random variable, be the margin of the th poll for a given race. We can represent where represents the different factors in a poll: methodology, time until the election, pollster, sample size, etc.
Let be the true margin of the race. Our goal is to get an unbiased estimate with minimum variance. To do this, we need two things for :
We'd then use a tool called bias-adjusted inverse-variance weighing, and get the estimate via the following formula:
This formula finds exactly what we want: an unbiased, minimum-variance estimate for the margin of the race these polls are trying to predict. All that's left is to find a way to determine the bias and variance for every poll. This is... easier said than done.
Bias is relatively simple: we create an algorithm (in this case, using SVM, since it achieved lowest error and is quite good at predicting outside the training set) that takes in and predicts for each poll.
Variance is more difficult. The variance of a poll does not depend on the true margin, but counterfactual worlds where the result of the poll is different due to pure randomness. Of course, some polls may have different variances, due to methodology, etc.
Instead of creating an algorithm to predict the variance for each poll (an impossible task), we instead predicted the MSE (mean-squared error): .
A well-known mathematical formula called bias-variance decomposition relates these formulas. In particular, it says that . We've got bias, and we've got MSE -- so we can easily calculate variance from each poll.
With that done, we can use bias-adjusted inverse-variance weighing and get what is essentially the mathematically best possible polling average.
There's a couple more specifics:
Once we're done with all this math, we create a series of different variables related to polling. We create the following:
Obviously, this polling methodology is not perfect. While it did reduce our average error by 10%, it's only as good as the data it is given. Each year, the political environment of America finds new ways to mess up polling—and there's effectively nothing aggregators can do about it. What we can do is look to the past and take every bit of knowledge we can glean, and that's exactly what this change does!
Though we did technically create a BPR-affiliated model for the 2022 Senate elections, 24cast.org is the first iteration that utilizes the full extent of modern ML methods for prediction and interpretation. To ensure our model would perform well for 2024 elections, we could not rely on our past predictions as we didn't have any! As such, we decided to backtest on the past two elections and compare our results to other models. We tested our model several different times (effectively, selecting random elections from the pre-2020 training data, in a method called bootstrapping) and found that we were more accurate than every other election model 60% of the time in 2020 and 70% of the time in 2022. With the addition of 2022 training data, we expect our 2024 model to outperform our back-tested models. However, election prediction is a probabilistic endeavor, not a guarantee – which is why we have distributions of possible outcomes.
Our model updates each day at midnight EDT with the latest data. Our predictions will constantly update as polls, campaign finance, and expert ratings change. Our model will finish updating on the day before the election. We use GitHub Actions and AWS with DynamoDB to gather new data and update our API, and R and Python to clean/analyze the incoming data while producing up-to-date predictions.
We will update the website with a purple banner at the top of the page whenever our predictions change significantly. This could be due to an influx of new polls, a change in expert ratings, or a new campaign finance report. We will also update the website with a banner when we release a new feature to our model, such as our campaign finance simulator.
To interpret our results, we used a method called SHAP, or Shapley Additive Explanations (a weird acronym, we know). Shapley values originate from cooperative game theory, where a “game” involves a set of players who cooperate, and the goal is to fairly distribute the “payout” among them based on their contribution. The math of SHAP values is too detailed to include here, but they can be easily found in their journal paper. In simple terms, a SHAP value for a single feature is determined by what the model would predict if that feature was not included in the prediction. SHAP values are especially useful for tree-based models like LightGBM. In our case, SHAP gives importance values for each feature and each prediction, allowing for easy interpretation of how features like campaign finance affect margins. We hope that our work to make our predictions interpretable gives our readers a better understanding of the factors that most heavily impact the results of elections.
SHAP values are also useful because they provide an easy way to understand how races are interlinked. When we have local importances for each race, we can determine the correlation between races. North Carolina and Georgia, for example, are very similar states and thus have close correlations. SHAP agrees with this – past elections play a huge role in both states with a similar magnitude/direction.
While we aim to be as transparent as possible, there were so many minute issues we faced as we created this model that describing all of them would take an entire book. If you're interested in looking at our code, check out our GitHub! If you have more specific math questions (or if you're confused by any of our code) feel free to reach out to our team. We're always willing to nerd out about data and politics, so we'll try and respond ASAP!
September 11: Consistent website visitors (or viewers of our new historical graphs) may notice a change of our predictions on September 11. In preparation for the final 50 days, we've made 3 under-the-hood changes that have especially impacted our predictions for the Presidential and House elections. As a fully open-source model, we want to make sure everybody knows exactly what these changes are and why we made them. We're particularly proud of the first of these changes, which was a multi-month, combined effort of mathematics specialists designing a method to get the maximum value out of every single poll.
Poll Averaging:
We’ve completely revamped our polling averaging methodology. Read our Polling Average section in this methodology for a detailed explanation of our new methods and their mathematical underpinnings.
Generic Ballot (and regex…):
When we searched through our codebase again to remove any possible problems, we noticed a problem with one of our lines of regex that utilized 2020 generic ballot results instead of 2022 generic ballot results for some House races. This meant that we erroneously underestimated the strength of strong House incumbents. The combination of this and our improved polling averages pushed our predictions for some House races a few points to the left. We now think Democrats are clear favorites to take the House.
Candidates that dropped out:
Some candidates, such as Chris Sununu (NH-R), recently announced they would not seek reelection despite having the opportunity to. We've removed incumbency status from these races. This affected around 20 total races across all Senate, House, and Governor races.
August 20: Our team identified that, though we had filtered Vice President Harris from state-specific polls before President Biden's withdrawal from candidacy, generic ballot polls for Harris pre-withdrawal remained in our dataset. We have updated our codebase to remove all pre-withdrawal presidential polls—both state-specific and generic. This change and an influx of new polls during and directly before the DNC have resulted in a significant shift leftward for congressional and presidential races.
August 3: Previously, our model predicted the outcome of the upcoming November 5 election by incorporating uncertainty into the polls to account for increased unpredictability as we move further from the election date. However, we have decided to eliminate this added uncertainty and instead predict the election results as if the election were held today. This change minimizes assumptions and more accurately reflects the output of our machine learning algorithm without adding uncertainty.