Methodology

The TL;DR

  • We gather and clean data from 2002 to 2022, including factors like incumbency, campaign finance, polls, and expert ratings. We then train our model on this data to predict the difference between the Democrat and Republican vote proportions.
  • Using the trained model and the same data from 2024, we predict the margin for 2024. Our model also gives the uncertainty for each prediction, based on the randomness of data like polling and campaign finance.
  • Given this uncertainty, we simulate many different possible elections and find how often each party will win a race.

Overall Model

We decided to create our model because we noticed an unfilled niche in the field of election prediction. Despite the rapid growth of machine learning (ML) algorithms over the past decade, few (if any) groups applied them to election prediction. Many people believe that complex ML algorithms are difficult, if not impossible, to interpret. We have worked hard to ensure that our predictions are easy to understand while remaining extremely accurate. Here's a brief summary of how the 24cast.org model works:

  1. Input training data: This data contains both the input variables (e.g. polls) and the output variable (in our case, Democrat % − Republican %, or the “margin”). This data comes from already-known elections—specifically, elections from 2002 to 2022.
  2. Train the model on this data: The model (hopefully) learns key relationships between the input and output variables. The simplest example of such a model is a linear regression. By fitting a line through data, one can somewhat easily understand relationships such as “when Republicans are ahead in polls, they often win elections”.
  3. Predict new results: In this case, there is no output variable, only input variables. We don't know the margins of the 2024 elections (if we did, all of this would be unnecessary!). What we do have is a list of all the polls/campaign finance/etc that is cleaned and filtered in the same manner as before. The model then uses this information to predict what it thinks the output variable (the margin) will be. Ideally, the more a model can understand complex and unique relationships between variables, the more accurate it will be in predicting the results.

In our case, we trained 1000 decision-tree models. Each branch of this tree looks at where data tends to “split”. For example, that might be in expert ratings: there is often a significant difference between races with a “Lean R” rating and a “Toss-up” rating. The model would (and has) learned this difference and analyzes the two sets of data separately. After many splits, the data is more easily distinguished—for example, into “Clear Democrat” races versus “Toss-ups”. To predict future data, it simply sends the data down the tree and gives the final result. By itself, one tree is fairly useless, but, combined, they can learn complex relationships between noisy data (like polls) and achieve high accuracy in areas other models cannot. We used LightGBM, a tree-based model designed in 2017 that quickly rose to stardom in machine learning competitions.

Using these already-trained models, we saw what they would have predicted in 2002-2022. Based on the errors of those predictions, we trained a second model: this time, to predict the standard deviation, assuming a normal distribution of predictions for each individual race. In math terms, we trained a second model to maximize the mean log-likelihood of the standard deviation, given the true margins and the predicted margins. After training this model, we ran it on 2024 predictions to calculate the final standard deviation. That's what allows us to take simulations from each race.

With a normal distribution for each race (and correlations, given by SHAP–see below) we create a multivariate normal distribution describing every race in the 2024 election. By taking samples from this distribution, we effectively “simulate” a possible election. By doing this many times, we can understand the likelihood of events on a national scale, even without assuming normality.

Data Used

We used data from FiveThirtyEight, Cook Political Report, Sabato's Crystal Ball, and FRED, among many others. A list of attributed data sources can be found in our GitHub. Our final dataset had more than 100 columns, with data ranging from polling averages to campaign finance to voting restrictions. The full list can be found here, but there are a few specific columns that merit additional discussion on this page.

Generic Ballot

The generic ballot (defined as the national Democratic % - Republican %) is a key feature in our predictions. Though it does not play a large role by itself, when added to past elections (for example, how much more Democratic a state is relative to the generic ballot), it becomes very useful. As such, we define a series of different generic ballots, allowing the model to decide which is more important. They are:

  1. Generic ballot via polls: By conducting a similar meta-analysis on generic ballot polls, we can calculate two different poll-based generic ballots: unweighted and weighted.
  2. Generic ballot via campaign finance: In our initial runs of the model, we noticed that campaign finance explains a significant amount of variance in elections–far more than we had expected. We calculate campaign finance ratios via the following formula: A * log(total Democrat receipts / total Republican receipts). We don't know what A is, so we included multiple different versions of A and let our model determine which was most effective for us.

Incumbent Differential

In addition to including incumbency in our model, we also include how much better a given incumbent performs than we'd expect, given their jurisdictions Cook PVI and Generic Ballot. This allows us to more accurately predict races like Vermont's Gubernatorial where a Republican will almost certainly win despite the state itself being a safe Democratic seat for presidency.

Polls

After September 11, 2024 (see details in our changelog), our polling averaging methodology became significantly more accurate. It has also become significantly more complicated. We're providing both a high-level overview of our method and a low-level mathematical description for those who may be interested.

High-Level Overview:

Our main goal was to maximize the amount of information we can get out of every poll. Polls report a detailed methodology, sample size, conflicts of interest, etc. There is a significant amount of data for those who are willing to look—so that's exactly what we did!

Previously, we only took three things into account in our pollster averages:

  1. Sample size
  2. Online/offline methodology
  3. How good a pollster is, keeping all other factors constant (like methodology)

This creates a fairly good polling average! However, it leaves out key information (what about other methodologies, or partisanship in pollsters?) that we simply could not fit in our already-large model without significantly increasing error. We've instead created machine learning models that predict how well each poll will perform, given their methodology, pollster history, sample size, time since poll conducted, and partisanship of the pollster. We then combine these polls to get what is effectively the most mathematically accurate pollster averages given the information we have.

Low-Level Math:

Let XiX_i, a random variable, be the margin of the iith poll for a given race. We can represent Xi=Xi(a)X_i=X_i(\vec a) where a\vec{a} represents the different factors in a poll: methodology, time until the election, pollster, sample size, etc.

Let μ\mu be the true margin of the race. Our goal is to get an unbiased estimate μ^\hat{\mu} with minimum variance. To do this, we need two things for XiX_i:

  1. Bias (BiB_i): E[Xi(a)μ]\mathbf{E}\left[X_i(\vec{a}) - \mu \right]
  2. Variance (σi2\sigma^2_i): E[(Xi(a)E[Xi(a)])2]\mathbf{E}\left[(X_i(\vec{a}) - \mathbf{E}[X_i(\vec{a})])^2 \right]

We'd then use a tool called bias-adjusted inverse-variance weighing, and get the estimate via the following formula:

μ^=i=1n(XiBi)σi2i=1n1σi2 \hat{\mu} = \frac{\sum_{i=1}^n \frac{(X_i - B_i)}{\sigma^2_i}}{\sum_{i=1}^n \frac{1}{\sigma^2_i}}

This formula finds exactly what we want: an unbiased, minimum-variance estimate for the margin of the race these polls are trying to predict. All that's left is to find a way to determine the bias and variance for every poll. This is... easier said than done.

Bias is relatively simple: we create an algorithm (in this case, using SVM, since it achieved lowest error and is quite good at predicting outside the training set) that takes in a\vec{a} and predicts XiμX_i - \mu for each poll.

Variance is more difficult. The variance of a poll does not depend on the true margin, but counterfactual worlds where the result of the poll is different due to pure randomness. Of course, some polls may have different variances, due to methodology, etc.

Instead of creating an algorithm to predict the variance for each poll (an impossible task), we instead predicted the MSE (mean-squared error): (Xiμ)2(X_i - \mu)^2.

A well-known mathematical formula called bias-variance decomposition relates these formulas. In particular, it says that MSE=B2+σ2MSE = B^2 + \sigma^2. We've got bias, and we've got MSE -- so we can easily calculate variance from each poll.

With that done, we can use bias-adjusted inverse-variance weighing and get what is essentially the mathematically best possible polling average.

There's a couple more specifics:

  1. We use squared-error for our loss function on these algorithms, since minimizing the MSE is equivalent to finding the mean (and that's exactly what we want, seeing as how the mean is merely a sample version of the expected value).
  2. For our variables, we use: sample size, partisanship of the pollster, methodology (one-hot), how long since the poll was conducted, and which pollster conducted the poll (one-hot, only including pollsters with more than 20 historical polls)

Once we're done with all this math, we create a series of different variables related to polling. We create the following:

  1. Bias-adjusted inverse-variance weighted mean estimate
  2. Bias-adjusted inverse-variance weighted lower estimate (95% CI)
  3. Bias-adjusted inverse-variance weighted upper estimate (95% CI)
  4. Unweighted mean estimate (a simple average of the polls)
  5. Unweighted lower estimate
  6. Unweighted upper estimate

Obviously, this polling methodology is not perfect. While it did reduce our average error by 10%, it's only as good as the data it is given. Each year, the political environment of America finds new ways to mess up polling—and there's effectively nothing aggregators can do about it. What we can do is look to the past and take every bit of knowledge we can glean, and that's exactly what this change does!

Backtesting

Though we did technically create a BPR-affiliated model for the 2022 Senate elections, 24cast.org is the first iteration that utilizes the full extent of modern ML methods for prediction and interpretation. To ensure our model would perform well for 2024 elections, we could not rely on our past predictions as we didn't have any! As such, we decided to backtest on the past two elections and compare our results to other models. We tested our model several different times (effectively, selecting random elections from the pre-2020 training data, in a method called bootstrapping) and found that we were more accurate than every other election model 60% of the time in 2020 and 70% of the time in 2022. With the addition of 2022 training data, we expect our 2024 model to outperform our back-tested models. However, election prediction is a probabilistic endeavor, not a guarantee – which is why we have distributions of possible outcomes.

Predictions for 2024

Updating Data

Our model updates each day at midnight EDT with the latest data. Our predictions will constantly update as polls, campaign finance, and expert ratings change. Our model will finish updating on the day before the election. We use GitHub Actions and AWS with DynamoDB to gather new data and update our API, and R and Python to clean/analyze the incoming data while producing up-to-date predictions.

We will update the website with a purple banner at the top of the page whenever our predictions change significantly. This could be due to an influx of new polls, a change in expert ratings, or a new campaign finance report. We will also update the website with a banner when we release a new feature to our model, such as our campaign finance simulator.

Interpretation

To interpret our results, we used a method called SHAP, or Shapley Additive Explanations (a weird acronym, we know). Shapley values originate from cooperative game theory, where a “game” involves a set of players who cooperate, and the goal is to fairly distribute the “payout” among them based on their contribution. The math of SHAP values is too detailed to include here, but they can be easily found in their journal paper. In simple terms, a SHAP value for a single feature is determined by what the model would predict if that feature was not included in the prediction. SHAP values are especially useful for tree-based models like LightGBM. In our case, SHAP gives importance values for each feature and each prediction, allowing for easy interpretation of how features like campaign finance affect margins. We hope that our work to make our predictions interpretable gives our readers a better understanding of the factors that most heavily impact the results of elections.

SHAP values are also useful because they provide an easy way to understand how races are interlinked. When we have local importances for each race, we can determine the correlation between races. North Carolina and Georgia, for example, are very similar states and thus have close correlations. SHAP agrees with this – past elections play a huge role in both states with a similar magnitude/direction.

Learn More

While we aim to be as transparent as possible, there were so many minute issues we faced as we created this model that describing all of them would take an entire book. If you're interested in looking at our code, check out our GitHub! If you have more specific math questions (or if you're confused by any of our code) feel free to reach out to our team. We're always willing to nerd out about data and politics, so we'll try and respond ASAP!

Changelog

September 11: Consistent website visitors (or viewers of our new historical graphs) may notice a change of our predictions on September 11. In preparation for the final 50 days, we've made 3 under-the-hood changes that have especially impacted our predictions for the Presidential and House elections. As a fully open-source model, we want to make sure everybody knows exactly what these changes are and why we made them. We're particularly proud of the first of these changes, which was a multi-month, combined effort of mathematics specialists designing a method to get the maximum value out of every single poll.

Poll Averaging:

We’ve completely revamped our polling averaging methodology. Read our Polling Average section in this methodology for a detailed explanation of our new methods and their mathematical underpinnings.

Generic Ballot (and regex…):

When we searched through our codebase again to remove any possible problems, we noticed a problem with one of our lines of regex that utilized 2020 generic ballot results instead of 2022 generic ballot results for some House races. This meant that we erroneously underestimated the strength of strong House incumbents. The combination of this and our improved polling averages pushed our predictions for some House races a few points to the left. We now think Democrats are clear favorites to take the House.

Candidates that dropped out:

Some candidates, such as Chris Sununu (NH-R), recently announced they would not seek reelection despite having the opportunity to. We've removed incumbency status from these races. This affected around 20 total races across all Senate, House, and Governor races.

August 20: Our team identified that, though we had filtered Vice President Harris from state-specific polls before President Biden's withdrawal from candidacy, generic ballot polls for Harris pre-withdrawal remained in our dataset. We have updated our codebase to remove all pre-withdrawal presidential polls—both state-specific and generic. This change and an influx of new polls during and directly before the DNC have resulted in a significant shift leftward for congressional and presidential races.

August 3: Previously, our model predicted the outcome of the upcoming November 5 election by incorporating uncertainty into the polls to account for increased unpredictability as we move further from the election date. However, we have decided to eliminate this added uncertainty and instead predict the election results as if the election were held today. This change minimizes assumptions and more accurately reflects the output of our machine learning algorithm without adding uncertainty.