Will Sorenson Machine Learning and Coding

In Search of the Most Promising Movie Remakes or Sequels

Over the past couple of years Netflix and other tech firms have been enormously successful producing TV series for their subscribers. What would this strategy look like in the case of movies? I use data publicly available from Box Office Mojo and IMDB to find the films that are most likely to have a successful sequel.

Movies vs TV Series

For subscription services like HBO and Netflix, focusing on series rather than movies makes intuitive sense. This is because both series and subscription services are recurring on a regular basis. Despite this intuition, history suggests that people are willing to subscribe for movies as well. HBO relied soley on movies for the first 20 years of its existence.

Producing original content can also make a lot of sense. An expensive series like House of Cards costs about 60 million a season. Given that Netflix has 33.3 Million subscribers. Since new subscribers are going to be paying $120 a year, Netflix only has to increase the number of their subscribers by 1.5% to break even. In the case of house of cards, it seems likely that Netflix made a good profit on the series.

costs = 50e6; subsciber_cost = 120
sub_rev = subsciber_cost * 33.3e6
print("${:,}".format(int(sub_rev * 0.015)))

The breakeven point for movies can be much lower. Summer blockbusters budgets may attract the most attention with their eyewatering budgets but many movies cost 25 million or less. The break-even point for a 25MM film would be to increase the subscriber base by about 0.7%.

Stylized Facts

My investigation yields the following relevant results:

  1. The single most important factor is to be beloved by the audience.
    • 1% increase in rating of the masses associated with a 6% increase of ROI.
    • critics unimportant.
  2. Though Budget does a great job of predicting box office sales, it negatively correlated with ROI.
    • A 10% increase in production budget is associated with a 5% decrease in ROI.
  3. High-cost films are much more unpredictable than than low-cost films
  4. Audience reviews of the original does not predict reviews of the first
  5. Month of year is unimportant once controlling for the above except for October.
  6. Sequels have a high ROI.
    • Gross on average 5 times their production budgets
    • Under traditional assumptions: 278% ROI
    • Under realistic assumptions: 112% ROI


Finding the most promising remake candidates requires understanding the movie industry better. I use ROI to measure the success of a movie because it is an excellent proxy for how many people viewed the movie.

I train a linear regression model that predicts ROI based on a basket of features retrieved from IMDB and Box Office Mojo. I divide my dataset of sequels into a training set (85% of total data) and a test set (15%) of total data. I then use a linear regression with L1 normalization to find the most important factors that are correlated with ROI. My model assumes that there are no confounders ($E[\epsilon \mid X] = 0$. This is a major limitation of my model.

Important Relationships

Everything else is not significant

More or less every regression looks like

Important Features

I find that only budget, audience rating, and being realased in October have a statistically significant effect on ROI. Surprisingly, the quality or box office of the original film does not seem to influence the results. My analysis suggests that it is difficult to predict ex-ante the success of movies.

Factor Change in ROI
Released in Month of October
Budget 10%
Audience Rating 1% 6.3%
Other months 0*
Runtime 0*
Genre 0*
Days since sequel 0*
Months other than October 0*
MPAA Rating 0*
Critic rating 0*
Change in PB from original 0*
Audience rating of original 0*

* Statistically insignificantly different from 0

What Kinds of Movies Should be Remade?

  1. Lower risk: Focus on cheaper films
    • Less likely to have massively negative ROI.
    • Able to produce many cheap films rather than a few expensive films
  2. Release the Movie in October
    • Unmet demand before the busy months of November/December?
  3. Do a good job on the sequel
    • Audience opinion on sequels not correlated with the rating of the original film.
  4. Keep it short
    • Runtime not associated with ROI.
    • Do not take this logic too far.

The Most Promising Sequels

Keeping the above in mind, these are the films that have the best chance to suceed. As many films (~200) satisfy all 4 conditions above, films are ranked by their thumb-rule ROI.

  1. Gone with the wind
  2. Juno
  3. Crocodile Dundee
  4. Blair Witch Project
  5. The Rocky Horror Picture show
  6. National Lamboon’s Animal House
  7. Platoon
  8. Fahrenheit 9/11
  9. Magic Mike
  10. The Usual Suspects.


See the appendix and accompanying code here. The appendix contains information like how I computed ROI as well as various limitations of the model and more descriptive statistics.