Over the past couple of years Netflix and other tech firms have been enormously successful producing TV series for their subscribers. What would this strategy look like in the case of movies? I use data publicly available from Box Office Mojo and IMDB to find the films that are most likely to have a successful sequel.
For subscription services like HBO and Netflix, focusing on series rather than movies makes intuitive sense. This is because both series and subscription services are recurring on a regular basis. Despite this intuition, history suggests that people are willing to subscribe for movies as well. HBO relied soley on movies for the first 20 years of its existence.
Producing original content can also make a lot of sense. An expensive series like House of Cards costs about 60 million a season. Given that Netflix has 33.3 Million subscribers. Since new subscribers are going to be paying $120 a year, Netflix only has to increase the number of their subscribers by 1.5% to break even. In the case of house of cards, it seems likely that Netflix made a good profit on the series.
costs = 50e6; subsciber_cost = 120
sub_rev = subsciber_cost * 33.3e6
print("${:,}".format(int(sub_rev * 0.015)))
$59,940,000
The breakeven point for movies can be much lower. Summer blockbusters budgets may attract the most attention with their eyewatering budgets but many movies cost 25 million or less. The break-even point for a 25MM film would be to increase the subscriber base by about 0.7%.
My investigation yields the following relevant results:
Finding the most promising remake candidates requires understanding the movie industry better. I use ROI to measure the success of a movie because it is an excellent proxy for how many people viewed the movie.
I train a linear regression model that predicts ROI based on a basket of features retrieved from IMDB and Box Office Mojo. I divide my dataset of sequels into a training set (85% of total data) and a test set (15%) of total data. I then use a linear regression with L1 normalization to find the most important factors that are correlated with ROI. My model assumes that there are no confounders ($E[\epsilon \mid X] = 0$. This is a major limitation of my model.
More or less every regression looks like
I find that only budget, audience rating, and being realased in October have a statistically significant effect on ROI. Surprisingly, the quality or box office of the original film does not seem to influence the results. My analysis suggests that it is difficult to predict ex-ante the success of movies.
Factor | Change in ROI |
---|---|
Released in Month of October | |
Budget 10% | |
Audience Rating 1% | 6.3% |
Other months | 0* |
Runtime | 0* |
Genre | 0* |
Days since sequel | 0* |
Months other than October | 0* |
MPAA Rating | 0* |
Critic rating | 0* |
Change in PB from original | 0* |
Audience rating of original | 0* |
* Statistically insignificantly different from 0
Keeping the above in mind, these are the films that have the best chance to suceed. As many films (~200) satisfy all 4 conditions above, films are ranked by their thumb-rule ROI.
See the appendix and accompanying code here. The appendix contains information like how I computed ROI as well as various limitations of the model and more descriptive statistics.
Written on October 22nd , 2015 by Will Sorenson