Using the Jupyter Notebook with an EC2 instance allows us to analyze data of a size that would be difficult or impossible to do on our local machines. The most powerful (from a memory perspective) EC2 instance has 244GB of Ram and 32 cores and often costs only 50 cents/hr.
This tutorial explains:
This post assumes you are already familiar with the basics of EC2. You should know how to start an instance and SSH into it.
wget link
on your remote host to download it. Just follow the directions on the page. The installation will ask whether to append the conda environment to the Python path. Type in yes
.bashrc
(type . .bashrc
)The Jupyter Documentation provides an excellent walkthrough on configuring the server here. The tl;dr is:
jupyter notebook --generate-config
in order to generate the configuration file.from notebook.auth import passwd; passwd()
and then enter in the password you want to use to access the notebook. Remember the password and copy the hash you get back.Generate an SSL certificate using the following code:
openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.key -out mycert.pem
Open the configuration file (~/.jupyter/jupyter_notebook_config.py
by default) you made with your favorite text editor and copy in the hash generated from passwd():
c = get_config()
c.NotebookApp.password = u'sha1:67c9e60bb8b6:9ffede0825894254b2e042ea597d771089e11aed'
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 9999
c.NotebookApp.certfile = u'/home/ubuntu/mycert.pem'
c.NotebookApp.keyfile = u'/home/ubuntu/mycert.key'
In .bashrc
make the following alias:
alias jupyter='nohup jupyter notebook &'
Then reload .bashrc
again. This also makes your jupyter session persistent. It will not stop running just because the ssh session is closed.
Now you can type jupyter
in the terminal and the server should run. Then browse to https://ip_address:9999 and login with the password you entered:
Spot instances are much cheaper than on-demand instances. They downside is that you can be kicked off your instance if the amount of capacity of the Spot Instance gets too high.
The best way to deal with the transient nature of spot instances is to use the following workflow:
This way we minimize the amount of time that we have to run the EC2 instance and we don’t have to worry about losing any code because it is all in the local host.
If you end up writing novel code in your EC2 instance I recommend keeping your code synced with github.