Instabase Drive


Python_SQL_R_OnYourLaptop.ipynb

Python, SQL, and R on Your Laptop

How to do data processing, analysis, visualization, and machine learning, all on your own computer!

Steven Chen (stevenzc@stanford.edu)

We've seen how to use three different languages to work with big data: Python, SQL, and R. In this class, we've worked with Jupyter Notebooks in Instabase: this certainly has a lot of benefits:

  • No need to install anything on your computer.
  • All files are safe on the Instabase Drive.
  • Can access notebooks from any computer.

However, in practice, many of us work locally, on our own computer. This has a lot of benefits, is easy to set up, and is fun! This notebook walks through how to install and work with Python, SQL, and R on your own laptop.

Why would I want to run programs on my laptop?

Well, many reasons, actually!

  • Working with files can be much easier (no more uploading and downloading files)
  • Programs will run faster (Instabase uses very low-power machines to save money, your laptop is probably faster!)
  • Can easily run your own programs and others' programs

Part 1: Python using Anaconda

To run Python (in both Jupyter Notebooks and regular Python .py files), we'll install Anaconda, the most popular Python software for data scientists.

Anaconda is a great Python distribution that includes all the Python packages we've gotten to know and love (Pandas, Numpy, Scikit-learn, etc.) Anaconda also includes Jupyter Notebook, which allows you to create and run notebooks on your laptop!

  1. Download Anaconda from here: https://www.anaconda.com/distribution/. Download the Python 2.7, 64-bit Graphical Installer for your operating system.

  2. Install Anaconda. On Windows computers, this will install both Anaconda Navigator (a graphical interface where we can launch things like Jupyter Notebook), as well as Anaconda Prompt (a command line where we can install packages). On Mac, this will install Anaconda Navigator and Python inside your Terminal.

  3. Launch Anaconda Navigator. Find the button called Jupyter Notebook and click Launch. A Jupyter Notebook file browser should open up in your web browser! From this interface, you can create new notebooks, and open old ones.

Note: Our class's code will run perfectly on your computer, as long as you remember to do the following:

  • Put the data files (ex. Cities.csv, Players.csv) in the same folder as your .ipynb notebook that you create, so that the notebook can find the data.

Anaconda Navigator

Anaconda Navigator

Jupyter Browser

Jupyter Notebook Browser. You can create a new notebook using the New button at the top right.

Bonus: How to write and run a Python .py file

Many Python programs are contained in text files, called Python scripts. These are regular text files containing lines of Python code, just like the cells of Jupyter Notebooks. When we run a Python script, the program runs from top to bottom.

Anaconda comes with an IDE (integrated development environment) called Spyder, which allows you to create, edit, and run Python programs. You can launch it by opening Anaconda Navigator and clicking the square named Spyder.

Spyder IDE

The Spyder IDE. You can create and edit Python .py files on the left, and type in and run single lines of Python on the console in the bottom right. Variables that currently exist in the console are displayed in the top right.

Part 2: SQL using Anaconda and Jupyter Notebook

If you'd like to run SQL queries on a dataset, you can actually use Jupyter Notebook in Anaconda, just like on Instabase! Before we do so, we'll need to install one package:

WINDOWS INSTRUCTIONS:

  1. Find the Anaconda Prompt on your computer and launch it. You should get a black text window.

  2. Type in the following (without quotes) and press enter: pip install ipython-sql. If the command prompts for y/n, type in Y and press enter. Wait for the command to finish running. (This may take a while!)

  3. Create a new Jupyter Notebook using the Part 1 instructions and copy the first couple cells of a SQL notebook (ex. SQLAdvanced.ipynb) into your notebook, and run them. Replace the name of the file and database with your own. You should be good to go!

MAC INSTRUCTIONS:

  1. Launch Terminal. You should get a text window.

  2. Type in the following (without quotes) and press enter: conda install -c conda-forge ipython-sql. If the command prompts, type in Y and press enter. Wait for the command to finish running. (This may take a while!)

  3. Create a new Jupyter Notebook using the Part 1 instructions and copy the first couple cells of a SQL notebook (ex. SQLAdvanced.ipynb) into your notebook and run them. If they run without errors, you're good to go!

Note: This is the same process you would take if you'd like to install other packages (ones that Anaconda doesn't include). Find their name online and replace ipython-sql with the name of the package you would like to use!

Part 3: R using RStudio

If you like the R language and would like to use it more, we'd recommend installing R and RStudio on your computer. RStudio is an IDE for R (like Spyder for Python) that makes it easy to load data and run programs interactively in R.

  1. First install the R language for your operating system at this link: https://cran.cnr.berkeley.edu/
  2. Next, install RStudio for your operating system from here: https://www.rstudio.com/products/rstudio/download/#download (Choose the Installer, not the Zip)

Spyder IDE

The RStudio IDE.

Lots of great tutorials on R and RStudio exist online. Here's a good starting point if you'd like to see what RStudio is capable of: http://web.cs.ucla.edu/~gulzar/rstudio/basic-tutorial.html

Bonus: How was this presentation made?

This Jupyter Notebook is filled with Markdown, a document writing language! Double click any cell to see what it looks like, and Shift+Enter to make it look pretty. When making a cell, select Markdown instead of Jupyter to make your own.

In [ ]: