Class on February 1 2018

Rob provided an overview of a useful online R versus Python debate:

Python vs R
  • R and Python are the two most popular programming languages used by data analysts and data scientists.
  • Both are free and and open source (and available for use in a Jupyter Notebook), and were developed in the early 1990s
  • R is for statistical analysis (ad hoc analysis and exploring datasets)
  • Python is a general-purpose language (data manipulation and repeated tasks)
  • R has a steep learning curve, and people without programming experience may find it overwhelming.
  • Python is generally considered easier to pick up.
A basic iterative process of using a Jupyter Notebook entails:
  • restart the kernel
  • interrupt kernel
  • run selected cell
  • move selected cells down
  • copy selected cells
  • paste cells below
  • move selected cells up
  • cut selected cells
  • insert cell below
  • Save and Checkpoint
Rob then spent the rest of class illustrating how a Python Jupyter Notebook could facilitate the efficiency by which the Google Sheets exercises from the week before could be performed with code.

Rob emphasized the importance of good file management (ocg250, data, code, output) and how it was often filled with regret when not done mindfully from the start.

A Python Notebook tutorial along the lines of what students did in class is here: