Objectives

  1. What You Will Learn
  2. How will you succeed in this course?

What You Will Learn

Understand the importance of data and programming for data science

  • Understand the relationship between data and data science.
  • Understand how data is related to programming.
  • Know broadly what kinds of data exist.

Confidently work in an appropriate programming environment

  • Confidently write code in IDEs and in the command line.
  • Be exposed to Visual Studio Code, Jupyter Notebooks, and R Studio.
  • Understand which editor is appropriate to which task.
  • Find and use documents, data, and code online.

Recognize the importance of source control and how it is used

  • Understand how to initialize a git repository and commit items to it.
  • Grasp the relationship between a local git repository and a remote GitHub repository.
  • Know how to pull from, push to, and submit PR’s for shared repositories.

Identify and use data types and data structures

  • Know the elementary data types for each language:
    • booleans, integers, floats, strings, etc.
  • Know the elementary data structures for each language:
    • Python: set, list, dictionary, and tuple.
    • R: vectors, list, matrix, factor.
  • Know some of the Python Scientific Stack:
    • Numpy
    • Pandas
  • Know and perform basic operations for each data type and structure.
  • Select and apply an appropriate data structure based on the problem requirements.

Read and write to and from various data formats

  • Read text and data files from disc.
  • Import data into a Pandas dataframe.

Confidently call and write functions and methods

  • Understand the structure and use of functions for programming.
  • Use built-in and import functions to perform fundamental tasks.
  • Correctly pass parameters and retrieve function output(s).
  • Use built-in object methods for data types and structures, e.g. string methods and dataframe methods.
  • Know what vectorized functions and methods are.

Confidently write a class and call its methods

  • Understand role of classes in organizing code.
  • Understand how classes group together variables as attributes and functions as methods into encapsulated components.
  • Understand how classes can inherit the variables and methods of other classes.

Use packages to augment existing data structures

  • In Python, NumPy and Pandas essentials (e.g. simple queries and small ML computation)
  • In Python and R, use a program API to utilize existing functions (e.g. assert statements.)

Write your own modules of classes in Python

  • Write classes and organize them into modules to make your more modular.
  • Make your modules sharable so that others can install them with Python’s setup and install functions.
  • Write documentation for your modules so that others can make sense of them.
  • Write test scripts to go with your modules.

How will you succeed in this course?

Participate. You are expected to participate actively in the course based on your own learning goals. Since you all come from different backgrounds and experiences of data science, your peers are valuable resources for learning. Don’t shortchange them and yourself by coming to class without preparing or by sitting quietly during class discussion.

Communicate. This course may be unlike any of your previous courses, with increasingly complex content and new kinds of technical challenges. I am committed to helping you address these new challenges, and therefore have an open door policy in addition to class and office hours; I will meet with you or respond to your email within 24 hours whenever possible. You should let me know what ideas and tools are challenging to you and how you are doing in the class. If you start this habit early in the semester, then I will be able to better tailor our activities to help you learn. If you’re not comfortable with email or office hours, then post a comment in Anonymous Feedback in the class Canvas site.

Take risks. Programming often requires personal judgments about what to include or ignore, which structural approach to follow, and/or how to interpret complex data. Sometimes the “right” answer is unknown, incomplete, or even wrong! Nobel Prize breakthroughs have often resulted from attempting to support a “best guess” with incomplete data or from finding evidence to explain an “experiment gone wrong.” You will be rewarded for going out on a limb to defend your ideas as long as your assumptions and decision‐making process are transparent in your answers. If you’re not sure how to start a problem, don’t be scared to defend your assumptions and go for it!


Back to top

Copyright © 2023 Neal Magee. Distributed by an MIT license.

Page last modified: Aug 16 2021 at 09:57 AM.