Introduction#

Programming for Data Science is an introductory course focused on essential programming concepts, structures, and techniques. Students will develop the skills to write efficient code, as well as to read, understand, and debug it. The course is primarily based on Python and covers its basic and intermediate programming features. Additionally, it introduces some of the fundamental Python libraries for Data Science, namely NumPy and Pandas. Given the importance of R in Data Science, a brief introduction to this language is also included. The course concludes with an introduction to the Command Line and GitHub, two essential tools commonly used by Data Scientists in their daily work.

Learning Objectives#

  • Understand the importance of programming for data science.

  • Confidently work in one of the most commons programming environment for Data Science: Jupyter Notebooks.

  • Identify and use data types and data structures.

  • Read and write to and from various data formats.

  • Confidently call and write functions and methods.

  • Introduction to Object Oriented Programming.

  • Use and develop your own modules and packages to encapsulate and efficiently organize the code.

How You Will Know You Are Learning#

This course combines hands-on tutorials with in-class coding activities. Throughout all class sessions, you should expect to follow along on your device. During in-class coding sessions, you will spend your time working through problems on your computer. The best way to become comfortable with the material is through continuous practice. The goal of this active learning class is to provide an environment where you can practice, ask questions, and troubleshoot. Working in small groups is encouraged. While not every class may go perfectly, week by week, you should feel more comfortable with the material.

For this course, there is no need to install anything locally. All code will be executed on Rivanna through the Open OnDemand service provided by UVA, minimizing the burden and potential challenges associated with local installations. While you are welcome to attempt installing the tools on your own time, please note that the instructors will not provide technical assistance for this.

This course introduces students to fundamental coding languages and techniques in data science, with a primary focus on Python and R in a lesser extent. Popular packages such as NumPy, Pandas, and tidyverse will be covered. Additionally, command-line interface (CLI) skills and software management tools like Git and GitHub will be included.

For students new to programming, this course will provide an introduction to basic operations and data structures in two languages commonly used in data science. We will also discuss and practice good habits in software development and management. As you will learn, there are several other essential components “around” the coding process, such as setting up and working within different environments, using source control tools like Git, and handling various types of data. These skills will be vital in our work with Python and R and will undoubtedly serve you well in future courses.

Students who successfully complete this course will be prepared to confidently and efficiently handle many fundamental tasks in data science, including data capture, cleaning, manipulation, and visualization.

How will you succeed in this course?#

Participate. You are expected to actively participate in the course, guided by your own learning goals. Since you all come from diverse backgrounds and experiences in data science, your peers are invaluable resources for learning. Don’t shortchange them or yourself by coming to class unprepared or by sitting quietly during discussions.

Communicate. This course may differ from your previous experiences, with increasingly complex content and new technical challenges. The instructors are here to help you navigate these challenges and maintain an open-door policy in addition to class and office hours. Please keep the instructors informed about which ideas and tools are challenging for you and how you are progressing in the class. By starting this habit early in the semester, we can better tailor our activities to support your learning. If you’re uncomfortable with email or office hours, feel free to post a comment in the Anonymous Feedback section on the class Canvas site.

Take risks. Programming often requires making personal judgments about what to include or omit, which structural approach to take, and how to interpret complex data. Sometimes, the “right” answer is unknown, incomplete, or even incorrect! Nobel Prize breakthroughs have often resulted from attempts to support a “best guess” with incomplete data or from finding explanations for an “experiment gone wrong.” You will be rewarded for taking risks to defend your ideas, as long as your assumptions and decision-making process are transparent in your answers. If you’re unsure how to start a problem, don’t hesitate to defend your assumptions and give it a shot!

But above all, you’ll find that learning a programming language is much like learning a new spoken language. Before long, you’ll see yourself becoming capable of communicating with your computer and thinking logically. This will be both comforting and rewarding. So, ENJOY THE RIDE!