In-Class Activity: Reading and Processing Data

 


Introduction

The purpose of this activity is to explore how to read data in from a file and process that data.


Exercises

  1. Create a new notebook in Google Colab for the statements and functions you will write in this activity. Save this new notebook with a name representative of this activity.

  2. It is often useful for us to do some form of processing with data that we either generate from our own code, or that we read in from a file, whether that be performing some calculations, searching for particular values, or doing some form of graphing. Download the file class-grades.txt and save it in your Google drive in the same place where you are saving your notebooks. This data set is a set of grades from a Chemical Engineering course at MacMaster University. This data set contains 99 rows and six columns. The columns give us: a prefix denoting which year the student is, the assignment grade, the tutorial grade, the midterm grade, the takehome exam grade, and the final exam grade for each student.
    Add the following two lines in a Code cell at the beginning of your notebook to give your notebook access to files on your Google drive:
                from google.colab import drive
                drive.mount('/drive')
                
    Run this cell. Follow the prompts to permit your notebook to access your Google drive files.

  3. The following code may be used to read in the data. It reads the first line (which contains only column headings) and does nothing with it. It then reads in one line at a time and prints it. Copy this code into a Code cell. Before you run the code, edit the path to the class-grades.txt file to be the path to this file in your Google drive. (Ask for help if you're not sure what this should be.) Now run the code. You should see the lines from the file being printed out.
                def readGrades():
                    # Create five empty lists here, 
                    # one for each of the sets of grades
                    # (assignment, tutorial, midterm, takehome exam, final exam grade)
    
                    with open('/drive/My Drive/ColabNotebooks/class-grades.txt', 'r') as f:
                        f.readline()   # read the first line (headings) and do nothing with it
                        for line in f:
                            print(line)
                            #line = line.strip('\n')
                            #grades = line.split('\t')
                            # Now do something here with the elements in grades
    
                    # Do some processing with the different lists of grades
                
    If you just ran the Code cell and didn't see anything happening, did you remember to call the function? (i.e., Don't forget to add a statement readGrades() at the end of the code, or in another Code cell, to call the function.)

  4. Now that you are able to read the data, we can put it into a form that you can be processed/analyzed. At the beginnning of the function, there is a comment to create five empty lists. Each list will hold the students' grades for a particular assignment or exam. Go ahead and create those lists at the beginning of the function.

  5. Run your code again to make sure it still works. You shouldn't see anything different happening yet. (It's good practice to test code after small changes to make sure nothing got broken.)

  6. Next, in the for loop, comment out the code that prints the line. Then uncomment the line that strips off the '\n' character at the end of each line, and uncomment the line that separates the data pieces with the '\t' character, storing the pieces in the list called grades. Print out grades.

  7. Run the code again. Does the output look like you expect? Do you get the same result as when you just printed the line? What do you notice about the data types? (Hint: Everything is a string.)

  8. After the comment "Now do something..." add lines of code to add the values from the grades list to the respective lists of assignment and exam grades. For example, if assignment was the name of one of the lists, we could use the line
                assignment.append(float(grades[1]))
                
    to add the student's assignment grade to the list of assignment grades. You can do something similar for each of the other grades for this student.

  9. Add a statement to print the values for one of your assignment lists after all of the data has been read. This statement should go after the comment, "Do some processing...". Run your code. Does it look like you think it should? Why or why not?

  10. Now add a line of code to print the number of elements in one of your assignment lists. If the result is not 99, there is an error somewhere. Fix it before going on.

  11. If you are confident that your data has all been read correctly, go ahead and comment out the line that prints the entire grades list.

  12. In a new Code cell at the top of the file, define a function that takes a list (of numeric values) as a parameter and returns the average of the values in the list.

  13. In the readGrades function, after you have printed one of your assignment lists and its size, add lines to call your average function with each of the 5 different assignment lists. Print out messages to state the average of each assignment.

  14. In the same Code cell where you defined your average function, define a function that takes a list (of numeric values) as a parameter and returns the minimum value of that list. Do a visual check through the assignment grade lists to make sure you are getting the correct output.

  15. After the lines of code where you called the average function with your assignment lists, call the minimum function with your assignment lists and print out the minimum value for each assignment.

  16. Similarly, create a function that takes a list (of numeric values) as a parameter and returns the maximum value in that list.

  17. Call this maximum function with your assignment lists and print the maximum values. Do a visual check through the assighment lists to ensure your code produced the correct result.

  18. Add documentation to your file:
    • Add a Text cell at the very beginning of your file with your name, the date, and the activity description.
    • Add Text cells before each of your Code cells to explain what you are doing in the code. Alternatively, add comments at the beginning of each of your Code cells to explain what you are doing. Each function that you have defined should have a comment before it to explain what it does.

  19. (Optional Challenge): Define a function that will print out the number of A's, B's, C's, D's, and F's that were earned on a particular assignment. This function should take a list as a parameter and does not need to return anything.

Submit