4. Day 2 Lecture 1#

4.1. Basic NumPy#

Numpy is a mathematical based library with extra capabilities. Libraries are “add-ons” developed by other programmers. They add extra functions and capabilities.

  1. Getting started

Importing numpy package to python

import numpy as np

Now whenever we want to use a NumPy command, refer to it with np.

  1. numpy arrays

Use np.array() method to create a numpy array object

numpy_array = np.array([[1, 2, 3], [3, 2, 1]])
print(numpy_array)
[[1 2 3]
 [3 2 1]]
  • Or you can convert a python list into a numpy array

python_list = [0, 0, 1]
numpy_array_from_list = np.array(python_list)
print(numpy_array_from_list)
[0 0 1]
  1. Basic methods of numpy arrays

size: returns the total number of elements of the array

print(numpy_array.size)
6
  • shape: shows the dimension of the numpy array

print(numpy_array.shape)
(2, 3)
  • reshape: changes the dimensionality of your array

print(numpy_array.reshape(3, 2))
[[1 2]
 [3 3]
 [2 1]]
  • T: dot T gives you the transpose of your array

print(numpy_array.T)
[[1 3]
 [2 2]
 [3 1]]
  1. Basic arithmetics

  • Operations are carried out element-wise

  • Addition, multiplication

print("numpy array:\n", numpy_array)
addition = numpy_array + 5
print("after adding 5:\n", addition)
multiplication = numpy_array * 5
print("after multiplying by 5:\n", multiplication)
numpy array:
 [[1 2 3]
 [3 2 1]]
after adding 5:
 [[6 7 8]
 [8 7 6]]
after multiplying by 5:
 [[ 5 10 15]
 [15 10  5]]

But what about dot and cross products?

dot_product = np.dot([1, 2, 3], [1, 2, 3])
print("dot product:\n", dot_product)
cross_product = np.cross([3, 2, 1], [1, 2, 3])
print("cross product:\n", cross_product)
dot product:
 14
cross product:
 [ 4 -8  4]

4.2. Basic Pandas#

  • Pandas is a library

  • DataFrame is a datatype/datastructure/object

    • the main offering of the pandas library

4.3. Clear example of DataFrame#

alt text

  • Attributes

    • Column Indexes

    • Row Indexes

    • Multiple datatypes

    • Datatypes function of column

      • List of strings in the entry at [4,’column_three’]

4.4. How are they made?#

  • They are declared

  • The information comes from

    • read in a .csv file

    • from a python dictionary

    • from a pickle (special binary file)

    • from .json file

    • and more

import pandas as pd
mydataset = {
  'fruit': ["cherry", "strawberry", "apple"],
  'quantity': [3, 7, 2]
}

mydf = pd.DataFrame(mydataset)

print(mydf)
        fruit  quantity
0      cherry         3
1  strawberry         7
2       apple         2

Pandas Series

Series is like an array.

pd_series = pd.Series([3, 7, 2])
print(pd_series)
0    3
1    7
2    2
dtype: int64

Locating a column in a dataframe

print(mydf)
fruit_column = mydf.loc[:,"fruit"]
print(fruit_column)
quantity_column = mydf.loc[:, "quantity"]
print(quantity_column)
        fruit  quantity
0      cherry         3
1  strawberry         7
2       apple         2
0        cherry
1    strawberry
2         apple
Name: fruit, dtype: object
0    3
1    7
2    2
Name: quantity, dtype: int64

Locating a row in a dataframe

print(mydf)
first_row = mydf.loc[0]
print(first_row)
        fruit  quantity
0      cherry         3
1  strawberry         7
2       apple         2
fruit       cherry
quantity         3
Name: 0, dtype: object

You can also do slicing

print(mydf)
slices = mydf.loc[1:2]
print(slices)
        fruit  quantity
0      cherry         3
1  strawberry         7
2       apple         2
        fruit  quantity
1  strawberry         7
2       apple         2

Some basic pandas methods

  • mean

  • min/max

  • std

# first we locate the column
quantity_column = mydf.loc[:, "quantity"]
average_quantity = quantity_column.mean()
print("the average quantity is:", average_quantity)
the average quantity is: 4.0
max_val = quantity_column.max()
print("the max value is:", max_val)
min_val = quantity_column.min()
print("the min value is:", min_val)
the max value is: 7
the min value is: 2
std_quantity = quantity_column.std()
print("the standard deviation is:", std_quantity)
the standard deviation is: 2.6457513110645907

Side note on output formatting

Use %.3f suffix when printing

“.3” means 3 decimal places, “f” means floating point value.

“%” indicates where to put this value.

print("the standard deviation is: %.3f"%std_quantity)
the standard deviation is: 2.646