Tut#15-16: Pandas/Numpy CPSC 501 Dr. J. Hudson University of - - PowerPoint PPT Presentation

tut 15 16 pandas numpy
SMART_READER_LITE
LIVE PREVIEW

Tut#15-16: Pandas/Numpy CPSC 501 Dr. J. Hudson University of - - PowerPoint PPT Presentation

Tut#15-16: Pandas/Numpy CPSC 501 Dr. J. Hudson University of Calgary Arshia Hosseini T01/T02 Basics The primary data structures in pandas are implemented as two classes: DataFrame , which you can imagine as a relational data table,


slide-1
SLIDE 1

Tut#15-16: Pandas/Numpy

CPSC 501

  • Dr. J. Hudson

University of Calgary

Arshia Hosseini

T01/T02

slide-2
SLIDE 2

Basics

2

  • The primary data structures in pandas are implemented as two classes:
  • DataFrame, which you can imagine as a relational data table, with rows and

named columns.

  • Series, which is a single column. A DataFrame contains one or

more Series and a name for each Series.

slide-3
SLIDE 3

Loading a file

3

  • The following example loads a file with California

housing data

  • The example uses DataFrame.describe to show

interesting statistics about a DataFrame

  • Another useful function is DataFrame.head, which

displays the first few records of a DataFrame:

slide-4
SLIDE 4

Cont’d

4

  • Another powerful feature of pandas is graphing. For

example, DataFrame.hist lets you quickly study the distribution of values in a column:

slide-5
SLIDE 5

Accessing/Manupulating Data

5

  • Accessing is the same is Python’s list/dict
  • You can also apply Python’s basic arithmetic
  • perations to series, or you can use them as

arguments to NumPy functions.

slide-6
SLIDE 6

Modifying

6

  • Modying is also straightforward.
  • Adding two series to an existing DataFrame:
  • Both Series and DataFrame objects also define

an index property that assigns an identifier value to each Series item or DataFrame row.

  • Call DataFrame.reindex to manually reorder the rows.
slide-7
SLIDE 7

Numpy

7

  • NumPy’s main objective is an n-dimensional array (ndarray). Dimensions

are called axes. It also goes by its alias: numpy.array

  • Some of the attributes are:
  • ndarray.ndim the number of axes (dimensions) of the array.
  • ndarray.shape the dimensions of the array. This is a tuple of integers indicating the size
  • f the array in each dimension. For a matrix with nrows and m columns, shape will

be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

  • ndarray.size the total number of elements of the array. This is equal to the product of

the elements of shape.

  • ndarray.dtype an object describing the type of the elements in the array. One can create
  • r specify dtype’s using standard Python types. Additionally NumPy provides types of its
  • wn. numpy.int32, numpy.int16, and numpy.float64 are some examples.
  • ndarray.itemsize the size in bytes of each element of the array. For example, an array of

elements of type float64 has itemsize 8 (=64/8)

  • ndarray.data the buffer containing the actual elements of the array. Normally, we won’t

need to use this attribute because we will access the elements in an array using indexing facilities.

slide-8
SLIDE 8

An example

8

slide-9
SLIDE 9

Creation

9

  • you can create an array from a regular Python list or

tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences.

slide-10
SLIDE 10

Cont’d

10

  • Often, the elements of an array are originally unknown, but its size is
  • known. Hence, NumPy offers several functions to create arrays with initial

placeholder content. These minimize the necessity of growing arrays, an expensive operation.

slide-11
SLIDE 11

Operations

11

  • Arithmetic operators on arrays apply elementwise. A new array is created and filled

with the result.

slide-12
SLIDE 12

12

  • By default, these operations apply to the array as though it were a list of

numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array:

slide-13
SLIDE 13

Universal Functions

13

slide-14
SLIDE 14

Iterating/Slicing

14

slide-15
SLIDE 15

Copying

15