Welcome to the co u rse ! IN TR OD U C TION TO IMP OR TIN G DATA - - PowerPoint PPT Presentation

welcome to the co u rse
SMART_READER_LITE
LIVE PREVIEW

Welcome to the co u rse ! IN TR OD U C TION TO IMP OR TIN G DATA - - PowerPoint PPT Presentation

Welcome to the co u rse ! IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON H u go Bo w ne - Anderson Data Scientist at DataCamp Import data Flat les , e . g . . t x ts , . cs v s Files from other so w are INTRODUCTION TO IMPORTING


slide-1
SLIDE 1

Welcome to the course!

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

Hugo Bowne-Anderson

Data Scientist at DataCamp

slide-2
SLIDE 2

INTRODUCTION TO IMPORTING DATA IN PYTHON

Import data

Flat les, e.g. .txts, .csvs Files from other soware

slide-3
SLIDE 3

INTRODUCTION TO IMPORTING DATA IN PYTHON

Import data

Flat les, e.g. .txts, .csvs Files from other soware Relational databases

slide-4
SLIDE 4

INTRODUCTION TO IMPORTING DATA IN PYTHON

Plain text files

slide-5
SLIDE 5

INTRODUCTION TO IMPORTING DATA IN PYTHON

Table data

Source: Kaggle

1

slide-6
SLIDE 6

INTRODUCTION TO IMPORTING DATA IN PYTHON

Table data

slide-7
SLIDE 7

INTRODUCTION TO IMPORTING DATA IN PYTHON

Table data

Flat le

slide-8
SLIDE 8

INTRODUCTION TO IMPORTING DATA IN PYTHON

Reading a text file

filename = 'huck_finn.txt' file = open(filename, mode='r') # 'r' is to read text = file.read() file.close()

slide-9
SLIDE 9

INTRODUCTION TO IMPORTING DATA IN PYTHON

Printing a text file

print(text) YOU don't know about me without you have read a book by the name of The Adventures of Tom Sawyer; but that ain't no matter. That book was made by Mr. Mark Twain, and he told the truth, mainly. There was things which he stretched, but mainly he told the truth. That is

  • nothing. never seen anybody but lied one time or

another, without it was Aunt Polly, or the widow, or maybe Mary. Aunt Polly--Tom's Aunt Polly, she is--and Mary, and the Widow Douglas is all told about in that book, which is mostly a true book, with some stretchers, as I said before.

slide-10
SLIDE 10

INTRODUCTION TO IMPORTING DATA IN PYTHON

Writing to a file

filename = 'huck_finn.txt' file = open(filename, mode='w') # 'w' is to write file.close()

slide-11
SLIDE 11

INTRODUCTION TO IMPORTING DATA IN PYTHON

Context manager with

with open('huck_finn.txt', 'r') as file: print(file.read()) YOU don't know about me without you have read a book by the name of The Adventures of Tom Sawyer; but that ain't no matter. That book was made by Mr. Mark Twain, and he told the truth, mainly. There was things which he stretched, but mainly he told the truth. That is

  • nothing. never seen anybody but lied one time or

another, without it was Aunt Polly, or the widow, or maybe Mary. Aunt Polly--Tom's Aunt Polly, she is--and Mary, and the Widow Douglas is all told about in that book, which is mostly a true book, with some stretchers, as I said before.

slide-12
SLIDE 12

INTRODUCTION TO IMPORTING DATA IN PYTHON

In the exercises, you’ll:

Print les to the console Print specic lines Discuss at les

slide-13
SLIDE 13

Let's practice!

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

slide-14
SLIDE 14

The importance of flat files in data science

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

Hugo Bowne-Anderson

Data Scientist at DataCamp

slide-15
SLIDE 15

INTRODUCTION TO IMPORTING DATA IN PYTHON

Flat files

slide-16
SLIDE 16

INTRODUCTION TO IMPORTING DATA IN PYTHON

Flat files

slide-17
SLIDE 17

INTRODUCTION TO IMPORTING DATA IN PYTHON

Flat files

slide-18
SLIDE 18

INTRODUCTION TO IMPORTING DATA IN PYTHON

Flat files

slide-19
SLIDE 19

INTRODUCTION TO IMPORTING DATA IN PYTHON

Flat files

Text les containing records That is, table data Record: row of elds or aributes

slide-20
SLIDE 20

INTRODUCTION TO IMPORTING DATA IN PYTHON

Flat files

Text les containing records That is, table data Record: row of elds or aributes Column: feature or aribute

slide-21
SLIDE 21

INTRODUCTION TO IMPORTING DATA IN PYTHON

Flat files

Text les containing records That is, table data Record: row of elds or aributes Column: feature or aribute

slide-22
SLIDE 22

INTRODUCTION TO IMPORTING DATA IN PYTHON

Header

slide-23
SLIDE 23

INTRODUCTION TO IMPORTING DATA IN PYTHON

Header

slide-24
SLIDE 24

INTRODUCTION TO IMPORTING DATA IN PYTHON

File extension

.csv - Comma separated values .txt - Text le commas, tabs - Delimiters

slide-25
SLIDE 25

INTRODUCTION TO IMPORTING DATA IN PYTHON

Tab-delimited file

slide-26
SLIDE 26

INTRODUCTION TO IMPORTING DATA IN PYTHON

Tab-delimited file

slide-27
SLIDE 27

INTRODUCTION TO IMPORTING DATA IN PYTHON

How do you import flat files?

Two main packages: NumPy, pandas Here, you’ll learn to import: Flat les with numerical data (MNIST) Flat les with numerical data and strings (titanic.csv)

slide-28
SLIDE 28

Let's practice!

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

slide-29
SLIDE 29

Importing flat files using NumPy

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

Hugo Bowne-Anderson

Data Scientist at DataCamp

slide-30
SLIDE 30

INTRODUCTION TO IMPORTING DATA IN PYTHON

Why NumPy?

NumPy arrays: standard for storing numerical data

slide-31
SLIDE 31

INTRODUCTION TO IMPORTING DATA IN PYTHON

Why NumPy?

NumPy arrays: standard for storing numerical data Essential for other packages: e.g. scikit-learn loadtxt() genfromtxt()

slide-32
SLIDE 32

INTRODUCTION TO IMPORTING DATA IN PYTHON

Importing flat files using NumPy

import numpy as np filename = 'MNIST.txt' data = np.loadtxt(filename, delimiter=',') data [[ 0. 0. 0. 0. 0.] [ 86. 250. 254. 254. 254.] [ 0. 0. 0. 9. 254.] ..., [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.]]

slide-33
SLIDE 33

INTRODUCTION TO IMPORTING DATA IN PYTHON

Customizing your NumPy import

import numpy as np filename = 'MNIST_header.txt' data = np.loadtxt(filename, delimiter=',', skiprows=1) print(data) [[ 0. 0. 0. 0. 0.] [ 86. 250. 254. 254. 254.] [ 0. 0. 0. 9. 254.] ..., [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.]]

slide-34
SLIDE 34

INTRODUCTION TO IMPORTING DATA IN PYTHON

Customizing your NumPy import

import numpy as np filename = 'MNIST_header.txt' data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols=[0, 2]) print(data) [[ 0. 0.] [ 86. 254.] [ 0. 0.] ..., [ 0. 0.] [ 0. 0.] [ 0. 0.]]

slide-35
SLIDE 35

INTRODUCTION TO IMPORTING DATA IN PYTHON

Customizing your NumPy import

data = np.loadtxt(filename, delimiter=',', dtype=str)

slide-36
SLIDE 36

INTRODUCTION TO IMPORTING DATA IN PYTHON

Mixed datatypes

Source: Kaggle

1

slide-37
SLIDE 37

INTRODUCTION TO IMPORTING DATA IN PYTHON

Mixed datatypes

slide-38
SLIDE 38

Let's practice!

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

slide-39
SLIDE 39

Importing flat files using pandas

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

Hugo Bowne-Anderson

Data Scientist at DataCamp

slide-40
SLIDE 40

INTRODUCTION TO IMPORTING DATA IN PYTHON

What a data scientist needs

Two-dimensional labeled data structure(s) Columns of potentially dierent types Manipulate, slice, reshape, groupby, join, merge Perform statistics Work with time series data

slide-41
SLIDE 41

INTRODUCTION TO IMPORTING DATA IN PYTHON

Pandas and the DataFrame

slide-42
SLIDE 42

INTRODUCTION TO IMPORTING DATA IN PYTHON

Pandas and the DataFrame

slide-43
SLIDE 43

INTRODUCTION TO IMPORTING DATA IN PYTHON

Pandas and the DataFrame

DataFrame = pythonic analog of R’s data frame

slide-44
SLIDE 44

INTRODUCTION TO IMPORTING DATA IN PYTHON

Pandas and the DataFrame

slide-45
SLIDE 45

INTRODUCTION TO IMPORTING DATA IN PYTHON

Manipulating pandas DataFrames

Exploratory data analysis Data wrangling Data preprocessing Building models Visualization Standard and best practice to use pandas

slide-46
SLIDE 46

INTRODUCTION TO IMPORTING DATA IN PYTHON

Importing using pandas

import pandas as pd filename = 'winequality-red.csv' data = pd.read_csv(filename) data.head() volatile acidity citric acid residual sugar 0 0.70 0.00 1.9 1 0.88 0.00 2.6 2 0.76 0.04 2.3 3 0.28 0.56 1.9 4 0.70 0.00 1.9 data_array = data.values

slide-47
SLIDE 47

INTRODUCTION TO IMPORTING DATA IN PYTHON

You’ll experience:

Importing at les in a straightforward manner Importing at les with issues such as comments and missing values

slide-48
SLIDE 48

Let's practice!

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

slide-49
SLIDE 49

Final thoughts on data import

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON

Hugo Bowne-Anderson

Data Scientist at DataCamp

slide-50
SLIDE 50

INTRODUCTION TO IMPORTING DATA IN PYTHON

Next chapters:

Import other le types: Excel, SAS, Stata Feather Interact with relational databases

slide-51
SLIDE 51

INTRODUCTION TO IMPORTING DATA IN PYTHON

Next course:

Scrape data from the web Interact with APIs

slide-52
SLIDE 52

Let's practice!

IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON