Keeping objects secure IN TRODUCTION TO AW S BOTO IN P YTH ON - - PowerPoint PPT Presentation

keeping objects secure
SMART_READER_LITE
LIVE PREVIEW

Keeping objects secure IN TRODUCTION TO AW S BOTO IN P YTH ON - - PowerPoint PPT Presentation

Keeping objects secure IN TRODUCTION TO AW S BOTO IN P YTH ON Maksim Pecherskiy Data engineer Why care about permissions? df = pd.read_csv('https://gid-staging.s3.amazonaws.com/potholes.csv') INTRODUCTION TO AWS BOTO IN PYTHON Why care


slide-1
SLIDE 1

Keeping objects secure

IN TRODUCTION TO AW S BOTO IN P YTH ON

Maksim Pecherskiy

Data engineer

slide-2
SLIDE 2

INTRODUCTION TO AWS BOTO IN PYTHON

Why care about permissions?

df = pd.read_csv('https://gid-staging.s3.amazonaws.com/potholes.csv')

slide-3
SLIDE 3

INTRODUCTION TO AWS BOTO IN PYTHON

Why care about permissions?

Permission Allowed!

# Generate the boto3 client for interacting with S3 s3 = boto3.client('s3', region_name='us-east-1', aws_access_key_id=AWS_KEY_ID, aws_secret_access_key=AWS_SECRET) # Use client to download a file s3.download_file( Filename='potholes.csv', Bucket='gid-requests', Key='potholes.csv')

slide-4
SLIDE 4

INTRODUCTION TO AWS BOTO IN PYTHON

AWS Permissions Systems

slide-5
SLIDE 5

INTRODUCTION TO AWS BOTO IN PYTHON

AWS Permissions Systems

slide-6
SLIDE 6

INTRODUCTION TO AWS BOTO IN PYTHON

ACLs

slide-7
SLIDE 7

INTRODUCTION TO AWS BOTO IN PYTHON

ACLs

Upload File

s3.upload_file( Filename='potholes.csv', Bucket='gid-requests', Key='potholes.csv')

Set ACL to 'public-read'

s3.put_object_acl( Bucket='gid-requests', Key='potholes.csv', ACL='public-read')

slide-8
SLIDE 8

INTRODUCTION TO AWS BOTO IN PYTHON

Setting ACLs on upload

Upload le with 'public-read' ACL

s3.upload_file( Bucket='gid-requests', Filename='potholes.csv', Key='potholes.csv', ExtraArgs={'ACL':'public-read'})

slide-9
SLIDE 9

INTRODUCTION TO AWS BOTO IN PYTHON

Accessing public objects

S3 Object URL Template

https://{bucket}.s3.amazonaws.com/{key}

URL for Key= '2019/potholes.csv'

https://gid-requests.s3.amazonaws.com/2019/potholes.csv

slide-10
SLIDE 10

INTRODUCTION TO AWS BOTO IN PYTHON

Generating public object URL

Generate Object URL String

url = "https://{}.s3.amazonaws.com/{}".format( "gid-requests", "2019/potholes.csv") 'https://gid-requests.s3.amazonaws.com/2019/potholes.csv' # Read the URL into Pandas df = pd.read_csv(url)

slide-11
SLIDE 11

INTRODUCTION TO AWS BOTO IN PYTHON

How access is decided

slide-12
SLIDE 12

INTRODUCTION TO AWS BOTO IN PYTHON

How access is decided

slide-13
SLIDE 13

INTRODUCTION TO AWS BOTO IN PYTHON

Review

slide-14
SLIDE 14

INTRODUCTION TO AWS BOTO IN PYTHON

Review

Set ACL to 'public-read'

s3.put_object_acl( Bucket='gid-requests', Key='potholes.csv', ACL='public-read')

Set ACL to 'private'

s3.put_object_acl( Bucket='gid-requests', Key='potholes.csv', ACL='private')

slide-15
SLIDE 15

INTRODUCTION TO AWS BOTO IN PYTHON

Review

Upload le with 'public-read' ACL

s3.upload_file( Bucket='gid-requests', Filename='potholes.csv', Key='potholes2.csv', ExtraArgs={'ACL':'public-read'})

slide-16
SLIDE 16

INTRODUCTION TO AWS BOTO IN PYTHON

Review

Generate Object URL String

url = "https://{}.s3.amazonaws.com/{}".format( "gid-requests", "2019/potholes.csv") 'https://gid-requests.s3.amazonaws.com/2019/potholes.csv' # Read the URL into Pandas df = pd.read_csv(url)

slide-17
SLIDE 17

Let's practice!

IN TRODUCTION TO AW S BOTO IN P YTH ON

slide-18
SLIDE 18

Accessing private

  • bjects in S3

IN TRODUCTION TO AW S BOTO IN P YTH ON

Maksim Pecherskiy

Data Engineer

slide-19
SLIDE 19

INTRODUCTION TO AWS BOTO IN PYTHON

Downloading a private le

df = pd.read_csv('https://gid-staging.s3.amazonaws.com/potholes.csv')

slide-20
SLIDE 20

INTRODUCTION TO AWS BOTO IN PYTHON

Downloading private les

Download File

s3.download_file( Filename='potholes_local.csv', Bucket='gid-staging', Key='2019/potholes_private.csv')

Read From Disk

pd.read_csv('./potholes_local.csv')

slide-21
SLIDE 21

INTRODUCTION TO AWS BOTO IN PYTHON

Accessing private les

Use '.get_object()'

  • bj = s3.get_object(Bucket='gid-requests', Key='2019/potholes.csv')

print(obj)

slide-22
SLIDE 22

INTRODUCTION TO AWS BOTO IN PYTHON

Accessing private les

slide-23
SLIDE 23

INTRODUCTION TO AWS BOTO IN PYTHON

Accessing private Files

Get the object

  • bj = s3.get_object(

Bucket='gid-requests', Key='2019/potholes.csv')

Read StreamingBody into Pandas

pd.read_csv(obj['Body'])

slide-24
SLIDE 24

INTRODUCTION TO AWS BOTO IN PYTHON

Pre-signed URLs

Expire after a certain timeframe Great for temporary access Example

https://s3.amazonaws.com/?AWSAccessKeyId=12345&Signature=rBmnrwutb6VkJ9hE8Uub%2BBYA9mY%3D&Expires=155

slide-25
SLIDE 25

INTRODUCTION TO AWS BOTO IN PYTHON

Pre-signed URLs

Upload a le

s3.upload_file( Filename='./potholes.csv', Key='potholes.csv', Bucket='gid-requests')

slide-26
SLIDE 26

INTRODUCTION TO AWS BOTO IN PYTHON

Pre-signed URLs

Generate Presigned URL

share_url = s3.generate_presigned_url( ClientMethod='get_object', ExpiresIn=3600, Params={'Bucket': 'gid-requests','Key': 'potholes.csv'} )

Open in Pandas

pd.read_csv(share_url)

slide-27
SLIDE 27

INTRODUCTION TO AWS BOTO IN PYTHON

Load multiple les into one DataFrame

# Create list to hold our DataFrames df_list = [] # Request the list of csv's from S3 with prefix; Get contents response = s3.list_objects( Bucket='gid-requests', Prefix='2019/') # Get response contents request_files = response['Contents']

slide-28
SLIDE 28

INTRODUCTION TO AWS BOTO IN PYTHON

Load multiple les into one DataFrame

# Iterate over each object for file in request_files:

  • bj = s3.get_object(Bucket='gid-requests', Key=file['Key'])

# Read it as DataFrame

  • bj_df = pd.read_csv(obj['Body'])

# Append DataFrame to list df_list.append(obj_df)

slide-29
SLIDE 29

INTRODUCTION TO AWS BOTO IN PYTHON

Load multiple les into one DataFrame

# Concatenate all the DataFrames in the list df = pd.concat(df_list) # Preview the DataFrame df.head()

slide-30
SLIDE 30

INTRODUCTION TO AWS BOTO IN PYTHON

Review Accessing private objects in S3

Download then open

s3.download_file()

Open directly

s3.get_object()

Generate presigned URL

s3.generate_presigned_url()

slide-31
SLIDE 31

INTRODUCTION TO AWS BOTO IN PYTHON

Review - Sharing URLs

PUBLIC FILES: PUBLIC OBJECT URL Generate using .format()

'https://{bucket}.s3.amazonaws.com/{key}'

PRIVATE FILES: PRESIGNED URL Generate using .get_presigned_url()

'https://s3.amazonaws.com/?AWSAccessKeyId=12345&Signature=rBmnrwutb6VkJ9hE8Uub%2BBYA9mY%

slide-32
SLIDE 32

Let's practice!

IN TRODUCTION TO AW S BOTO IN P YTH ON

slide-33
SLIDE 33

Sharing les through a website

IN TRODUCTION TO AW S BOTO IN P YTH ON

Maksim Pecherskiy

Data Engineer

slide-34
SLIDE 34

INTRODUCTION TO AWS BOTO IN PYTHON

Serving HTML Pages

slide-35
SLIDE 35

INTRODUCTION TO AWS BOTO IN PYTHON

HTML table in Pandas

Convert DataFrame to html

df.to_html('table_agg.html')

slide-36
SLIDE 36

INTRODUCTION TO AWS BOTO IN PYTHON

HTML Table in Pandas with links

Convert DataFrame to html

df.to_html('table_agg.html', render_links=True)

slide-37
SLIDE 37

INTRODUCTION TO AWS BOTO IN PYTHON

Certain columns to HTML

Convert DataFrame to html

df.to_html('table_agg.html', render_links=True, columns['service_name', 'request_count', 'info_link'])

slide-38
SLIDE 38

INTRODUCTION TO AWS BOTO IN PYTHON

Borders

Convert DataFrame to html

df.to_html('table_agg.html', render_links=True, columns['service_name', 'request_count', 'info_link'], border=0)

slide-39
SLIDE 39

INTRODUCTION TO AWS BOTO IN PYTHON

Uploading an HTML le to S3

Upload an HTML le to S3

s3.upload_file( Filename='./table_agg.html', Bucket='datacamp-website', Key='table.html', ExtraArgs = { 'ContentType': 'text/html', 'ACL': 'public-read'} )

slide-40
SLIDE 40

INTRODUCTION TO AWS BOTO IN PYTHON

Accessing HTML le

S3 Object URL Template

https://{bucket}.s3.amazonaws.com/{key} https://datacamp-website.s3.amazonaws.com/table.html

slide-41
SLIDE 41

INTRODUCTION TO AWS BOTO IN PYTHON

HTML Page

slide-42
SLIDE 42

INTRODUCTION TO AWS BOTO IN PYTHON

Uploading other types of content

Upload an image le to S3

s3.upload_file( Filename='./plot_image.png', Bucket='datacamp-website', Key='plot_image.png', ExtraArgs = { 'ContentType': 'image/png', 'ACL': 'public-read'} )

slide-43
SLIDE 43

INTRODUCTION TO AWS BOTO IN PYTHON

IANA Media Types

JSON : application/json PNG : image/png PDF : application/pdf CSV : text/csv

http://www.iana.org/assignments/media types/media types.xhtml

1 2 3

slide-44
SLIDE 44

INTRODUCTION TO AWS BOTO IN PYTHON

Generating an index page

# List the gid-reports bucket objects starting with 2019/ r = s3.list_objects(Bucket='gid-reports', Prefix='2019/') # Convert the response contents to DataFrame

  • bjects_df = pd.DataFrame(r['Contents'])
slide-45
SLIDE 45

INTRODUCTION TO AWS BOTO IN PYTHON

Generating an index page

# Create a column "Link" that contains website url + key base_url = "http://datacamp-website.s3.amazonaws.com/"

  • bjects_df['Link'] = base_url + objects_df['Key']

# Write DataFrame to html

  • bjects_df.to_html('report_listing.html',

columns=['Link', 'LastModified', 'Size'], render_links=True)

slide-46
SLIDE 46

INTRODUCTION TO AWS BOTO IN PYTHON

Uploading index page

Upload an HTML le to S3

s3.upload_file( Filename='./report_listing.html', Bucket='datacamp-website', Key='index.html', ExtraArgs = { 'ContentType': 'text/html', 'ACL': 'public-read'} )

https://datacamp-website.s3.amazonaws.com/index.html

slide-47
SLIDE 47

INTRODUCTION TO AWS BOTO IN PYTHON

Review

HTML T able in Pandas ( df.to_html('table.html') ) Upload HTML le ( ContentType: text/html ) Upload Image le ( ContentType: image/png ) Share the URL for our html page!

slide-48
SLIDE 48

Let's practice!

IN TRODUCTION TO AW S BOTO IN P YTH ON

slide-49
SLIDE 49

Case Study: Generating a Report Repository

IN TRODUCTION TO AW S BOTO IN P YTH ON

Maksim Pecherskiy

Data Engineer

slide-50
SLIDE 50

INTRODUCTION TO AWS BOTO IN PYTHON

Final product

slide-51
SLIDE 51

INTRODUCTION TO AWS BOTO IN PYTHON

The steps

Prepare the data

Download les for the month from the raw data bucket Concatenate them into one csv Create an aggregated DataFrame

slide-52
SLIDE 52

INTRODUCTION TO AWS BOTO IN PYTHON

The steps

Create the report

Write the DataFrame to CSV and HTML Generate a Bokeh plot, save as HTML

slide-53
SLIDE 53

INTRODUCTION TO AWS BOTO IN PYTHON

The steps

Upload report to shareable website

Create gid-reports bucket Upload all the three les for the month to S3 Generate an index.html le that lists all the les Get the website URL!

slide-54
SLIDE 54

INTRODUCTION TO AWS BOTO IN PYTHON

Raw data bucket

Private les Daily CSVs of requests from the App Raw data

slide-55
SLIDE 55

INTRODUCTION TO AWS BOTO IN PYTHON

Read raw data les

# Create list to hold our DataFrames df_list = [] # Request the list of csv's from S3 with prefix; Get contents response = s3.list_objects( Bucket='gid-requests', Prefix='2019_jan') # Get response contents request_files = response['Contents']

slide-56
SLIDE 56

INTRODUCTION TO AWS BOTO IN PYTHON

Read raw data les

# Iterate over each object for file in request_files:

  • bj = s3.get_object(Bucket='gid-requests', Key=file['Key'])

# Read it as DataFrame

  • bj_df = pd.read_csv(obj['Body'])

# Append DataFrame to list df_list.append(obj_df)

slide-57
SLIDE 57

INTRODUCTION TO AWS BOTO IN PYTHON

Read raw data les

# Concatenate all the DataFrames in the list df = pd.concat(df_list) # Preview the DataFrame df.head()

slide-58
SLIDE 58

INTRODUCTION TO AWS BOTO IN PYTHON

Create aggregated reports

Perform some aggregation

df.to_csv('jan_final_report.csv') df.to_html('jan_final_report.html') jan_final_chart.html

slide-59
SLIDE 59

INTRODUCTION TO AWS BOTO IN PYTHON

Report bucket

Bucket website Publicly Accessible Aggregated data and HTML reports

slide-60
SLIDE 60

INTRODUCTION TO AWS BOTO IN PYTHON

Upload Aggregated CSV

# Upload Aggregated CSV to S3 s3.upload_file(Filename='./jan_final_report.csv', Key='2019/jan/final_report.csv', Bucket='gid-reports', ExtraArgs = {'ACL': 'public-read'})

slide-61
SLIDE 61

INTRODUCTION TO AWS BOTO IN PYTHON

Upload HTML Table

# Upload HTML table to S3 s3.upload_file(Filename='./jan_final_report.html', Key='2019/jan/final_report.html', Bucket='gid-reports', ExtraArgs = { 'ContentType': 'text/html', 'ACL': 'public-read'})

slide-62
SLIDE 62

INTRODUCTION TO AWS BOTO IN PYTHON

Upload HTML Chart

# Upload Aggregated Chart to S3 s3.upload_file(Filename='./jan_final_chart.html', Key='2019/jan/final_chart.html', Bucket='gid-reports', ExtraArgs = { 'ContentType': 'text/html', 'ACL': 'public-read'})

slide-63
SLIDE 63

INTRODUCTION TO AWS BOTO IN PYTHON

Uploaded reports

slide-64
SLIDE 64

INTRODUCTION TO AWS BOTO IN PYTHON

Create index.html

# List the gid-reports bucket objects starting with 2019/ r = s3.list_objects(Bucket='gid-reports', Prefix='2019/') # Convert the response contents to DataFrame

  • bjects_df = pd.DataFrame(r['Contents'])

# Create a column "Link" that contains website url + key base_url = "https://gid-reports.s3.amazonaws.com/"

  • bjects_df['Link'] = base_url + objects_df['Key']
slide-65
SLIDE 65

INTRODUCTION TO AWS BOTO IN PYTHON

Create index.html

# Write DataFrame to html

  • bjects_df.to_html('report_listing.html',

columns=['Link', 'LastModified', 'Size'], render_links=True)

slide-66
SLIDE 66

INTRODUCTION TO AWS BOTO IN PYTHON

Upload index.html

# Upload the file to gid-reports bucket root. s3.upload_file( Filename='./report_listing.html', Key='index.html', Bucket='gid-reports', ExtraArgs = { 'ContentType': 'text/html', 'ACL': 'public-read' })

slide-67
SLIDE 67

INTRODUCTION TO AWS BOTO IN PYTHON

Get the URL of the index!

Bucket website URL *

"http://gid-reports.s3.amazonaws.com/index.html"

slide-68
SLIDE 68

Let's tweak!

IN TRODUCTION TO AW S BOTO IN P YTH ON