Labs #4 APIs API I Lab #1 Previously, rendering of Guestbook done - - PowerPoint PPT Presentation

labs 4 apis api i lab 1
SMART_READER_LITE
LIVE PREVIEW

Labs #4 APIs API I Lab #1 Previously, rendering of Guestbook done - - PowerPoint PPT Presentation

Labs #4 APIs API I Lab #1 Previously, rendering of Guestbook done in Flask with Jinja templates returning HTML Recall, client-side rendering approaches Front-end UI code completely on browser (e.g. client-side rendering) Back-end


slide-1
SLIDE 1

Labs #4

slide-2
SLIDE 2

APIs

slide-3
SLIDE 3

API I Lab #1

 Previously, rendering of Guestbook done in Flask with Jinja

templates returning HTML

 Recall, client-side rendering approaches

 Front-end UI code completely on browser (e.g. client-side rendering)  Back-end model exposed via a REST API

 Create a REST API for implementing direct access to Guestbook

backend

 In Cloud Shell, clone the course repository and view the source code

git clone https://bitbucket.org/wuchangfeng/cs430-src cd cs430-src/API_Functions_Guestbook

 Specify packages needed for Python-based Cloud Function in requirements.txt

 flask  google-cloud-datastore

 Modify gbmodel/model_datastore.py

 Update YOUR_PROJECT_ID to point to your Cloud Datastore

backend

slide-4
SLIDE 4

 main.py

 Single function supporting two methods to handle API endpoint  GET method

 Pulls all entries from model and creates a dictionary of enumerated entries  Formats and returns a JSON response with an HTTP status code of 200 (OK)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

from flask import make_response, abort import gbmodel, json def gb(request): """ Guestbook API endpoint :param request: flask.Request object :return: flack.Response object (in JSON), HTTP status code """ model = gbmodel.get_model() if request.method == 'GET': entries = [dict(name=row[0], email=row[1], signed_on=str(row[2]), message=row[3] ) for row in model.select()] entries_dict = { i:x for i,x in enumerate(entries,1)} response = make_response(json.dumps(entries_dict)) response.headers['Content-Type'] = 'application/json' return response, 200

slide-5
SLIDE 5

 POST method to handle submissions to Guestbook formatted in JSON

 Retrieve POST data sent as a JSON object  Validate required keys have been included (e.g. 'name', 'email',

'message') then insert into model

 Return the added entry as a response using HTTP status code 201 (Added)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

if request.method == 'POST' and \ request.headers['content-type'] == 'application/json': request_json = request.get_json(silent=True) if all(key in request_json for key in ('name', 'email', 'message')): model.insert(request_json['name'], request_json['email'], request_json['message']) else: raise ValueError("JSON missing name, email, or message property") response = make_response(request_json) response.headers['Content-Type'] = 'application/json' return request_json, 201

slide-6
SLIDE 6

 Deploy as a Cloud Function in Cloud Shell

 Specifies gb function for responding to requests from the endpoint  Specifies that function will be triggered via a web request (http)

gcloud functions deploy gb --runtime python37 --trigger-http

 Allow unauthenticated invocations and then wait  Show the HTTPS trigger URL (e.g. your API_EndPoint) and the

function in the console

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-7
SLIDE 7

 Click on the function and show the

amount of RAM allocated to it

 Bring up the "Trigger" tab

 Click on it and show the results

returning all entries as a JSON object

 Bring up the "Testing" tab

 Specify a JSON payload with the

appropriate fields and a message "API Lab #1"

 Click on "Test the Function", then

show the results returned in the "Output" field.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-8
SLIDE 8

 Use the API from a Python interpreter to dump the entries in the

Guestbook

 In Cloud Shell, bring up the interactive interpreter for Python3  Import requests and hit the API with a GET request import requests resp = requests.get('<API_EndPoint>')  Show the response status (resp.status_code)  Show the response headers (resp.headers) and show the data

type of its response headers (via type())?

 Show the response text (resp.text) and show the data type of it  Show the response parsed as json (resp.json()) and its

type()

 Assign resp.json() to a variable  Use the interpreter to individually print the name, email, signed_on,

and message of the first Guestbook entry returned

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-9
SLIDE 9

 Within the same Python interpreter session use the API to insert a

Guestbook entry

 Import JSON package import json  Create a dictionary with keys 'name', 'email', and 'message' with the

name/email set to your own and message's value set to 'API Lab #2'

 e.g. my_dict = {'foo':'bar'} creates a dictionary with a single entry

with key 'foo' and value 'bar'  The POST method in requests has a keyword argument json that

allows one to specify a dictionary that is converted into a JSON object as the payload.

 Submit a POST request onto the endpoint with the dictionary resp = requests.post('<API_EndPoint>',json=my_dict)  Show the response status, the response headers, and the response text

that indicate a successful insertion

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-10
SLIDE 10

API I Lab #1

 Revisit the Guestbook API to show the new entries  Use any of the methods from prior labs to show the new entries

 Compute Engine, AppEngine, Cloud Run, Kubernetes, Cloud Shell Dev

Server

 Cleanup

 Delete function via the UI or gcloud functions delete gb

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-11
SLIDE 11

ML APIs Labs

slide-12
SLIDE 12

ML APIs Is Lab #1

 Integrating Machine Learning APIs (25 min)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-13
SLIDE 13

En Enab able le APIs Is

 Skip project creation step (use your course project)  Ensure Cloud Speech, Cloud Translation, and Cloud Natural

Language Processing APIs are enabled

 As done with Cloud Vision previously (via web console)  Or via individual gcloud commands within Cloud Shell

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud services enable speech.googleapis.com gcloud services enable translate.googleapis.com gcloud services enable language.googleapis.com gcloud services enable vision.googleapis.com gcloud services enable videointelligence.googleapis.com

slide-14
SLIDE 14

Setup etup

 Create a service account for you to access ML APIs and generate a

service account key to authenticate with

 This can be done via the console as well and having the json file cut-

and-pasted into cloud shell

 Note I've chosen to name it cs430mlapis  Account bound to your PROJECT_ID and service account key

placed in a JSON file

Portland State University CS 430P/530 Internet, Web & Cloud Systems

cd $HOME gcloud iam service-accounts create cs430mlapis gcloud iam service-accounts keys create cs430mlapis.json --iam- account cs430mlapis@$DEVSHELL_PROJECT_ID.iam.gserviceaccount.com gcloud projects add-iam-policy-binding ${DEVSHELL_PROJECT_ID} -- member serviceAccount:cs430mlapis@${DEVSHELL_PROJECT_ID}.iam.gserviceacc

  • unt.com --role roles/editor
slide-15
SLIDE 15

Setup etup

 Should get message that cs430mlapis@cs430-odinid.iam.gserviceaccount.com created  Set environment variable to point to your created credential file

 Note that this will only set the variable for your current session.

 Update ~/.bashrc to set credentials for each Cloud Shell session

 Important: if you get a 403 error upon running your applications, it is likely because this

environment variable is either not set or set improperly

Portland State University CS 430P/530 Internet, Web & Cloud Systems

export GOOGLE_APPLICATION_CREDENTIALS=$HOME/cs430mlapis.json export GOOGLE_APPLICATION_CREDENTIALS=$HOME/cs430mlapis.json source /google/devshell/bashrc.google

google.api_core.exceptions.PermissionDenied: 403 Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the texttospeech.googleapis.com. We recommend that most server applications use service accounts instead. For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/.

slide-16
SLIDE 16

Cloud ud Visi sion

  • n via Pyth

thon

  • n

 If you haven't downloaded the Python samples, do so  Install Cloud Vision package  Go to Vision cloud-client code  Run a detection that returns the labels generated with an image of a bird

given its URI, show the output

Portland State University CS 430P/530 Internet, Web & Cloud Systems

pip3 install --upgrade google-cloud-vision --user git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git cd ~/python-docs-samples/vision/cloud-client/detect python3 detect.py labels-uri gs://ml-api-codelab/birds.jpg

slide-17
SLIDE 17

 Examine detect.py  Look for code that allows you to run the script to detect logos in images

given a URI

 Then, use the script to run a detection on a logo you find on the Internet

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-18
SLIDE 18

Cloud ud Spe peech ech via a Pyth thon

  • n

 Install Cloud Speech package  Go to Speech cloud-client code  Fix line 73 of transcribe.py (delete encoding and

sample_hertz_rate, set language to tr-TR)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

pip3 install --upgrade google-cloud-speech --user cd ~/python-docs-samples/speech/cloud-client

slide-19
SLIDE 19

 Run transcribe.py on the given URI and show output

Portland State University CS 430P/530 Internet, Web & Cloud Systems

python3 transcribe.py gs://ml-api-codelab/tr-ostrich.wav

slide-20
SLIDE 20

Cloud ud Translat anslate e via a Pyth thon

  • n

 Install Cloud Translate package  Go to Translate cloud-client code  Examine code

Portland State University CS 430P/530 Internet, Web & Cloud Systems

pip3 install --upgrade google-cloud-translate --user cd ~/python-docs-samples/translate/cloud-client

slide-21
SLIDE 21

 Run snippets.py on the text string and show output

Portland State University CS 430P/530 Internet, Web & Cloud Systems

python3 snippets.py translate-text en '你有沒有帶外套'

slide-22
SLIDE 22

Cloud ud Natural tural Lang nguage uage via Python thon

 Install Cloud Natural Language package  Go to Natural Language cloud-client code

Portland State University CS 430P/530 Internet, Web & Cloud Systems

pip3 install --upgrade google-cloud-language --user cd ~/python-docs-samples/language/cloud-client/v1

slide-23
SLIDE 23

 Examine code for entity analysis

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-24
SLIDE 24

 Examine code for sentiment analysis

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-25
SLIDE 25

 Run the entities-text function in snippets.py and show output  Edit the string used in sentiment_text() and run the script using

the following strings and show how the sentiment score varies each time

Portland State University CS 430P/530 Internet, Web & Cloud Systems

python snippets.py entities-text text = 'homework is awful!' text = 'homework is awesome?' text = 'homework is awesome.' text = 'homework is awesome!' python snippets.py sentiment-text

slide-26
SLIDE 26

Integration egration

 See if words in a recording describe an object in an image  Previous calls modified to return results as text (vs. print)

 Audio transcription to translation to NLP to obtain entities  Image analysis to obtain labels

 Comparison to determine match

Portland State University CS 430P/530 Internet, Web & Cloud Systems

In foreign language

slide-27
SLIDE 27

Setup etup

 Clone the repository

git clone https://github.com/googlecodelabs/integrating-ml-apis

 In the repository, edit solution.py to use older translate version

 Replace from google.cloud import translate  With from google.cloud import translate_v2 as translate

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-28
SLIDE 28

 tr-TR speech samples:

 gs://ml-api-codelab/tr-ball.wav  gs://ml-api-codelab/tr-bike.wav  gs://ml-api-codelab/tr-jacket.wav  gs://ml-api-codelab/tr-ostrich.wav

Portland State University CS 430P/530 Internet, Web & Cloud Systems

 de-DE speech samples:  gs://ml-api-codelab/de-ball.wav  gs://ml-api-codelab/de-bike.wav  gs://ml-api-codelab/de-jacket.wav  gs://ml-api-codelab/de-ostrich.wav

slide-29
SLIDE 29

Integration egration

 See code for mods to transcribe_gcs() (Speech), translate_text() (Translate), entities_text() (Natural

Language) and detect_labels_uri() (Vision)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-30
SLIDE 30

ML APIs Is Lab #1

 Run at least 3 pairs other than the one given in the walk-through

 Integrating Machine Learning APIs (25 min)

 https://codelabs.developers.google.com/codelabs/cloud-ml-apis

Portland State University CS 430P/530 Internet, Web & Cloud Systems

python3 solution.py tr-TR gs://ml-api-codelab/tr-ball.wav gs://ml- api-codelab/football.jpg

slide-31
SLIDE 31

ML APIs Is Lab #2

 Using the (rest of the) Vision API with Python (8 min)

 Optical Character Recognition (OCR) (text detection)  Landmark detection  Sentiment analysis (face detection)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-32
SLIDE 32

Setup etup

 Skip Steps 2, 3, and 4. (Re-use setup from ML API Lab #1)  Copy image files into your own bucket

 For the rest of the examples, my project ID is used in the gs://

URIs, use yours instead

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gsutil cp gs://cloud-vision-codelab/otter_crossing.jpg gs://$DEVSHELL_PROJECT_ID gsutil cp gs://cloud-vision-codelab/eiffel_tower.jpg gs://$DEVSHELL_PROJECT_ID gsutil cp gs://cloud-vision-codelab/face_surprise.jpg gs://$DEVSHELL_PROJECT_ID gsutil cp gs://cloud-vision-codelab/face_no_surprise.png gs://$DEVSHELL_PROJECT_ID

$ echo $DEVSHELL_PROJECT_ID cs410c-wuchang-201515

slide-33
SLIDE 33

 Make bucket publicly readable via console UI by giving

allUsers the Storage Object Viewer role

 https://cloud.google.com/storage/docs/access-control/making-data-public

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-34
SLIDE 34

Launch unch interactiv eractive e ipython for lab

 Show full image of the Otter Crossing sign via your bucket  Then use Vision's text_detection() to perform an OCR

  • peration on a picture of the sign (substitute your bucket name in the

gs:// URL)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

wuchang@cloudshell:~ (cs430-wuchang-201515)$ ipython … … … In [1]: from google.cloud import vision from google.cloud.vision import types client = vision.ImageAnnotatorClient() image = vision.types.Image() image.source.image_uri = 'gs://cs430-wuchang-201515/otter_crossing.jpg' resp = client.text_detection(image=image) print('\n'.join([d.description for d in resp.text_annotations]))

slide-35
SLIDE 35

 Show full Eiffel Tower image in your bucket  Then use Vision's landmark_detection() to identify it of

famous places (substitute your bucket name in the gs:// URL)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

from google.cloud import vision from google.cloud.vision import types client = vision.ImageAnnotatorClient() image = vision.types.Image() image.source.image_uri = 'gs://cs430-wuchang-201515/eiffel_tower.jpg' resp = client.landmark_detection(image=image) print(resp.landmark_annotations)

slide-36
SLIDE 36

 Show the two face images in your bucket  Then use Vision's face_detection() to annotate images

(substitute your bucket name in the gs:// URL)

 See the likelihood of the faces showing surprise

Portland State University CS 430P/530 Internet, Web & Cloud Systems

from google.cloud import vision from google.cloud.vision import types client = vision.ImageAnnotatorClient() image = vision.types.Image() likelihood_name = ('UNKNOWN', 'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY') for pic in ('face_surprise.jpg', 'face_no_surprise.png'): image.source.image_uri = 'gs://cs430-wuchang-201515/'+pic resp = client.face_detection(image=image) faces = resp.face_annotations for face in faces: print(pic + ': surprise: {}'.format(likelihood_name[face.surprise_likelihood]))

slide-37
SLIDE 37

ML APIs Is Lab #2

 Using the Vision API with Python (8 min)

 https://codelabs.developers.google.com/codelabs/cloud-vision-api-

python

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-38
SLIDE 38

ML APIs Is Lab #3

 Video Intelligence API (20 min)  Ensure API is enabled in the API Library

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-39
SLIDE 39

Setup etup cred eden entials tials in Cloud ud Shell ell

 In Cloud Shell, create a service account named videolab  Create a policy that specifies a role of project viewer and attach it to

the service account created in the previous step

 Create and download a service account key in JSON for applications

to use in order to take on roles associated with service accounts

 Set local environment variable that points to file in previous step

 Python script will access credentials via this environment variable and

file

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud iam service-accounts create videolab --display-name "Video Lab" gcloud projects add-iam-policy-binding ${DEVSHELL_PROJECT_ID} -- member=serviceAccount:videolab@${DEVSHELL_PROJECT_ID}.iam.gserviceaccou nt.com --role roles/viewer export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/videolab.json" gcloud iam service-accounts keys create /home/${USER}/videolab.json -- iam-account videolab@${DEVSHELL_PROJECT_ID}.iam.gserviceaccount.com

slide-40
SLIDE 40

Cloud ud Video eo Intelligence elligence

 Video labeling code labels.py  Labeling function analyze_labels()

 Create client and set features to extract  Call annotate_video with storage location of video  Get result of annotation (allow 90 seconds)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

cd ~/python-docs-samples/video/cloud-client/labels # python labels.py gs://cloud-ml-sandbox/video/chicago.mp4 import argparse from google.cloud import videointelligence def analyze_labels(path): video_client = videointelligence.VideoIntelligenceServiceClient() features = [videointelligence.enums.Feature.LABEL_DETECTION]

  • peration = video_client.annotate_video(path, features=features)

print('\nProcessing video for label annotations:') result = operation.result(timeout=90) print('\nFinished processing.')

slide-41
SLIDE 41

 analyze_labels()

 Go through labels returned in JSON  Cycle through labels and print each entity in video and each entity's category  For each entity, output times in video and confidence in detection

Portland State University CS 430P/530 Internet, Web & Cloud Systems

# first result is retrieved because a single video was processed segment_labels = result.annotation_results[0].segment_label_annotations for i, segment_label in enumerate(segment_labels): print('Video label description: {}'.format(segment_label.entity.description)) for category_entity in segment_label.category_entities: print('\tLabel category description: {}'.format( category_entity.description)) for i, segment in enumerate(segment_label.segments): start_time = (segment.segment.start_time_offset.seconds + segment.segment.start_time_offset.nanos / 1e9) end_time = (segment.segment.end_time_offset.seconds + segment.segment.end_time_offset.nanos / 1e9) positions = '{}s to {}s'.format(start_time, end_time) confidence = segment.confidence print('\tSegment {}: {}'.format(i, positions)) print('\tConfidence: {}'.format(confidence)) print('\n')

slide-42
SLIDE 42

 Setup environment and install packages to run code  Copy the video to a storage bucket

 Video from https://youtu.be/k2pBvCtwli8  Also at https://thefengs.com/wuchang/courses/cs430/SportsBloopers2016.mp4

Portland State University CS 430P/530 Internet, Web & Cloud Systems

virtualenv -p python3 env source env/bin/activate pip install -r requirements.txt curl https://thefengs.com/wuchang/courses/cs430/SportsBloopers2016.mp4 | gsutil -h "Content-Type:video/mp4" cp - gs://<BUCKET_NAME>/SportsBloopers2016.mp4

slide-43
SLIDE 43

 Run the code to perform the analysis

Portland State University CS 430P/530 Internet, Web & Cloud Systems

$ python labels.py gs://cs410c-wuchang-201515/SportsBloopers2016.mp4 Processing video for label annotations: Finished processing. Video label description: hockey Label category description: sports Segment 0: 0.0s to 178.9788s Confidence: 0.837484955788 Video label description: sports Segment 0: 0.0s to 178.9788s Confidence: 0.927089214325

slide-44
SLIDE 44

 Watch video and answer the following questions

 Which sports did the API properly identify?  Which sports did the API fail to identify?

 Upload a short (< 2 min) video of your own to a Cloud Storage

bucket and run the label script on it

 You can find one on

YouTube then use this site to pull it out as an mp4 that can be uploaded to your bucket

 https://youtubemp4.to/

 Ensure the file in the bucket is publicly readable as before via

command-line or web UI

 See ML APIs Lab #2  https://cloud.google.com/storage/docs/access-control/making-data-public

 Note: if you get a permissions error, you may need to restart Cloud

Shell

 Answer the following questions

 Show an example of a properly identified entity in your video  Show an example of a missed or misclassified entity in your video

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-45
SLIDE 45

Op Opti tional

  • nal (FY

FYI)

 If you wish to explore more (for example, as a final project), see

analyze.py for examples for detecting explicit content in video and labeling shots within a video

 shots breaks clips into parts based on camera shots  explicit_content detects adult material

Portland State University CS 430P/530 Internet, Web & Cloud Systems

$ cd ~/python-docs-samples/video/cloud-client/analyze # python analyze.py labels gs://cloud-ml-sandbox/video/chicago.mp4 # python analyze.py labels_file resources/cat.mp4 # python analyze.py shots gs://demomaker/gbikes_dinosaur.mp4 # python analyze.py explicit_content gs://demomaker/gbikes_dinosaur.mp4

slide-46
SLIDE 46

ML APIs Is Lab #3

 Clean-up  Video Intelligence API (20 min)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

rm /home/${USER}/videolab.json gcloud iam service-accounts delete videolab@${DEVSHELL_PROJECT_ID}.iam.gserviceaccount.com gcloud projects remove-iam-policy-binding ${DEVSHELL_PROJECT_ID} -- member=serviceAccount:videolab@${DEVSHELL_PROJECT_ID}.iam.gserv iceaccount.com --role=roles/viewer

slide-47
SLIDE 47

ML APIs Is Lab #4

 Deploying a Python Flask Web Application to App Engine (24 min)

 Note that the codelab link has you deploy using the flexible environment

which is not needed, and is more expensive

 We will modify app.yaml to run on standard

 Skip steps 2, 3, and 5  Do Step 4 In Cloud Shell

Portland State University CS 430P/530 Internet, Web & Cloud Systems

cd python-docs-samples/codelabs/flex_and_vision

slide-48
SLIDE 48

 Enable APIs (already done most likely)  Create service account for the lab  Create a policy and attach it to service account to allow access to

view and store objects in buckets from application

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud services enable vision.googleapis.com gcloud services enable storage-component.googleapis.com gcloud services enable datastore.googleapis.com gcloud iam service-accounts create flexvisionlab --display- name "Flex Vision Lab" gcloud projects add-iam-policy-binding ${DEVSHELL_PROJECT_ID} -- member serviceAccount:flexvisionlab@${DEVSHELL_PROJECT_ID}.iam.gserviceacco unt.com --role roles/storage.admin

slide-49
SLIDE 49

 Create a policy and attach it to service account to allow access to

Cloud Datastore from application

 In IAM, view the roles that have been attached to this service

account in the web UI to ensure the roles have been enabled before issuing a key

 Create a JSON key file used by application to authenticate itself as

the service account

 Set your GOOGLE_APPLICATION_CREDENTIALS

environment variable to point application to the key

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud iam service-accounts keys create /home/${USER}/flexvisionlab.json --iam-account flexvisionlab@${DEVSHELL_PROJECT_ID}.iam.gserviceaccount.com export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/flexvisionlab.json" gcloud projects add-iam-policy-binding ${DEVSHELL_PROJECT_ID} -- member serviceAccount:flexvisionlab@${DEVSHELL_PROJECT_ID}.iam.gserviceacco unt.com --role roles/datastore.user

slide-50
SLIDE 50

 Set the location of the Cloud Storage bucket for the app's images via

an environment variable

 If you have deleted your gs://${DEVSHELL_PROJECT_ID} bucket,

create it again

 Then set the environment variable to point to it

Portland State University CS 430P/530 Internet, Web & Cloud Systems

export CLOUD_STORAGE_BUCKET=${DEVSHELL_PROJECT_ID} gsutil mb gs://${DEVSHELL_PROJECT_ID}

slide-51
SLIDE 51

 Create python3 environment to test locally  Run the app on the dev server

 Note: If you get an error, exit the application and wait for IAM

credentials to fully propagate

Portland State University CS 430P/530 Internet, Web & Cloud Systems

virtualenv -p python3 env source env/bin/activate pip install -r requirements.txt python main.py

slide-52
SLIDE 52

 Test the app via the web preview or clicking on the link returned by

python (http://127.0.0.1:8080)

 Upload a photo to detect joy in faces

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-53
SLIDE 53

Portland State University CS 430P/530 Internet, Web & Cloud Systems

@app.route('/') def homepage(): # Create a Cloud Datastore client. datastore_client = datastore.Client() # Use the Cloud Datastore client to fetch # information from Datastore about each photo query = datastore_client.query(kind='Faces') image_entities = list(query.fetch()) # Pass image_entities to Jinja2 template to render return render_template('homepage.html', image_entities=image_entities)

Code e for def efault ault rout ute

slide-54
SLIDE 54

upload_photo()

 Code for uploading new images

Portland State University CS 430P/530 Internet, Web & Cloud Systems

from google.cloud import datastore from google.cloud import storage from google.cloud import vision CLOUD_STORAGE_BUCKET = os.environ.get('CLOUD_STORAGE_BUCKET') @app.route('/upload_photo', methods=['GET', 'POST']) def upload_photo(): photo = request.files['file'] # File from form submission storage_client = storage.Client() # Create storage client. # Get bucket bucket = storage_client.get_bucket(CLOUD_STORAGE_BUCKET) # Create blob to store uploaded content then upload content to it blob = bucket.blob(photo.filename) blob.upload_from_string( photo.read(), content_type=photo.content_type) # Make blob publicly available blob.make_public()

slide-55
SLIDE 55

 Code for getting face annotations from Vision API

Portland State University CS 430P/530 Internet, Web & Cloud Systems

# Create a Cloud Vision client. vision_client = vision.ImageAnnotatorClient() # Use the Cloud Vision client to detect a face for our image. source_uri = 'gs://{}/{}'.format(CLOUD_STORAGE_BUCKET, blob.name) image = vision.types.Image( source=vision.types.ImageSource(gcs_image_uri=source_uri)) faces = vision_client.face_detection(image).face_annotations # If face detected, store likelihood that the face # displays 'joy' based on Vision's annotations if len(faces) > 0: face = faces[0] # Convert the likelihood string. likelihoods = [ 'Unknown', 'Very Unlikely', 'Unlikely', 'Possible', 'Likely', 'Very Likely'] face_joy = likelihoods[face.joy_likelihood] else: face_joy = 'Unknown'

slide-56
SLIDE 56

 Code to insert entry into Datastore (including a link to the file in

cloud storage)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

datastore_client = datastore.Client() # Create datastore client. current_datetime = datetime.now() # Fetch current date / time. kind = 'Faces' # Set kind for new entity. name = blob.name # Set name/ID for new entity. # Create the Cloud Datastore key for the new entity. key = datastore_client.key(kind, name) # Construct the new entity using the key as a dictionary # including "face_joy" label from Cloud Vision face detection entity = datastore.Entity(key) entity['blob_name'] = blob.name entity['image_public_url'] = blob.public_url entity['timestamp'] = current_datetime entity['joy'] = face_joy # Save the new entity to Datastore. datastore_client.put(entity) return redirect('/')

slide-57
SLIDE 57

App pp En Engi gine ne conf nfig iguration uration

 Modify app.yaml (use standard environment with 1 f1-micro,

configure storage bucket)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

#runtime: python #env: flex runtime: python37 env: standard entrypoint: gunicorn -b :$PORT main:app runtime_config: python_version: 3 env_variables: CLOUD_STORAGE_BUCKET: <YOUR_STORAGE_BUCKET> # CLOUD_STORAGE_BUCKET: cs410c-wuchang-201515 manual_scaling: instances: 1 resources: cpu: 1 memory_gb: 0.5 disk_size_gb: 10

slide-58
SLIDE 58

De Deplo ploy y app pp

 Deactivate development environment  Deploy

 Note the custom container built and pushed into gcr.io to support the

flexible environment's deployment onto App Engine

 Show application running at

 https://<PROJECT_ID>.appspot.com

Portland State University CS 430P/530 Internet, Web & Cloud Systems

deactivate gcloud app deploy

slide-59
SLIDE 59

ML APIs Is Lab #4

 Cleanup  Deploying a Python Flask Web Application to App Engine (24 min)

 https://codelabs.developers.google.com/codelabs/cloud-vision-app-

engine

Portland State University CS 430P/530 Internet, Web & Cloud Systems

rm /home/${USER}/flexvisionlab.json gcloud projects remove-iam-policy-binding ${DEVSHELL_PROJECT_ID} -- member=serviceAccount:flexvisionlab@${DEVSHELL_PROJECT_ID}.iam. gserviceaccount.com --role=roles/storage.admin gcloud projects remove-iam-policy-binding ${DEVSHELL_PROJECT_ID} -- member=serviceAccount:flexvisionlab@${DEVSHELL_PROJECT_ID}.iam. gserviceaccount.com --role=roles/datastore.user gcloud iam service-accounts delete flexvisionlab@${DEVSHELL_PROJECT_ID}.iam.gserviceaccount.com

slide-60
SLIDE 60

Aut utoML ML Lab #1

 Upload a dataset of labeled cloud images to Cloud Storage and use

AutoML to create a custom model to recognize clouds

 Data encoded as a CSV file that contains labels and paths to individual

files in the bucket

 Goto APIs & Services→Library, search for AutoML and enable the

API

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-61
SLIDE 61

 Visit the AutoMLVision console and allow access

https://cloud.google.com/automl/ui/vision

 Specify your Project ID (cs410c-wuchang-201515), then click "Set Up

Now"

 Go to Cloud Storage → Browser and ensure the bucket gs://${DEVSHELL_PROJECT_ID}-vcm has been created

 Launch Cloud Shell and copy the training set from Google's public

storage bucket into yours

Portland State University CS 430P/530 Internet, Web & Cloud Systems

export BUCKET=${DEVSHELL_PROJECT_ID}-vcm gsutil -m cp -r gs://automl-codelab-clouds/* gs://${BUCKET}

slide-62
SLIDE 62

 Refresh bucket to see 3 directories of cloud images of different

types

 Copy dataset CSV file

 Each row in CSV contains URL of image and associated label  Change bucket location to point to your bucket above before copying

it to your bucket

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gsutil cp gs://automl-codelab-metadata/data.csv . sed -i -e "s/placeholder/${BUCKET}/g" ./data.csv gsutil cp ./data.csv gs://${BUCKET}

slide-63
SLIDE 63

 Show CSV file in bucket, then open CSV file and show the format

  • f the dataset

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-64
SLIDE 64

 Go back to AutoMLVision console

https://cloud.google.com/automl/ui/vision

 Create new dataset and specify the CSV file, select Multi-Label

classification, then "Create Dataset"

 Wait for images to be imported

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-65
SLIDE 65

 Scroll through images to ensure they

imported properly

 Click on Train, then "Start Training"  AutoML will create a custom model

 Go get some coffee  Takes a while to complete

 Show the full evaluation of model

including the confusion matrix

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-66
SLIDE 66

 Visit the cloud image gallery at UCAR

 https://scied.ucar.edu/cloud-image-gallery

 Download one image for each type trained and one image that is

not any of the three

 Click on "Predict" and upload the 4 images  Show the results of prediction

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-67
SLIDE 67

Aut utoML ML Lab #1

 https://codelabs.developers.google.com/codelabs/cloud-automl-

vision-intro

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-68
SLIDE 68

Firebase Labs

slide-69
SLIDE 69

Fi Firebase ebase Lab b #1

 Firebase Web Codelab (39 min)  Create project in the Google Firebase Console (different from the

Google Cloud Console)

 Call it firebaselab-<OdinID>

Portland State University CS 430P/530 Internet, Web & Cloud Systems

https://console.firebase.google.com/

slide-70
SLIDE 70

Reg egist ster er app pp

 Click on </> to register a new web app

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-71
SLIDE 71

 Register app, but skip the next steps for including Firebase in your

app and continue to console

 We will do this in Cloud Shell

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-72
SLIDE 72

En Enab able le us use e of Go Google gle aut uthentication entication

 From console, Develop=>Authentication->Sign-In Method

 Enable Google account logins for your web app and call it FriendlyChat

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-73
SLIDE 73

En Enab able le rea eal-time time dat atabas abase

 From console,

Develop=>Database

 Scroll down to Cloud

Firestore database

 Then, Create database  Enable "Start in test

mode…"

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-74
SLIDE 74

En Enab able le us use e of Cloud ud Storage age

 Note: Bucket is initially wide-open

 Develop=>Storage=>Get Started=>Next  Set the storage region to the default

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-75
SLIDE 75

Setup etup code

 Goto console.cloud.google.com to find the project created

(firebaselab-<OdinID>)

 Visit Compute Engine and enable billing on project

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-76
SLIDE 76

Setup etup code

 Launch Cloud Shell

 Clone repository

 Use npm to install the Firebase CLI.

 In Cloud Shell

 To verify that the CLI has been installed correctly, run

Portland State University CS 430P/530 Internet, Web & Cloud Systems

git clone https://github.com/firebase/friendlychat-web

cd friendlychat-web/web-start/public npm -g install firebase-tools firebase --version

slide-77
SLIDE 77

Install stall th the e Fi Firebase ebase CLI

 Authorize the Firebase CLI to deploy app by running  Visit URL given and login to your pdx.edu account

 Note that you may need to cut-and-paste the entire URL given in the

console

 Allow access

Portland State University CS 430P/530 Internet, Web & Cloud Systems

firebase login --no-localhost

slide-78
SLIDE 78

 Get authorization code and paste it in to complete login

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-79
SLIDE 79

Setup etup Fi Firebase ebase for app pp

 Make sure you are in the web-start directory then set up

Firebase to use your project

 Use arrow keys to select your Project ID and follow the instructions

given.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

firebase use --add

slide-80
SLIDE 80

Ex Exami mine ne Fi Firebase ebase code de in web eb app pp

 Use Cloud Shell's code editor to view index.html

 Note that the developer has only added the Firebase components that are used for the

app to the page for efficiency

 Note the inclusion of init.js that is created via the firebase use

command and contains the project's Firebase credentials

Portland State University CS 430P/530 Internet, Web & Cloud Systems

edit index.html

slide-81
SLIDE 81

Run un th the e app pp from m Cloud ud Shell ell

 Use Firebase hosting emulator to deliver app locally

Portland State University CS 430P/530 Internet, Web & Cloud Systems

firebase serve --only hosting

slide-82
SLIDE 82

View w run unning ning tes est t app pplication lication

 Click on link or go to Web Preview, change port to 5000, and

preview

 App not fully functional yet

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-83
SLIDE 83

View w your ur credenti edentials als

 From web app

 View source then,  Click on init.js link  See project credentials

 Go back to Cloud Shell and "Control-C" to terminate the server

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-84
SLIDE 84

 In scripts/main.js, modify the signIn function to

configure authentication using Google as the identity (OAuth) provider

 Similarly, set the signOut function just below

Portland State University CS 430P/530 Internet, Web & Cloud Systems

// Signs-in Friendly Chat. function signIn() { // Sign in Firebase w/ popup auth and Google as the identity provider. var provider = new firebase.auth.GoogleAuthProvider(); firebase.auth().signInWithPopup(provider); } // Signs-out of Friendly Chat. function signOut() { // Sign out of Firebase. firebase.auth().signOut(); }

Par Part t 1: Add d Fi Fireba rebase se Aut uthenti enticat cation ion

slide-85
SLIDE 85

 Register a callback function (authStateObserver) in

initFirebaseAuth that updates the UI whenever the authentication state of a user changes

 Function will update the profile photo and name of the (now) authenticated

user using data from the OAuth provider (Google)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

// Initiate firebase auth. function initFirebaseAuth() { // Listen to auth state changes. firebase.auth().onAuthStateChanged(authStateObserver); }

slide-86
SLIDE 86

 Implement calls from authStateObserver for getting the profile

picture and name from OAuth provider

 Implement check for login

Portland State University CS 430P/530 Internet, Web & Cloud Systems

// Returns the signed-in user's profile Pic URL. function getProfilePicUrl() { return firebase.auth().currentUser.photoURL || '/images/profile_placeholder.png'; } // Returns the signed-in user's display name. function getUserName() { return firebase.auth().currentUser.displayName; } // Returns true if a user is signed-in. function isUserSignedIn() { return !!firebase.auth().currentUser; }

slide-87
SLIDE 87

 If you want to test with the development server (i.e. via

firebase serve), you will need to authorize its appspot domain it is served from

 Firebase=>Authentication=>Sign-in Method=>Authorized Domains  Note that the domain used on a firebase deploy is enabled by

default ($PROJECT_ID.firebaseapp.com)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

firebase serve domain firebase deploy domain

slide-88
SLIDE 88

 Ensure that third-party cookies are enabled on your browser  In Chrome=>Settings=>Advanced=>Privacy and Security=>Site

Settings=>Cookies and site data=>Block Third Party Cookies (disable setting)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-89
SLIDE 89

Tes est t Sign gning ing-In In to th the e App

 Update app  Click on link or go to Web Preview and change to 5000  Sign-In with Google  Show that the Google profile pic and name of the user is displayed

Portland State University CS 430P/530 Internet, Web & Cloud Systems

firebase serve --only hosting

slide-90
SLIDE 90

Par Part t 2: Impl plem emen ent t me mess ssage ge se sending ding

 Update saveMessage to use add() to store messages into

real-time database upon "Send" being clicked

Portland State University CS 430P/530 Internet, Web & Cloud Systems

// Saves a new message to your Cloud Firestore database. function saveMessage(messageText) { // Add a new message entry to the database. return firebase.firestore().collection('messages').add({ name: getUserName(), text: messageText, profilePicUrl: getProfilePicUrl(), timestamp: firebase.firestore.FieldValue.serverTimestamp() }).catch(function(error) { console.error('Error writing new message to database', error); }); }

slide-91
SLIDE 91

Par Part t 2: Impl plem emen ent t me mess ssage ge rec eceiving eiving

 Modify loadMessages function in main.js

 Synchronize messages on the app across clients  Add listeners that trigger when changes are made to data

 Listeners update UI element for showing messages.

 Only display the last 12 messages of the chat for fast load  (See next slide)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-92
SLIDE 92

Portland State University CS 430P/530 Internet, Web & Cloud Systems

// Loads chat messages history and listens for upcoming ones. function loadMessages() { // Create query to load last 12 messages and listen for new ones. var query = firebase.firestore() .collection('messages') .orderBy('timestamp', 'desc') .limit(12); // Start listening to the query. query.onSnapshot(function(snapshot) { snapshot.docChanges().forEach(function(change) { if (change.type === 'removed') { deleteMessage(change.doc.id); } else { var message = change.doc.data(); displayMessage(change.doc.id, message.timestamp, message.name, message.text, message.profilePicUrl, message.imageUrl); } }); }); }

slide-93
SLIDE 93

Tes est

 Update your app  Sign-in to Google  Click on Message box, type a

message and click Send

 Message will be inserted into

real-time database

 UI will automatically update

with message and the account profile picture

Portland State University CS 430P/530 Internet, Web & Cloud Systems

firebase serve

slide-94
SLIDE 94

 Show the message in the database  Note

 One can mock up an iOS or Android client version to interoperate (see

the two other codelabs)

 https://codelabs.developers.google.com/codelabs/firebase-android  https://codelabs.developers.google.com/codelabs/firebase-ios-swift

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-95
SLIDE 95

Tes est t rea eal-time time data atabase base up updat ates es

 Go back to Firebase Database web UI to view messages in database  We will manually add a message and it will update the UI in real-

time automatically

 Click "Add document"

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-96
SLIDE 96

 Click on "Auto ID" for Document ID, then enter the fields for the

document

 name (string)

 Wu

 profilePicURL (string)

 https://lh3.googleusercontent.com/a-

/AAuE7mAaCBS0jz6HPgy_NW_UAlaoETpPoNZHTo2McVTAQQ

 text (string)

 Pretend the instructor added a message

 timestamp (timestamp)

 Set to today's date and time

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-97
SLIDE 97

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-98
SLIDE 98

Par Part t 4: Impl plem emen ent t ima mage ge se sending ding

 Update saveImageMessage to store images into real-time

database

 Initially create message with loading icon  Take file parameter and store in Firebase storage  Get URL for the file in Firebase storage  Update message in step 1 with URL to show image on UI

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-99
SLIDE 99

Portland State University CS 430P/530 Internet, Web & Cloud Systems

// Saves a new message containing an image in Firebase. // This first saves the image in Firebase storage. function saveImageMessage(file) { // 1 - We add a message with a loading icon that will get updated with the shared image. firebase.firestore().collection('messages').add({ name: getUserName(), imageUrl: LOADING_IMAGE_URL, profilePicUrl: getProfilePicUrl(), timestamp: firebase.firestore.FieldValue.serverTimestamp() }).then(function(messageRef) { // 2 - Upload the image to Cloud Storage. var filePath = firebase.auth().currentUser.uid + '/' + messageRef.id + '/' + file.name; return firebase.storage().ref(filePath).put(file).then(function(fileSnapshot) { // 3 - Generate a public URL for the file. return fileSnapshot.ref.getDownloadURL().then((url) => { // 4 - Update the chat message placeholder with the image's URL. return messageRef.update({ imageUrl: url, storageUri: fileSnapshot.metadata.fullPath }); }); }); }).catch(function(error) { console.error('There was an error uploading a file to Cloud Storage:', error); }); }

slide-100
SLIDE 100

UI UI

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-101
SLIDE 101

 Show message in Database with link to file

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-102
SLIDE 102

 Show file in Firebase storage

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-103
SLIDE 103

Fi Firebase ebase Lab b #1

 Skip step 11: (Part 5) Enabling notifications, walkthrough included  Skip steps 12-13: Locking down security rules  Skip step 14: Performance  Step 15

 Create deployment manifest in web-start/firebase.json  Deploy app to Firebase static hosting

 Send URL to partner or instructor so they can add messages via their

Google account

 Show a screenshot of messages sent by multiple users  https://codelabs.developers.google.com/codelabs/firebase-web (39

min)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

firebase deploy --except functions

{ "hosting": { "public": "./public" } }

slide-104
SLIDE 104

Data Labs

slide-105
SLIDE 105

Cloud ud Da Datapr taproc

  • c Lab #1

 Calculate π via massively parallel dart throwing  Two ways (27 min)

 Command-line interface  Web UI

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-106
SLIDE 106

Compu putation tation for r calcula lculating ting π

 Square with sides of length 1 (Area = 1)  Circle within has diameter 1 (radius = ½)

 Area of circle is ?

 π * ( ½ ) 2  Or π/4

 Randomly throw darts into square

 What does the ratio of darts in the circle to the total darts correspond

to?

 Ratio of the areas (or π/4)

 What expression as a function of darts approximates π ?

Darts in Circle = π/4 Total Darts

 π = 4*(Darts in Circle)

Total Darts

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-107
SLIDE 107

 Algorithm

 Spawn 1000 dart-throwers (map)  Collect counts (reduce)

 Modified computation on quadrant

 Calculate "inside" to get ratio  Randomly pick x and y uniformly between 0,1 and  Dart is inside orange when x2 + y2 < 1

 Perform parallel computation

Portland State University CS 430P/530 Internet, Web & Cloud Systems

def inside(p): x, y = random.random(), random.random() return x*x + y*y < 1 count = sc.parallelize(xrange(0, NUM_SAMPLES)).filter(inside).count() print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)

(1,1) (0,0) map reduce

slide-108
SLIDE 108

Ver ersi sion

  • n #1: Comm

mmand and-line ine inter erface ace

 Provisioning and Using a Managed Hadoop/Spark Cluster with

Cloud Dataproc (Command Line) (20 min)

 Enable API  Skip to end of Step 4

 Set zone to us-west1-b (substitute zone for rest of lab)

 Set name of cluster in CLUSTERNAME environment variable to

<username>-dplab

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud config set compute/zone us-west1-b CLUSTERNAME=${USER}-dplab gcloud services enable dataproc.googleapis.com

slide-109
SLIDE 109

 Create a cluster with tag "codelab" in us-west1-b  If you get quota errors, use the following flags as well  View cluster on Compute Engine

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud dataproc clusters create ${CLUSTERNAME} \

  • -scopes=cloud-platform \
  • -tags codelab \
  • -zone=us-west1-b
  • -master-machine-type=n1-standard-2
  • -worker-machine-type=n1-standard-2
  • -master-boot-disk-size=10GB
  • -worker-boot-disk-size=10GB
slide-110
SLIDE 110

 Note the current time, then submit job specifying

 1000 workers  stdout and stderr sent to output.txt via >&  Command placed in the background via ending &

 List the jobs periodically via  When done, note the time. How long did it take?  Examine output.txt via less to find the string "Pi is"

 Show the estimate for π

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud dataproc jobs list --cluster ${CLUSTERNAME}

gcloud dataproc jobs submit spark --cluster ${CLUSTERNAME} \

  • -class org.apache.spark.examples.SparkPi \
  • -jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000 \

>& output.txt &

slide-111
SLIDE 111

 Show the cluster to find the numInstances used for the master

and the workers (save to a file if necessary)

 Allocate two pre-emptible machines to the cluster  Repeat listing to see Config section they show up in  Show them in Compute Engine

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud dataproc clusters describe ${CLUSTERNAME} gcloud dataproc clusters update ${CLUSTERNAME} --num- preemptible-workers=2 gcloud dataproc clusters describe ${CLUSTERNAME}

slide-112
SLIDE 112

 Note the current time, then submit job again, saving result to a

different file

 List the jobs periodically via  When done, note the time. How long did it take?

 Show the estimate for π

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud dataproc jobs list --cluster ${CLUSTERNAME}

gcloud dataproc jobs submit spark --cluster ${CLUSTERNAME} \

  • -class org.apache.spark.examples.SparkPi \
  • -jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000 \

>& output2.txt &

Rep epeat eat wi with th new w set setup up

slide-113
SLIDE 113

 ssh into the master node  Once logged in, get the hostname  List the cluster to show all VMs  Then logout  Skip Step 10  Delete cluster in Cloud Shell  Ensure no instances from cluster are running on Compute Engine

before continuing to Step 12

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud compute ssh ${CLUSTERNAME}-m --zone=us-west1-b hostname gcloud dataproc clusters list gcloud dataproc clusters delete ${CLUSTERNAME}

slide-114
SLIDE 114

Ver ersi sion

  • n #1: Comm

mmand and-line ine inter erface ace

 In Step 12

 Repeat the lab via the web console (Step 12 of codelab) "Getting

Started…"

 Version #1: Provisioning and Using a Managed Hadoop/Spark

Cluster with Cloud Dataproc (Command Line) (20 min)

 https://codelabs.developers.google.com/codelabs/cloud-dataproc-

gcloud

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-115
SLIDE 115

Ver ersi sion

  • n #2: Web

eb UI

 Skip steps 1, 2, 3  Step 4

 Goto Cloud Dataproc  Create a cluster in us-

west1-b with master and worker nodes set to n1-standard-2VMs

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-116
SLIDE 116

 Click "Submit a Job", choose region and cluster just created  Set job type to Spark  Set name of main jar

 Java version

 Set args to 1000

 # of tasks

 Set location of jar

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-117
SLIDE 117

 Start job and wait a minute for completion  Upon completion, click on job, then click on Line wrapping to see

  • utput

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-118
SLIDE 118

 Delete cluster

Portland State University CS 430P/530 Internet, Web & Cloud Systems

Ver ersi sion

  • n #2: Web

eb UI

slide-119
SLIDE 119

Cloud ud Da Datapr taproc

  • c Lab #1

 Version #2: Introduction to Cloud Dataproc: Hadoop and Spark on

Google Cloud Platform (7 min)

 https://codelabs.developers.google.com/codelabs/cloud-dataproc-

starter

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-120
SLIDE 120

Cloud ud Da Datapr taproc

  • c Lab #2

 Distributed Image Processing in Cloud Dataproc

 Perform face detection on images in parallel using a Dataproc cluster  Data Science Quest lab #3

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-121
SLIDE 121

 You will use sbt, an open source build tool, to build the JAR for

the job you will submit to the Cloud Dataproc cluster

 Build code

Portland State University CS 430P/530 Internet, Web & Cloud Systems

echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823 sudo apt-get update sudo apt-get install -y scala apt-transport-https sbt git clone https://github.com/GoogleCloudPlatform/cloud-dataproc cd cloud-dataproc/codelabs/opencv-haarcascade sbt assembly

slide-122
SLIDE 122

View w code

 cloud-dataproc/codelabs/opencv-

haarcascade/FeatureDetector.scala  Creates Spark context to be used for job, copies classifier to cluster

machines

 Then, map the function processImage onto all files in bucket.  Specify output directory to place result (reduce)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

def main (args: Array[String]) { val conf = new SparkConf() val sc = new SparkContext(conf) ... val classifier = downloadToCluster(classifierPath, sc) ... sc.parallelize(filePaths).foreach({ processImage(_, classifier.getName(), outputDirName) }) }

slide-123
SLIDE 123

 processImage creates an OpenCV's CascadeClassifier, then calls

detectFeatures with input image from imgs directory to generate

  • utput image in out directory of bucket

Portland State University CS 430P/530 Internet, Web & Cloud Systems

import org.bytedeco.javacpp.opencv_objdetect.CascadeClassifier def processImage(...): String = { val classifierName = SparkFiles.get(classifier) ... val inImg = imread(localIn.getPath()) val detector = new CascadeClassifier(classifierName) val outImg = detectFeatures(inImg, detector) ... }

slide-124
SLIDE 124

 detectFeatures allocates a vector of rectangles, then calls

OpenCV's detector to populate it with features

 Copies image via clone()  Then draws green rectangles on copy before returning

Portland State University CS 430P/530 Internet, Web & Cloud Systems

def detectFeatures(img: Mat, detector: CascadeClassifier): Mat = { val features = new RectVector() detector.detectMultiScale(img, features) val numFeatures = features.size().toInt val outlined = img.clone() // Draws green rectangles on the detected features. val green = new Scalar(0, 255, 0, 0) for (f <- 0 until numFeatures) { val currentFeature = features.get(f) rectangle(outlined, currentFeature, green) } return outlined }

slide-125
SLIDE 125

 Generate a random name for the storage bucket for the lab  Create the bucket  Copy images to analyze into bucket (in imgs/ directory)  List the contents of the bucket

Portland State University CS 430P/530 Internet, Web & Cloud Systems

MYBUCKET="${USER/_/-}-images-${RANDOM}" echo MYBUCKET=${MYBUCKET}

curl https://www.publicdomainpictures.net/pictures/20000/velka/family-of- three-871290963799xUk.jpg | gsutil -h "Content-Type:image/jpeg" cp - gs://${MYBUCKET}/imgs/family-of-three.jpg curl https://www.publicdomainpictures.net/pictures/10000/velka/african-woman- 331287912508yqXc.jpg | gsutil -h "Content-Type:image/jpeg" cp - gs://${MYBUCKET}/imgs/african-woman.jpg curl https://www.publicdomainpictures.net/pictures/10000/velka/296- 1246658839vCW7.jpg | gsutil -h "Content-Type:image/jpeg" cp - gs://${MYBUCKET}/imgs/classroom.jpg

gsutil mb gs://${MYBUCKET} gsutil ls -R gs://${MYBUCKET}

slide-126
SLIDE 126

 Create the name for your compute cluster for Dataproc  Create the cluster in us-west1-b  Download and copy face detection configuration file to your bucket

 Contains classifier (e.g. model data) to perform feature detection

(~1MB)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

MYCLUSTER="${USER/_/-}-codelab" echo MYCLUSTER=${MYCLUSTER} gcloud config set compute/zone us-west1-b gcloud dataproc clusters create --worker-machine-type=n1-standard-2 ${MYCLUSTER}

curl https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarca scade_frontalface_default.xml | gsutil -h "Content-Type:application/xml" cp - gs://${MYBUCKET}/haarcascade_frontalface_default.xml

slide-127
SLIDE 127

 Run Dataproc job  Go to Dataproc→Clusters. Show the Jobs detail, the VM instances

created, and the configuration

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud dataproc jobs submit spark \

  • -cluster ${MYCLUSTER} \
  • -jar target/scala-2.10/feature_detector-assembly-1.0.jar -- \

gs://${MYBUCKET}/haarcascade_frontalface_default.xml \ gs://${MYBUCKET}/imgs/ \ gs://${MYBUCKET}/out/

slide-128
SLIDE 128

 Go to Dataproc→Jobs and show the job output and configuration

via screenshot

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-129
SLIDE 129

 Download and screenshot the processed images in the out/ folder  Go to the Cloud Vision page, scroll down to "Try the API",

download one of the 3 original images used, and use the Vision API to perform the same operation. Compare the results

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-130
SLIDE 130

 Clean up

 Delete cluster  Delete storage bucket  Delete cloud-dataproc  https://codelabs.developers.google.com/codelabs/scd-dataproc

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud dataproc clusters delete ${MYCLUSTER} gsutil rm "gs://${MYBUCKET}/**" gsutil rb gs://${MYBUCKET} rm -rf ~/cloud-dataproc

Cloud ud Da Datapr taproc

  • c Lab #2
slide-131
SLIDE 131

Cloud ud Da Datafl taflow w Lab b #1

 Simple Cloud Dataflow pipeline in Python (grep)

 Find all imports of a particular package in Java source code  Two versions

 Local Apache Beam pipeline grep.py  Pipeline mapped to Cloud Dataflow grepc.py

 Check out source code and change into lab directory  Install packages

Portland State University CS 430P/530 Internet, Web & Cloud Systems

git clone https://github.com/GoogleCloudPlatform/training-data-analyst.git cd training-data- analyst/courses/machine_learning/deepdive/04_features/dataflow/python/

sudo ./install_packages.sh

slide-132
SLIDE 132

 List the APIs to see the range of services available

 To enable a service like the Cloud DatastoreAPI, the command would

be

 From the list, enable the following services if not already enabled

 Google Dataflow API  Stackdriver Logging API  Google Cloud Storage  Google Cloud Storage JSON API  Google Cloud Pub/Sub API

 Ensure they are set via

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud services list --available gcloud services enable datastore.googleapis.com gcloud services list --enabled

slide-133
SLIDE 133

Portland State University CS 430P/530 Internet, Web & Cloud Systems

import apache_beam as beam import sys def my_grep(line, term): if line.startswith(term): yield line p = beam.Pipeline(argv=sys.argv) input = '../javahelp/src/main/java/com/google/cloud/training/dataanalyst/javahelp/ *.java'

  • utput_prefix = '/tmp/output'

searchTerm = 'import' # find all lines that contain the searchTerm (p | 'GetJava' >> beam.io.ReadFromText(input) | 'Grep' >> beam.FlatMap(lambda line: my_grep(line, searchTerm) ) | 'write' >> beam.io.WriteToText(output_prefix) ) p.run().wait_until_finish()

slide-134
SLIDE 134

 Run the pipeline locally  View the output and print the number of lines in it  For locally executing pipeline, data is read in from the local

filesystem

 Now, we will run a version that can be mapped into a parallel

dataflow pipeline

 Note: code is modified to perform I/O to and from a bucket instead of

the local file system.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

python3 grep.py cat /tmp/output* wc –l /tmp/output*

slide-135
SLIDE 135

Portland State University CS 430P/530 Internet, Web & Cloud Systems

PROJECT='your-project' BUCKET='your-bucket' argv = [ '--project={0}'.format(PROJECT), '--job_name=examplejob2', '--save_main_session', '--staging_location=gs://{0}/staging/'.format(BUCKET), '--temp_location=gs://{0}/staging/'.format(BUCKET), '--runner=DataflowRunner' ] p = beam.Pipeline(argv=argv) input = 'gs://{0}/javahelp/*.java'.format(BUCKET)

  • utput_prefix = 'gs://{0}/javahelp/output'.format(BUCKET)

searchTerm = 'import' # find all lines that contain the searchTerm (p | 'GetJava' >> beam.io.ReadFromText(input) | 'Grep' >> beam.FlatMap(lambda line: my_grep(line, searchTerm) ) | 'write' >> beam.io.WriteToText(output_prefix) ) p.run()

slide-136
SLIDE 136

 Create a bucket for placing input files and output file  Copy the source files into the bucket  Edit grepc.py to replace PROJECT (your-project) and

BUCKET (your-bucket) with your project and bucket names

 Then run the program (ignore Deprecation warning)

 Note: if you get permission errors on the storage bucket, ensure that

GOOGLE_APPLICATION_CREDENTIALS are either unset or set to point to a JSON file with valid service account credentials

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gsutil mb gs://[BUCKET] python3 grepc.py gsutil cp ../javahelp/src/main/java/com/google/cloud/training/dataanalyst /javahelp/*.java gs://[BUCKET]/javahelp

slide-137
SLIDE 137

 Screenshot the job in the Dataflow console and its details including

the resources it brought up

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-138
SLIDE 138

 View the results in the storage bucket  Because the pipeline distributes workload across workers based on

individual files, the resulting output is distributed across multiple files

 Match one of the output files with its corresponding input file and

screenshot the matching lines in each

 e.g. StreamDemoConsumer.java and its corresponding output file

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-139
SLIDE 139

 Next, we will implement a Map-Reduce computation in Dataflow

 Note that, unlike Cloud Dataproc, intermediate results are consumed

within workers and *not* stored out

 is_popular.py for determining the top packages included in a

source tree

 Code for pipeline functions  startsWith()

 Grep code from before (looks for "import" in our example)

 packageUse()

 Produces all packages and sub-packages used for a single import statement

Portland State University CS 430P/530 Internet, Web & Cloud Systems

def startsWith(line, term): if line.startswith(term): yield line def packageUse(line, keyword): packages = getPackages(line, keyword) for p in packages: yield (p, 1)

slide-140
SLIDE 140

 Strip off 'import' keyword (getPackages) and trailing ';'  Call splitPackageName to recursively output package name

prefixes using occurrences of '.' until end

Portland State University CS 430P/530 Internet, Web & Cloud Systems

def getPackages(line, keyword): start = line.find(keyword) + len(keyword) end = line.find(';', start) if start < end: packageName = line[start:end].strip() return splitPackageName(packageName) return [] def splitPackageName(packageName): """e.g. given com.example.appname.library.widgetname returns com com.example com.example.appname etc. """ result = [] end = packageName.find('.') while end > 0: result.append(packageName[0:end]) end = packageName.find('.', end+1) result.append(packageName) return result

slide-141
SLIDE 141

 by_value()

 Take two key-value pairs, and return whether one has a greater

  • ccurrence than the other

 Used in pipeline to sort results to get top packages

Portland State University CS 430P/530 Internet, Web & Cloud Systems

def by_value(kv1, kv2): key1, value1 = kv1 key2, value2 = kv2 return value1 < value2

slide-142
SLIDE 142

 GetJava: parallel reading of input files  GetImports: pull out all lines that perform an import (map)  PackageUse: pull out all packages and sub-packages (map)  TotalUse: CombinePerKey to sum up occurrences of each package

(shuffle, reduce)

 Top_5: Combine (reduce) keys and sort by_value to return Top 5  Write out to specified file

Portland State University CS 430P/530 Internet, Web & Cloud Systems

p = beam.Pipeline(argv=pipeline_args) input = '{0}*.java'.format(options.input)

  • utput_prefix = options.output_prefix

keyword = 'import' # find most used packages (p | 'GetJava' >> beam.io.ReadFromText(input) | 'GetImports' >> beam.FlatMap(lambda line: startsWith(line, keyword)) | 'PackageUse' >> beam.FlatMap(lambda line: packageUse(line, keyword)) | 'TotalUse' >> beam.CombinePerKey(sum) | 'Top_5' >> beam.transforms.combiners.Top.Of(5, by_value) | 'write' >> beam.io.WriteToText(output_prefix) ) p.run().wait_until_finish()

slide-143
SLIDE 143

 Run the pipeline locally  View the output and show the top 5 packages  Note

 No need to cleanup aside from deleting storage bucket  Compare serverless Dataflow workloads vs. Dataproc workloads that

map into fixed cluster resources

Portland State University CS 430P/530 Internet, Web & Cloud Systems

python3 is_popular.py cat /tmp/output*

slide-144
SLIDE 144

Cloud ud Da Datafl taflow w Lab b #1

 Simple Cloud Dataflow pipeline in Python (grep)

 https://codelabs.developers.google.com/codelabs/mlimmersion-

simplepipeline-python

 Simple Cloud Dataflow MapReduce job in Python (isPopular)

 https://codelabs.developers.google.com/codelabs/mlimmersion-

mapreduce-python

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-145
SLIDE 145

BigQuery, DataNotebooks Labs

slide-146
SLIDE 146

Bi BigQ gQuer uery Lab #1

 Create datasets and run queries on BigQuery (25 min)  Launch Cloud Shell  List the APIs to see the range of services available

 To enable a specific service, the command would be  From the list, enable the BigQueryAPI

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gcloud services list --available gcloud services enable <service_name>.googleapis.com

slide-147
SLIDE 147

 Go to console, and menu of services  BigQuery

 Click on drop-down next to project name and create dataset  For Dataset ID, type cp100

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-148
SLIDE 148

 Copy file from bucket into Cloud Shell and take a look

gsutil cp gs://cloud-training/CP100/Lab12/yob2014.txt . head -3 yob2014.txt wc -l yob2014.txt

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-149
SLIDE 149

 Click on cp100 under your project and create table from file in bucket

 Specify input file location in bucket (yob2014.txt) and select CSV format  Specify table name (namedata), table type (native) and schema columns and types  Edit schema to add fields for name and gender as STRING, count as INTEGER  Field delimiter as a Comma, then Create Table

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-150
SLIDE 150

 Once you see your table has been created, click on it and go to

Preview, show the number of rows in Details

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-151
SLIDE 151

Qu Quer erying ying met method d #1

 In the Query editor for your table

 Run a query that lists the 20 most popular female names in 2014

 Notice the Validator with a a green checkmark to see how much data you will hit

when the query is run

 Can hide editor to see your query results if you click “hide editor”  Table names must be escaped with back ticks in the UI  Screenshot your results.  Example on next slide.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-152
SLIDE 152

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-153
SLIDE 153

 Via command-line in Google Cloud Shell

 Run query to get the 20 least popular boys names in 2014  Screenshot output

Portland State University CS 430P/530 Internet, Web & Cloud Systems

Qu Quer erying ying met method d #2

slide-154
SLIDE 154

 Via BigQuery shell (bq shell)  Run a query to find 20 most popular male names in 2014.  Screenshot output

Portland State University CS 430P/530 Internet, Web & Cloud Systems

Qu Quer erying ying met method d #3

slide-155
SLIDE 155

 Is your name in the dataset for 2014? How popular was it?

Screenshot your results.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-156
SLIDE 156

Bi BigQ gQuer uery Lab #1

 Keep project  Create datasets and run queries on BigQuery

 https://codelabs.developers.google.com/codelabs/cp100-big-query/

(25 min)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-157
SLIDE 157

Bi BigQ gQuer uery Lab #2

 Query Github Data Using BigQuery (8 min)  (Not in Codelab) Visit the public dataset containing all of the blocks

and transactions on the Bitcoin block-chain https://bigquery.cloud.google.com/dataset/bigquery-public- data:bitcoin_blockchain

 Click on the tables and then click Preview to find the number of blocks

that are currently being stored on a full node.

 Click on Details to find the size of the block-chain in BigQuery

(uncompressed). Screenshot both results.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-158
SLIDE 158

 Visit dataset containing all github commits

 https://bigquery.cloud.google.com/table/bigquery-public-

data:github_repos.commits

 Click on Preview and examine the columns associated with commits  Click on Details to find the size of the commits table  Screenshot your results.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-159
SLIDE 159

 Go to console, and open a BigQuery window  Click on "Compose Query"  Click Show Options  Unclick Legacy SQL to use standard SQL (IMPORTANT: you

will need to make sure this is unchecked every time you make a query for this lab)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-160
SLIDE 160

 Enter a query to find commits with duplicate subject lines (commit

messages)

 Run the query to find commits with duplicate subject lines  What is the most common subject message used? Answer this in

your lab notebook

 Screenshot your results.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

#standardSQL SELECT subject AS subject, COUNT(*) AS num_duplicates FROM `bigquery-public-data.github_repos.commits` GROUP BY subject ORDER BY num_duplicates DESC LIMIT 100

slide-161
SLIDE 161

 Run query to find projects with the most contributors

 What project is has the top number of contributors? Screenshot your

  • results. Extract name of repo from repo_name path.

 Run query to find most popular languages used in pull requests.

What language is this? Screenshot.

Portland State University CS 430P/530 Internet, Web & Cloud Systems

#standardSQL SELECT COUNT(DISTINCT author.email) AS num_authors, REGEXP_EXTRACT(repo_name[ORDINAL(1)], r"([^/]+)$") AS repo FROM `bigquery-public-data.github_repos.commits` GROUP BY repo ORDER BY num_authors DESC LIMIT 1000 #standardSQL SELECT COUNT(*) pr_count, JSON_EXTRACT_SCALAR(payload, '$.pull_request.base.repo.language') lang FROM `githubarchive.month.201901` WHERE JSON_EXTRACT_SCALAR(payload, '$.pull_request.base.repo.language') IS NOT NULL GROUP BY lang ORDER BY pr_count DESC LIMIT 10

slide-162
SLIDE 162

Bi BigQ gQuer uery Lab #2

 Query Github Data Using BigQuery (8 min)

 https://codelabs.developers.google.com/codelabs/bigquery-github

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-163
SLIDE 163

Bi BigQ gQuer uery Lab #3

 Looking at campaign finance with BigQuery (14 min)

 First 8 steps

 Skip step 2 (should already be done)  Create a dataset via bq command-line interface  Source of campaign finance data and its format are at

 http://www.fec.gov/finance/disclosure/ftpdet.shtml

 Copy uncompressed version from a GCS bucket and examine the last

several entries with tail

Portland State University CS 430P/530 Internet, Web & Cloud Systems

DATASET=campaign_funding bq mk -d ${DATASET} gsutil cp gs://campaign-funding/indiv16.txt . tail indiv16.txt

slide-164
SLIDE 164

Bi BigQ gQuer uery Lab #3

 Use du and wc to find out how large the file is and how many

individual contributions were made

 Contribution data definitions by individuals (indiv16.txt), by

committees, and by candidates available at

 https://classic.fec.gov/finance/disclosure/metadata/DataDictionaryC

  • ntributionsbyIndividuals.shtml

 https://classic.fec.gov/finance/disclosure/metadata/DataDictionaryC

  • mmitteeMaster.shtml

 https://classic.fec.gov/finance/disclosure/metadata/DataDictionaryCa

ndidateMaster.shtml

 We will be linking a BigQuery table with these definitions to the

downloaded files stored in GCS

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-165
SLIDE 165

 Create a BigQuery definition specifying CSV data from the bucket

location via command-line and obtain data definition JSON output

 Note that file is not actually in CSV format

 Data separated by pipe character '|'  In Line #6, change fieldDelimiter to indicate this  Or run…

Portland State University CS 430P/530 Internet, Web & Cloud Systems

bq mkdef --source_format=CSV gs://campaign-funding/indiv*.txt \ "CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION, TRANSACTION_DT, TRANSACTION_AMT:FLOAT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT, SUB_ID" > indiv_def.json sed -i 's/"fieldDelimiter": ","/"fieldDelimiter": "|"/g; s/"quote": "\\""/"quote":""/g' indiv_def.json

slide-166
SLIDE 166

 Copy similarly modified definition files for committee and candidate

data

 Create BigQuery tables with the definitions

 Note that because BigQuery tables are linked to flat files, queries will

not perform well for large data

Portland State University CS 430P/530 Internet, Web & Cloud Systems

gsutil cp gs://campaign-funding/candidate_def.json . gsutil cp gs://campaign-funding/committee_def.json .

bq mk --external_table_definition=indiv_def.json -t ${DATASET}.transactions bq mk --external_table_definition=committee_def.json -t ${DATASET}.committees bq mk --external_table_definition=candidate_def.json -t ${DATASET}.candidates

slide-167
SLIDE 167

 Goto BigQuery UI and run a simple query

 Note that because we pointed BigQuery to files in a storage bucket, the

validator will not be able to estimate the amount of data that will be processed for the query

Portland State University CS 430P/530 Internet, Web & Cloud Systems

SELECT * FROM [campaign_funding.transactions] WHERE EMPLOYER contains "GOOGLE" ORDER BY TRANSACTION_DT DESC LIMIT 100

slide-168
SLIDE 168

 Then run the following query to obtain party-based contributions

for those with an engineering occupation

 IMPORTANT: DO check Legacy SQL for this query

Portland State University CS 430P/530 Internet, Web & Cloud Systems

SELECT affiliation, SUM(amount) AS amount FROM ( SELECT * FROM ( SELECT t.amt AS amount, t.occupation AS occupation, c.affiliation AS affiliation, FROM ( SELECT trans.TRANSACTION_AMT AS amt, trans.OCCUPATION AS occupation, cmte.CAND_ID AS CAND_ID FROM [campaign_funding.transactions] trans RIGHT OUTER JOIN EACH ( SELECT CMTE_ID, FIRST(CAND_ID) AS CAND_ID FROM [campaign_funding.committees] GROUP EACH BY CMTE_ID) cmte ON trans.CMTE_ID = cmte.CMTE_ID) AS t RIGHT OUTER JOIN EACH ( SELECT CAND_ID, FIRST(CAND_PTY_AFFILIATION) AS affiliation, FROM [campaign_funding.candidates] GROUP EACH BY CAND_ID) c ON t.CAND_ID = c.CAND_ID) WHERE occupation CONTAINS "ENGINEER") GROUP BY affiliation ORDER BY amount DESC

slide-169
SLIDE 169

 Query needs to join with committees table

(Republican/Democratic) and candidates table to associate candidate to party for individual contribution

 Repeat previous query on any other profession besides Engineer to

find

 A profession that has more Republican contributions than Democratic  A profession that has more Democratic contributions than Republican  Some professions may be case-sensitive  Screenshot your results

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-170
SLIDE 170

Bi BigQ gQuer uery Lab #3

 Looking at campaign finance with BigQuery (14 min)

 First 8 steps  https://codelabs.developers.google.com/codelabs/cloud-bq-campaign-

finance

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-171
SLIDE 171

Cloud ud Da Datalab talab Lab b #1

 Analyzing data using Datalab and BigQuery (11 min)  In Cloud Shell, launch Cloud Datalab docker container onto a VM

instance nearby

 If you get a Cloud Source Repositories error, go to the console UI and

create a default repository

 Go to next step while waiting (takes > 5 min to get the message below)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

datalab create mydatalabvm --zone us-west1-b

slide-172
SLIDE 172

 In BigQuery UI, run standard SQL query to list delayed departures,

uncheck use Legacy SQL and screenshot your results.

 Run query to find 20 most popular flights and screenshot your results

Portland State University CS 430P/530 Internet, Web & Cloud Systems

SELECT departure_delay, COUNT(1) AS num_flights, APPROX_QUANTILES(arrival_delay, 4) AS arrival_delay_quantiles FROM `bigquery-samples.airline_ontime_data.flights` GROUP BY departure_delay HAVING num_flights > 100 ORDER BY departure_delay ASC SELECT departure_airport, arrival_airport, COUNT(1) AS num_flights FROM `bigquery-samples.airline_ontime_data.flights` GROUP BY departure_airport, arrival_airport ORDER BY num_flights DESC LIMIT 20

slide-173
SLIDE 173

 Go back to Cloud Shell that launched Cloud Datalab  Go to Web Preview of shell, change port to 8081, and preview to

pull up Cloud Datalab UI

 Start a new notebook called 'flights'

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-174
SLIDE 174

 Paste Python code into notebook cell and run it

 Note that df is a pandas data-frame  Get count of flight departure delays and their associated arrival delays,

then run

Portland State University CS 430P/530 Internet, Web & Cloud Systems

query=""" SELECT departure_delay, COUNT(1) AS num_flights, APPROX_QUANTILES(arrival_delay, 10) AS arrival_delay_deciles FROM `bigquery-samples.airline_ontime_data.flights` GROUP BY departure_delay HAVING num_flights > 100 ORDER BY departure_delay ASC """ import google.datalab.bigquery as bq df = bq.Query(query).execute().result().to_dataframe() df.head()

slide-175
SLIDE 175

 Append a new code cell to notebook  Paste Python code to create deciles on arrivals in next notebook

cell and run it

 Paste Python code to plot delays into next notebook cell and run it

 Show the plot

Portland State University CS 430P/530 Internet, Web & Cloud Systems

import pandas as pd percentiles = df['arrival_delay_deciles'].apply(pd.Series) percentiles = percentiles.rename(columns = lambda x : str(x*10) + "%") df = pd.concat([df['departure_delay'], percentiles], axis=1) df.head() without_extremes = df.drop(['0%', '100%'], 1) without_extremes.plot(x='departure_delay', xlim=(-30,50),ylim=(-50,50));

slide-176
SLIDE 176

Cloud ud Da Datalab talab Lab b #1

 Skip Step #5  Analyzing data using Datalab and BigQuery (11 min)  Link

 https://codelabs.developers.google.com/codelabs/mlimmersion-data-

analysis/

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-177
SLIDE 177

Cloud ud Da Datalab talab #2 #2

 Anomaly Detection in HTTP logs  Steps through work flow of a data-scientist

 Other notebooks included in samples directory

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-178
SLIDE 178

 In Cloud Datalab, click on Home icon, then navigate to  Click on  In notebook, clear all cells  The notebook will take HTTP log data stored in BigQuery and run

queries against it to detect anomalies in requests

Portland State University CS 430P/530 Internet, Web & Cloud Systems

datalab/docs/samples Anomaly Detection in HTTP Logs.ipynb

slide-179
SLIDE 179

Cloud ud Da Datalab talab #2

 Individually select code cells and click Run  Show all graphs that are generated by the notebook  Clean-up

Portland State University CS 430P/530 Internet, Web & Cloud Systems

datalab delete mydatalabvm