CS 327E Class 9 November 11, 2019 Grading update What to - PowerPoint PPT Presentation

Oct 01, 2023 •204 likes •375 views

CS 327E Class 9 November 11, 2019 Grading update What to expect from remaining Milestones: Milestone 9 : Find dataset2 + ingest into BQ + model the data Milestone 10 : Create Beam pipelines + cross-dataset queries

CS 327E Class 9 November 11, 2019
● Grading update ● What to expect from remaining Milestones: ● Milestone 9 : Find dataset2 + ingest into BQ + model the data ● Milestone 10 : Create Beam pipelines + cross-dataset queries ● Milestone 11 : Orchestrate workflow ● Milestone 12 : Present your project ● Review your dataset2 selection: sign-up sheet
1) A data warehouse provides _____________ A. a centralized and consolidated data platform by integrating data from different sources and in different formats. an operational data platform with guaranteed consistency during B. transaction processing.
2) What are the most common schemas of a data warehouse? A. Star and Snowflake schemas B. Fact and Dimension schemas C. Normalized and Denormalized schemas
3) In this Saber data warehouse schema, which column stores a fact/measure? A. Car-Nr B. Cust-Nr C. Sales in Euros D. None of the above
4) What are some important considerations when designing a data warehouse schema? A. The grain of the Fact table(s) B. Identifying the Dimension tables C. Handling slowly changing dimensions D. All of the above
5) What activity can consume 80% of the time when building a data warehouse? A) Designing the data warehouse schema B) Building the ETL process C) Creating the BI reports
6) Just like a data warehouse, a data lake is a central repository of data. Unlike a data warehouse, a data lake stores data in its raw form and its primary users are data scientists. A) True B) False
Classic Star Schema
Data Integration Challenge SELECT ... FROM Source1 .Account as A1 JOIN Source2 .Account as A2 ON A1.c1 = A2.c1 AND A1.c2 = A2.c2 ...
SELECT employer_name, registration_date FROM Employer JOIN Corporate_Registrations on employer_name = corporation_name and employer_city = corporation_city and employer_state = corporation_state Results: ● 2% matches between Employer and Corporate_Registrations ● Punctuation characters in corporation_name and corporation_city ● Suffixes in corporation_name (e.g. LLC, INC)
dataset2 1. Upload dataset2 files to Cloud Storage bucket 2. Create staging area in BigQuery 3. Load data files into BigQuery as staging tables 4. Create modeled area in BigQuery 5. Identify Entity Types and create modeled tables 6. Identify relationships between tables 7. Identify Primary and Foreign Keys Same steps as dataset1 , except using a Jupyter Notebook.
Jupyter Notebooks ● Project Jupyter is open-source software ● Widely used for developing data science projects ● A web-based environment for creating notebooks ● Integrates code and its output into a single document, saved in .ipynb file ● Notebook is made up of cells ● Cell: block of code to be executed or container for text to be displayed ● Two types of cells: Code and Markdown ● Kernel: computation engine that executes the code in a notebook
Jupyter Notebook Demo
http://www.cs.utexas.edu/~scohen/milestones/Milestone9.pdf

Recommend

CS 327E Class 7 October 21, 2019 Announcements Midterm is next class from 6pm - 7:30pm

CS 327E Class 7 October 21, 2019 Announcements Midterm is next class from 6pm - 7:30pm Midterm location: Mary E Gearing Hall, GEA 105 Review session: Friday from 1pm - 2pm in GDC 1.304 Milestone 7 due this Friday. 1) Which

612 views • 22 slides

CS 327E Class 6 October 14, 2019 1) PTransforms such as Pardo mutate their input elements. A.

CS 327E Class 6 October 14, 2019 1) PTransforms such as Pardo mutate their input elements. A. True B. False 2) What kind of object does the ParDo transform expect? A. A DoFn subclass B. A DoFn super class C. A DoFn abstract class 3) Does

371 views • 20 slides

CS 327E Class 9 November 19, 2018 Announcements What to expect from the next 3 milestones

CS 327E Class 9 November 19, 2018 Announcements What to expect from the next 3 milestones (Milestones 8 - 10) How to get feedback on your cross-dataset queries and pipeline designs today. Sign-up sheet: https://tinyurl.com/y9fdogqk

481 views • 19 slides

CS 327E Class 10 November 26, 2018 Announcements Scheduling your group presentation for

CS 327E Class 10 November 26, 2018 Announcements Scheduling your group presentation for Milestone 10. All presentations will happen on week of 12/10 M-F in the evenings. Send me your preferred days/times by Friday . How to get

651 views • 21 slides

CS 327E Class 11 November 25, 2019 Announcements Milestone 12: What: Group Presentations.

CS 327E Class 11 November 25, 2019 Announcements Milestone 12: What: Group Presentations. When: Week of Dec. 9th. M-F 6:00pm - 8:00pm. Where: TBD. Requested Action: Email me your preferred times by EOD tomorrow. 1) In

583 views • 14 slides

CS 327E Class 9 April 8, 2019 No Quiz Today :) What to expect from upcoming Milestones:

CS 327E Class 9 April 8, 2019 No Quiz Today :) What to expect from upcoming Milestones: Milestone 9: Find your secondary dataset, load into BQ and model the data with SQL transforms Milestone 10: Create Beam pipelines that transform the

441 views • 23 slides

CS 327E Class 12 December 2, 2019 Announcements CIS Survey: Your voice matters .

CS 327E Class 12 December 2, 2019 Announcements CIS Survey: Your voice matters . Milestone 12: Presentation Schedule GCP credits: check your balance and request second coupon if needed. 1) What infrastructure components does

349 views • 13 slides

CS 327E Class 4 Sept 18, 2020 Announcements Rubric clarification Test 1 details Exam

CS 327E Class 4 Sept 18, 2020 Announcements Rubric clarification Test 1 details Exam rules: Open-note and open-book Piazza will be disabled during exam May not consult with any human in any form A World without

270 views • 13 slides

CS 327E Class 7 Oct 16, 2020 Review session for Test 2 Test 2 details Exam rules:

CS 327E Class 7 Oct 16, 2020 Review session for Test 2 Test 2 details Exam rules: Open-note and open-book Piazza will be disabled during exam May not consult with any human in any form Designed for storing and

394 views • 17 slides

CS 327E Class 7 November 5, 2018 Check your GCP Credits :) iClicker Question Are you running

CS 327E Class 7 November 5, 2018 Check your GCP Credits :) iClicker Question Are you running low on GCP credits? A. Yes B. No No Quiz Today Dataflow Concepts A system for processing arbitrary computations on large amounts of data

559 views • 17 slides

CS 327E Class 2 September 16, 2019 1) Which is not a Data Manipulation Language construct? a)

CS 327E Class 2 September 16, 2019 1) Which is not a Data Manipulation Language construct? a) CREATE b) SELECT c) INSERT d) UPDATE 2) How many fields does this query return? SELECT group FROM ACL_Lineup_2019 a) 5 b) 4 c) 1 d) 0 3) How

663 views • 16 slides

CS 327E Class 8 Oct 30, 2020 Final Project Components Choose a primary and secondary

CS 327E Class 8 Oct 30, 2020 Final Project Components Choose a primary and secondary dataset (Milestone 1) Load the raw data into BigQuery (Milestone 1) Explore the raw data with SQL (Milestone 1) Cleanse the data with SQL

495 views • 22 slides

CS 327E Class 4 September 30, 2019 1) What type of relationship do we have between the Actor and

CS 327E Class 4 September 30, 2019 1) What type of relationship do we have between the Actor and Movie entity types as shown? A. 1:1 B. 1:m C. m:n 2) How many joins would we need to find the cast members who acted in 'Avengers: Endgame'

398 views • 25 slides

CS 327E Class 10 November 18, 2019 1) What is meant by the following usage pattern? A. The

CS 327E Class 10 November 18, 2019 1) What is meant by the following usage pattern? A. The elements in the PCollection are split up such that 1/2 of the elements are written to BigQuery and 1/2 are written to Bigtable. B. The same

689 views • 15 slides

CS 327E Class 10 April 15, 2019 1) What is meant by the following usage pattern? A. The

CS 327E Class 10 April 15, 2019 1) What is meant by the following usage pattern? A. The elements in the PCollection are split up such that 1/2 of the elements are written to BigQuery and 1/2 are written to Bigtable. B. The same PCollection

395 views • 13 slides

CS 327E Class 8 November 4, 2019 1) Does Q1 contain a subquery? Q1: SELECT * FROM Lineup WHERE

CS 327E Class 8 November 4, 2019 1) Does Q1 contain a subquery? Q1: SELECT * FROM Lineup WHERE band_id = (SELECT id FROM Band WHERE name = 'Asleep at the Wheel') A. Yes B. No 2) What is the output from Q2s subquery when run against the

209 views • 17 slides

OLAP Databases Aalborg University, adapted from Torben Bach Pedersen, Man Lung Yiu and Dimitra

OLAP Databases Aalborg University, adapted from Torben Bach Pedersen, Man Lung Yiu and Dimitra Vista Instructors: Peter Baumann email: p.baumann@jacobs-university.de tel: -3178 office: room 88, Research 1 340151 Big Data & Cloud

2.31k views • 44 slides

Adding Business Intelligence to Paradata: The Blaise Audit Trail Joel Devonshire Gina-Qian

Adding Business Intelligence to Paradata: The Blaise Audit Trail Joel Devonshire Gina-Qian Cheung 2013 International Blaise Users Conference Washington, D.C. Survey Research Operations Survey Research Center Institute for Social

375 views • 9 slides

IATA Webcast A Production-ready Solution to forecast and price under complex market conditions

IATA Webcast A Production-ready Solution to forecast and price under complex market conditions April 2020 FLYR provides a Commercial Operating System for Airlines , unifying their data to maximize revenue through Deep Learning Our Vision 2

498 views • 14 slides

Drupal as a Data Warehouse Everybody Into the Data Lake! Gail Radecki, CHCP, American Academy of

Drupal as a Data Warehouse Everybody Into the Data Lake! Gail Radecki, CHCP, American Academy of Allergy Asthma & Immunology Ezra Wolfe, EthosCE Devin Zuczek, EthosCE Above all else, show the data Edward Tufte My name is Gail

766 views • 47 slides

CSE5334 DATA MINING Lecture 3: Data CSE 4334/5334 Data Mining, Fall 2014 Warehousing, OLAP ,

CSE5334 DATA MINING Lecture 3: Data CSE 4334/5334 Data Mining, Fall 2014 Warehousing, OLAP , Department of Computer Science and Engineering, University of Texas at Arlington Data Cube Chengkai Li (Slides courtesy of Jiawei Han) Chapter

1.19k views • 39 slides

RETHINKING THINKING MODELS FOR EVENT-DRIVEN PROGRAMMING @cdavisafc function SumToN(n : INTEGER):

CORNELIA DAVIS SR. DIRECTOR OF TECHNOLOGY PIVOTAL RETHINKING THINKING MODELS FOR EVENT-DRIVEN PROGRAMMING @cdavisafc function SumToN(n : INTEGER): INTEGER; var i : INTEGER; s : INTEGER; begin i := 0; s := 0; while (i <= n) do begin

396 views • 27 slides

Technology Acceptance Model Technology Acceptance Model (TAM) Fred Davis (1986) PhD

Technology Acceptance Model Technology Acceptance Model (TAM) Fred Davis (1986) PhD Thesis at MIT Adaption of Fishbein and Ajzens Theory of Reasoned Action (TRA) Davis paper Perceived usefulness, perceived ease of use,

463 views • 22 slides

Fast, General Parallel Computation for Norm Matloff University of California at Machine

Fast, General Parallel Computation for Machine Learning Robin Elizabeth Yancey and Fast, General Parallel Computation for Norm Matloff University of California at Machine Learning Davis Robin Elizabeth Yancey and Norm Matloff

1.74k views • 80 slides