Introd u ction to JSON STR E AML IN E D DATA IN G E STION W ITH - PowerPoint PPT Presentation

Introd u ction to JSON STR E AML IN E D DATA IN G E STION W ITH PAN DAS Aman y Mahfo uz Instr u ctor

Ja v ascript Object Notation ( JSON ) Common w eb data format Not tab u lar Records don ' t ha v e to all ha v e the same set of a � rib u tes Data organi z ed into collections of objects Objects are collections of a � rib u te -v al u e pairs Nested JSON : objects w ithin objects STREAMLINED DATA INGESTION WITH PANDAS

Reading JSON Data read_json() Takes a string path to JSON _ or _ JSON data as a string Specif y data t y pes w ith dtype ke yw ord arg u ment orient ke yw ord arg u ment to � ag u ncommon JSON data la y o u ts possible v al u es in pandas doc u mentation STREAMLINED DATA INGESTION WITH PANDAS

Data Orientation JSON data isn ' t tab u lar pandas g u esses ho w to arrange it in a table pandas can a u tomaticall y handle common orientations STREAMLINED DATA INGESTION WITH PANDAS

Record Orientation Most common JSON arrangement [ { "age_adjusted_death_rate": "7.6", "death_rate": "6.2", "deaths": "32", "leading_cause": "Accidents Except Drug Posioning (V01-X39, X43, X45-X59, Y85-Y86)", "race_ethnicity": "Asian and Pacific Islander", "sex": "F", "year": "2007" }, { "age_adjusted_death_rate": "8.1", "death_rate": "8.3", "deaths": "87", ... STREAMLINED DATA INGESTION WITH PANDAS

Col u mn Orientation More space - e � cient than record - oriented JSON { "age_adjusted_death_rate": { "0": "7.6", "1": "8.1", "2": "7.1", "3": ".", "4": ".", "5": "7.3", "6": "13", "7": "20.6", "8": "17.4", "9": ".", "10": ".", "11": "19.8", ... STREAMLINED DATA INGESTION WITH PANDAS

Specif y ing Orientation Split oriented data - nyc_death_causes.json { "columns": [ "age_adjusted_death_rate", "death_rate", "deaths", "leading_cause", "race_ethnicity", "sex", "year" ], "index": [...], "data": [ [ "7.6", STREAMLINED DATA INGESTION WITH PANDAS

Specif y ing Orientation import pandas as pd death_causes = pd.read_json("nyc_death_causes.json", orient="split") print(death_causes.head()) age_adjusted_death_rate death_rate deaths leading_cause race_ethnicity sex year 0 7.6 6.2 32 Accidents Except Drug... Asian and Pacific Islander F 2007 1 8.1 8.3 87 Accidents Except Drug... Black Non-Hispanic F 2007 2 7.1 6.1 71 Accidents Except Drug... Hispanic F 2007 3 . . . Accidents Except Drug... Not Stated/Unknown F 2007 4 . . . Accidents Except Drug... Other Race/ Ethnicity F 2007 [5 rows x 7 columns] STREAMLINED DATA INGESTION WITH PANDAS

Let ' s practice ! STR E AML IN E D DATA IN G E STION W ITH PAN DAS

Introd u ction to APIs STR E AML IN E D DATA IN G E STION W ITH PAN DAS Aman y Mahfo uz Instr u ctor

Application Programming Interfaces De � nes ho w a application comm u nicates w ith other programs Wa y to get data from an application w itho u t kno w ing database details STREAMLINED DATA INGESTION WITH PANDAS

Req u ests Send and get data from w ebsites Not tied to a partic u lar API requests.get() to get data from a URL STREAMLINED DATA INGESTION WITH PANDAS

req u ests . get () requests.get(url_string) to get data from a URL Ke yw ord arg u ments params ke yw ord : takes a dictionar y of parameters and v al u es to c u stomi z e API req u est headers ke yw ord : takes a dictionar y, can be u sed to pro v ide u ser a u thentication to API Res u lt : a response object , containing data and metadata response.json() w ill ret u rn j u st the JSON data STREAMLINED DATA INGESTION WITH PANDAS

response . json () and pandas response.json() ret u rns a dictionar y read_json() e x pects strings , not dictionaries Load the response JSON to a data frame w ith pd.DataFrame() read_json() w ill gi v e an error ! STREAMLINED DATA INGESTION WITH PANDAS

Yelp B u siness Search API STREAMLINED DATA INGESTION WITH PANDAS

Making Req u ests import requests import pandas as pd api_url = "https://api.yelp.com/v3/businesses/search" # Set up parameter dictionary according to documentation params = {"term": "bookstore", "location": "San Francisco"} # Set up header dictionary w/ API key according to documentation headers = {"Authorization": "Bearer {}".format(api_key)} # Call the API response = requests.get(api_url, params=params, headers=headers) STREAMLINED DATA INGESTION WITH PANDAS

Parsing Responses # Isolate the JSON data from the response object data = response.json() print(data) {'businesses': [{'id': '_rbF2ooLcMRA7Kh8neIr4g', 'alias': 'city-lights-bookstore-san-francisco', 'name': 'City Lights # Load businesses data to a data frame bookstores = pd.DataFrame(data["businesses"]) print(bookstores.head(2)) alias ... url 0 city-lights-bookstore-san-francisco ... https://www.yelp.com/biz/city-lights-bookstore... 1 alexander-book-company-san-francisco ... https://www.yelp.com/biz/alexander-book-compan... [2 rows x 16 columns] STREAMLINED DATA INGESTION WITH PANDAS

Let ' s practice ! STR E AML IN E D DATA IN G E STION W ITH PAN DAS

Working w ith nested JSONs STR E AML IN E D DATA IN G E STION W ITH PAN DAS Aman y Mahfo uz Instr u ctor

Nested JSONs JSONs contain objects w ith a � rib u te -v al u e pairs A JSON is nested w hen the v al u e itself is an object STREAMLINED DATA INGESTION WITH PANDAS

STREAMLINED DATA INGESTION WITH PANDAS

# Print columns containing nested data print(bookstores[["categories", "coordinates", "location"]].head(3)) categories \ 0 [{'alias': 'bookstores', 'title': 'Bookstores'}] 1 [{'alias': 'bookstores', 'title': 'Bookstores'... 2 [{'alias': 'bookstores', 'title': 'Bookstores'}] coordinates \ 0 {'latitude': 37.7975997924805, 'longitude': -1... 1 {'latitude': 37.7885846793652, 'longitude': -1... 2 {'latitude': 37.7589836120605, 'longitude': -1... location 0 {'address1': '261 Columbus Ave', 'address2': '... 1 {'address1': '50 2nd St', 'address2': '', 'add... 2 {'address1': '866 Valencia St', 'address2': ''... STREAMLINED DATA INGESTION WITH PANDAS

pandas . io . json pandas.io.json s u bmod u le has tools for reading and w riting JSON Needs its o w n import statement json_normalize() Takes a dictionar y/ list of dictionaries ( like pd.DataFrame() does ) Ret u rns a � a � ened data frame Defa u lt � a � ened col u mn name pa � ern : attribute.nestedattribute Choose a di � erent separator w ith the sep arg u ment STREAMLINED DATA INGESTION WITH PANDAS

Loading Nested JSON Data import pandas as pd import requests from pandas.io.json import json_normalize # Set up headers, parameters, and API endpoint api_url = "https://api.yelp.com/v3/businesses/search" headers = {"Authorization": "Bearer {}".format(api_key)} params = {"term": "bookstore", "location": "San Francisco"} # Make the API call and extract the JSON data response = requests.get(api_url, headers=headers, params=params) data = response.json() STREAMLINED DATA INGESTION WITH PANDAS

# Flatten data and load to data frame, with _ separators bookstores = json_normalize(data["businesses"], sep="_") print(list(bookstores)) ['alias', 'categories', 'coordinates_latitude', 'coordinates_longitude', ... 'location_address1', 'location_address2', 'location_address3', 'location_city', 'location_country', 'location_display_address', 'location_state', 'location_zip_code', ... 'url'] STREAMLINED DATA INGESTION WITH PANDAS

Introd u ction to JSON STR E AML IN E D DATA IN G E STION W ITH - PowerPoint PPT Presentation

Introd u ction to JSON STR E AML IN E D DATA IN G E STION W ITH PAN DAS Aman y Mahfo uz Instr u ctor Ja v ascript Object Notation ( JSON ) Common w eb data format Not tab u lar Records don ' t ha v e to all ha v e the same set of a rib u

Introduction to JSON Psychometric Conference 2016 (JavaScript Object Ou Zhang Notation)

1 Web App Development 2 3 JavaScript: JSON JSON: J ava S cript O bject N otation. JSON is a

Lecture 20: JSON JSON JSON stands for JavaScript Object Notation. It is a data format and it has

JSON (JavaScript Object Notation) JSON (JavaScript Object Notation) A lightweight

OData JSON Extensions Ralf Handl, SAP Susan Malaika, IBM Michael Pizzo, Microsoft 2012-07-27,

JL JSON Manipulation Language Json Objects and JLs Motivation [ { name: "John",

A JSON Data Processing Language Audrey Copeland, Walter Meyer, Taimur Samee, Rizwan Syed

Jsonpath in examples and roadmap Nikita Glukhov, Oleg Bartunov Postgres Professional SQL/JSON

A RESTful JSON-LD Architecture A RESTful JSON-LD Architecture for Unraveling Hidden References

JSON-LD Update State of JSON-LD in 2017 Gregg Kellogg gregg@greggkellogg.net @gkellogg

Session 14 Serialization/JSON 1 Lecture Objectives Understand the need for serialization

ArangoDB Siegen, 31 August 2017 Max Neunhffer www.arangodb.com Documents (JSON) In this

Session 9 Serialization/JSON 1 Lecture Objectives Understand the need for serialization

INTROD TRODUCT CTION TO TO PRI RIOR ORITY TY-BASED ED B BUDGET ET BUDGETI TING F FOR

Introd u ction to a u dio data in P y thon SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Introd u ction to P y D u b SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u

jQuery Ubiquitous Open Source JavaScript library Use by linking in page (include and

Simple Javascript Example sum = 0; for (i = 1; i < 10; i++) { sum += i*i; } CS 142 Lecture

What is JQuery? 4 Not actually a language in and of

TIPS REMEMBER JAVASCRIPT IS VERY, VERY CASE SENSITIVE RESERVED WORDS List by category

Parsing JSON, Using Libraries, Java Collections, Generics Slides adapted from Craig Zilles 1

JSON Representation of DICOM Structured Reports DICOM WG 23 David Clunie Trial Use Phase

NIST CRYPTOGRAPHIC CONFORMANCE TESTING UPDATE NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY

Storage Formats Storage Formats 1 1 Overview We covered storage of unstructured files in HDFS

Introd u ction to JSON STR E AML IN E D DATA IN G E STION W ITH - PowerPoint PPT Presentation

Introd u ction to JSON STR E AML IN E D DATA IN G E STION W ITH PAN DAS Aman y Mahfo uz Instr u ctor Ja v ascript Object Notation ( JSON ) Common w eb data format Not tab u lar Records don ' t ha v e to all ha v e the same set of a rib u

Introduction to JSON Psychometric Conference 2016 (JavaScript Object Ou Zhang Notation)

1 Web App Development 2 3 JavaScript: JSON JSON: J ava S cript O bject N otation. JSON is a

Lecture 20: JSON JSON JSON stands for JavaScript Object Notation. It is a data format and it has

JSON (JavaScript Object Notation) JSON (JavaScript Object Notation) A lightweight

OData JSON Extensions Ralf Handl, SAP Susan Malaika, IBM Michael Pizzo, Microsoft 2012-07-27,

JL JSON Manipulation Language Json Objects and JLs Motivation [ { name: &quot;John&quot;,

A JSON Data Processing Language Audrey Copeland, Walter Meyer, Taimur Samee, Rizwan Syed

Jsonpath in examples and roadmap Nikita Glukhov, Oleg Bartunov Postgres Professional SQL/JSON

A RESTful JSON-LD Architecture A RESTful JSON-LD Architecture for Unraveling Hidden References

JSON-LD Update State of JSON-LD in 2017 Gregg Kellogg gregg@greggkellogg.net @gkellogg

Session 14 Serialization/JSON 1 Lecture Objectives Understand the need for serialization

ArangoDB Siegen, 31 August 2017 Max Neunhffer www.arangodb.com Documents (JSON) In this

Session 9 Serialization/JSON 1 Lecture Objectives Understand the need for serialization

INTROD TRODUCT CTION TO TO PRI RIOR ORITY TY-BASED ED B BUDGET ET BUDGETI TING F FOR

Introd u ction to a u dio data in P y thon SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Introd u ction to P y D u b SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u

jQuery Ubiquitous Open Source JavaScript library Use by linking in page (include and

Simple Javascript Example sum = 0; for (i = 1; i &lt; 10; i++) { sum += i*i; } CS 142 Lecture

What is JQuery? 4 Not actually a language in and of

TIPS REMEMBER JAVASCRIPT IS VERY, VERY CASE SENSITIVE RESERVED WORDS List by category

Parsing JSON, Using Libraries, Java Collections, Generics Slides adapted from Craig Zilles 1

JSON Representation of DICOM Structured Reports DICOM WG 23 David Clunie Trial Use Phase

NIST CRYPTOGRAPHIC CONFORMANCE TESTING UPDATE NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY

Storage Formats Storage Formats 1 1 Overview We covered storage of unstructured files in HDFS

JL JSON Manipulation Language Json Objects and JLs Motivation [ { name: "John",

Simple Javascript Example sum = 0; for (i = 1; i < 10; i++) { sum += i*i; } CS 142 Lecture