CS6 Practical System Skills Fall 2019 edition Leonhard - - PowerPoint PPT Presentation

cs6
SMART_READER_LITE
LIVE PREVIEW

CS6 Practical System Skills Fall 2019 edition Leonhard - - PowerPoint PPT Presentation

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu 2 / 34 3 / 34 docker-compose.yml export DATA_DIR=/db-data && docker-compose up -d


slide-1
SLIDE 1

CS6

Practical System Skills

Fall 2019 edition

Leonhard Spiegelberg lspiegel@cs.brown.edu

slide-2
SLIDE 2

⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒

2 / 34

slide-3
SLIDE 3

→ → → → →

3 / 34

slide-4
SLIDE 4

4 / 34

version: '3' services: login: image: "login:latest" ports:

  • "80:5000"

env_file: .env links:

  • postgres:dbserver

restart: always postgres: image: "postgres:12.1" env_file: .env-postgres restart: always volumes:

  • ${DATA_DIR}:/var/lib/postgresql/data

docker-compose.yml export DATA_DIR=/db-data && docker-compose up -d

slide-5
SLIDE 5

22

CS6 Practical System Skills

Fall 2019

Leonhard Spiegelberg lspiegel@cs.brown.edu

slide-6
SLIDE 6

⇒ ⇒ → →

6 / 34

slide-7
SLIDE 7

7 / 34

slide-8
SLIDE 8

⇒ → pip3 install pandas ⇒ ⇒

8 / 34

import numpy as np a = np.array([1, 2, 3, 4]) b = np.array([5, 4, 3, 2]) a + b

slide-9
SLIDE 9

import pandas as pd colA = pd.Series([10, 20, 30], index=[1, 3, 4]) colB = pd.Series([9, -3, -2.41], index=[0, 1, 2]) df = pd.DataFrame({'columnA' : colA, 'columnB' : colB})

⇒ → 0, …, #numelements - 1 → ⇒

9 / 34

slide-10
SLIDE 10
slide-11
SLIDE 11

⇒ ⇒

11 / 34

records = [{'A' : 20, 'B' : 'Tux', 'C' : 3.141}, {'A' : None, 'B' : 'Sealion'}, {'A' : 10, 'B': 'Crabby', 'C' : 6.0}] pd.DataFrame(records) records = [(20, 'Tux', 3.141), (None, 'Sealion', np.NaN), (10, 'Crabby', 6.0)] pd.DataFrame(records, columns=['A', 'B', 'C'])

slide-12
SLIDE 12

12 / 34

a,b,c 1,2,3 4,5,6 7,8,9 pd.read_csv('sample.csv') {"a":1,"b":2,"c":3} {"a":4,"b":5,"c":6} {"a":7,"b":8,"c":9} pd.read_json('sample.json',

  • rient='records',

lines=True) pd.read_excel('sample.xlsx', 'Sheet1')

slide-13
SLIDE 13

13 / 34

import sqlalchemy dburi = 'postgresql://postgres:docker@localhost/postgres' db = sqlalchemy.create_engine(dburi) pd.read_sql('SELECT * FROM sample', db)

docker run -p 5432:5432 -e POSTGRES_PASSWORD=docker -v \ $PWD/data:/var/lib/postgresql/data --rm postgres

slide-14
SLIDE 14

14 / 34

df.to_dict(orient='records') # list of dicts list(df.to_records(index=None)) # list of tuples df.to_csv('sample.csv', index=None) df.to_json('sample.json',

  • rient='records',

lines=True) df.to_excel('sample.xlsx', index=None) df.to_sql('sample', db, index=None, if_exists='replace')

slide-15
SLIDE 15
slide-16
SLIDE 16

16 / 34

⇒ ⇒ df[label] → df[df.columns[i]] ⇒ df[[label1, label2, …, labelN]]

df[['c', 'a']]

slide-17
SLIDE 17

17 / 34

# manipulating colummns via element-wise Numpy operations df['a^2 - b^2'] = df['a'] * df['a'] + df['b'] * df['b'] # apply over a single column df['fmt'] = df['a'].apply(lambda x: '{:04d}'.format(x)) # apply using multiple columns df['a+b'] = df[['a', 'b']].apply(lambda row: row['a'] + row['b'], axis=1)

slide-18
SLIDE 18

⇒ .head() .tail() ⇒ .loc .iloc → df[<idx>] df['a'] > 10 ⇒ df.iloc[<row sel>, <col sel>] df.loc[<row sel>, <col sel>]

18 / 34

slide-19
SLIDE 19

19 / 34

df[df['age'] < 10] df.iloc[[0, 1], 3:] df['Aaron':, ['food', 'state']]

slide-20
SLIDE 20

df[(df['age'] < 20) & (df['height'] > 100)]

⇒ → df[[True, False, …]] ⇒ df['column'] > 10 ⇒ & | → & I

20 / 34

df[(df['age'] < 10) | (df['age'] > 65)]

slide-21
SLIDE 21
slide-22
SLIDE 22

22 / 34

⇒ → df.index.rename('...', inplace=True) df.index

(1) (2) (1) df.index = df.index.rename('name') df.reset_index() (2) df.set_index('name')

slide-23
SLIDE 23

⇒ .groupby ⇒ .count/.describe/.mean/.std/.agg(...) ⇒

df.count()

23 / 34 df.groupby('A') \ .count() df.groupby(['A', 'B']) \ .count()

slide-24
SLIDE 24
slide-25
SLIDE 25

25 / 34

df.join(df_carrier.set_index('Code'), on='OP_UNIQUE_CARRIER')

slide-26
SLIDE 26
slide-27
SLIDE 27

27 / 34

⇒ → df_results.head().to_latex(index=False)

\begin{tabular}{lr} \toprule CarrierName & DEP\_DELAY \\ \midrule JetBlue Airways & 20.429078 \\ Frontier Airlines Inc. & 15.982878 \\ SkyWest Airlines Inc. & 15.123184 \\ PSA Airlines Inc. & 13.794702 \\ ExpressJet Airlines LLC & 13.642608 \\ \bottomrule \end{tabular}

slide-28
SLIDE 28

⇒ → ⇒

28 / 34

@app.route('/') def index(): # load dataset, perform analytics df_results = ... return df_results.head().to_html(index=None)

slide-29
SLIDE 29
slide-30
SLIDE 30

⇒ ⇒ ⇒

30 / 34

slide-31
SLIDE 31

⇒ → →

31 / 34

import matplotlib.pyplot as plt plt.figure(figsize=(5, 5)) x = [1, 2, 3, 4, 5, 6] y = [3.4, 2.0, -1, 0.5, .3, .2] plt.grid() plt.scatter(x, y, s=60) plt.plot(x, y, lw=2) plt.xlabel('x') plt.ylabel('y') plt.title('scatter plot example') plt.tight_layout() plt.savefig('img.png', transparent=True)

slide-32
SLIDE 32

⇒ ⇒

32 / 34 import seaborn as sns sns.barplot(x='CarrierName', y='DEP_DELAY', data=df.head(), palette=sns.color_palette('Blues')) sns.despine() df.head() df.head().set_index('CarrierName') \ .plot.bar()

slide-33
SLIDE 33

⇒ → ⇒ flask-cache ⇒

33 / 34

slide-34
SLIDE 34