CS6 Practical System Skills Fall 2019 edition Leonhard - PowerPoint PPT Presentation

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ 2 / 34

→ → → → → 3 / 34

docker-compose.yml export DATA_DIR=/db-data && docker-compose up -d version: '3' services: login: image: "login:latest" ports: - "80:5000" env_file: .env links: - postgres:dbserver restart: always postgres: image: "postgres:12.1" env_file: .env-postgres restart: always volumes: - ${DATA_DIR}:/var/lib/postgresql/data 4 / 34

22 CS6 Practical System Skills Fall 2019 Leonhard Spiegelberg lspiegel@cs.brown.edu

⇒ ⇒ → → 6 / 34

⇒ 7 / 34

⇒ → pip3 install pandas ⇒ ⇒ import numpy as np a = np.array([1, 2, 3, 4]) b = np.array([5, 4, 3, 2]) a + b 8 / 34

⇒ → 0, …, #numelements - 1 → ⇒ import pandas as pd colA = pd.Series([10, 20, 30], index=[1, 3, 4]) colB = pd.Series([9, -3, -2.41], index=[0, 1, 2]) df = pd.DataFrame({'columnA' : colA, 'columnB' : colB}) 9 / 34

⇒ ⇒ records = [{'A' : 20, 'B' : 'Tux', 'C' : 3.141}, {'A' : None, 'B' : 'Sealion'}, {'A' : 10, 'B': 'Crabby', 'C' : 6.0}] pd.DataFrame(records) records = [(20, 'Tux', 3.141), (None, 'Sealion', np.NaN), (10, 'Crabby', 6.0)] pd.DataFrame(records, columns=['A', 'B', 'C']) 11 / 34

⇒ a,b,c pd.read_csv('sample.csv') 1,2,3 4,5,6 7,8,9 pd.read_json('sample.json', {"a":1,"b":2,"c":3} orient='records', {"a":4,"b":5,"c":6} {"a":7,"b":8,"c":9} lines=True) pd.read_excel('sample.xlsx', 'Sheet1') 12 / 34

⇒ import sqlalchemy dburi = 'postgresql://postgres:docker@localhost/postgres' db = sqlalchemy.create_engine(dburi) pd.read_sql('SELECT * FROM sample', db) docker run -p 5432:5432 -e POSTGRES_PASSWORD=docker -v \ $PWD/data:/var/lib/postgresql/data --rm postgres 13 / 34

df.to_dict(orient='records') # list of dicts list(df.to_records(index=None)) # list of tuples df.to_csv('sample.csv', index=None) df.to_json('sample.json', orient='records', lines=True) df.to_excel('sample.xlsx', index=None) df.to_sql('sample', db, index=None, if_exists='replace') 14 / 34

⇒ ⇒ df[label] → df[df.columns[i]] ⇒ df[[label1, label2, …, labelN]] df[['c', 'a']] 16 / 34

⇒ # manipulating colummns via element-wise Numpy operations df['a^2 - b^2'] = df['a'] * df['a'] + df['b'] * df['b'] # apply over a single column df['fmt'] = df['a'].apply(lambda x: '{:04d}'.format(x)) # apply using multiple columns df['a+b'] = df[['a', 'b']].apply(lambda row: row['a'] + row['b'], axis=1) 17 / 34

⇒ .head() .tail() ⇒ .loc .iloc → df[<idx>] df['a'] > 10 ⇒ df.iloc[<row sel>, <col sel>] df.loc[<row sel>, <col sel>] 18 / 34

df[df['age'] < 10] df.iloc[[0, 1], 3:] df['Aaron':, ['food', 'state']] 19 / 34

⇒ → df[[True, False, …]] ⇒ df['column'] > 10 ⇒ & | → & I df[(df['age'] < 10) | df[(df['age'] < 20) & (df['age'] > 65)] (df['height'] > 100)] 20 / 34

⇒ → df.index.rename('...', inplace=True) df.index (1) df.index = (2) df.set_index('name') df.index.rename('name') df.reset_index() (1) (2) 22 / 34

⇒ .groupby ⇒ .count/.describe/.mean/.std/.agg(...) ⇒ df.count() df.groupby(['A', 'B']) \ df.groupby('A') \ .count() .count() 23 / 34

⇒ df.join(df_carrier.set_index('Code'), on='OP_UNIQUE_CARRIER') 25 / 34

⇒ → df_results.head().to_latex(index=False) \begin{tabular}{lr} \toprule CarrierName & DEP\_DELAY \\ \midrule JetBlue Airways & 20.429078 \\ Frontier Airlines Inc. & 15.982878 \\ SkyWest Airlines Inc. & 15.123184 \\ PSA Airlines Inc. & 13.794702 \\ ExpressJet Airlines LLC & 13.642608 \\ \bottomrule \end{tabular} 27 / 34

⇒ → ⇒ @app.route('/') def index(): # load dataset, perform analytics df_results = ... return df_results.head().to_html(index=None) 28 / 34

⇒ ⇒ ⇒ 30 / 34

⇒ → → import matplotlib.pyplot as plt plt.figure(figsize=(5, 5)) x = [1, 2, 3, 4, 5, 6] y = [3.4, 2.0, -1, 0.5, .3, .2] plt.grid() plt.scatter(x, y, s=60) plt.plot(x, y, lw=2) plt.xlabel('x') plt.ylabel('y') plt.title('scatter plot example') plt.tight_layout() plt.savefig('img.png', transparent=True) 31 / 34

df.head() ⇒ ⇒ df.head().set_index('CarrierName') \ import seaborn as sns sns.barplot(x='CarrierName', y='DEP_DELAY', .plot.bar() data=df.head(), palette=sns.color_palette('Blues')) sns.despine() 32 / 34

⇒ → ⇒ flask-cache ⇒ 33 / 34

CS6 Practical System Skills Fall 2019 edition Leonhard - PowerPoint PPT Presentation

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu 2 / 34 3 / 34 docker-compose.yml export DATA_DIR=/db-data && docker-compose up -d

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu 14

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu 18

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu 19

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

CS6 Practical System Skills Fall 9 edition Leonhard Spiegelerg

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

CS6 Practical System Skills Fall 9 edition Leonhard Spiegelerg

CS6 Practical System Skills Fall 9 edition Leonhard Spiegelerg

CS 683 - Security and Privacy Spring 2018 Instructor: Karim Eldefrawy University of San

Python - Data Analysis Essentials Day 2 Giuseppe Accaputo g@accaputo.ch 18.05.2019 Slide 1 IT

Digital Medicine I: Introduction to Programming Pandas Autumn 2019 December 19, 2019 So far.

Weld: A Common Runtime for Data Analytics Shoumik Palkar, James Thomas, Anil Shanbhag*, Deepak

Rela latio ional data pandas SQLite Two table les Table: city Table: country name

INTRODUCTION TO D ATA AN ALYSIS AND PLOTTING WITH PANDAS JSC TUTORIAL Andreas Herten,

The Panda Hunter Game Jie Gao Stony Brook University http://www.cs.sunysb.edu/ jgao IMA

Reminders: Code can be found on github.com/jackel119/python102 Slides on

March 3: Data, models, errors Questions for today How can we filter a pandas data frame?

GeoPandas Easy, fast and scalable geospatial analysis in Python Joris Van den Bossche, FOSDEM,

Troubleshooting what to do when things arent working JN Matthews Dont Worry! Everyone Has

Sambuz

Useful Links

Newsletter

Mail Us