cs6
play

CS6 Practical System Skills Fall 2019 edition Leonhard - PowerPoint PPT Presentation

CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu 2 / 34 3 / 34 docker-compose.yml export DATA_DIR=/db-data && docker-compose up -d


  1. CS6 Practical System Skills Fall 2019 edition Leonhard Spiegelberg lspiegel@cs.brown.edu

  2. ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ 2 / 34

  3. → → → → → 3 / 34

  4. docker-compose.yml export DATA_DIR=/db-data && docker-compose up -d version: '3' services: login: image: "login:latest" ports: - "80:5000" env_file: .env links: - postgres:dbserver restart: always postgres: image: "postgres:12.1" env_file: .env-postgres restart: always volumes: - ${DATA_DIR}:/var/lib/postgresql/data 4 / 34

  5. 22 CS6 Practical System Skills Fall 2019 Leonhard Spiegelberg lspiegel@cs.brown.edu

  6. ⇒ ⇒ → → 6 / 34

  7. ⇒ 7 / 34

  8. ⇒ → pip3 install pandas ⇒ ⇒ import numpy as np a = np.array([1, 2, 3, 4]) b = np.array([5, 4, 3, 2]) a + b 8 / 34

  9. ⇒ → 0, …, #numelements - 1 → ⇒ import pandas as pd colA = pd.Series([10, 20, 30], index=[1, 3, 4]) colB = pd.Series([9, -3, -2.41], index=[0, 1, 2]) df = pd.DataFrame({'columnA' : colA, 'columnB' : colB}) 9 / 34

  10. ⇒ ⇒ records = [{'A' : 20, 'B' : 'Tux', 'C' : 3.141}, {'A' : None, 'B' : 'Sealion'}, {'A' : 10, 'B': 'Crabby', 'C' : 6.0}] pd.DataFrame(records) records = [(20, 'Tux', 3.141), (None, 'Sealion', np.NaN), (10, 'Crabby', 6.0)] pd.DataFrame(records, columns=['A', 'B', 'C']) 11 / 34

  11. ⇒ a,b,c pd.read_csv('sample.csv') 1,2,3 4,5,6 7,8,9 pd.read_json('sample.json', {"a":1,"b":2,"c":3} orient='records', {"a":4,"b":5,"c":6} {"a":7,"b":8,"c":9} lines=True) pd.read_excel('sample.xlsx', 'Sheet1') 12 / 34

  12. ⇒ import sqlalchemy dburi = 'postgresql://postgres:docker@localhost/postgres' db = sqlalchemy.create_engine(dburi) pd.read_sql('SELECT * FROM sample', db) docker run -p 5432:5432 -e POSTGRES_PASSWORD=docker -v \ $PWD/data:/var/lib/postgresql/data --rm postgres 13 / 34

  13. df.to_dict(orient='records') # list of dicts list(df.to_records(index=None)) # list of tuples df.to_csv('sample.csv', index=None) df.to_json('sample.json', orient='records', lines=True) df.to_excel('sample.xlsx', index=None) df.to_sql('sample', db, index=None, if_exists='replace') 14 / 34

  14. ⇒ ⇒ df[label] → df[df.columns[i]] ⇒ df[[label1, label2, …, labelN]] df[['c', 'a']] 16 / 34

  15. ⇒ # manipulating colummns via element-wise Numpy operations df['a^2 - b^2'] = df['a'] * df['a'] + df['b'] * df['b'] # apply over a single column df['fmt'] = df['a'].apply(lambda x: '{:04d}'.format(x)) # apply using multiple columns df['a+b'] = df[['a', 'b']].apply(lambda row: row['a'] + row['b'], axis=1) 17 / 34

  16. ⇒ .head() .tail() ⇒ .loc .iloc → df[<idx>] df['a'] > 10 ⇒ df.iloc[<row sel>, <col sel>] df.loc[<row sel>, <col sel>] 18 / 34

  17. df[df['age'] < 10] df.iloc[[0, 1], 3:] df['Aaron':, ['food', 'state']] 19 / 34

  18. ⇒ → df[[True, False, …]] ⇒ df['column'] > 10 ⇒ & | → & I df[(df['age'] < 10) | df[(df['age'] < 20) & (df['age'] > 65)] (df['height'] > 100)] 20 / 34

  19. ⇒ → df.index.rename('...', inplace=True) df.index (1) df.index = (2) df.set_index('name') df.index.rename('name') df.reset_index() (1) (2) 22 / 34

  20. ⇒ .groupby ⇒ .count/.describe/.mean/.std/.agg(...) ⇒ df.count() df.groupby(['A', 'B']) \ df.groupby('A') \ .count() .count() 23 / 34

  21. ⇒ df.join(df_carrier.set_index('Code'), on='OP_UNIQUE_CARRIER') 25 / 34

  22. ⇒ → df_results.head().to_latex(index=False) \begin{tabular}{lr} \toprule CarrierName & DEP\_DELAY \\ \midrule JetBlue Airways & 20.429078 \\ Frontier Airlines Inc. & 15.982878 \\ SkyWest Airlines Inc. & 15.123184 \\ PSA Airlines Inc. & 13.794702 \\ ExpressJet Airlines LLC & 13.642608 \\ \bottomrule \end{tabular} 27 / 34

  23. ⇒ → ⇒ @app.route('/') def index(): # load dataset, perform analytics df_results = ... return df_results.head().to_html(index=None) 28 / 34

  24. ⇒ ⇒ ⇒ 30 / 34

  25. ⇒ → → import matplotlib.pyplot as plt plt.figure(figsize=(5, 5)) x = [1, 2, 3, 4, 5, 6] y = [3.4, 2.0, -1, 0.5, .3, .2] plt.grid() plt.scatter(x, y, s=60) plt.plot(x, y, lw=2) plt.xlabel('x') plt.ylabel('y') plt.title('scatter plot example') plt.tight_layout() plt.savefig('img.png', transparent=True) 31 / 34

  26. df.head() ⇒ ⇒ df.head().set_index('CarrierName') \ import seaborn as sns sns.barplot(x='CarrierName', y='DEP_DELAY', .plot.bar() data=df.head(), palette=sns.color_palette('Blues')) sns.despine() 32 / 34

  27. ⇒ → ⇒ flask-cache ⇒ 33 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend