selecting data in pandas
play

Selecting data in pandas P YTH ON FOR R U SE R S Daniel Chen - PowerPoint PPT Presentation

Selecting data in pandas P YTH ON FOR R U SE R S Daniel Chen Instr u ctor Man u all y create DataFrame df = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}, index = ['x', 'y', 'z']) print(df) A B C x 1 4


  1. Selecting data in pandas P YTH ON FOR R U SE R S Daniel Chen Instr u ctor

  2. Man u all y create DataFrame df = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}, index = ['x', 'y', 'z']) print(df) A B C x 1 4 7 y 2 5 8 z 3 6 9 PYTHON FOR R USERS

  3. df = pd.DataFrame({ df.A 'A': [1, 2, 3], 'B': [4, 5, 6], x 1 'C': [7, 8, 9]}, y 2 index = ['x', 'y', 'z']) z 3 df Name: A, dtype: int64 A B C df[['A', 'B']] x 1 4 7 y 2 5 8 z 3 6 9 A B x 1 4 y 2 5 df['A'] z 3 6 x 1 y 2 z 3 Name: A, dtype: int64 PYTHON FOR R USERS

  4. S u bsetting ro w s Ro w- label ( loc ) v s ro w- inde x ( iloc ) P y thon starts co u nting from 0 PYTHON FOR R USERS

  5. S u bsetting ro w s . iloc df df.iloc[0, :] A B C A 1 x 1 4 7 B 4 y 2 5 8 C 7 z 3 6 9 Name: x, dtype: int64 df.iloc[0] df.iloc[[0, 1], :] A 1 A B C B 4 x 1 4 7 C 7 y 2 5 8 Name: x, dtype: int64 PYTHON FOR R USERS

  6. S u bsetting ro w s . loc df df.loc['x'] A B C A 1 x 1 4 7 B 4 y 2 5 8 C 7 z 3 6 9 Name: x, dtype: int64 df.loc[['x', 'y']] A B C x 1 4 7 y 2 5 8 PYTHON FOR R USERS

  7. df A B C x 1 4 7 y 2 5 8 z 3 6 9 df.loc['x', 'A'] 1 df.loc[['x', 'y'], ['A', 'B']] A B x 1 4 y 2 5 PYTHON FOR R USERS

  8. Conditional s u bsetting df[df.A == 3] A B C z 3 6 9 df[(df.A == 3) | (df.B == 4)] A B C x 1 4 7 z 3 6 9 PYTHON FOR R USERS

  9. Attrib u tes df.shape (3, 2) df.shape() ------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-17-0e566b70f572> in <module>() <hr />-> 1 df.shape() TypeError: 'tuple' object is not callable PYTHON FOR R USERS

  10. Let ' s practice ! P YTH ON FOR R U SE R S

  11. Data t y pes P YTH ON FOR R U SE R S Daniel Chen Instr u ctor

  12. R P y thon df <- data.frame( import pandas as pd 'A' = c(1, 2, 3), df = pd.DataFrame( 'B' = c(4, 5, 6) {'A': [1, 2, 3], ) 'B':[4, 5, 6]}) df df A B 1 1 4 A Bd 2 2 5 0 1 4 3 3 6 1 2 5 2 3 6 class(df) type(df) "data.frame" pandas.core.frame.DataFrame PYTHON FOR R USERS

  13. R str(df) 'data.frame': 3 obs. of 2 variables: $ A: num 1 2 3 $ B: num 4 5 6 P y thon df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): A 3 non-null int64 B 3 non-null int64 dtypes: int64(2) memory usage: 128.0 bytes PYTHON FOR R USERS

  14. R df$A <- as.character(df$A) str(df) 'data.frame': 3 obs. of 2 variables: $ A: chr "1" "2" "3" $ B: num 4 5 6 P y thon df['A'] = df['A'].astype(str) df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): A 3 non-null object B 3 non-null int64 dtypes: int64(1), object(1) memory usage: 128.0+ bytes PYTHON FOR R USERS

  15. String objects df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): A 3 non-null object B 3 non-null int64 dtypes: int64(1), object(1) memory usage: 128.0+ bytes When y o u see " object " it is a string Access b u ilt - in string methods w ith str accessor PYTHON FOR R USERS

  16. String accessor df = pd.DataFrame({'name': ['Daniel ',' Eric', ' Julia ']}) df name 0 Daniel 1 Eric 2 Julia df['name_strip'] = df['name'].str.strip() df name name_strip 0 Daniel Daniel 1 Eric Eric 2 Julia Julia PYTHON FOR R USERS

  17. Categor y df = pd.DataFrame({'name': ['Daniel','Eric', 'Julia'], ...: 'gender':['Male', 'Male', 'Female']}) df.dtypes Out[3]: gender object name object dtype: object df['gender_cat'] = df['gender'].astype('category') df.dtypes gender object name object gender_cat category dtype: object PYTHON FOR R USERS

  18. Categor y accessor df['gender_cat'].cat.categories Index(['Female', 'Male'], dtype='object') df.gender_cat.cat.codes 0 1 1 1 2 0 dtype: int8 PYTHON FOR R USERS

  19. Datetime df = pd.DataFrame({'name': ['Rosaline Franklin', 'William Gosset'], 'born': ['1920-07-25', '1876-06-13']}) df['born_dt'] = pd.to_datetime(df['born']) df born name born_dt 0 1920-07-25 Rosaline Franklin 1920-07-25 1 1876-06-13 William Gosset 1876-06-13 df.dtypes born object name object born_dt datetime64[ns] dtype: object PYTHON FOR R USERS

  20. Datetime accessor df['born_dt'].dt.day 0 25 1 13 Name: born_dt, dtype: int64 df['born_dt'].dt.month 0 7 1 6 Name: born_dt, dtype: int64 df['born_dt'].dt.year 0 1920 1 1876 Name: born_dt, dtype: int64 PYTHON FOR R USERS

  21. Let ' s practice ! P YTH ON FOR R U SE R S

  22. More Pandas P YTH ON FOR R U SE R S Daniel Chen Instr u ctor

  23. Missing data NaN missing v al u es from from n u mp y np.NaN , np.NAN , np.nan are all the same as the NA R v al u e check missing w ith pd.isnull Check non - missing w ith pd.notnull pd.isnull is an alias for pd.isna PYTHON FOR R USERS

  24. Working w ith missing data df name treatment_a treatment_b 0 John Smith NaN 2 1 Jane Doe 16.0 11 2 Mary Johnson 3.0 1 a_mean = df['treatment_a'].mean() a_mean 9.5 PYTHON FOR R USERS

  25. Fillna df['a_fill'] = df['treatment_a'].fillna(a_mean) df name treatment_a treatment_b a_fill 0 John Smith NaN 2 9.5 1 Jane Doe 16.0 11 16.0 2 Mary Johnson 3.0 1 3.0 PYTHON FOR R USERS

  26. More Pandas Appl y ing c u stom f u nctions Gro u pb y operations Tid y ing data PYTHON FOR R USERS

  27. Appl y y o u r o w n f u nctions B u ilt - in f u nctions C u stom f u nctions apply method Pass in an a x is PYTHON FOR R USERS

  28. R P y thon df = data.frame('a' = c(1, 2, 3), import pandas as pd 'b' = c(4, 5, 6)) df = pd.DataFrame({'A': [1, 2, 3], apply(df, 2, mean) 'B':[4, 5, 6]}) df.apply(np.mean, axis=0) a b 2 5 A 2.0 B 5.0 dtype: float64 apply(df, 1, mean) df.apply(np.mean, axis=1) 2.5 3.5 4.5 0 2.5 1 3.5 2 4.5 dtype: float64 PYTHON FOR R USERS

  29. Tid y Reshaping and tid y ing o u r data Hadle y Wickham , Tid y Data Paper Each ro w is an obser v ation Each col u mn is a v ariable Each t y pe of obser v ational u nit forms a table Tid y Data Paper : h � p ://v ita . had . co . n z/ papers / tid y- data . pdf PYTHON FOR R USERS

  30. Tid y melt df name treatment_a treatment_b 0 John Smith NaN 2 1 Jane Doe 16.0 11 2 Mary Johnson 3.0 1 df_melt = pd.melt(df, id_vars='name') df_melt name variable value 0 John Smith treatment_a NaN 1 Jane Doe treatment_a 16.0 2 Mary Johnson treatment_a 3.0 3 John Smith treatment_b 2.0 ... PYTHON FOR R USERS

  31. Tid y pi v ot _ table df_melt_pivot = pd.pivot_table(df_melt, index='name', columns='variable', values='value') df_melt_pivot variable treatment_a treatment_b name Jane Doe 16.0 11.0 John Smith NaN 2.0 Mary Johnson 3.0 1.0 PYTHON FOR R USERS

  32. Reset inde x df_melt_pivot.reset_index() variable name treatment_a treatment_b 0 Jane Doe 16.0 11.0 1 John Smith NaN 2.0 2 Mary Johnson 3.0 1.0 PYTHON FOR R USERS

  33. Gro u pb y groupby : split - appl y- combine split data into separate partitions appl y a f u nction on each partition combine the res u lts PYTHON FOR R USERS

  34. Performing a gro u pb y name variable value 0 John Smith treatment_a NaN 1 Jane Doe treatment_a 16.0 2 Mary Johnson treatment_a 3.0 3 John Smith treatment_b 2.0 4 Jane Doe treatment_b 11.0 5 Mary Johnson treatment_b 1.0 df_melt.groupby('name')['value'].mean() name Jane Doe 13.5 John Smith 2.0 Mary Johnson 2.0 Name: value, dtype: float64 PYTHON FOR R USERS

  35. Let ' s practice ! P YTH ON FOR R U SE R S

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend