a c r a w l i n g a p p l i c a t i o n w i t h r wh a t
play

A C r a w l i n g A p p l i c a t i o n w i t - PowerPoint PPT Presentation

A C r a w l i n g A p p l i c a t i o n w i t h R Wh a t a b o u t R e a l E s t a t e ? Wh a t a b o u t R e a l E s t a t e ? R e a l E s t a t e D e v e l o p me n


  1. A C r a w l i n g A p p l i c a t i o n w i t h R

  2. Wh a t a b o u t R e a l E s t a t e ?

  3. Wh a t a b o u t R e a l E s t a t e ? R e a l E s t a t e D e v e l o p me n t  K n o w t h e s u p p l y a n d d e ma n d  I n c r e a s i n g i n v e s t me n t v o l u me s  S i n g l e - p o i n t v s . d e v e l o p me n t - o v e r - t i me  R e q u i r e d d a t a : - S u p p l y o f r e a l e s t a t e - D e ma n d / a b s o r p t i o n - I n f l u e n c i n g f a c t o r s ( p r i c e , l o c a t i o n . . . ) B F S – L e e r w o h n u n g s z ä h l u n g ( L W Z ) 3

  4. Wh a t a b o u t R e a l E s t a t e ? N e w R e a l E s t a t e P r o j e c t s  C u r r e n t l y 2 1 1 1 l i s t i n g s ( S e p t e mb e r 2 0 1 9 )  F r o m s i n g l e h o u s i n g s t o b i g p r o j e c t s  E x i s t i n g w o r k f l o w a t W ü e s t P a r t n e r 4

  5. A C r a w l i n g A p p w i t h R

  6. R e q u i r e me n t M e a s u r e t h e A b s o r p t i o n R a t e o f R e a l E s t a t e

  7. R e q u i r e me n t E x a mp l e : A b s o r p t i o n R a t e o f F l a t s 9

  8. A p p D e mo

  9. A r c h i t e c t u r e O v e r v i e w

  10. A r c h i t e c t u r e D B D B I I 12

  11. A r c h i t e c t u r e D e t a i l s

  12. F r o n t e n d S h i n y D a s h b o a r d  S h i n y P r o x y : d e p l o y me n t a n d u s e r ma n a g e me n t  . b o x { S h i n y D a s h b o a r d w i t h c u s t o m C S S b o r d e r - r a d i u s : 0 p x ; b a c k g r o u n d : # E F E C E A ;  F r o m c u s t o m S h i n y - P r o x y - V e r s e I ma g e b o x - s h a d o w : n o n e ; b o r d e r - t o p : n o n e ;  D a t a T a b l e , L e a f l e t } 14

  13. F r o n t e n d S h i n y D a s h b o a r d  E n t e r n e w p r o j e c t s  R e g i s t e r U R L s w i t h d a t a  T e s t a u t o ma t i c e x t r a c t i o n a n d p a r s i n g  S u r v e y p r o j e c t s s t a t u s e s  A n a l y z e d a t a 15

  14. B a c k e n d P o s t g r e s D a t a b a s e d b < - d b P o o l ( o d b c ( ) , D r i v e r = " P o s t g r e S Q L U n i c o d e " , D a t a b a s e = S y s . g e t e n v ( " P O S T G R E S _ D B " ) ,  T a b l e s : p r o j e c t s , u r l s , d w e l l i n g s U s e r N a me = S y s . g e t e n v ( " P O S T G R E S _ U S E R " ) , P a s s w o r d = S y s . g e t e n v ( " P O S T G R E S _ P A S S WO R D " ) ,  C o n n e c t i o n f r o m R w i t h D B I S e r v e r n a me = S y s . g e t e n v ( " P O S T G R E S _ S E R V E R " ) , P o r t = S y s . g e t e n v ( " P O S T G R E S _ P O R T " )  ) L o a d / a p p e n d d a t a f r a me s u r l s _ t o _ c r a w l < - d b R e a d T a b l e ( d b , " u r l s " ) % > % f i l t e r ( u r l _ p a r s e _ s t a t u s = = 1 ) d b Wr i t e T a b l e ( d b , " d w e l l i n g s " , p a r s e d _ d a t a , a p p e n d = T R U E ) 16 16

  15. B a c k e n d S p l a s h

  16. B a c k e n d S p l a s h l i b r a r y ( s p l a s h r )  H e a d l e s s b r o w s e r my _ s p l a s h < - s p l a s h ( h o s t = " s p l a s h " , p o r t = 8 0 5 0 )  R e n d e r s U R L ( d a t a o f t e n l o a d e d v i a J S ) h t ml < - r e n d e r _ h t ml ( s p l a s h _ o b j = my _ s p l a s h ,  O u t - o f - t h e - b o x s p l a s h i ma g e u r l = i n p u t $ u r l )  A c c e s s e d f r o m R w i t h s p l a s h r s c r e e n s h o t < - r e n d e r _ p n g ( s p l a s h _ o b j = my _ s p l a s h ,  R e t u r n s f u l l H T M L / S c r e e n s h o t u r l = i n p u t $ u r l , ) 18

  17. B a c k e n d A u t o E x t r a c t i o n / P a r s i n g L o g i c  P a r s e H T M L w i t h r v e s t  E x t r a c t t a b l e s t o d a t a f r a me s  h t ml _ t a b l e ( )  h t ml _ n o d e s ( ) % > % h t ml _ t e x t ( )  D i f f i c u l t t o a u t o ma t e !  P a r s e d a t a f r a me s w i t h d p l y r  S e a r c h C o l u mn s ( k e y w o r d s , n u mb e r s )  M u t a t e  V a l i d a t e 19

  18. B a c k e n d C r a w l i n g D a e mo n d b < - d b P o o l ( o d b c ( ) , D r i v e r = , D a t a b a s e = … , . . . ) my _ s p l a s h < - s p l a s h ( h o s t = " s p l a s h " , p o r t = 8 0 5 0 )  S e p a r a t e R c o n t a i n e r u r l s _ t o _ c r a w l < - d b R e a d T a b l e ( d b , " u r l s " ) % > % f i l t e r ( u r l _ p a r s e _ s t a t u s = = 1 )  S c h e d u l e d s c r i p t w i t h c r o n j o b h t ml < - r e n d e r _ h t ml ( s p l a s h _ o b j = my _ s p l a s h ,  G e t U R L s f r o m d b , g e t H T M L , p a r s e d a t a u r l = u r l s _ t o _ c r a w l )  O n e r e q u e s t p e r d a y → s ma l l s e r v e r l o a d p a r s e d _ d a t a < - r e t u r n _ t a b l e ( h t ml , u r l s _ t o _ c r a w l ) d b Wr i t e T a b l e ( d b , " d w e l l i n g s " , p a r s e d _ d a t a , a p p e n d = T R U E ) 20

  19. A r c h i t e c t u r e D B D B I I 21

  20. C u r r e n t S t a t e

  21. C u r r e n t S t a t e C u r r e n t S t a t e  I n c l u d e d i n Wü e s t P a r t n e r w e e k l y w o r k f l o w  S i g n i f i c a n t p o r t i o n o f l a r g e p r o j e c t s t r a c k e d  D a i l y d a t a t r a c k i n g N e x t s t e p s :  M o d e l a b s o r p t i o n r a t e s  P r o d u c t i z i n g a n a l y s i s d a s h b o a r d  A u t o ma t i c U R L f i n d e r 23

  22. C h a l l e n g e s H e l p me : - )

  23. C h a l l e n g e s A u t o E x t r a c t D a t a F r a me s f r o m H T M L < t a b l e / > v s . < d i v / > T a b l e g e o me t r y 25

  24. W e b C r a w l i n g i n G e n e r a l

  25. O t h e r c r a w l i n g p r o j e c t s L a r g e C r a w l i n g P r o j e c t s  P e r f o r ma n c e , a u t o ma t i o n  E a s y f o r m s u b mi s s i o n  E x t e n d e d s e s s i o n ma n a g e me n t c a p a b i l i t i e s  Q u i c k d e p l o y me n t , t e s t i n g M o r e s o f t w a r e d e v e l o p me n t L e s s d a t a a n a l y s i s 27

  26. O t h e r c r a w l i n g p r o j e c t s T h i n g s t o c o n s i d e r  O n l y c o l l e c t a s mu c h a s y o u r e a l l y n e e d  D o n o t o v e r l o a d s i t e s  R e s p e c t d a t a p r i v a c y  C o n s i d e r s o c i a l i mp l i c a t i o n s h t t p s : / / w w w . s h i e l d s q u a r e . c o m/ g o o d - b o t s - a n d - b a d - b o t s / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend