co u rse ratings
play

Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G - PowerPoint PPT Presentation

Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp Ratings at DataCamp INTRODUCTION TO DATA ENGINEERING Recommend u sing ratings Get rating data Clean and calc u late top -


  1. Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp

  2. Ratings at DataCamp INTRODUCTION TO DATA ENGINEERING

  3. Recommend u sing ratings Get rating data Clean and calc u late top - recommended co u rses Recalc u late dail y E x ample u sage : u ser ' s dashboard INTRODUCTION TO DATA ENGINEERING

  4. As an ETL process It ' s an ETL process ! INTRODUCTION TO DATA ENGINEERING

  5. The database Course Rating course_id user_id title course_id description rating programming_language INTRODUCTION TO DATA ENGINEERING

  6. The database relationship Course Rating course_id user_id title course_id description rating programming_language INTRODUCTION TO DATA ENGINEERING

  7. Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G

  8. From ratings to recommendations IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp

  9. The recommendations table u ser _ id co u rse _ id rating 1 1 4.8 1 74 4.78 1 21 4.5 2 32 4.9 The estimated rating of a co u rse the u ser hasn ' t taken y et . INTRODUCTION TO DATA ENGINEERING

  10. Recommendation techniq u es Matri x factori z ation B u ilding Recommendation Engines w ith P y Spark INTRODUCTION TO DATA ENGINEERING

  11. Common sense transformation Course Recommendations course_id title u ser _ id co u rse _ id rating description programming_language 1 1 4.8 1 74 4.78 1 21 4.5 Rating user_id 2 32 4.9 course_id rating INTRODUCTION TO DATA ENGINEERING

  12. A v erage co u rse ratings A v erage co u rse rating co u rse _ id a v g _ rating 1 4.8 74 4.78 21 4.5 32 4.9 We w ant to recommend highl y rated co u rses INTRODUCTION TO DATA ENGINEERING

  13. Use the right programming lang u age Rating u ser _ id co u rse _ id programming _ lang u age rating 1 1 r 4.8 1 74 sql 4.78 1 21 sql 4.5 1 32 p y thon 4.9 Recommend SQL co u rse for u ser w ith id 1 INTRODUCTION TO DATA ENGINEERING

  14. Recommend ne w co u rses Rating u ser _ id co u rse _ id programming _ lang u age rating 1 1 r 4.8 1 74 sql 4.78 1 21 sql 4.5 1 32 p y thon 4.9 Don ' t recommend the combinations alread y in the rating table INTRODUCTION TO DATA ENGINEERING

  15. O u r recommendation transformation Use technolog y that u ser has rated most Don ' t recommend co u rses that u ser alread y rated Recommend three highest rated co u rses from remaining combinations INTRODUCTION TO DATA ENGINEERING

  16. Rating u ser _ id co u rse _ id programming _ lang u age rating 1 12 sql 4.78 1 52 sql 4.5 1 32 r 4.9 Recommend three highest rated SQL co u rses w hich are not 12 and 52. INTRODUCTION TO DATA ENGINEERING

  17. Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G

  18. Sched u ling dail y jobs IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp

  19. What y o u'v e done so far E x tract u sing extract_course_data() and extract_rating_data() Clean u p u sing NA u sing transform_fill_programming_language() A v erage co u rse ratings per co u rse : transform_avg_rating() Get eligible u ser and co u rse id pairs : transform_courses_to_recommend() Calc u late the recommendations : transform_recommendations() INTRODUCTION TO DATA ENGINEERING

  20. Loading to Postgres Use the calc u lations in data prod u cts Update dail y E x ample u se case : sending o u t e - mails w ith recommendations INTRODUCTION TO DATA ENGINEERING

  21. The loading phase recommendations.to_sql( "recommendations", db_engine, if_exists="append", ) INTRODUCTION TO DATA ENGINEERING

  22. def etl(db_engines): # Extract the data courses = extract_course_data(db_engines) rating = extract_rating_data(db_engines) # Clean up courses data courses = transform_fill_programming_language(courses) # Get the average course ratings avg_course_rating = transform_avg_rating(rating) # Get eligible user and course id pairs courses_to_recommend = transform_courses_to_recommend( rating, courses, ) # Calculate the recommendations recommendations = transform_recommendations( avg_course_rating, courses_to_recommend, ) # Load the recommendations into the database load_to_dwh(recommendations, db_engine)) INTRODUCTION TO DATA ENGINEERING

  23. Creating the DAG from airflow.models import DAG from airflow.operators.python_operator import PythonOperator dag = DAG(dag_id="recommendations", scheduled_interval="0 0 * * *") task_recommendations = PythonOperator( task_id="recommendations_task", python_callable=etl, ) INTRODUCTION TO DATA ENGINEERING

  24. Let ' s practice ! IN TR OD U C TION TO DATA E N G IN E E R IN G

  25. Congrat u lations IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data Engineer @ DataCamp

  26. Introd u ction to data engineering Identif y the tasks of a data engineer What kind of tools the y u se Clo u d ser v ice pro v iders INTRODUCTION TO DATA ENGINEERING

  27. Data engineering toolbo x Databases Parallel comp u ting & frame w orks ( Spark ) Work � o w sched u ling w ith Air � o w INTRODUCTION TO DATA ENGINEERING

  28. E x tract , Load and Transform ( ETL ) E x tract : get data from se v eral so u rces Transform : perform transformations u sing parallel comp u ting Load : load data into target database INTRODUCTION TO DATA ENGINEERING

  29. Case st u d y: DataCamp Fetch data from m u ltiple so u rces Transform to form recommendations Load into target database INTRODUCTION TO DATA ENGINEERING

  30. Good job ! IN TR OD U C TION TO DATA E N G IN E E R IN G

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend