define research questions
play

DEFINE RESEARCH QUESTIONS Over 11 million users on Stack Overflow - PowerPoint PPT Presentation

Motivation Research Questions An empirical study on negative Methodology answers at Stack Overflow Experiment Result Threat of Validity Ethan Wang & Sherry Zhu Future Work Conclusion DEFINE RESEARCH QUESTIONS Over 11 million


  1. Motivation Research Questions An empirical study on negative Methodology answers at Stack Overflow Experiment Result Threat of Validity Ethan Wang & Sherry Zhu Future Work Conclusion DEFINE RESEARCH QUESTIONS ▪ Over 11 million users on Stack Overflow ▪ What is the distribution of positive / neutral / negative replies? ▪ Many people experience Stack Overflow as a hostile place, especially newer coders, women, people of color. ▪ What kind of reasons for a respondent to give negative answers, and what is the distribution across the reasons?

  2. ▪ StackExchange Data Explore (From 2019.1.1 ~ 2019.10.31) ▪ Post Table Schema ▪ Senti4SD Requirements: 1. Contains only normal text 2. All text from single answer should be in one line. ▪ Text Sanitization Algorithm: ▪ Replace all new lines and extra spaces. ▪ Remove all characters not in ASCII visible range. ▪ Random sampling using NewID() ▪ Parse the HTML tag from the text, remove all the HTML tags ▪ Remove all code blocks and links while parsing the HTML tags

  3. ▪ Divide data into smaller segments ▪ Regular Expression (5000 each) Extract common patterns for each group o ▪ Filter all negative answers and random sample 200 records ▪ K-means Clustering & Support Vector Machine 1. Data Cleaning (lowercase, removed punctuation) ▪ Use online card sort tool 2. Stop Words Removing (the, he, she…) called ”U sabiliTEST ” 3. Text Vectorization (TF-IDF) Preparation 1. 2. Execution ▪ SVM: Interpretation 3. Model_1 classifies 'neutral' and 'negative' o Model_2 classifies 'negative' into multiple groups o ▪ Five themes: ▪ Neutral K-means Regex SVM ▪ Vague ▪ Undetermined ▪ Cold ▪ Irreproducible • Silhouette • Precision • Precision Coefficient = 76.86% = 86.89% = 0.002808 • Recall = • Recall = 45.49% 48.38%

  4. ▪ Regex SVM THREAT TO VALIDITY ▪ Native subjectiveness on manual process ▪ Accuracy of the Senti4SD tool STACK OVERFLOW INTERVIEWS AND POSITIVE COMMENTS SURVEYS FEEDBACK

  5. 1 2 Hostility takes only a tiny The environment on Stack portion of overall replies Overflow is satisfactory in general.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend