An experiment on the Reddit dataset
Sha hahbaz Ahm hmed ed (1155 15594 94) Viore
- rel Mora
- rari
An experiment on the Reddit dataset Sha hahbaz Ahm hmed ed (1155 - - PowerPoint PPT Presentation
An experiment on the Reddit dataset Sha hahbaz Ahm hmed ed (1155 15594 94) Viore orel Mora orari ri (1156 15629 29) The nee need for for Sum Summarization Go Goal - to capture the important information contained in large
{"arc rchi hive ved":true,"aut utho hor":"jaquehamr","body dy":"Thanks for proving the point of the quote.\n\nTL;DR: WOOSH", "controversiality":0, "created_utc":"1239192802", "dow
ns":0,"edi dite ted":"false", "gild lded ed":0, "id id":"c08q8en", "link nk_i _id":"t3_8auok", "nam ame":"t1_c08q8en","pare rent_ t_id id":"t1_c08q4sz","ret etri riev eved_ d_on
_hi dden":false,"subreddit":"atheism","sub ubre redd ddit it_i _id":"t5_2qh2p","ups":3}
{"arc rchi hive ved":true,"aut utho hor":"[deleted]","crea eated ed":1297290547,"cre reate ted_ d_ut utc":"1297290547","dom dom ain":"self.WeAreTheFilmMakers","downs":0,"edited":"false","gilded":0,"hide_score":false,"i d":"fibse","is_ s_sel elf":true,"med edia_ a_em embe bed":{},"nam ame":"t3_fibse","num_ m_co comm mmen ents":2,"ov
r_18 18":fa lse,"perm rmal alin ink":"/r/WeAreTheFilmMakers/comments/fibse/question_about_resumes/","quar aran antin ine ":false,"retr trie ieved ed_o _on":1442846972,"save aved":false,"scor
ecur ure_ e_me medi dia_ a_emb mbed ed":{}, "sel elft ftext xt":"I'm currently a film student at the University of Cincinnati and I'm going to start applying for internships soon so I was wondering what I should put on my resume when applying.\n\ntl;dr I'm going to be sending out my resume soon and I'm looking for help on what I should include on it”, ”stic icki kied ed":false,"subr bred eddi dit":"WeAreTheFilmMakers", "sub ubre reddi dit_ t_id id":"t5_2qngr","thu humbn bnai ail":"default","titl tle":"Question about resumes", "ups ps":2, "url rl":"http://www.reddit.com/r/WeAreTheFilmMakers/comments/fibse/question_about_resumes/"}
Extracting a clean dataset “The most important tasks with regard to understanding the information available in comments are filtering, ranking and summarizing the comments.” - (Potthast et al. 2012)
Extracting a clean dataset Filtering the targets
quote.\n\nTL;DR: WOOSH“ – inv
up ten years of your life with a tl;dr“ - inv
Extracting a clean dataset Filtering the targets Processing & Ranking the target content
Extracting a clean dataset Filtering the targets Processing & Ranking the target content Extracting relevant information by ranks
Extracting a clean dataset Filtering the targets Processing & Ranking the target content Extracting relevant information by ranks Presentation of the retrieved content
nnels related to technology and geek culture.
nnel of little note that wanted techtv's audience and cancelled all the decent reasons to ever tune into techtv.
for free (and legally) on either hulu or the comedy chann nnel's website. (12 sente tences) Or Origina nal tl;dr dr: : there was a decent one. comcast more or less bought it out and axed all it's programming to get viewers for its gaming chann nnel but only succeeded in destroying the market and causing kevin rose to run off and create digg. (2 sente tenc nces)
SUB UBMIS ISSION IONS DISTR TRIB IBUTION TION
COMME COMMENTS TS DISTR TRIB IBUTION TION
1,850,031 (~ 1.85 million) 749376 (~ 0.75 million)
50 100 150 200 250 300 350 400 450
COMMENTS SUBMISSIONS
AVERAGE WOR ORD LENG NGTH TH Nu Numbe mber r of word rds
DISTR TRIB IBUTION TION OF OF COMME OMMENTS TS BY LENG NGTH TH Number of Comments
DISTR TRIB IBUTION TION OF OF SUB UBMIS MISSION IONS BY LENG NGTH TH