Mining Lectures Marcel Caraciolo - @marcelcaraciolo 1 Whos me ? - - PowerPoint PPT Presentation

mining lectures
SMART_READER_LITE
LIVE PREVIEW

Mining Lectures Marcel Caraciolo - @marcelcaraciolo 1 Whos me ? - - PowerPoint PPT Presentation

Mining Lectures Marcel Caraciolo - @marcelcaraciolo 1 Whos me ? Marcel Pinheiro Caraciolo Brazilian, lover of crabs Director of P&D - brazilian startup Orygens M.S.C Candidate at Data Mining and Recommender Systems Current moderator


slide-1
SLIDE 1

Mining Lectures

Marcel Caraciolo - @marcelcaraciolo

1

slide-2
SLIDE 2

Who’s me ?

Marcel Pinheiro Caraciolo Brazilian, lover of crabs M.S.C Candidate at Data Mining and Recommender Systems Current moderator of the Local Python User Group at Pernambuco Interested at machine learning, recommender systems and mobile computing Blogging about machine learning with Python since 2008 http://aimotion.blogspot.com Young apprentice with Python programming since 2008. Director of P&D - brazilian startup Orygens

2

slide-3
SLIDE 3

How I started this analysis?

24 hours ago...

3

slide-4
SLIDE 4

Question

How were the topics distributed around the Scipy Conference General Sessions ?

4

slide-5
SLIDE 5

Scrapping of Scipy Conference

Small Web-Crawler for extracting the approved lectures

urllib2, re, BeautifulSoap...

5

slide-6
SLIDE 6

Resume

Lectures minutes length

41 820

6

slide-7
SLIDE 7

It means...

=~ 4100 tweets posted.

7

slide-8
SLIDE 8

Or watch...

Star Wars Trilogy

2x

8

slide-9
SLIDE 9

Or finish Super Mario Game... 82 x!

9

slide-10
SLIDE 10

Or open the Eclipse 2 x!

10

slide-11
SLIDE 11

Most popular Authors

Dharhas Pothina - 3

Wes McKinney - 2 All the others - 1

11

slide-12
SLIDE 12

Playing with the text...

The most frequent words at the conference

nltk, re

12

slide-13
SLIDE 13

But let’s take a deeper look.

I used the clustering algorithm K-Means Tool used for visualization Ubigraph

13

slide-14
SLIDE 14

Distribution of the Lectures

Basic Frameworks

matplotlib, ipython

Parallelism

performance, gpu, statistical

B u i l d i n g f r a m e w

  • r

k s

performance, models, web services

V i s u a l i z a t i

  • n

Numpy

toolkits using Numpy data analysis, statistical

14

slide-15
SLIDE 15

To sum up...

Mining english text is so much easier!!!

Submit your work also! Spread the scientific python over the community I expect to be back to Scipy next year!

15

slide-16
SLIDE 16

Mining Lectures

Marcel Caraciolo - @marcelcaraciolo

https://github.com/marcelcaraciolo/clustering_scipy

16