+ Sorting for WordClouds + Text Processing Data Visualization - - PowerPoint PPT Presentation

sorting for wordclouds text processing data visualization
SMART_READER_LITE
LIVE PREVIEW

+ Sorting for WordClouds + Text Processing Data Visualization - - PowerPoint PPT Presentation

+ Sorting for WordClouds + Text Processing Data Visualization Process Text Visualization n Acquire - Obtain the data from n Source = Document some source n Parse = Words n Parse - Give the data some structure, clean up n Filter


slide-1
SLIDE 1

+

Sorting for WordClouds

slide-2
SLIDE 2

+Text Processing

n Acquire - Obtain the data from

some source

n Parse - Give the data some

structure, clean up

n Filter - Remove all but the data

  • f interest

n Mine - Use the data to derive

interesting properties

n Represent - Chose a visual

representation

n Refine – Improve to make it

more visually engaging

n Interact - Make it interactive n Source = Document n Parse = Words n Filter = Word Set with counts n Mine = Get relevant words n Represent = Fonts/Placement n Refine/Interact

Data Visualization Process Text Visualization

slide-3
SLIDE 3

+Acquire data: Source = Document

n // Sketch 7-1: Parsing an input text file

String inputTextFile = "Obama.txt"; String [] fileContents; fileContents = loadStrings(inputTextFile);

n fileContents has the source! n What next?

slide-4
SLIDE 4

+Parse

n How do we turn fileContents into words? n join array into one long string

String rawText; rawText = join(fileContents, " ");

n make all same case

rawText = rawText.toLowerCase();

n remove symbols and split string into words

String delimiters = " ,./?<>;:'\"[{]}\\|=+-_()*&^%$#@!~";

tokens = splitTokens(rawText, delimiters);

slide-5
SLIDE 5

+Display the words

n Let's start by displaying all of the words:

for (String t : tokens) { //textSize(15); if(random(100) > 40) { // more red than green fill(random(150,250),0, 0,190); // make red } else { fill(0,random(150,250), 0,190); // make green } text(t, random(0,width-50), random(20,height)); } // for

slide-6
SLIDE 6

+Count the words (second way)

n Use a HashMap ( a dictionary from words è counts)

n HashMap <String,Integer> wordCountSet =

new HashMap<String,Integer>();

n to add a new word:

n wordCountSet.put(word,1); // initial count is 1

n to get the frequency of a word:

n Integer frequency =

wordCountSet.get(word); // if null, then none

n to update the frequency of a word:

n wordCountSet.put(word, frequeny + 1);

slide-7
SLIDE 7

+Count the words (second way)

slide-8
SLIDE 8

+Display the UNIQUE words

n Instead of tokens, we want the keys of the HashMap:

n wordCountSet.keySet()

for (String t : wordCountSet.keySet()) { //Let's change the text size based on the frequency //textSize(<what goes here?>); if(random(100) > 40) { // more red than green fill(random(150,250),0, 0,190); // make red } else { fill(0,random(150,250), 0,190); // make green } text(t, random(0,width-50), random(20,height)); } // for

slide-9
SLIDE 9

+Display the UNIQUE words

n Instead of tokens, we want the keys of the HashMap:

n wordCountSet.keySet()

for (String t : wordCountSet.keySet()) { //Let's change the text size based on the frequency textSize(wordCountSet.get(t)); if(random(100) > 40) { // more red than green fill(random(150,250),0, 0,190); // make red } else { fill(0,random(150,250), 0,190); // make green } text(t, random(0,width-50), random(20,height)); } // for

slide-10
SLIDE 10

+Display the most frequent words

n Lazy way

n check all frequencies of words in set and only display words

above a threshold frequency.

n First find the threshold (loop once) n Next use the threshold (loop second time)

n Systematic way

n sort word set by frequency n only display top N words

slide-11
SLIDE 11

+Code from class (max frequency)

slide-12
SLIDE 12

+Filter and size text using max frequency and map()

slide-13
SLIDE 13

+Sorting

n Any process of arranging items in sequence n Build-in sort()

n Works on arrays of simple types, i.e. int, float and String

n float[] a = { 3.4, 3.6, 2, 0, 7.1 }; n a = sort(a); // sort all elements in place n String[] s = { "deer", "elephant", "bear", "aardvark",

"cat" };

n s = sort(s, 3); // sort the first three elements n Convenient, but not very flexible

slide-14
SLIDE 14

+Sorting (implement your own)

n Easy to code (but slow)

n Selection Sort n Bubble Sort n Insertion Sort

n Animations

n https://www.cs.usfca.edu/~galles/visualization/

ComparisonSort.html

n http://www.sorting-algorithms.com/

slide-15
SLIDE 15

+Selection sort

n Basic idea:

n step forward on each item of the array starting with the first item,

if there is a smallest item in front of the item being stepped on, then swap the two items. Repeat until you've stepped on every item.

n Implementation:

n nested loop n first loop marks the current item n inner loop finds the smallest item between the current item

and the last item inclusively, then swaps the items

n Time Complexity?

slide-16
SLIDE 16

+Bubble sort

n Basic idea:

n start with the first item in the array compare adjacent items if they

are not sorted, swap them, go to the next item and repeat until you get to the end.

n repeat the above process until sorted

n Implementation:

n nested loop n first loop checks if the array is sorted n inner compares and swaps

n Time Complexity?

slide-17
SLIDE 17

+Insertion Sort

n Basic idea:

n start with a sorted subarray, insert the next item from your

unsorted list into the right position of the sorted list.

n When you get to the end of the unsorted list, you are done

n Implementation:

n nested loop n first loop gets next item to insert n inner compares, copies and makes space n inserts into space

n Time Complexity?