+ Sorting for WordClouds + Text Processing Data Visualization - PowerPoint PPT Presentation

+ Sorting for WordClouds

+ Text Processing Data Visualization Process Text Visualization n Acquire - Obtain the data from n Source = Document some source n Parse = Words n Parse - Give the data some structure, clean up n Filter = Word Set with counts n Filter - Remove all but the data of interest n Mine = Get relevant words n Mine - Use the data to derive interesting properties n Represent = Fonts/Placement n Represent - Chose a visual representation n Refine/Interact n Refine – Improve to make it more visually engaging n Interact - Make it interactive

+ Acquire data: Source = Document n // Sketch 7-1: Parsing an input text file � String inputTextFile = "Obama.txt"; � String [] fileContents; � fileContents = loadStrings(inputTextFile); � n fileContents has the source! � n What next? �

� + Parse n How do we turn fileContents into words? n join array into one long string String rawText; � rawText = join(fileContents, " "); � n make all same case rawText = rawText.toLowerCase(); � n remove symbols and split string into words String delimiters = " ,./?<>;:'\"[{]}\\|=+-_()*&^%$#@!~"; � tokens = splitTokens(rawText, delimiters); �

+ Display the words n Let's start by displaying all of the words: for (String t : tokens) { � //textSize(15); � if(random(100) > 40) { // more red than green � fill(random(150,250),0, 0,190); // make red � } else { � fill(0,random(150,250), 0,190); // make green � } � text(t, random(0,width-50), random(20,height)); � } // for �

+ Count the words (second way) n Use a HashMap ( a dictionary from words è counts) n HashMap <String,Integer> wordCountSet = � new HashMap<String,Integer>(); � n to add a new word: n wordCountSet.put(word,1); // initial count is 1 � n to get the frequency of a word: n Integer frequency = � wordCountSet.get(word); // if null, then none � n to update the frequency of a word: n wordCountSet.put(word, frequeny + 1); �

+ Count the words (second way)

+ Display the UNIQUE words n Instead of tokens, we want the keys of the HashMap: n wordCountSet.keySet() for (String t : wordCountSet.keySet()) { � //Let's change the text size based on the frequency � //textSize(<what goes here?>); � if(random(100) > 40) { // more red than green � fill(random(150,250),0, 0,190); // make red � } else { � fill(0,random(150,250), 0,190); // make green � } � text(t, random(0,width-50), random(20,height)); � } // for �

+ Display the UNIQUE words n Instead of tokens, we want the keys of the HashMap: n wordCountSet.keySet() for (String t : wordCountSet.keySet()) { � //Let's change the text size based on the frequency � textSize(wordCountSet.get(t)); � if(random(100) > 40) { // more red than green � fill(random(150,250),0, 0,190); // make red � } else { � fill(0,random(150,250), 0,190); // make green � } � text(t, random(0,width-50), random(20,height)); � } // for �

+ Display the most frequent words n Lazy way n check all frequencies of words in set and only display words above a threshold frequency. n First find the threshold (loop once) n Next use the threshold (loop second time) n Systematic way n sort word set by frequency n only display top N words

+ Code from class (max frequency)

+ Filter and size text using max frequency and map()

+ Sorting n Any process of arranging items in sequence n Build-in sort() n Works on arrays of simple types, i.e. int , float and String n float[] a = { 3.4, 3.6, 2, 0, 7.1 }; n a = sort(a); // sort all elements in place n String[] s = { "deer", "elephant", "bear", "aardvark", "cat" }; n s = sort(s, 3); // sort the first three elements n Convenient, but not very flexible

+ Sorting (implement your own) n Easy to code (but slow) n Selection Sort n Bubble Sort n Insertion Sort n Animations n https://www.cs.usfca.edu/~galles/visualization/ ComparisonSort.html n http://www.sorting-algorithms.com/

+ Selection sort n Basic idea: n step forward on each item of the array starting with the first item, if there is a smallest item in front of the item being stepped on, then swap the two items. Repeat until you've stepped on every item. n Implementation: n nested loop n first loop marks the current item n inner loop finds the smallest item between the current item and the last item inclusively, then swaps the items n Time Complexity?

+ Bubble sort n Basic idea: n start with the first item in the array compare adjacent items if they are not sorted, swap them, go to the next item and repeat until you get to the end. n repeat the above process until sorted n Implementation: n nested loop n first loop checks if the array is sorted n inner compares and swaps n Time Complexity?

+ Insertion Sort n Basic idea: n start with a sorted subarray, insert the next item from your unsorted list into the right position of the sorted list. n When you get to the end of the unsorted list, you are done n Implementation: n nested loop n first loop gets next item to insert n inner compares, copies and makes space n inserts into space n Time Complexity?

+ Sorting for WordClouds + Text Processing Data Visualization - PowerPoint PPT Presentation

+ Sorting for WordClouds + Text Processing Data Visualization Process Text Visualization n Acquire - Obtain the data from n Source = Document some source n Parse = Words n Parse - Give the data some structure, clean up n Filter

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

SORTING Review of Sorting Merge Sort Sets sorting 1 Sorting Algorithms

Overview/Questions What is sorting? Why does sorting matter? How is sorting

Sorting Lower Bound Sorting Lower Bound 1 Comparison-Based Sorting (10.4) Many sorting

Chapter 7 External Sorting Sorting Tables Larger Than Main Memory Query Processing Sorting

Sorting Sorting: to arrange data in some sequential order Sorting occurs as a part in

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Sorting Sorting used as a step in many algorithms Savitch Chapter 7.4 Sorting algorithms

Sorting Sorting as a tool Sorting problem: Given a list a with n elements possessing a There are

Sorting Algorithms Introduction Sorting Problem Sorting Problem Given a sequence A = a 1 , .

Chapter 10 Sorting and Searching Some concepts Sorting is one of the most common

Sorting Algorithms CENG 707 Data Structures and Algorithms Sorting Sorting is a process

+ Word Clouds Implementation + Text Processing Data Visualization Process Text Visualization

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can

Sorting Lower Bound Radix Sort Radix sort to the rescue sort of After today, you should be

ADVANCED DATABASE SYSTEMS Parallel Join Algorithms (Sorting) @ Andy_Pavlo // 15- 721 //

ADVANCED DATABASE SYSTEMS Parallel Join Algorithms (Sorting) @ Andy_Pavlo // 15- 721 //

Inverted Index Large set D of documents (possibly from WWW). We have a set of terms appearing in

Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools Tetsuya

The Computational Essence of Sorting Algorithms Ralf Hinze Department of Computer Science,

Loop Invariants: Part 1 7 January 2019 OSU CSE 1 Reasoning About Method Calls What a

Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives

+ Sorting for WordClouds + Text Processing Data Visualization - PowerPoint PPT Presentation

+ Sorting for WordClouds + Text Processing Data Visualization Process Text Visualization n Acquire - Obtain the data from n Source = Document some source n Parse = Words n Parse - Give the data some structure, clean up n Filter

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

SORTING Review of Sorting Merge Sort Sets sorting 1 Sorting Algorithms

Overview/Questions What is sorting? Why does sorting matter? How is sorting

Sorting Lower Bound Sorting Lower Bound 1 Comparison-Based Sorting (10.4) Many sorting

Chapter 7 External Sorting Sorting Tables Larger Than Main Memory Query Processing Sorting

Sorting Sorting: to arrange data in some sequential order Sorting occurs as a part in

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Sorting Sorting used as a step in many algorithms Savitch Chapter 7.4 Sorting algorithms

Sorting Sorting as a tool Sorting problem: Given a list a with n elements possessing a There are

Sorting Algorithms Introduction Sorting Problem Sorting Problem Given a sequence A = a 1 , .

Chapter 10 Sorting and Searching Some concepts Sorting is one of the most common

Sorting Algorithms CENG 707 Data Structures and Algorithms Sorting Sorting is a process

+ Word Clouds Implementation + Text Processing Data Visualization Process Text Visualization

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Security Visualization Tim Vidas &amp; Hanan Hibshi UPS 2011 1 Visualization Visualization can

Sorting Lower Bound Radix Sort Radix sort to the rescue sort of After today, you should be

ADVANCED DATABASE SYSTEMS Parallel Join Algorithms (Sorting) @ Andy_Pavlo // 15- 721 //

ADVANCED DATABASE SYSTEMS Parallel Join Algorithms (Sorting) @ Andy_Pavlo // 15- 721 //

Inverted Index Large set D of documents (possibly from WWW). We have a set of terms appearing in

Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools Tetsuya

The Computational Essence of Sorting Algorithms Ralf Hinze Department of Computer Science,

Loop Invariants: Part 1 7 January 2019 OSU CSE 1 Reasoning About Method Calls What a

Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can