discovering weblog communities
play

Discovering Weblog Communities A Content- and Topology-Based - PDF document

Discovering Weblog Communities A Content- and Topology-Based Approach Jeroen Bulters Maarten de Rijke ISLA, University of Amsterdam ISLA, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam Kruislaan 403, 1098 SJ Amsterdam The


  1. Discovering Weblog Communities A Content- and Topology-Based Approach Jeroen Bulters Maarten de Rijke ISLA, University of Amsterdam ISLA, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam Kruislaan 403, 1098 SJ Amsterdam The Netherlands The Netherlands jbulters@science.uva.nl mdr@science.uva.nl Abstract We believe that our work is of interest to two types of end users: (1) the algorithm we propose lays the ground work for Weblogs have become a leading form of self-publication on a tool that can used by individual bloggers as an exploratory the web. Personal weblogs are often considered to represent search tool, and (2) our algorithm can be extended to a tool a person, and the links between webogs can naturally be given for advertisers and marketeers, for whom a global view of a social interaction. Against this background, finding a com- likes, dislikes, and interests of groups of bloggers matters. munity around a given weblog—i.e., identifying a set of we- The remainder of this paper is organized as follows. We blogs that forms a natural group together with the starting start with a brief description of related work in Section 2. point, because of content or social reasons—is a very natural Then, in Section 3, we present our algorithm for discover- task. Traditional methods for community finding methods fo- ing weblog communities. We follow with a description of an cus almost exclusively on topology analysis. In this paper we experimental evaluation of the algorithm in Section 4. We present a novel method for discovering weblog communities report on the results in Section 5 and conclude in Section 6. that incorporates both topology analysis and content anal- ysis. We evaluate our method in a small-scale user study, analyze the contributions of the various components of our 2. Related work approach, and compare it against a state-of-the-art topology- based community finding algorithm. The fact that a weblog is a web-based publication gives us the opportunity to apply traditional web-mining techniques to weblogs. A lot of work has been done on the identifica- 1. Introduction tion of clustered websites; see e.g., [2]. Although weblogs are In recent years weblogs have become a dominant form of self just websites, weblogs are often considered to “represent” a publication on the internet. The number of weblogs tracked person while a website represents a subject [5]. Websites can by Technorati has been doubling every 5 months and it is be characterized in terms of the strong distinction between often claimed that a new weblog is created every second. The authority-type and hub-type pages [4]; authority-type pages vast and evolving nature of the blogosphere offers interesting are considered to have substantially more outgoing links than challenges from the point of view of information access . incoming links while hub-type pages have a—more-or-less— In this paper, we focus on the following access task: given equal number of incoming and outgoing links. The analogy a weblog (or blogger), return a set of other weblogs that between authorities and subjects, and hubs and people is eas- form a community together with the starting blog. Tradi- ily made. While websites can be related to two types of pages, tional community extraction methods rely almost exclusively weblogs are considered to “identify” a person — who can have on an analysis of link topology around a given starting point, many different interests (subjects) — and can thus only be thereby effectively ignoring the immense amount of informa- related in an intuitive way with the hub-type pages of Klein- tion given by the weblogger in his posts. For example, in the berg’s HITS algorithm. Kumar et al. [5] present a topology- experimental evaluation in this paper one of the weblogs— based algorithm for community extraction which they later appelejan —was assessed as having 18 members in its com- use in so called Burst-Analysis. This algorithm is our base- munity; however, a state-of-the-art topology based algorithm line. yielded only three members of the community due to the fact Lin et al. [7] focus on extracting communities based on two that members in the community did not always link back to key insights: (a) communities form due to individual blog- each other or to other members of the community. ger actions that are mutually observable; (b) the semantics We present a novel community finding method that incor- of the hyperlink structure are different from traditional web porates both topology- and content-analysis. In addition to analysis problems. Their topology-based approach involves a detailed description of the core algorithm, we provide the developing computational models for mutual awareness that outcomes of a small-scale user study aimed at understand- incorporate the specific action type, frequency and time of ing the algorithm’s effectiveness and at comparing it with an occurrence. existing state-of-the-art solution. Merelo-Guervos et al. [8] map a weblog hosting site using Kohonen’s self-organizing map and discover interesting com- munity features; they provide a comparison between their methods and other community-discovering algorithms. Like us, they use a mixture of topology- and content-analysis. ICWSM 2007 Boulder, CO USA

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend