 
              1/23/2010 Data Mining in Business Intelligence Professor Hui Xiong I/UCRC Center for Dynamic Data Analytics Rutgers University Data Mining Tasks Data Tid Refund Marital Taxable Status Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Yes 5 No Divorced 95K 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes No 9 No Married 75K 10 No Single 90K Yes 11 No Married 60K No 12 Yes Divorced 220K No 13 No Single 85K Yes 14 No Married 75K No 15 No Single 90K Yes 1 0 Milk Financial Fraud Detection • Inside Trading, Market Manipulation, Fraud 1
1/23/2010 Spoiled by One Very Rotten Apple ‐ Rogue Trader’s $7.14 Billion Loss • Biggest Bank Fraud in History – 2008: Bank Societe Generale – $7.14 Billion Loss • A single futures trader, Jerome Kerviel, who scheme of fictitious transactions • China Aviation Oil (CAO), Chen Jiulin, led to a loss of $550 million Business & Economic Networks � Example: eBay bidding � vertices: eBay users, links: represent bidder-seller or buyer-seller � fraud detection: bidding rings � Example: corporate boards � vertices: corporations � links: between companies that share a board member � Example: corporate partnerships � vertices: corporations � links: represent formal joint ventures � Example: goods exchange networks � vertices: buyers and sellers of commodities � links: represent “permissible” transactions A Sample Network of Board of Directors 2
1/23/2010 Financial Fraud Detection • Cross ‐ account/channel Fraud Detection – Money transfer (ring of traders, multiple accounts) – Price manipulations (in or outgoing stars, potentially with losses) • Fraud Risk Propagation in Corporation Networks Deliverables • First 6 months – Building a database of bankrupt companies with the information, such as board of directors • 12 months and associated knowledge transfer – A demo system for detecting fraud / short signals Cab Location Traces • 500 Taxi drivers • About 30 ‐ day data in San Francisco • Spatial ‐ temporal sequence Spatial temporal sequence – Latitude – Longitude – Identifier of Business • 1 indicates with passenger • 0 indicates no passenger – Time stamp 3
1/23/2010 Profiling Driver Behaviors � Profiling the driver behaviors to identify transportation related green knowledge � i.e. highly effective use of energy; safety driving; the driving patterns affecting the gasoline consumption � Method � Driver Segmentation � Trajectory Clustering Ref: Transecurity Understanding Cab Driver Behaviors Energy ‐ related Knowledge Discovery • Driver Segmentation based on their effective driving time – Ratio between driving time with customers and driving time without customers driving time without customers • Clustering of effective pick ‐ up points • Frequent trajectory with customers • Frequent trajectory without customers • Moving pattern of most profitable drivers 4
1/23/2010 Energy ‐ Efficient Mobile Recommender Systems • Recommend routes – Suggest a sequence of pick ‐ up points for cab drivers in a real ‐ time fashion based on the knowledge learnt from history data g y – Suggest to avoid area where may lead to less effective use of gasoline. • Knowledge for Safety Driving Training • Pattern for Cab Driver Coaching and Feedback Context ‐ Aware Customer Service Support � Customer service support: an integral part of most companies Customer Service Problem Log • Structured attributes: limited information • Unstructured attributes • A Sample Problem Log Entry 5
1/23/2010 Context ‐ Aware Customer Service Support • User behaviors identified from problem logs • Demographic information of Customers • Multi ‐ focal Learning Multi ‐ focal Learning: An illustration • Multi ‐ focal learning: partition training data into several different focal groups and build prediction model within each focal group Deliverables • First 6 months – Context ‐ aware feature selection – Multi ‐ source demographic customer data collection • 12 months and associated knowledge transfer – A software package for context ‐ aware multi ‐ focal learning for customer service support 6
NSF Industry/University Center for Dynamic Data Analytics (CDDA) Project Summary Project Name: An Energy-Efficient Mobile Recommender System Project Investigators: Hui Xiong Description: The increasing availability of large-scale location traces creates unprecedent opportunities to change the paradigm for knowledge discovery in transportation systems. A particularly promising area is to extract energy-efficient transportation patterns (green knowledge), which can be used as the guidance for reducing inefficiencies in energy consumption of transportation sectors. However, extracting green knowledge from location traces is not a trivial task. Conventional data analysis tools are usually not customized for handling the massive quantity, complex, dynamic, and distributed nature of location traces. To that end, in this project, we will provide a focused study of extracting energy-efficient transportation patterns from location traces. Specifically, we have the initial focus on a sequence of mobile recommendations. As a case study, we will develop a mobile recommender system which has the ability in recommending a sequence of pick-up points for taxi drivers. The goal of this mobile recommendation system is to maximize the probability of business success. Experimental Plan : - Sept. 10: Data Preprocessing - Dec 10: Algorithm Design - Spring 11: Testing of algorithms - Fall 11: Performance Evaluation Related Work Elsewhere: How Ours Is Different: - - Classic recommender systems are focused Mobile recommender systems is under- on traditional application domains, such as explored commercial item recommendation - Recommendation based on business success instead of user ratings Related Work in Center: Milestones: - Vision and data analysis applications - 2010-2011: Focus on algorithm - DHS work on camera networks development - 2011: Implementation of a Demo system and Evaluation of the performances of Energy-Efficient Mobile Recommendation Budget: $50,000 Deliverables: - Technical demonstration along with a technical report resulting in a publication; Potential Benefits to Member Companies: - Ideas for developing energy-efficient location based services
NSF Industry/University Center for Dynamic Data Analytics (CDDA) Project Summary Project Name: Mobile Web Usage Profiling for System Performance Tuning Project Investigators: Hui Xiong Description: The objective of this proposed research is to profile the behaviors of mobile web users. Due to the differences in age, profession, gender, and cultural background, mobile users may exhibit a large degree of diversity in how they access the mobile Internet. Understanding this diversity as well as extracting similarity in the user patterns is thus critical to designing and developing future mobile applications which is centered on mobile search. In order to address this need, we have obtained web usage logs from a mobile service provider, and propose to perform a detailed analysis of the logs. Specifically, we propose to analyze the logs based on the method of user segmentation, which cluster users with similar behaviors based on their demographic data, search keywords, and click histories. This research poses challenges in, as well as advances the development of, both data mining and mobile computing. By the end of the project, we expect to develop a set of techniques that can effectively characterize users’ usage patterns and a list of observations that can be leveraged for improving the performance of the mobile Web sites. Experimental Plan : - Sept. 10: Data Preprocessing - Dec 10: Algorithm Design - Spring 11: Testing of algorithms - Fall 11: Performance Evaluation Related Work Elsewhere: How Ours Is Different: - Customer Segmentation - Cross-information-source collaborative - Customer Profiling customer analysis Related Work in Center: Milestones: - - Vision and data analysis applications 2010-2011: Focus on algorithm development - 2011: Testing of a demo system for customer analysis; evaluation of the performances Deliverables: Budget: $50,000 - Technical demonstration along with a technical report resulting in a publication; Potential Benefits to Member Companies: - Techniques for multi-source and context-aware customer analysis
Recommend
More recommend