Topic Discovery and Future Trend Prediction In Scholarly Networks - - PowerPoint PPT Presentation

topic discovery and future trend prediction in scholarly
SMART_READER_LITE
LIVE PREVIEW

Topic Discovery and Future Trend Prediction In Scholarly Networks - - PowerPoint PPT Presentation

Topic Discovery and Future Trend Prediction In Scholarly Networks Interim Report 515030910600 Introduction & Existing works Proper resource allocation on research requires accurate forecasting for the future research


slide-1
SLIDE 1

Topic Discovery and Future Trend Prediction In Scholarly Networks

Interim Report 515030910600

slide-2
SLIDE 2

Introduction & Existing works

Proper resource allocation on research requires accurate forecasting for the future research activities.

slide-3
SLIDE 3

Forecasting

Extrapolate historical data through a specific function depend on the subjective judgments

  • f experts

Judgmental Analysis Numerical Analysis

slide-4
SLIDE 4

Numerical Analysis

Co-citation Clustering Area Co-word Analysis Bibliometric Calculate the same elements in a document to group the documents into a certain category Calculate the frequency of words to determine its trends Obtain keywords and abstracts Use NGD to arrange them into taxonomies Calculate the Technology growth score !"# = ∑& '×)

*

∑& )

*

Where Y means all the observation period y means the year f means the term frequency

slide-5
SLIDE 5

Theory & Researching Plan

Analyze the time series and find similar pattern from them

slide-6
SLIDE 6

Analysis of Time Series with Machine Learning

Principle Component Analysis Factor Analysis Partial Least Square Sliced Inverse Regression Blind Source Separation

– Single best method is not always applicable for different kind of data. – Some best prediction results should be taken into account, and then combine them into an ensemble system to get the final forecast result

……

slide-7
SLIDE 7

Time Series Similarity

Allows acceleration-deceleration of signals along the time dimension Measure the difference between each point of the series

Euclidean Distance Dynamic Time Warping

slide-8
SLIDE 8

Research Plan

– Select research topics from the dataset and categorize them – Construct time series based on the topic’s frequency each year – Construct training and testing matrix – Predict the time series using ANN & SVM with various parameters, as well as ARIMA and logistic model – select the best models of the training data which is most similar with the testing data – Combine prediction result using average, median, performance based ranking

slide-9
SLIDE 9

Experiments

slide-10
SLIDE 10

Datasets

Via link.springer.com

slide-11
SLIDE 11

Time Series

slide-12
SLIDE 12

Comparison among Individual predictors

Kernel RBF with sigma’s width = 0.01 / 0.1 / 0.5 / 1 / 2 / 5 Hidden node set = 1 / 2 / 3 / 5 / 10

Neural Network SVR

– The first experiment in this study is to compare the performance

  • f each predictor

– there are totally 14 models by varying the parameters of those predictors.

Kernel polynomial with degree = 1 / 2 / 3

slide-13
SLIDE 13

Comparison among Individual predictors

– Single best method is not always applicable for different kind of

  • data. Most predictors have good prediction for some time series

but not for the others Forecasting Performance using MSE among Individual Models on 14 Time Series

slide-14
SLIDE 14

Combination

  • f Models

using Similarity Measure

– The second experiment in this study is to select the predictors that perform best on training time series similar to testing time series to be predicted – The best model in validation does not necessarily always imply the best model in testing – As the number of model is increased, the MSE decreases up until about half of the total number of model

MSE on Combination of Methods using Euclidean, DTW and without Similarity

slide-15
SLIDE 15

Combination

  • f Models

using Similarity Measure

– Using combination of methods selected based on the similarity between training and testing data may lead into better prediction result compared to the combination of all methods – Among those three model selections, Euclidean similarity is the

  • ne that may yield the lowest MSE

Average Performance of Forecast Combination using Models Selected by Euclidean and DTW Similarity Compared to the one using best and all Models without Similarity Measure

slide-16
SLIDE 16

The most

  • ften use

Models

– Neural Networks are chosen more often as best models than the SVRs – Among the NNs, the moderate number of hidden node, such as 3 and 5, are more preferable – Among the SVRs, the polynomial kernel of degree 3 and RBF kernel of width 1, which are more suitable for fluctuating pattern, are closely following the NNs.

The most often selected models for the first eight models of all series

slide-17
SLIDE 17

Conclusion

slide-18
SLIDE 18

Conclusion

– The combination of methods selected based on the similarity between training and testing data may perform better compared to the combination of all methods – The optimum number of models to combine is about fifty percent of the number of models – Smaller number of models to combine may not provide enough diversification of method’s capabilities whereas greater number of models may select poor performing models

slide-19
SLIDE 19

Thank You