MICROSCOPIC AND MACROSCOPIC SPATIO-TEMPORAL TOPIC MODELS FOR CHECK-IN DATA

 

ABSTRACT

Twitter, together with other online social networks, such as Facebook, and Gowalla have begun to collect hundreds ofmillions of check-ins. Check-in data captures the spatial and temporal information of user movements and interests. To model andanalyze the spatio-temporal aspect of check-in data and discover temporal topics and regions, we firstly propose a spatio-temporaltopic model, i.e. Upstream Spatio-Temporal Topic Model (USTTM). USTTM can discover temporal topics and regions, i.e. a user’schoice of region and topic is affected by time in this model. We use continuous time to model check-in data, rather than discretized time,avoiding the loss of information through discretization. In addition, USTTM captures the property that user’s interests and activity spacewill change over time, and users have different region and topic distributions at different times in USTTM. However, both USTTM andother related models capture “microscopic patterns” within a single city, where users share POIs, and cannot discover “macroscopic”patterns in a global area, where users check-in to different POIs. Therefore, we also propose a macroscopic spatio-temporal topicmodel, MSTTM, employing words of tweets that are shared between cities to learn the topics of user interests. We perform anexperimental evaluation on Twitter and Gowalla data sets from New York City and on a Twitter US data set. In our qualitative analysis,we perform experiments with USTTM to discover temporal topics, e.g. how topic “tourist destinations” changes over time, and todemonstrate that MSTTM indeed discovers macroscopic, generic topics. In our quantitative analysis, we evaluate the effectiveness ofUSTTM in terms of perplexity, accuracy of POI recommendation, and accuracy of user and time prediction. Our results show that theproposed USTTM achieves better performance than the state-of-the-art models, confirming that it is more natural to model time as anupstream variable affecting the other variables. Finally, the performance of the macroscopic model MSTTM is evaluated on a TwitterUS dataset, demonstrating a substantial improvement of POI recommendation accuracy compared to the microscopic models.

 

EXISTING SYSTEM:

We briefly review four lines of related work: spatial topic models,temporal topic models, spatio-temporal topic models, and spatialtemporalanalysis not based on topic models.Spatial Topic Models. Much work has been done on spatialtopic model. Yin et al. proposed the LGTA based on ProbabilisticLatent Semantic Indexing (PLSI), and modeled combinesgeographical clustering and topic modeling. This model assumesthat coordinates and words in each document are generated bya region, and topics are also drawn by region. Hong et al. [2]proposed a fully Bayesian generative model that extends LGTA.This work is the combination of three models: a background languagemodel, a per-region language model and a topical languagemodel. Recently, Hu and Ester proposed the Spatial Topic (ST)Model for location recommendation. This model can capture thecorrelation between users’ movements and between user interestsand the functional locations. Ahmed et al. presented a hierarchicaltopic model which models regional variations of topics.Relations between the Gaussian distributed geographical regionsare modeled by assuming a strict hierarchical relation. Kling et al. proposed the MDP-based geographical topic model (MGTM)based on a multi-Dirichlet process (MDP) to detect non-gaussiangeographical topics. This model extends the GeoFolk model proposed by Sizov et al. In addition, the proposed MGTM is anon-personalized model.

PROPOSED SYSTEM:

We propose two novel spatio-temporal topic models for check-in data, which can discover temporal topics and regions. Ourcontributions are as follows:_ Both models use continuous time to model check-in data,rather than discretized time, avoiding the loss of informationthrough discretization._ In order to capture the property that user’s interests andactivity space will change over time, we propose USTTM,where users have different region and topic distributionsat different times._ In order to discover “macroscopic” patterns in a globalarea, we propose MSTTM, employing words of tweetsthat are shared between cities to learn the topics of userinterestes._ We perform an experimental evaluation of the microscopicmodel on Twitter and Gowalla data sets from New YorkCity in terms of perplexity, accuracy of POI recommendation,and accuracy of user and time prediction. Our resultsshow that the USTTM model achieves better performancethan the state-of-the-art models. The performance of themacroscopic model MSTTM is evaluated on a Twitter USdataset, demonstrating a substantial improvement of POIrecommendation accuracy compared to the microscopicmodels.

CONCLUSION

The rapidly increasing availability of check-in data opens up awide variety of novel applications. In this paper, we firstly proposea novel microscopic spatio-temporal topic models, i.e. USTTM,to capture the spatial and temporal patterns of user movementsand interests. We use continuous time to model check-in data,rather than discretized time, avoiding the loss of informationthrough discretization. USTTM can discover temporal topics andregions. USTTM can capture the property that user’s interests andactivity space will change over time, and users have differentregion and topic distributions at different times in this model.However, USTTM captures “microscopic patterns” within a singlecity, where users share POIs, and cannot discover “macroscopic”patterns in a global area, where users check-in to different POIs.Therefore, we also propose a macroscopic spatio-temporal topicmodel, MSTTM, employing words of tweets that are sharedbetween cities to learn the topics of user interests. We performan experimental evaluation on Twitter and Gowalla data sets fromNew York City and on a Twitter US data set. In our qualitativeanalysis, we perform experiments with USTTM to discover temporaltopics, e.g. how topic “tourist destinations” changes overtime, and to demonstrate that MSTTM indeed discovers macroscopic,generic topics. In our quantitative analysis, we evaluate theeffectiveness of USTTM in terms of perplexity, accuracy of POIrecommendation, and accuracy of user and time prediction. Ourresults show that USTTM achieves better performance than thestate-of-the-art models, confirming that it is more natural to modeltime as an upstream variable affecting the other variables. Finally,the performance of the macroscopic model MSTTM is evaluatedon a Twitter US dataset, demonstrating a substantial improvementof POI recommendation accuracy compared to the microscopicmodels.

REFERENCES

[1] Z. Yin, L. Cao, J. Han, C. Zhai, and T. Huang, “Geographical topicdiscovery and comparison,” in Proceedings of the 20th internationalconference on World wide web. ACM, 2011, pp. 247–256.

[2] L. Hong, A. Ahmed, S. Gurumurthy, A. J. Smola, and K. Tsioutsiouliklis,“Discovering geographical topics in the twitter stream,” in Proceedingsof the 21st international conference on World Wide Web. ACM, 2012,pp. 769–778.

[3] Q. Yuan, G. Cong, Z. Ma, A. Sun, and N. M. Thalmann, “Who, where,when and what: discover spatio-temporal topics for twitter users,” inProceedings of the 19th ACM SIGKDD international conference onKnowledge discovery and data mining. ACM, 2013, pp. 605–613.

[4] B. Hu, M. Jamali, and M. Ester, “Spatio-temporal topic modeling inmobile social media for location recommendation,” in Data Mining(ICDM), 2013 IEEE 13th International Conference on. IEEE, 2013,pp. 1073–1078.

[5] B. Hu and M. Ester, “Spatial topic modeling in online social media forlocation recommendation,” in Proceedings of the 7th ACM conference onRecommender systems. ACM, 2013, pp. 25–32.

[6] A. Ahmed, L. Hong, and A. J. Smola, “Hierarchical geographicalmodeling of user locations from social media posts,” in Proceedings ofthe 22nd international conference on World Wide Web. InternationalWorld Wide Web Conferences Steering Committee, 2013, pp. 25–36.

[7] C. C. Kling, J. Kunegis, S. Sizov, and S. Staab, “Detecting non-gaussiangeographical topics in tagged photo collections,” in Proceedings of the7th ACM international conference on Web search and data mining.ACM, 2014, pp. 603–612.

[8] S. Sizov, “Geofolk: latent spatial semantics in web 2.0 social media,” inProceedings of the third ACM international conference on Web searchand data mining. ACM, 2010, pp. 281–290.

[9] Y. Liu, B. Zhou, F. Chen, and D. W. Cheung, “Graph topic scan statisticfor spatial event detection,” in Proceedings of the 25th ACM Internationalon Conference on Information and Knowledge Management. ACM,2016, pp. 489–498.

[10] D. M. Blei and J. D. Lafferty, “Dynamic topic models,” in Proceedingsof the 23rd international conference on Machine learning. ACM, 2006,pp. 113–120.