-
发现复杂网络中的社团结构在社会网络、生物组织网络和在线网络等复杂网络中具备十分重要的意义. 针对社交媒体网络的社团检测通常需要利用两种信息源: 网络拓扑结构特征和节点属性特征, 丰富的节点内容属性信息为社团检测的增加了灵活性和挑战. 传统方法是要么仅针对这两者信息之一进行单独挖掘, 或者将两者信息得到的社团结果进行线性叠加判决, 不能有效进行信息源的融合. 本文将节点的多维属性特征作为社团划分的一种有效协同学习项进行研究, 将两者信息源进行融合分析, 提出了一种基于联合矩阵分解的节点多属性网络社团检测算法CDJMF, 提高了社团检测的有效性和鲁棒性. 实验表明, 本文所提的方法能够有效利用节点的属性信息指导社团检测, 具备更高的社团划分质量.An important problem in the area of social networking is the community detection. In the problem of community detection, the goal is to partition the network into dense regions of the graph. Such dense regions typically correspond to entities which are closely related with each other, and can hence be said to belong to a community. Detecting communities is of great importance in computing biology and sociology networks. There have been lots of methods to detect community. When detecting communities in social media networks, there are two possible sources of information one can use: the network link structure, and the features and attributes of nodes. Nodes in social media networks have plenty of attributes information, which presents unprecedented opportunities and flexibility for the community detection process. Some community detection algorithms only use the links between the nodes in order to determine the dense regions in the graph. Such methods are typically based purely on the linkage structure of the underlying social media network. Some other community detection algorithms may utilize the nodes' attributes to cluster the nodes, i.e. which nodes with the same attributes would be put into the same cluster. While traditional methods only use one of the two sources or simple linearly combine the results of community detection based on different sources, they cannot detect community with node attributes effectively. In recent years, matrix factorization (MF) has received considerable interest from the data mining and information retrieval fields. MF has been successfully applied in document clustering, image representation, and other domains. In this paper, we use nodes attributes as a better supervision to the community detection process, and propose an algorithm based on joint matrix factorization (CDJMF). Our method is based on the assumption that the two different information sources of linkage and node attributes can get an identical nodes' affiliation matrix. This assumption is reasonable and can interpret the inner relationship between the two different information sources, based on which the performance of community detection can be greatly improved. We also conduct some experiments on three different real social networks; theoretical analysis and numerical simulation results show that our approach can get a superior performance than some classical algorithms, so our method is an effective way to explore community structure of social networks.
-
Keywords:
- matrix factorization /
- node attributes /
- community detection
[1] Fortunato S 2010 Physics Reports 486 75
[2] Tang J L, Wang X F, Liu H 2011 MSMMUSE 7472 1
[3] Wang X Y, Zhao Z X 2014 Acta Phys. Sin 63 178901 (in Chinese) [王兴元, 赵仲祥 2014 63 178901]
[4] Chen Y H, Wang L J, Dong M 2010 IEEE Transactions on knowledge and data engineering 22 1459
[5] Li M, Wang B H 2014 Chin. Phys. B 23 76402
[6] Girvan M, Newman M E J 2002 Proc. Natl. Acad. Sci 99 7821
[7] Luxburg U 2007 Statistics and Computing 17 395
[8] Palla G, Dernyi I, Farkas I, Vicsek T 2005 Nature 435 814
[9] Tang L, Liu H 2009 Proceedings of the 18th ACM Conference on Information and Knowledge ManagementNY, USA November 2-6, 2009 p1107
[10] Liang Z W, Li J P, Yang F, Athina Petropulu 2014 Chin. phys. B 23 98902
[11] Su X P, Song Y R 2013 Acta Phys. Sin. 64 020101 (in Chinese) [苏晓萍, 宋玉蓉 2015 64 020101]
[12] Zhou Y, Cheng H, Yu J X Proceedings of the VLDB Endowment Lyon, France August 24-28, 2009 p718
[13] Xiang R, Neville J, Rogati M Proceedings of the 19th international conference on World wide web NY, USA April 26-30, 2010 p981
[14] Qi G J, Charu C. Aggarwal Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Washington DC, USA April 1-5, 2012 p534
[15] Yang J, McAuley J, Leskovec J Proceedings of the IEEE International Conference on Data Mining Dallas, TX, USA December 7-11, 2013 p1151
[16] Ruan Y, Fuhry D, Parthasarathy S Proceedings of the 22 nd international conference on World Wide Web Republic and Canton of Geneva, Switzerland May 13-17, 2013 p1089
[17] Yuan C, Chai Y 2012 Acta Phys. Sin 61 218901 (in Chinese) [袁超, 柴毅 2012 61 218901]
[18] Gunnemann S, Boden B, Farber I, Seidl T Proceedings of the 17th Pacific-Asia Conference (PAKDD) Gold Coast, Australia April 14-17, 2013 p261
[19] Lee D D, Seung H S 1999 Nature 401 788
[20] Wang H, Nie F P, Huang H, Ding C Proceedings of the 2011 IEEE 11th International Conference on Data Mining Vancouver, Canada December 11-14, 2011 p774
[21] Shang F H 2012 Ph. D. Dissertation (Xi an: Xidian University) (in Chinese) [尚凡华 2012 博士学位论文(西安:西安电子科技大学)]
[22] Cai D, He X F, Han J W, Huang T S 2011 IEEE Transactions on Pattern Analysis and Machine Intelligence 8 1548
[23] Christopher M 2010 Ph. D. Dissertation (Stanford: Stanford university)
[24] Eustace J, Wang X Y, Cui Y Z 2015 Physica A 421 510
[25] Meyer C D, Wessell C D 2012 SIAM J. Matrix Anal. Appl. 33 1214
[26] Marcus W, Wasinee R, Alexander S 2004 ZIB-Report 04 1
[27] McAuley J Leskovec J Proceedings of the Advances in Neural Information Processing Systems 25 Lake Tahoe, Nevada, USA December 3-6, 2012 p548
[28] Prithviraj S, Galileo M N, Mustafa B, Lise G, Brian G, Tina E R 2008 AI Magazine 3 93
[29] MeCallum A, Nlgam K, Rennie J, Seymore K 2000 Information Retrieval Journal 3 127
[30] Kanungo T, Mount D M, Netanyahu N S 2002 IEEE Transactions on Pattern Analysis and Machine Intelligence 24 881
-
[1] Fortunato S 2010 Physics Reports 486 75
[2] Tang J L, Wang X F, Liu H 2011 MSMMUSE 7472 1
[3] Wang X Y, Zhao Z X 2014 Acta Phys. Sin 63 178901 (in Chinese) [王兴元, 赵仲祥 2014 63 178901]
[4] Chen Y H, Wang L J, Dong M 2010 IEEE Transactions on knowledge and data engineering 22 1459
[5] Li M, Wang B H 2014 Chin. Phys. B 23 76402
[6] Girvan M, Newman M E J 2002 Proc. Natl. Acad. Sci 99 7821
[7] Luxburg U 2007 Statistics and Computing 17 395
[8] Palla G, Dernyi I, Farkas I, Vicsek T 2005 Nature 435 814
[9] Tang L, Liu H 2009 Proceedings of the 18th ACM Conference on Information and Knowledge ManagementNY, USA November 2-6, 2009 p1107
[10] Liang Z W, Li J P, Yang F, Athina Petropulu 2014 Chin. phys. B 23 98902
[11] Su X P, Song Y R 2013 Acta Phys. Sin. 64 020101 (in Chinese) [苏晓萍, 宋玉蓉 2015 64 020101]
[12] Zhou Y, Cheng H, Yu J X Proceedings of the VLDB Endowment Lyon, France August 24-28, 2009 p718
[13] Xiang R, Neville J, Rogati M Proceedings of the 19th international conference on World wide web NY, USA April 26-30, 2010 p981
[14] Qi G J, Charu C. Aggarwal Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Washington DC, USA April 1-5, 2012 p534
[15] Yang J, McAuley J, Leskovec J Proceedings of the IEEE International Conference on Data Mining Dallas, TX, USA December 7-11, 2013 p1151
[16] Ruan Y, Fuhry D, Parthasarathy S Proceedings of the 22 nd international conference on World Wide Web Republic and Canton of Geneva, Switzerland May 13-17, 2013 p1089
[17] Yuan C, Chai Y 2012 Acta Phys. Sin 61 218901 (in Chinese) [袁超, 柴毅 2012 61 218901]
[18] Gunnemann S, Boden B, Farber I, Seidl T Proceedings of the 17th Pacific-Asia Conference (PAKDD) Gold Coast, Australia April 14-17, 2013 p261
[19] Lee D D, Seung H S 1999 Nature 401 788
[20] Wang H, Nie F P, Huang H, Ding C Proceedings of the 2011 IEEE 11th International Conference on Data Mining Vancouver, Canada December 11-14, 2011 p774
[21] Shang F H 2012 Ph. D. Dissertation (Xi an: Xidian University) (in Chinese) [尚凡华 2012 博士学位论文(西安:西安电子科技大学)]
[22] Cai D, He X F, Han J W, Huang T S 2011 IEEE Transactions on Pattern Analysis and Machine Intelligence 8 1548
[23] Christopher M 2010 Ph. D. Dissertation (Stanford: Stanford university)
[24] Eustace J, Wang X Y, Cui Y Z 2015 Physica A 421 510
[25] Meyer C D, Wessell C D 2012 SIAM J. Matrix Anal. Appl. 33 1214
[26] Marcus W, Wasinee R, Alexander S 2004 ZIB-Report 04 1
[27] McAuley J Leskovec J Proceedings of the Advances in Neural Information Processing Systems 25 Lake Tahoe, Nevada, USA December 3-6, 2012 p548
[28] Prithviraj S, Galileo M N, Mustafa B, Lise G, Brian G, Tina E R 2008 AI Magazine 3 93
[29] MeCallum A, Nlgam K, Rennie J, Seymore K 2000 Information Retrieval Journal 3 127
[30] Kanungo T, Mount D M, Netanyahu N S 2002 IEEE Transactions on Pattern Analysis and Machine Intelligence 24 881
计量
- 文章访问数: 7119
- PDF下载量: 387
- 被引次数: 0