Stochastic generative models for citation networks of scientific and technical articles
Conference
64th ISI World Statistics Congress
Format: IPS Abstract
Session: IPS 389 - Big Data Analysis of Scientific Networks: Methods and Insights
Tuesday 18 July 2 p.m. - 3:40 p.m. (Canada/Eastern)
Abstract
Citations among scientific and technical articles can be represented by a network structure called a citation network, where nodes and directed edges represent articles with discrete publication time and citations, respectively.
We first propose a stochastic generative model in which a citation between two articles is described by a probability based on the type of the citing article, the importance of the cited article, and the difference between their publication times. We consider the out-degree of an article as its type, and the in-degree as its importance. In the model, we assume three structures: a logistic function to represent the expected number of articles published in discrete time, an inverse Gaussian probability distribution function to approximate the aging effect, and an exponential distribution to approximate the out-degree distribution. We also assume two types of generative mechanisms, preferential attachment, and triad formation to perform edge generation. We show that the model is able to generate network structures that approximate the in-degree and out-degree distributions, as well as the distribution of the number of triangles of several scientific citation networks.
We then find that the model does not fit the patent citations well. Therefore, we propose a modified model that uses the ratio of triad formation as a random variable instead of a constant parameter for the first model. We find that the model provides a better fit to the patent citation network.