Random walk on node cliques for high-quality samples to estimate large graphs with high accuracies and low costs

Citation DataKnowledge and Information Systems, ISSN: 0219-3116, Vol: 64, Issue: 7, Page: 1909-1935

Publication Year2022

3
Citations
0
Usage
2
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
3
- Citation Indexes
  3
Captures
2
- Readers
  2

Article Description

Random-walk-based sampling is an efficient way to extract and analyze the properties of large and complex graphs representing social networks. However, it is almost impractical for existing random-walk-based sampling schemes to reach the desired node distribution because of the indeterministic sampling budget (i.e., the number of samples or sampling steps) required for doing so with large volumes of data in graphs. On the other hand, under a small sampling budget, these methods produce low-quality samples with many repeats and high correlations (i.e., many common attributes), which leads to a large deviation from the desired node distribution and large estimation errors. In this paper, we propose a new random-walk sampling scheme based on node cliques (a subset of cliques), called node-clique random walk, or NCRW, to strike a good balance between the estimation error and the sampling budget, by producing unique samples with low correlations. Meanwhile, both the deviation from the desired node distribution and the estimation errors under the constraint of the sampling budget are reduced both theoretically and experimentally. Thus, the sampling costs which are closely related to the sampling budget are reduced. Our extensive experimental evaluation driven by real-world datasets further confirms that NCRW significantly increases the quality of samples and accuracy of estimations with much lower costs than those of existing random-walk-based sampling schemes especially in estimating the higher-order node attributes.

Bibliographic Details

DOI10.1007/s10115-022-01691-8

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85132545961&origin=inward; http://dx.doi.org/10.1007/s10115-022-01691-8; https://link.springer.com/10.1007/s10115-022-01691-8; https://dx.doi.org/10.1007/s10115-022-01691-8; https://link.springer.com/article/10.1007/s10115-022-01691-8

AUTHOR(S)

Lingling Zhang; Zhiwei Zhang; Guoren Wang; Fang Wang; Dan Feng; Yanwen Xie; Hong Jiang

PUBLISHER(S)

Springer Science and Business Media LLC

TAG(S)

Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know