Database page: ‘A dynamic programming algorithm for a maximum \(s\)-clique set on trees’

Home Page
Artifact Page
Reproducibility of results

Authors: José Alberto Fernández-Zepeda, Alejandro Flores-Lamas, Matthew Hague, Joel Antonio Trejo-Sánchez.

Description

This webpage contains the dataset for the experiments in the article “A dynamic programming algorithm for a maximum \(s\)-clique set on trees”. This presentation-only paper is currently under consideration for acceptance for The 16th International Conference on Reachability Problems (RP’22).


Abstract

Given an undirected graph \(G = (V_G, E_G)\), a clique \(C\) is a complete subgraph of \(G\). In social networks analysis, the unit distance of the clique makes it challenging to model certain social concepts. For those applications, a relaxed variant of the clique is more appropriate. A \(s\)-clique \(Q\) is a maximal subgraph of \(G\), such that the distance in \(G\) between any pair of vertices of \(Q\) is less than or equal to some positive integer \(s\). In this sense, a clique is also a \(1\)-clique. The maximum \(s\)-clique problem consists of finding a \(s\)-clique with the greatest amount of vertices in \(G\). Such a problem is NP-hard for arbitrary graphs and any \(s\). In this work, we propose a dynamic programming algorithm that solves this problem on a tree of order \(n\) in \(O(s \cdot n)\) time, for \(s \geq 2\). Our algorithm improves, theoretically and experimentally, the performance of previous algorithms that compute a maximum \(s\)-clique on trees.

Keywords: \(s\)-clique, Graph algorithms, Trees and Dynamic programming.


Data documentation

The data set consists of six classes of trees.



Table 1. Source files (in GML format) for the cases of studies; files are zipped.

Case of study File Case of study File
\({\cal T}_{22\_16}\) T_22_16.zip \(T_L\) T_L.zip
\({\cal T}_{Ph}\) T_Ph.zip \(T_B\) T_B.zip
\(T_D\) T_D.zip \(T_{\eta\mathrm{e}{6}}\) T_eta.zip
All files DB.zip

Metrics

We used the wall-clock running time required by Schäfer’s1 and Max-DkT2 implementations. We reported and compared the time required for computing an MDkC3 size on each input graph; i.e., we did not account for operations such as reading/writing files, data structure initialisation, and MDkC vertices identification. Since both algorithms are correct, we did not consider the size of the MDkC.


Computing environment

We implemented both algorithms with Python4 3.10 and the Python package NetworkX5 version 2.8. Experiments ran on a 2012 MacBook Pro @2.3 GHz Quad-Core Intel Core i7, running a 64 bit macOS Catalina version 10.15.7 with a total memory of 8 GB (1600 MHz DDR3). This computer executed each algorithm implementation sequentially without any resource allocation. The graph format is GML6.


Experimental results

Table 2. Wall-clock running time of Schäfer’s implementation and MaxDkT on the six cases of studies. We denote ‘timeouts’ and ‘not applicable’ by ‘—’ and ‘n/a’, respectively.

Running time in seconds
Row Graph \(t_{\text{Sch{\"a}fer}}\) \(t_{\text{Max-D}k\text{T}}\) \(\frac{t_{\text{Sch{\"a}fer}}}{t_{\text{Max-D}k\text{T}}}\)
1 \({\cal T}_{22\_16}\) 132.4492 60.8545 2.1
2 \(T_D,\; k = 4\) 109.9832 1.8577 59.2
3 \(T_D,\; k = 5\) 116.5945 1.5734 74.1
4 \(T_L, \; k = 10\) 1.5289 0.5584 2.7
5 \(T_L, \; k = 100\) 12.6688 4.8969 2.5
6 \(T_L, \; k = 1\,000\) 602.9626 45.2492 13.3
7 \(T_L, \; k = 10\,000\) 19,859.2017 241.7339 82.1
8 \(T_B, \; k = 5\) 117.9299 2.0445 57.6
9 \(T_B, \; k = 10\) 121.7708 1.9870 61.2
10 \(T_B, \; k = 15\) 125.8261 1.9811 63.5
11 \(T_B, \; k = 20\) 125.4565 1.9592 64.0
12 \(T_B, \; k = 25\) 126.7537 1.9809 63.9
13 \(T_B, \; k = 30\) 148.4265 1.9831 74.8
14 \({\cal T}_{Ph}\) 719.4773 185.4435 3.8
15 \(T_{0.3\mathrm{e}{6}},\; k = 10\) 1,022.6843 9.4978 107.7
16 \(T_{0.3\mathrm{e}{6}},\; k = 100\) 1,439.42 17.3046 83.1
17 \(T_{0.3\mathrm{e}{6}},\; k = 500\) 5,989.8246 74.7689 80.1
18 \(T_{0.5\mathrm{e}{6}},\; k = 10\) 2,945.4381 15.6059 188.7
19 \(T_{0.5\mathrm{e}{6}},\; k = 100\) 3,654.6010 44.1117 82.8
20 \(T_{0.5\mathrm{e}{6}},\; k = 500\) 2,661.9666 n/a
21 \(T_{0.75\mathrm{e}{6}},\; k = 10\) 6,436.4398 22.7611 282.7
22 \(T_{0.75\mathrm{e}{6}},\; k = 100\) 7,807.3684 94.7391 82.4
23 \(T_{0.75\mathrm{e}{6}},\; k = 500\) 5,657.6981 n/a
24 \(T_{1\mathrm{e}{6}},\; k = 10\) 11,529.7250 31.8786 361.6
25 \(T_{1\mathrm{e}{6}},\; k = 100\) 14,167.638 198.9311 71.2
26 \(T_{1\mathrm{e}{6}},\; k = 500\) 11,422.2738 n/a



Table 3. Additional information and raw data.

Case of study File Case of study File
\({\cal T}_{22\_16}\) T_22_16.tsv \(T_L\) T_L.tsv
\({\cal T}_{Ph}\) T_Ph.tsv \(T_B\) T_L.tsv
\(T_D\) T_D.tsv \(T_{\eta\mathrm{e}{6}}\) Random.tsv

Note: In the .tsv files, the column labelled as ‘diameter’ represents the value for the integer \(k\).


License


  1. Schäfer, A.: Exact algorithms for s-club finding and related problems. Ph.D. thesis, Friedrich-Schiller-University Jena (2009).↩︎

  2. Our dynamic programming algorithm for a maximum distance \(s\)-clique set on trees.↩︎

  3. A maximum distance \(s\)-clique on trees.↩︎

  4. Van Rossum, G., Drake, F.L.: Python 3 Reference Manual. CreateSpace, Scotts Valley, CA (2009)↩︎

  5. Hagberg, A., Swart, P., S Chult, D.: Exploring network structure, dynamics, and function using networkx. Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (2008)↩︎

  6. Himsolt, M., 1997. GML: A portable graph file format. Technical report, Universitat Passau.↩︎