Community analysis software

Contents

About

We are glad that several people sent us positive comments to our WWW07 poster paper whose title is "Finding community structure in Mega-scale social networks [extended abstract]". We also received some requests for our Java implementation of our idea. We apologize that we have not made publicly released our software earlier. I have been involved in heavy teaching duty and had limited time to maintain the software.

Finally it is made available! Please download the zip archive. It contains a command-line tool and sample files. Comments and bug reports are welcome.

Term to use

  • Free of charge for academic purpose. Otherwise, contact me. Contact information is available at the bottom of this page.
  • We would be delighted if you include reference to the following paper in your research publication. Also please let us know of your work and send us your paper.

    Ken Wakita and Toshiyuki Tsurumi, "Finding community structure in a mega-scale social networking service", in proceedings of IADIS international conference on WWW/Internet 2007, pp. 153-162, October 2007.

Requirements

Installation

  1. Download and unpack our software. You can find our command-line tool (tool.pl) in the folder/directory named "wt2007www".
  2. Download and unpack the CSV tokenizer. You find a file named "csv.jar". Put it in the wt2007www folder/directory.
  3. (Optionally) Use your favorite text editor and modify a Java runtime settings (line 15 of tool.pl). By default settings, 1GB heap area is used. If you computer comes with less memory lower this value. If your dataset does not fit use larger value.

Usage

Our software takes a network dataset and performs bottom-up hierarchical clustering in a manner similar to the idea presented by Clauset, Newman, and Morgan’s famous paper. The input dataset and output are both in binary form.

Please prepare your dataset in a textform. You can use our tool to convert your text-based dataset into a binary form, perform hierarchical clustering, and view the dendrogram and clusters.

Preparation of your dataset

Currently, our tool supports two types of data format: adjacency matrix and CSV. Please consult "karate.am" and "karate.csv" to find the structure of these file format.

Note 1
In AM form, every line have exactly the same number of 0 or 1 as the number of nodes in the graph.
Note 2
In CSV form, each line starts with node number that starts with 1 and grows incrementally.

Converting your dataset into a binary form.

If you have prepared your dataset in an adjacency matrix run our tool in the following manner.

./tool.pl --adjacency-matrix dataset.am --socnet dataset

If your dataset is in CSV form, then:

./tool.pl --csv dataset.csv --socnet dataset

You use –adjacency-matrix or –csv option to specify the type and name of your textual dataset and –socnet option to specify the name of the binary dataset. The binary form comprises two files. In the above cases, dataset.idx and dataset.db are produces.

Running the clustering algorithm

Execute the following command line:

./tool.pl --analyze dataset

We offer two different analysis algorithm HN and HE. By default HN is taken. If you want to state the type of algorithm, you can use the –algorithm option:

./tool.pl --analyze dataset --algorith HE

You can specify the output file names with –history-path and –cluster-path options.

Viewing the result

The result of the clustering is stored in history database and cluster database in a binary form. You can use the following command lines to see their contents:

./tool.pl --view-clusters dataset-history

and the following command line shows the dendrogram:

./tool.pl --view-history dataset-history

Contact information

Send me (Ken Wakita) an email to the following address (substitute at by @ and dot by . and remove empty spaces).

wakita at is dot titech dot ac dot jp

Limitations

  1. The software does not support directed graphs nor weighted graphs.

Issues

tool.pl does not work on Windows (fixed)

Guillaume Roelly pointed me a problem running the software on Windows platform. The issue is related the way Java interpreter accepts classpath differently between Windows and other platforms. Now I am working on this issue. This issue is already fixed by the latest distribution. If you have an old one, please check the latest.

Following is a sample session:

run --command am2sdb --input karate
run --command csv2sdb --input karate
run --command sym2sdb --input karate
run --command analyze --input karate
run --command analyze --algorithm HN --input karate
run --command analyze --algorithm HE --input karate
run --command view-history --input karate
run --command view-clusters --input karate

This new revision offers more simpler command line syntax and is recommended for Linux and Windows users as well.

Issue 2:

Gang Su sent me a bug report that says when he omits –algorithm option on the original revision of the software, the clustering process generates the following error message:

Exception in thread "main" java.lang.NoClassDefFoundError:
jp/ac/titech/is/socialnet/clustering//Analyze

This is my fault as I did not specify the safe default for –algorithm option. When you run "tool.pl" please remember that you always need to specify the –algorithm option. You can omit –algorithm for revision 159 and later.

Publications

Please refer to Reference section of this page.


  1. I tried to run the algorithm. I unpacked the wt2007www-160.zip, but i can’t find, the tool.pl in the wt2007www folder.

    Thank you in anticipation
    Albert Soos

    Reply

    1. Hi Albert,

      We are sorry that the documentation you found here is out of date. For the moment, please run the run.sh shell script if you are on UNIX or run.bat if you are on Windows. I am going to update the documentation soon.

      Sorry for your inconvenience.

      Ken

      Reply

  2. Dear Ken!

    I’ m on windows platform. I have installed active perl interpreter, and jre6. I put csv.jar in the wt2007www folder. I tried:

    run –command am2sdb –input karate
    run –command csv2sdb –input karate
    run –command analyze –algorithm HN –input karate
    run –command analyze –algorithm HE –input karate

    I got error messages: Java.lang.NoClassDefFoundError =java –Xmx1G –classpath $CLASSPATH $MAIN, could not find the main class.
    The export and the MAIN commands are unknown for the system. I don’t know, what i’m doing wrong.

    I’ m sorry for the disturbance.

    Albert

    Reply

Spam Protection by WP-SpamFree