Community analysis software

Contents

About

We are glad that several people sent us positive comments to our WWW07 poster paper whose title is "Finding community structure in Mega-scale social networks [extended abstract]". We also received some requests for our Java implementation of our idea. We apologize that we have not made publicly released our software earlier. I have been involved in heavy teaching duty and had limited time to maintain the software.

Finally it is made available! Please download the zip archive. It contains a command-line tool and sample files. Comments and bug reports are welcome.

Term to use

  • Free of charge for academic purpose. Otherwise, contact me. Contact information is available at the bottom of this page.
  • We would be delighted if you include reference to the following paper in your research publication. Also please let us know of your work and send us your paper.

    Ken Wakita and Toshiyuki Tsurumi, "Finding community structure in a mega-scale social networking service", in proceedings of IADIS international conference on WWW/Internet 2007, pp. 153-162, October 2007.

Requirements

Installation

  1. Download and unpack our software. You can find our command-line tool (tool.pl) in the folder/directory named "wt2007www".
  2. Download and unpack the CSV tokenizer. You find a file named "csv.jar". Put it in the wt2007www folder/directory.
  3. (Optionally) Use your favorite text editor and modify a Java runtime settings (line 15 of tool.pl). By default settings, 1GB heap area is used. If you computer comes with less memory lower this value. If your dataset does not fit use larger value.

Usage

Our software takes a network dataset and performs bottom-up hierarchical clustering in a manner similar to the idea presented by Clauset, Newman, and Morgan’s famous paper. The input dataset and output are both in binary form.

Please prepare your dataset in a textform. You can use our tool to convert your text-based dataset into a binary form, perform hierarchical clustering, and view the dendrogram and clusters.

Preparation of your dataset

Currently, our tool supports two types of data format: adjacency matrix and CSV. Please consult "karate.am" and "karate.csv" to find the structure of these file format.

Note 1
In AM form, every line have exactly the same number of 0 or 1 as the number of nodes in the graph.
Note 2
In CSV form, each line starts with node number that starts with 1 and grows incrementally.

Converting your dataset into a binary form.

If you have prepared your dataset in an adjacency matrix run our tool in the following manner.

./tool.pl --adjacency-matrix dataset.am --socnet dataset

If your dataset is in CSV form, then:

./tool.pl --csv dataset.csv --socnet dataset

You use –adjacency-matrix or –csv option to specify the type and name of your textual dataset and –socnet option to specify the name of the binary dataset. The binary form comprises two files. In the above cases, dataset.idx and dataset.db are produces.

Running the clustering algorithm

Execute the following command line:

./tool.pl --analyze dataset

We offer two different analysis algorithm HN and HE. By default HN is taken. If you want to state the type of algorithm, you can use the –algorithm option:

./tool.pl --analyze dataset --algorith HE

You can specify the output file names with –history-path and –cluster-path options.

Viewing the result

The result of the clustering is stored in history database and cluster database in a binary form. You can use the following command lines to see their contents:

./tool.pl --view-clusters dataset-history

and the following command line shows the dendrogram:

./tool.pl --view-history dataset-history

Contact information

Send me (Ken Wakita) an email to the following address (substitute at by @ and dot by . and remove empty spaces).

wakita at is dot titech dot ac dot jp

Limitations

  1. The software does not support directed graphs nor weighted graphs.

Issues

tool.pl does not work on Windows (fixed)

Guillaume Roelly pointed me a problem running the software on Windows platform. The issue is related the way Java interpreter accepts classpath differently between Windows and other platforms. Now I am working on this issue. This issue is already fixed by the latest distribution. If you have an old one, please check the latest.

Following is a sample session:

run --command am2sdb --input karate
run --command csv2sdb --input karate
run --command sym2sdb --input karate
run --command analyze --input karate
run --command analyze --algorithm HN --input karate
run --command analyze --algorithm HE --input karate
run --command view-history --input karate
run --command view-clusters --input karate

This new revision offers more simpler command line syntax and is recommended for Linux and Windows users as well.

Issue 2:

Gang Su sent me a bug report that says when he omits –algorithm option on the original revision of the software, the clustering process generates the following error message:

Exception in thread "main" java.lang.NoClassDefFoundError:
jp/ac/titech/is/socialnet/clustering//Analyze

This is my fault as I did not specify the safe default for –algorithm option. When you run "tool.pl" please remember that you always need to specify the –algorithm option. You can omit –algorithm for revision 159 and later.

Publications

Please refer to Reference section of this page.


  1. I tried to run the algorithm. I unpacked the wt2007www-160.zip, but i can’t find, the tool.pl in the wt2007www folder.

    Thank you in anticipation
    Albert Soos

    Reply

    1. Hi Albert,

      We are sorry that the documentation you found here is out of date. For the moment, please run the run.sh shell script if you are on UNIX or run.bat if you are on Windows. I am going to update the documentation soon.

      Sorry for your inconvenience.

      Ken

      Reply

  2. Dear Ken!

    I’ m on windows platform. I have installed active perl interpreter, and jre6. I put csv.jar in the wt2007www folder. I tried:

    run –command am2sdb –input karate
    run –command csv2sdb –input karate
    run –command analyze –algorithm HN –input karate
    run –command analyze –algorithm HE –input karate

    I got error messages: Java.lang.NoClassDefFoundError =java –Xmx1G –classpath $CLASSPATH $MAIN, could not find the main class.
    The export and the MAIN commands are unknown for the system. I don’t know, what i’m doing wrong.

    I’ m sorry for the disturbance.

    Albert

    Reply

    1. Something simple like this should work.
      java -Xmx1g -Djava.ext.dirs=jar jp.ac.titech.is.socialnet.clustering.Main

      Reply

  3. Hi Ken

    I am working on a research project and wish to use your community detection algorithm in not only finding the final clusters but also to observe the way they are getting formed. I want to see that at every step, which two clusters are merging. Using your commands, this does not seem possible. Also you have not included the source code in the zip archive so that I could change it accordingly.

    Thanks
    Sahil

    Reply

    1. Mr. Sahil,
      Could you send me your email address to “wakita at is dot titech dot ac dot jp”?

      Reply

  4. Thanks for making this available! I just wanted to notify you that the contents of run.bat are identical to the run.sh (and thus not working on windows). I think this is the reason for the problem that Albert has reported.

    Reply

  5. Hi Ken

    I am working on a research project and wish to compare your community detection algorithm with my algorithm. I want to see that at every step, which two clusters are merging. Using your commands, this does not seem possible. Also you have not included the source code in the zip archive. I wish that you can send me the source code.

    Thank you.
    Darine

    Reply

  6. A newer revision of the software is coming soon.

    I packed all the necessary jar files into a single jar file. Now you can run the software simply:

    java -jar wt2007www.jar …

    You do not have to set up class path and mess around run.sh/run.bat/run.pl scripts.

    Added functionality is to dump the clustering process using view-history command.

    The software is available from
    http://www.is.titech.ac.jp/~wakita/software/wt2007www-161.zip

    Reply

  7. Mr Ken,

    thank you very very much Sir :)
    and I’ m so sorry for the disturbance.

    Darine

    Reply

  8. Hi Mr Ken, I obtain the following error when I run “java -jar wt2007www.jar –command am2sdb –input karate”:

    Exception in thread “main” java.lang.UnsupportedClassVersionError: Bad version number in .class file
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:676)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:317)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:280)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:375)

    Im running on MacOSX, java version “1.5.0_30″
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_30-b03-389-9M3425)

    Reply

  9. Hi mr Ken I manage to run version 160. In my network I obtain 13 clusters but when I run the command ‘view-clusters’ it only shows me 3 clusters, why is this?

    Reply

    1. Great! I will look into your view-clusters problem. Please be waited.

      Reply

  10. Sorry to bother you again Mr Ken, I have this two questions:

    How could I see the modularity of the Network?
    AVG K size refers to the average size (number of elements) in every cluster?

    Reply

  11. Greetings, I’m running the code for a 10.000 node network with the java runtime memory set to 3 gb (Xmx3g). When I run the analyze command i get a Out Of Memory Error. How much memory I need for such a Network? How much for a one hundred thousand node network?

    (I’m running on dual core machine with 4 gb of RAM)

    Reply

  12. Dear Mr Wakita,

    Thank you very much for sharing your code with the research community.
    I’m working on community detection methods at Université catholique de Louvain (Belgium) and I’m particularly interested in weighted networks.
    Is your method able to tackle the community detection problem with weighted adjacency matrix given as input?
    If not, do you think it is possible for you to develop this additional option within you java environment.
    Thank you very much,

    Best Regards,
    Arnaud Browet

    Reply

Spam Protection by WP-SpamFree