Contents †
About †
We are glad that several people sent us positive comments to our WWW07 poster paper whose title is "Finding community structure in Mega-scale social networks [extended abstract]". We also received some requests for our Java implementation of our idea. We apologize that we have not made publicly released our software earlier. I have been involved in heavy teaching duty and had limited time to maintain the software.
Finally it is made available! Please download the zip archive. It contains a command-line tool and sample files. Comments and bug reports are welcome.
Term to use †
- Free of charge for academic purpose. Otherwise, contact me. Contact information is available at the bottom of this page.
- We would be delighted if you include reference to the following paper in your research publication. Also please let us know of your work and send us your paper.
Ken Wakita and Toshiyuki Tsurumi, "Finding community structure in a mega-scale social networking service", in proceedings of IADIS international conference on WWW/Internet 2007, pp. 153-162, October 2007.
Requirements †
- A Perl interpreter,
- Java 2 runtime environment,
- CSV tokenizer software, and
- our software.
Installation †
- Download and unpack our software. You can find our command-line tool (tool.pl) in the folder/directory named "wt2007www".
- Download and unpack the CSV tokenizer. You find a file named "csv.jar". Put it in the wt2007www folder/directory.
- (Optionally) Use your favorite text editor and modify a Java runtime settings (line 15 of tool.pl). By default settings, 1GB heap area is used. If you computer comes with less memory lower this value. If your dataset does not fit use larger value.
Usage †
Our software takes a network dataset and performs bottom-up hierarchical clustering in a manner similar to the idea presented by Clauset, Newman, and Morgan’s famous paper. The input dataset and output are both in binary form.
Please prepare your dataset in a textform. You can use our tool to convert your text-based dataset into a binary form, perform hierarchical clustering, and view the dendrogram and clusters.
Preparation of your dataset †
Currently, our tool supports two types of data format: adjacency matrix and CSV. Please consult "karate.am" and "karate.csv" to find the structure of these file format.
- Note 1
- In AM form, every line have exactly the same number of 0 or 1 as the number of nodes in the graph.
- Note 2
- In CSV form, each line starts with node number that starts with 1 and grows incrementally.
Converting your dataset into a binary form. †
If you have prepared your dataset in an adjacency matrix run our tool in the following manner.
./tool.pl --adjacency-matrix dataset.am --socnet dataset
If your dataset is in CSV form, then:
./tool.pl --csv dataset.csv --socnet dataset
You use –adjacency-matrix or –csv option to specify the type and name of your textual dataset and –socnet option to specify the name of the binary dataset. The binary form comprises two files. In the above cases, dataset.idx and dataset.db are produces.
Running the clustering algorithm †
Execute the following command line:
./tool.pl --analyze dataset
We offer two different analysis algorithm HN and HE. By default HN is taken. If you want to state the type of algorithm, you can use the –algorithm option:
./tool.pl --analyze dataset --algorith HE
You can specify the output file names with –history-path and –cluster-path options.
Viewing the result †
The result of the clustering is stored in history database and cluster database in a binary form. You can use the following command lines to see their contents:
./tool.pl --view-clusters dataset-history
and the following command line shows the dendrogram:
./tool.pl --view-history dataset-history
Contact information †
Send me (Ken Wakita) an email to the following address (substitute at by @ and dot by . and remove empty spaces).
wakita at is dot titech dot ac dot jp
Limitations †
- The software does not support directed graphs nor weighted graphs.
Issues †
tool.pl does not work on Windows (fixed) †
Guillaume Roelly pointed me a problem running the software on Windows platform. The issue is related the way Java interpreter accepts classpath differently between Windows and other platforms. Now I am working on this issue. This issue is already fixed by the latest distribution. If you have an old one, please check the latest.
Following is a sample session:
run --command am2sdb --input karate run --command csv2sdb --input karate run --command sym2sdb --input karate run --command analyze --input karate run --command analyze --algorithm HN --input karate run --command analyze --algorithm HE --input karate run --command view-history --input karate run --command view-clusters --input karate
This new revision offers more simpler command line syntax and is recommended for Linux and Windows users as well.
Issue 2: †
Gang Su sent me a bug report that says when he omits –algorithm option on the original revision of the software, the clustering process generates the following error message:
Exception in thread "main" java.lang.NoClassDefFoundError: jp/ac/titech/is/socialnet/clustering//Analyze
This is my fault as I did not specify the safe default for –algorithm option. When you run "tool.pl" please remember that you always need to specify the –algorithm option. You can omit –algorithm for revision 159 and later.
Publications †
Please refer to Reference section of this page.
-
I tried to run the algorithm. I unpacked the wt2007www-160.zip, but i can’t find, the tool.pl in the wt2007www folder.
Thank you in anticipation
Albert Soos -
Dear Ken!
I’ m on windows platform. I have installed active perl interpreter, and jre6. I put csv.jar in the wt2007www folder. I tried:
run –command am2sdb –input karate
run –command csv2sdb –input karate
run –command analyze –algorithm HN –input karate
run –command analyze –algorithm HE –input karateI got error messages: Java.lang.NoClassDefFoundError =java –Xmx1G –classpath $CLASSPATH $MAIN, could not find the main class.
The export and the MAIN commands are unknown for the system. I don’t know, what i’m doing wrong.I’ m sorry for the disturbance.
Albert
-
Something simple like this should work.
java -Xmx1g -Djava.ext.dirs=jar jp.ac.titech.is.socialnet.clustering.Main
-
-
Hi Ken
I am working on a research project and wish to use your community detection algorithm in not only finding the final clusters but also to observe the way they are getting formed. I want to see that at every step, which two clusters are merging. Using your commands, this does not seem possible. Also you have not included the source code in the zip archive so that I could change it accordingly.
Thanks
Sahil -
Thanks for making this available! I just wanted to notify you that the contents of run.bat are identical to the run.sh (and thus not working on windows). I think this is the reason for the problem that Albert has reported.
-
Hi Ken
I am working on a research project and wish to compare your community detection algorithm with my algorithm. I want to see that at every step, which two clusters are merging. Using your commands, this does not seem possible. Also you have not included the source code in the zip archive. I wish that you can send me the source code.
Thank you.
Darine -
A newer revision of the software is coming soon.
I packed all the necessary jar files into a single jar file. Now you can run the software simply:
java -jar wt2007www.jar …
You do not have to set up class path and mess around run.sh/run.bat/run.pl scripts.
Added functionality is to dump the clustering process using view-history command.
The software is available from
http://www.is.titech.ac.jp/~wakita/software/wt2007www-161.zip -
Mr Ken,
thank you very very much Sir

and I’ m so sorry for the disturbance.Darine
-
Hi Mr Ken, I obtain the following error when I run “java -jar wt2007www.jar –command am2sdb –input karate”:
Exception in thread “main” java.lang.UnsupportedClassVersionError: Bad version number in .class file
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:676)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:317)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:280)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:375)Im running on MacOSX, java version “1.5.0_30″
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_30-b03-389-9M3425) -
Hi mr Ken I manage to run version 160. In my network I obtain 13 clusters but when I run the command ‘view-clusters’ it only shows me 3 clusters, why is this?
-
Sorry to bother you again Mr Ken, I have this two questions:
How could I see the modularity of the Network?
AVG K size refers to the average size (number of elements) in every cluster? -
Greetings, I’m running the code for a 10.000 node network with the java runtime memory set to 3 gb (Xmx3g). When I run the analyze command i get a Out Of Memory Error. How much memory I need for such a Network? How much for a one hundred thousand node network?
(I’m running on dual core machine with 4 gb of RAM)
-
Dear Mr Wakita,
Thank you very much for sharing your code with the research community.
I’m working on community detection methods at Université catholique de Louvain (Belgium) and I’m particularly interested in weighted networks.
Is your method able to tackle the community detection problem with weighted adjacency matrix given as input?
If not, do you think it is possible for you to develop this additional option within you java environment.
Thank you very much,Best Regards,
Arnaud Browet


16 comments
Comments feed for this article
Trackback link: http://ken-wakita.net/research/en/software/community-analysis/trackback/