Tuesday, April 9, 2013

Benchmarking Java Linear Algebra Libraries for BIGS

I think we all agree in that we need a standard, efficient and easy way of performing linear algebra operations in BIGS. Not necessarily parallelized operations but operations performed in each worker (even in one core of each worker) like multiplying a pair of matrices or computing the norm of a matrix.
It is truth that we previously used JAMA, in the BIGS' K-Means for example. But recent benchmarks made over K-Means by Raúl show that the linear algebra operations being made are a bottleneck: 

en las pruebas, con unos 1000 data items de mnist, comparo los métodos norm2  y minus de Jama con implementaciones "straight-forward"
norm2 (straight-forward) 55.00ms
norm2 (jama)                48.99sec
minus (straight-forward) 9.00ms
minus (java)                 87.00ms

Having this in mind we did an evaluation of some JAVA linear algebra libraries, taking into account the following factors:

  • Active development
  • Ease of usage
  • Speed
  • Portability
  • License
Looking for something "Bueno, bonito y barato" by instruction of El Comandante Raúl.


As far as I know they're all equally "portable" and they all have JavaDocs. The speed here is measured with respect to the straight-forward implementation (the straight-forward implementation has a speed of 1.0 and corresponds to using fors and native operations in JAVA), i.e. the speed tells how much times slower (or faster if less than 1) it ran (on average over 10 runs) with respect to the straight forward implementation (here the smaller the better). The speed measure shown is an a
verage over runs with different matrix sizes (ranging from 1000x1000 to 10000x10000), so you would expect a linear increase in this measure as you go up by one order of magnitude in the size of the matrices used.

JAMA
  • Last release date : 09/11/2012
  • Ease of usage : Method's names are pretty much what you would expeect (plus, minus, times, etc...). The JavaDoc is rather short and kind of uninformative. (No muy bonito)
  • Licence : "Released to the public domain" (Barato)
  • Speed in substraction : (Approx.) 11.0
  • Speed in frobenius norm : (Approx.) 17.0
EJML
  • Last release date : 04/12/2012
  • Ease of usage : Method's names are pretty much what you would expeect (plus, minus, times, etc...). There's a mapping between some MATLAB commands and the EJML methods. And has many useful linear algebra operations already implemented. (Bonito)
  • Licence : LGPL (Barato)
  • Speed in substraction : Around 3.5 (Más o menos bueno)
  • Speed in frobenius norm : Around 7.0

Colt
  • Last release date : 10/09/2004
  • Ease of usage : There's a bit of work involved in implementing simple things as the developer must adapt to a special "operations framework" proposed by the library, i.e. it is not SO straightforward to do things like sum, substract or norms.
  • Licence : Copyrighted.
  • Speed in substraction : Around 0.6, i.e. faster than s.f.
  • Speed in frobenius norm : Around 1.3

There's already a benchmark of all this (and more) libraries here, however it was made by the author of EJML so it was not that thrusthwortly.