Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use toBinary method on large matrices #282

Open
HowardLander opened this issue Sep 28, 2016 · 16 comments
Open

Can't use toBinary method on large matrices #282

HowardLander opened this issue Sep 28, 2016 · 16 comments

Comments

@HowardLander
Copy link

The following code fails test class fails on a fairly small (but not sparse) matrix:

Here's the invocation:

/usr/java/jdk1.7.0_51/bin/java -Xmx8000m -cp .:./la4j-0.6.0.jar MatrixTestBug
java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
at org.la4j.matrix.sparse.CRSMatrix.toBinary(CRSMatrix.java:1007)
at MatrixTestBug.main(MatrixTestBug.java:13)

and here's the code (MatrixTestBug.java)

import java.io.;
import org.la4j.
;

public class MatrixTestBug {

public static void main(String[] args) {

 org.la4j.matrix.sparse.CRSMatrix theMatrix = new            org.la4j.matrix.sparse.CRSMatrix(15000,15000);
   theMatrix.setAll(.5);

   try {
       FileOutputStream fos = new FileOutputStream("test.tmp");
       byte [] theMatrixBytes = theMatrix.toBinary();
   } catch (Exception e) {
       e.printStackTrace();
   }
}

}

One possible work around (at least for me) for this would be to restore the writeExternal and readExternal methods to the matrix classes. I suspect I could also ignore this problem if the matrix classes were serializable, as the only reason I am doing this is to write the matrix (and some other data) to a file. Any ideas?

Thanks
Howard

@vkostyukov
Copy link
Owner

As a workaround, you can traverse the matrix manually and write it into the file.

@HowardLander
Copy link
Author

Thanks for the response.

I suppose I could do that, but then wouldn't I lose the "sparsity" of it. Well, I guess I could encode a value (say -1) for the "empty cells", but that seems a bit ugly. How difficult would it be for these classes to be serializable? Do you think the default writeObject() and readObject() methods would work?

Howard

@vkostyukov
Copy link
Owner

Why losing sparsity? You can convert sparse matrices toMatrixMarket (which is super compact) and write it into a file. As far as I understand the problem is when you convert dense matrices.

Also feel free to send a PR with the solution - I'm happy to merge that in.

@vkostyukov
Copy link
Owner

BTW, you're trying to allocate 2gb matrix on a 800mb heap.

@HowardLander
Copy link
Author

Are you sure? Here's my command line:

/usr/java/jdk1.7.0_51/bin/java -Xmx8000m -cp .:./la4j-0.6.0.jar
MatrixTestBug

Howard

On 9/28/16 4:22 PM, Vladimir Kostyukov wrote:

BTW, you're trying to allocate 2gb matrix on a 800mb heap.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#282 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABNGrbx1wN0jY0xprA0ckhAtPxXXsbvsks5qusyigaJpZM4KJPlZ.

Howard Lander mailto:[email protected]
Senior Research Software Developer
Renaissance Computing Institute (RENCI) http://www.renci.org
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651

@vkostyukov
Copy link
Owner

Ahh I see.

Actually, this looks like an integer overflow. Your cardinality is already pretty close to Int.MaxValue (15000 * 15000) and when we're trying to allocatate the output array we multiply that by 8 (as per a single double value) and get the negative result.

One solution to this would be to use ByteBuffer instead of byte[] so we can concat them (and even possibly do off-heap).

@HowardLander
Copy link
Author

Hi Vladimir

I'm was unaware of Matrix Market. It looks interesting. I'll definitely
check it out. Thanks for mentioning it.

As far as the dense matrix goes, that was just my way of testing. We
have some matrices that, even sparse, are bigger than max int bytes, so
I was actually trying to break my code when I broke yours!

I was definitely considering trying to make these classes serializable
myself, but I didn't know if there was some reason you hadn't done
this. If you think it's a reasonable idea, I'd be happy to give it a try.

FYI: here's the code I was testing. I was wondering what would happen if
the binary matrix was larger than max int.

    try {
        FileOutputStream fos = new FileOutputStream("test.tmp");
        byte [] theMatrixBytes = theMatrix.toBinary();
        System.out.println("matrix size: " + theMatrixBytes.length);
        OutputStream theStream = new BufferedOutputStream(fos);
        ObjectOutputStream oos = new ObjectOutputStream(theStream);
        oos.writeInt(theMatrixBytes.length);
        oos.write(theMatrixBytes);
        oos.writeObject(nameSpace);
        oos.writeObject(theList);
        oos.close();
    } catch (Exception e) {
        e.printStackTrace();
    }

Howard

On 9/28/16 4:21 PM, Vladimir Kostyukov wrote:

Why losing sparsity? You can convert sparse matrices |toMatrixMarket|
(which is super compact) and write it into a file. As far as I
understand the problem is when you convert dense matrices.

Also feel free to send a PR with the solution - I'm happy to merge
that in.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#282 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABNGrXUJLmptYOJXbITZm_XyOQ51pRD7ks5qusw3gaJpZM4KJPlZ.

Howard Lander mailto:[email protected]
Senior Research Software Developer
Renaissance Computing Institute (RENCI) http://www.renci.org
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651

@vkostyukov
Copy link
Owner

I think if you swap to a dense (from CRS) you should be fine in that test. It's cheaper to write 15000x15000 dense matrix than to write 15000x15000 sparse matrix (with .setAll) since the later involves an overhead for maintaining sparcity.

@HowardLander
Copy link
Author

See below.

On 9/28/16 4:30 PM, Vladimir Kostyukov wrote:

Ahh I see.

Actually, this looks like an integer overflow. Your cardinality is
already pretty close to Int.MaxValue (15000 * 15000) and when we're
trying to allocatate the output array
https://github.com/vkostyukov/la4j/blob/master/src/main/java/org/la4j/matrix/sparse/CRSMatrix.java#L1021
we multiply that by 8 (as per a single double value) and get the
negative result.

One solution to this would be to use |ByteBuffer| instead of |byte[]|
so we can concat them (and even possibly do off-heap).

Would we have to change the toBinary() code, or is there something I can
change in mine?

Howard


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#282 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABNGrYyogahgjNaKgMSO9pr7oMV0-B7iks5qus5_gaJpZM4KJPlZ.

Howard Lander mailto:[email protected]
Senior Research Software Developer
Renaissance Computing Institute (RENCI) http://www.renci.org
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651

@vkostyukov
Copy link
Owner

Matrices/Vectors are used to be serializeable. I decided to not do that by the same reason no one else should do that. Java's serialization is completely broken (both performance-wise and security-wise). You should look at Jackson instead.

@vkostyukov
Copy link
Owner

I'm afraid not, there is no easy fix to that.

@HowardLander
Copy link
Author

I don't doubt what you are saying. What happens when I have a sparse
matrix 1 million by 1 million? What I I really need a way to write large
CRS matrices of unknown sparsity to file along with some other relevant
information.

Are the security issues you mentioned in your other message the reason
that the writeExternal and readExternal methods were removed? I'm not a
security expert at all.

Howard

On 9/28/16 4:33 PM, Vladimir Kostyukov wrote:

I think if you swap to a dense (from CRS) you should be fine in that
test. It's cheaper to write 15000x15000 dense matrix than to write
15000x15000 sparse matrix (with |.setAll|) since the later involves an
overhead for maintaining sparcity.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#282 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABNGrdejjv6xL8uO6F0QlQ6DRbz1C-7Xks5qus8RgaJpZM4KJPlZ.

Howard Lander mailto:[email protected]
Senior Research Software Developer
Renaissance Computing Institute (RENCI) http://www.renci.org
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651

@HowardLander
Copy link
Author

Potentially uninformed question: Is Jackson better than Gson? I've used
Gson.

On 9/28/16 4:35 PM, Vladimir Kostyukov wrote:

Matrices/Vectors are used to be serializeable. I decided to not do
that by the same reason no one else should do that. Java's
serialization is completely broken (both performance-wise and
security-wise). You should look at Jackson instead.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#282 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABNGrVPFBwmgZ2OQ-bMwGBq4QQRab49aks5qus-YgaJpZM4KJPlZ.

Howard Lander mailto:[email protected]
Senior Research Software Developer
Renaissance Computing Institute (RENCI) http://www.renci.org
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651

@vkostyukov
Copy link
Owner

Thre is a limit on a max sparse matrix size in la4j. It's MxN=Int.MaxValue. There is no way you can do 1m x 1m with la4j, just because there is a limit on array size on JVM (the maximum array you can allocate is Int.MaxValue, 2gb). If you need support for bigger matrices, I'd recommend either:

  1. Build you own thing with a sub-set of features
  2. Use something else (MTJ and Parallel Colt support CRS/CCS)

Looking at this comparison table (which I'm quite sceptical about): https://java-matrix.org/ you can try MTJ and Parallel Colt and see if their limits are true for sparse matrices (16gb and 64gb correspondingly).

@HowardLander
Copy link
Author

Thanks for the info. I probably should have said by now that I really
like la4j. It's easy to work with and has all of the features (except
for the serialization issues we have been discussing) that I need. I
don't know what motivates folks like you to provide these sorts of
libraries but I'm sure glad you do!

The largest matrix that we have currently is about 18600 by 18600, but
we do have plans for bigger. Square root of max int is about 46000, so I
guess you are right that we will probably have to look at something else
fairly soon. I had looked at colt before and there was something I
didn't like about it, but that was several years ago.

I may just live with the limitation for now. I do have a workaround

Thanks

Howard

On 9/28/16 4:52 PM, Vladimir Kostyukov wrote:

Thre is a limit on a max sparse matrix size in la4j. It's
|MxN=Int.MaxValue|. There is no way you can do 1m x 1m with la4j, just
because there is a limit on array size on JVM (the maximum array you
can allocate is Int.MaxValue, 2gb). If you need support for bigger
matrices, I'd recommend either:

  1. Build you own thing with a sub-set of features
  2. Use something else (MTJ and Parallel Colt support CRS/CCS)

Looking at this comparison table (which I'm quite sceptical about):
https://java-matrix.org/ you can try MTJ and Parallel Colt and see if
their limits are true for sparse matrices (16gb and 64gb correspondingly).


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#282 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABNGrcoTK9EBYmc_rJlDTad8rdkWGz2Bks5qutOIgaJpZM4KJPlZ.

Howard Lander mailto:[email protected]
Senior Research Software Developer
Renaissance Computing Institute (RENCI) http://www.renci.org
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651

@vkostyukov
Copy link
Owner

Yeah, you should be fine with 18600x18600 matrices. I admit there is a bug in .toBinary (w.r.t to integer overflow) and if it's possible to allocate this matrix with la4j it should be possible to convert that into a binary form. Although, I don't have a good solution yet for this and you should feel free to suggest some.

In the meantime, I'd advice to shove your matrices into Jackson and see if it works (it should!). You don't need Java's standard serialization for that. Jackson (If you're ok with JSON) or Kryo (for binary format). These libraries will do the trick much better (and faster and safer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants