skip to primary navigationskip to content

Some errors for the CBU adaptive queuing system and their possible explanation

Up to General Errors

Some errors for the CBU adaptive queuing system and their possible explanation

Posted by Dr Fawad Jamshed at January 13. 2014

java.lang.OutOfMemoryError: GC overhead limit exceeded

The maximum amount of data that can be transferred in a single chunk between client and workers in the execution of a parfor-loop is determined by the JVM memory allocation limit. Older versions of Parallel Computing Toolbox had restrictive data size limitations for transfers to and from PARFOR. This limitation got fixed in R2013a and the approximate size limitation depends on your system architecture: (64bit computers got 2 GB limit whereas 32bit got 600 MB limit).

Re: Some errors for the CBU adaptive queuing system and their possible explanation

Posted by Dr Fawad Jamshed at January 15. 2014

Sometimes after a successful run on the CBU Adaptive Queuing system a script might generate some error when it is execute again. 

I think the problem was that the RSA keys for some of the machines were missing / wrong. Whether the pool worked or not would depend on exactly which machines you were allocated - if you were assigned to one of the machines with bad RSA keys, matlab wouldn't be able to ssh onto that machine, and the cluster validation would fail

Re: Some errors for the CBU adaptive queuing system and their possible explanation

Posted by Dr Fawad Jamshed at January 15. 2014

Error using parallel.Cluster/matlabpool (line 64)
Failed to open matlabpool. (For information in addition to the causing error, validate the profile 'CBU_Cluster_Import3' in the Cluster Profile
Manager.)

Error in Recipe_MEG_searchlight_source_queue (line 34)
matlabpool(P)

Caused by:

Error using parallel.internal.pool.InteractiveClient/start (line 281)
Failed to start matlabpool.
Error using parallel.Job/submit (line 304)
Error executing the PBS script command 'qsub'. The reason given is
qsub: Job exceeds queue resource limits MSG=cannot locate feasible node


Error message when requested resources which cannot be fulfill by any node. Try to limit amount of resources collected.

Re: Some errors for the CBU adaptive queuing system and their possible explanation

Posted by Dr Fawad Jamshed at January 22. 2014

Starting matlabpool using the 'CBU_Cluster' profile ... License checkout failed.
License Manager Error -4
Maximum number of users for Distrib_Computing_Toolbox reached.
Try again later.
To see a list of current users use the lmstat utility or contact your License Administrator.

Troubleshoot this issue by visiting:
http://www.mathworks.com/support/lme/R2012a/4

Diagnostic Information:
Feature: Distrib_Computing_Toolbox
License path:
/home/cw03/.matlab/R2012a_licenses:/hpc-software/matlab/r2012a/licenses/license.dat:/hpc-software/matlab/r2012a/licenses/network.lic

Licensing error: -4,132.
stopped.

Error using parallel.Cluster/matlabpool (line 64) Failed to open matlabpool. (For information in addition to the causing error, validate the profile 'CBU_Cluster' in the Cluster Profile Manager.)

Caused by:
Error using distcomp.interactiveclient/start (line 88)
Failed to start matlabpool.
This is caused by:
Unable to checkout a license for the Parallel Computing Toolbox

 

CBU have a limited number of licenses for some toolboxes. things. CBU got 20 licenses for Distributing computing toolbox and 63 licenses for the statistics toolbox.  It's unusual for them all to be used at the same time, but it could happen if someone is running a lot of workers that all need it.

If you get an error similar to the above error try to reduce number of nodes or workers.  Otherwise try to use a different login node to launch your script. As a last resort you can wait before you relaunch your scripts. 

Powered by Ploneboard