Though the DL4J setup doesn't seem complex, installation issues can still happen because of different OSes or applications installed on the system, and so on. CUDA installation issues are not within the scope of this book. Maven build issues that are due to unresolved dependencies can have multiple causes. If you are working for an organization with its own internal repositories and proxies, then you need to make relevant changes in the pom.xml file. These issues are also outside the scope of this book. In this recipe, we will walk through the steps to mitigate common installation issues with DL4J.
Troubleshooting installation issues
Getting ready
The following checks are mandatory before we proceed:
- Verify Java and Maven are installed and the PATH variables are configured.
- Verify the CUDA and cuDNN installations.
- Verify that the Maven build is successful and the dependencies are downloaded at ~/.m2/repository.
How to do it...
- Enable logging levels to yield more information on errors:
Logger log = LoggerFactory.getLogger("YourClassFile.class");
log.setLevel(Level.DEBUG);
- Verify the JDK/Maven installation and configuration.
- Check whether all the required dependencies are added in the pom.xml file.
- Remove the contents of the Maven local repository and rebuild Maven to mitigate NoClassDefFoundError in DL4J. For Linux, this is as follows:
rm -rf ~/.m2/repository/org/deeplearning4j
rm -rf ~/.m2/repository/org/datavec
mvn clean install
- Mitigate ClassNotFoundException in DL4J. You can try this if step 4 didn't help to resolve the issue. DL4J/ND4J/DataVec should have the same version. For CUDA-related error stacks, check the installation as well.
How it works...
To mitigate exceptions such as ClassNotFoundException, the primary task is to verify we installed the JDK properly (step 2) and whether the environment variables we set up point to the right place. Step 3 is also important as the missing dependencies result in the same error.
In step 4, we are removing redundant dependencies that are present in the local repository and are attempting a fresh Maven build. Here is a sample for NoClassDefFoundError while trying to run a DL4J application:
root@instance-1:/home/Deeplearning4J# java -jar target/dl4j-1.0-SNAPSHOT.jar
09:28:22.171 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
Exception in thread "main" java.lang.NoClassDefFoundError: org/nd4j/linalg/api/complex/IComplexDouble
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5529)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5477)
at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:210)
at org.datavec.image.transform.PipelineImageTransform.(PipelineImageTransform.java:93)
at org.datavec.image.transform.PipelineImageTransform.(PipelineImageTransform.java:85)
at org.datavec.image.transform.PipelineImageTransform.(PipelineImageTransform.java:73)
at examples.AnimalClassifier.main(AnimalClassifier.java:72)
Caused by: java.lang.ClassNotFoundException: org.nd4j.linalg.api.complex.IComplexDouble
One possible reason for NoClassDefFoundError could be the absence of required dependencies in the Maven local repository. So, we remove the repository contents and rebuild Maven to download the dependencies again. If any dependencies were not downloaded previously due to an interruption, it should happen now.
Here is an example of ClassNotFoundException during DL4J training:
Again, this suggests version issues or redundant dependencies.
There's more...
In addition to the common runtime issues that were discussed previously, Windows users may face cuDNN-specific errors while training a CNN. The actual root cause could be different and is tagged under UnsatisfiedLinkError:
o.d.n.l.c.ConvolutionLayer - Could not load CudnnConvolutionHelper
java.lang.UnsatisfiedLinkError: no jnicudnn in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) ~[na:1.8.0_102]
at java.lang.Runtime.loadLibrary0(Runtime.java:870) ~[na:1.8.0_102]
at java.lang.System.loadLibrary(System.java:1122) ~[na:1.8.0_102]
at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:945) ~[javacpp-1.3.1.jar:1.3.1]
at org.bytedeco.javacpp.Loader.load(Loader.java:750) ~[javacpp-1.3.1.jar:1.3.1]
Caused by: java.lang.UnsatisfiedLinkError: C:\Users\Jürgen.javacpp\cache\cuda-7.5-1.3-windows-x86_64.jar\org\bytedeco\javacpp\windows-x86_64\jnicudnn.dll: Can't find dependent libraries
at java.lang.ClassLoader$NativeLibrary.load(Native Method) ~[na:1.8.0_102]
Perform the following steps to fix the issue:
- Download the latest dependency walker here: https://github.com/lucasg/Dependencies/.
- Add the following code to your DL4J main() method:
try {
Loader.load(<module>.class);
} catch (UnsatisfiedLinkError e) {
String path = Loader.cacheResource(<module>.class, "windows-x86_64/jni<module>.dll").getPath();
new ProcessBuilder("c:/path/to/DependenciesGui.exe", path).start().waitFor();
}
- Replace <module> with the name of the JavaCPP preset module that is experiencing the problem; for example, cudnn. For newer DL4J versions, the necessary CUDA libraries are bundled with DL4J. Hence, you should not face this issue.
If you feel like you might have found a bug or functional error with DL4J, then feel free to create an issue tracker at https://github.com/eclipse/deeplearning4j.
You're also welcome to initiate a discussion with the Deeplearning4j community here: https://gitter.im/deeplearning4j/deeplearning4j.