The ArrayFire library provides a high-level abstraction that makes writing massively parallel programs much simpler to write. The underlying library is written in C++, and the Julia wrapper provides an Array abstraction that allows idiomatic Julia programs to be executed on the GPU.
To begin, install the ArrayFire library for your operating system from https://arrayfire.com/download/ and install it on your GPU machine. Once installed, ArrayFire provides a wrapper around an array that copies data from the CPU to the GPU, and performs operations on that data on the GPU cores. It really is that simple.
In the following example, we create a random 2D matrix, copy it to the GPU, multiply it by itself, and then copy the result back to the main memory:
julia> using ArrayFire
julia> a=rand(1000,1000);
1000×1000 Array{Float64,2}:
...
julia>...