Since every chip is different, the techniques ensuring faster inference vary from one manufacturer to another. The steps necessary for running a model are well documented by the manufacturer.
A rule of thumb is to not use exotic operations. If one of the layers is running operations that include conditions or branching, it is likely that the chip will not support it. The operations will have to run on the CPU, making the whole process slower. It is therefore recommended to only use standard operations—convolution, pooling, and fully connected layers.