Half precision inference
This could reduce inference time significantly. Experiments have shown that performance doesn't suffer too much from it.
We could rely on torch.autocast
. Be careful not to break CPU inference.
This could reduce inference time significantly. Experiments have shown that performance doesn't suffer too much from it.
We could rely on torch.autocast
. Be careful not to break CPU inference.