As you're likely experienced with other binary classifiers, I thought it was wise to take a few sentences to talk about how to create some of the normal metrics used with more traditional binary classifiers.
One difference between the Keras functional API and what you might be used to in scikit-learn is the behavior of the .predict() method. When using Keras, .predict() will return an nxk matrix of k class probabilities for each of the n classes. For a binary classifier, there will be only one column, the class probability for class 1. This makes the Keras .predict() more like the .predict_proba() in scikit-learn.
When calculating precision, recall, or other class-based metrics, you'll need to transform the .predict() output by choosing some operating point, as shown in the following code:
def class_from_prob(x, operating_point...