Keras梳理

在tensorflow的官方文档：https://www.tensorflow.org/guide/keras?hl=zh-cn
原生文档：https://keras.io/

Keras中最重要的两个组件即：Model和Layer，二者在构建模型时核心的接口类似；Model还另外包含训练和测试相关的接口。

Model简介

关键成员

updates：返回模型中，从输入到输出的所有的layer中，layer.updates的op。
compile(...)：将模型所需要的其他核心组件配置好，比如metric、loss、optimizer。
evaluate(...)：在测试模式下，计算损失值和指标值。
fit(...)：模型训练，可以在callbacks中设置常用的内置Callback对象或者自定义Callback的对象。
predict(...)：生成预测。
load_weights(filepath, by_name=False)：加载模型权重。
save_weights(filepath, overwrite=True, save_format=None)：保存模型权重。
summary()：输出模型描述文本。

常见使用范式

直接使用内置Model

比如最简单的tf.keras.Sequential()。

model = tf.keras.Sequential()
# Adds a densely-connected layer with 64 units to the model:
model.add(layers.Dense(64, activation='relu'))
# Add another:
model.add(layers.Dense(64, activation='relu'))
# Add a softmax layer with 10 output units:
model.add(layers.Dense(10, activation='softmax'))

优点：简单安全。
缺点：只能用于层堆叠的简单模型，不能表示其他拓扑结构。

自定义模型

方式一：函数式API

inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

优点：可以构建复杂的函数拓扑（多输入、多输出、共享层）。相对比较安全。
缺点：没有子类化灵活。

方式二：模型子类化

class MyModel(tf.keras.Model):

    def __init__(self):
        super(MyModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)

    def call(self, inputs):
        x = self.dense1(inputs)
        return self.dense2(x)

model = MyModel()

优点：最灵活，在启用Eager Execution时，可以命令式地编写前向传播。
缺点：复杂性更高且用户出错率更高，model.inputs、model.outputs、model.to_yaml()、model.to_json()、model.get_config()、model.save()不可用。

继承Model必须重写的方法：

__init__：创建模型用到的层，并设置为类实例的属性。
call：定义前向传播，即连接上面定义的层（定义层的输入输出）。

自定义Layer

参考keras官方文档：Writing your own Keras layers

自定义layer最佳方法是继承tf.keras.Layer。

class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, num_outputs):
        super(MyDenseLayer, self).__init__()
        self.num_outputs = num_outputs

    def build(self, input_shape):
        self.kernel = self.add_variable("kernel", 
                                        shape=[int(input_shape[-1]), 
                                        self.num_outputs])

    def call(self, input):
        return tf.matmul(input, self.kernel)

layer = MyDenseLayer(10)
print(layer(tf.zeros([10, 5])))
print(layer.trainable_variables)

必须重写的方法：

__init__：所有和input无关的初始化。
call：定义前向传播。

keras文档中还提到必须重写compute_output_shape：根据input的shape，返回output的shape。

非必须但很重要：

build：
- 传入input的shape，然后做所有剩下的初始化。
- 也可以将所有初始化过程都写在__init__中，这就意味着参数的shape必须固定下来，input的shape也因此需要固定。
- 必须在最后调用self.built=True。
- 在模型第一次调用call()之前会调用build()。

import tensorflow as tf

class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, num_outputs):
        super(MyDenseLayer, self).__init__()
        self.num_outputs = num_outputs
    
    def build(self, input_shape):
        print(f"{self.name} build() is called.")
        self.kernel = self.add_variable("kernel", 
                                        shape=[int(input_shape[-1]), 
                                        self.num_outputs])
        self.built = True
    
    def call(self, input):
        print(f"{self.name} call() is called.")
        return tf.matmul(input, self.kernel)

layer = MyDenseLayer(10)

print(layer(tf.zeros([10, 5])))

1
2
3

my_dense_layer build() is called.
my_dense_layer call() is called.
Tensor("my_dense_layer/MatMul:0", shape=(10, 10), dtype=float32)

1	print(layer(tf.zeros([10, 5])))

1 2	my_dense_layer call() is called. Tensor("my_dense_layer_1/MatMul:0", shape=(10, 10), dtype=float32)

可见，在模型第一次调用call()之前会调用build()。

官方文档：

build(): Called once from __call__, when we know the shapes of inputs and dtype. Should have the calls to add_weight(), and then call the super’s build() (which sets self.built = True, which is nice in case the user wants to call build() manually before the first __call__).

源代码：

def _maybe_build(self, inputs):
  # Check input assumptions set before layer building, e.g. input rank.
  if self.built:
    return

  input_spec.assert_input_compatibility(
      self.input_spec, inputs, self.name)
  input_list = nest.flatten(inputs)
  if input_list and self._dtype is None:
    try:
      self._dtype = input_list[0].dtype.base_dtype.name
    except AttributeError:
      pass
  input_shapes = None
  if all(hasattr(x, 'shape') for x in input_list):
    input_shapes = nest.map_structure(lambda x: x.shape, inputs)
  # Only call `build` if the user has manually overridden the build method.
  if not hasattr(self.build, '_is_default'):
    self.build(input_shapes)
  # We must set self.built since user defined build functions are not
  # constrained to set self.built.
  self.built = True

_maybe_build会被__call__()调用，layer用self.built标识是否已经调用过build()。

复杂模型

参考tensorflow官方文档：Writing layers and models with TensorFlow Keras

在自定义layer中使用其他layer

一些很常用的layer比如Dense，很可能被一个layer作为一部分，那Dense的代码如果不想重写，就需要在自定义的Layer中使用其他Layer。

官方推荐在__init__方法里实例化所有的子layer。子layer里面的build会在外层layer的build被调用时调用。

参考：
https://www.tensorflow.org/alpha/guide/keras/custom_layers_and_models#layers_are_recursively_composable

class MyDenseBlock(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()
        self.dense_1 = MyDenseLayer(32)
        self.dense_2 = MyDenseLayer(1)
    
    def build(self, input_shape):
        print(f"{self.name} build() is called.")
        self.built = True
    
    def call(self, inputs):
        print(f"{self.name} call() is called.")
        x = self.dense_1(inputs)
        x = tf.nn.relu(x)
        x = self.dense_2(x)
        return x

layer = MyDenseBlock()

print(layer(tf.zeros([10, 5])))

my_dense_block build() is called.
my_dense_block call() is called.
my_dense_layer_1 build() is called.
my_dense_layer_1 call() is called.
my_dense_layer_2 build() is called.
my_dense_layer_2 call() is called.
Tensor("my_dense_block/my_dense_layer_2/MatMul:0", shape=(10, 1), dtype=float32)

在自定义model中使用其他model

这个方法也可以使用，但也有一些坑。
如果被使用的子model是使用函数式api定义的，一般可以正常使用，但如果是子类的方式，则必须重写build和compute_output_shape如下：

def build(self, input_shape):
    self.built = True

def compute_output_shape(self, input_shape):
  return tf.TensorShape([input_shape[0], 200])

原因是子model会被当做layer使用，这两个函数经常会被一些layer的wrapper调用，所以需要加上（血的教训）。