Eager Execution in Tensorflow
Introduction
Eager Execution (EE) enables you to run operations immediately. In TensorFlow, you have to create a graph and run it within a session in order to execute the operations of the graph. On the other hand, EE enables you to run operations directly and inspect the output as the operations are executed. This is very useful especially for debugging. Moreover, EE is pythonic and intergrates pretty well with numpy so it makes programming really easy and flexible. The next version of TenosrFlow "2.0" will enable EE by default.
From Google AI article, these are some benefits of using EE
- Fast debugging with immediate run-time errors and integration with python tools
- Support for dynamic models using easy-to-use Python control flow
- Strong support for custom and higher-order gradients
- Almost all of the available TensorFlow operations
GITHUB REPO
In [2]:
!python --version
Installing dependencies (Python 3.5+)
In [3]:
# installing numpy
!pip install numpy
# installing tensorflow
!pip install tensorflow
# installing matplotlib
!pip install matplotlib
Enabling Eager Execution
In current versions of TensorFlow eager execution is not enabled by default so you have to enable it.
In [0]:
import tensorflow as tf
Check if eager execution is enabled
In [44]:
if (not tf.executing_eagerly()):
tf.enable_eager_execution()
print ("Enabled Eager Execution")
else:
print ("Eager Execution already enabled")
Executing Ops Eagerly
By perfoming operations you can see the output directly without creating a session.
In [45]:
x = [[2.]]
m = tf.square(x)
print(m)
You can call
.numpy
to retrieve the results of the tensor as a numpy array
In [46]:
m.numpy()
Out[46]:
You can also compute an operation including two tensors
In [47]:
a = tf.constant([[1, 2],
[3, 4]])
b = tf.constant([[2, 1],
[3, 4]])
ab = tf.matmul(a, b)
print('a * b = \n', ab.numpy())
Constants and Variables
tf.constant
, creates a constant tensor populated with the values as argument. The values are immutable.tf.Variable
, this method encapsultes a mutable tensor that can be changed later using assign
Creating a constant tensor
In [48]:
a = tf.constant([[2,3]])
print(a)
A constant tensor is immutable so you cannot assign a new value to it.
In [50]:
try:
a.assign([[3,4]])
except:
print('Exception raised: Constant tensor is immutable ')
On the other hand variables are mutable and can be assigned a new value
In [51]:
v = tf.Variable(5.)
print('Old value for v =', v.numpy())
v.assign(2.)
print('New value for v =', v.numpy())
You can also increment/decrement the value of a tensor
In [52]:
v.assign(2.)
print('value : ', v.numpy())
print('increment : ', tf.assign_add(v, 1).numpy())
print('decrement : ', tf.assign_sub(v, 1).numpy())
You can return many information from a tensor variable, like the name, type , shape and what device it executes on.
In [53]:
print('name : ', v.name)
print('type : ', v.dtype)
print('shape : ', v.shape)
print('device: ', v.device)
Gradient Evaluation
Gradient evaluation is very importnat machine learning because it is based on function optimization. You can use
tf.GradientTape()
method to record the gradient of an arbitrary function
In [54]:
w = tf.Variable(2.0)
#watch the gradient of the loss operation
with tf.GradientTape() as tape:
loss = w * w
grad = tape.gradient(loss, w)
print(f'The gradient of w^2 at {w.numpy()} is {grad.numpy()}')
You also compute the gradient directly using
gradients_function
. In this example we evaluate the gradient of the sigmoid function
Note that
In [55]:
import tensorflow.contrib.eager as tfe
def sigmoid(x):
return 1/(1 + tf.exp(-x))
grad_sigmoid = tfe.gradients_function(sigmoid)
print('The gradient of the sigmoid function at 2.0 is ', grad_sigmoid(2.0)[0].numpy())
You can also compute higher order derivatives by nesting a gradient functions. For instance,
In [56]:
dx = tfe.gradients_function
def log(x):
return tf.log(x)
dx_log = dx(log)
dx2_log = dx(dx(log))
dx3_log = dx(dx(dx(log)))
print('The first derivative of log at x = 1 is ', dx_log(1.)[0].numpy())
print('The second derivative of log at x = 1 is ', dx2_log(1.)[0].numpy())
print('The third derivative of log at x = 1 is ', dx3_log(1.)[0].numpy())
Custom Gradients
Some times the gradient is not what we want espeically if there is a problem in numerical instabilitiy. Consider the following function and its gradient
The gradient is
Note that at big values of the gradient value will blow up.
In [57]:
def logexp(x):
return tf.log(1 + tf.exp(x))
grad_logexp = tfe.gradients_function(logexp)
print('The gradient at x = 0 is ', grad_logexp(0.)[0].numpy())
print('The gradient at x = 100 is ', grad_logexp(100.)[0].numpy())
We can revaluate the gradient by overriding the gradient of the function. We can recompute the gradient as
In [58]:
@tf.custom_gradient
def logexp_stable(x):
e = tf.exp(x)
#dy is optional, allows computation of vector jacobian products for vectors other than the vector of ones.
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.log(1 + e), grad
grad_logexp_stable = tfe.gradients_function(logexp_stable)
print('The gradient at x = 100 is ', grad_logexp_stable(100.)[0].numpy())
Execution Callbacks
add_execution_callback
can be used to monitor the execution of operations. These functions will be called when any function is executed eagerly. In this example we record the operation names.
In [59]:
#create a callback that records the operation name
def print_op(op_type, op_name, attrs, inputs, outputs):
print(op_type)
#clear previous callbacks
tfe.clear_execution_callbacks()
#add the callback
tfe.add_execution_callback(print_op)
#try runing an operation
x = tf.pow(2.0, 3.0) - 3.0
#clear the callback
tfe.clear_execution_callbacks()
Object Oriented Metrics
You can use
metrics
to record tensors/values and operate on them at the end. This is useful when recording the training history and you want to evaluate it at the end. Use .result()
to evaluate the metric at the end.
In [60]:
m = tfe.metrics.Mean("loss")
#record the loss
m(2)
m(4)
print('The mean loss is ', m.result().numpy())
If you want to remove the recorded values, you can reinstintiate the variables
In [61]:
m.init_variables()
print('The mean loss is ', m.result().numpy())
Linear Regression
This example is refactored from https://www.tensorflow.org/guide/eager. We create a complete example of using linear regression to predict the paramters of the function
Given a point we want to predict the value of . We train the model on 1000 data pairs .
The model to learn is a linear model
Note that, we use
tf.GradientTape
to record the gradient with respect our trainable paramters.
We MSE to calcuate the loss
We use Gradient Descent to update the paramters
In [22]:
#1000 data points
NUM_EXAMPLES = 1000
#define inputs and outputs with some noise
X = tf.random_normal([NUM_EXAMPLES]) #inputs
noise = tf.random_normal([NUM_EXAMPLES]) #noise
y = X * 3 + 2 + noise #true output
#create model paramters with initial values
W = tf.Variable(0.)
b = tf.Variable(0.)
#training info
train_steps = 200
learning_rate = 0.01
for i in range(train_steps):
#watch the gradient flow
with tf.GradientTape() as tape:
#forward pass
yhat = X * W + b
#calcuate the loss (difference squared error)
error = yhat - y
loss = tf.reduce_mean(tf.square(error))
#evalute the gradient with the respect to the paramters
dW, db = tape.gradient(loss, [W, b])
#update the paramters using Gradient Descent
W.assign_sub(dW * learning_rate)
b.assign_sub(db* learning_rate)
#print the loss every 20 iterations
if i % 20 == 0:
print("Loss at step {:03d}: {:.3f}".format(i, loss))
print(f'W : {W.numpy()} , b = {b.numpy()} ')
A Simple CNN
Here we create a simple convolutional neural network (CNN)to recognize hand-written digits (MNIST). We start by creating a simple alexnet CNN model.
In [0]:
from tensorflow.keras.layers import Dense, Convolution2D, MaxPooling2D, Flatten, BatchNormalization, Dropout
from tensorflow.keras.models import Sequential
In [0]:
def create_model():
model = Sequential()
model.add(Convolution2D(filters = 16, kernel_size = 3, padding = 'same', input_shape = [28, 28, 1], activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(BatchNormalization())
model.add(Convolution2D(filters = 32, kernel_size = 3, padding = 'same', activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(units = 100, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(units = 10 , activation = 'softmax'))
return model
Create the model
In [25]:
model = create_model()
model.summary()
Look at the output by frowarding a batch of zero images.
In [26]:
import numpy as np
model(np.zeros((10, 28, 28, 1), np.float32))
Out[26]:
Load MNIST dataset
In [27]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
Look at the data
In [28]:
import matplotlib.pyplot as plt
print('The label is ',y_train[0])
plt.imshow(x_train[0])
plt.show()
Preprcocessing the dataset
In [0]:
import numpy as np
N = x_train.shape[0]
#normalization and convert to batch input
x_train = tf.expand_dims(np.float32(x_train)/ 255., 3)
x_test = tf.expand_dims(np.float32(x_test )/ 255., 3)
#one hot encoding
y_train = tf.one_hot(y_train, 10)
y_test = tf.one_hot(y_test , 10)
Get Random Batch
In [0]:
import numpy as np
def get_batch(batch_size = 32):
r = np.random.randint(0, N-batch_size)
return x_train[r: r + batch_size], y_train[r: r + batch_size]
Design the loss function, Gradient and Accuracy metric
In [0]:
#evaluate the loss
def loss(model, x, y):
prediction = model(x)
return tf.losses.softmax_cross_entropy(y, logits=prediction)
#record the gradient with respect to the model variables
def grad(model, x, y):
with tf.GradientTape() as tape:
loss_value = loss(model, x, y)
return tape.gradient(loss_value, model.variables)
#calcuate the accuracy of the model
def accuracy(model, x, y):
#prediction
yhat = model(x)
#get the labels of the predicted values
yhat = tf.argmax(yhat, 1).numpy()
#get the labels of the true values
y = tf.argmax(y , 1).numpy()
return np.sum(y == yhat)/len(y)
Intitalize the variables
In [0]:
i = 1
batch_size = 64
epoch_length = N // batch_size
epoch = 0
epochs = 5
#use Adam optimizer
optimizer = tf.train.AdamOptimizer()
#record epoch loss and accuracy
loss_history = tfe.metrics.Mean("loss")
accuracy_history = tfe.metrics.Mean("accuracy")
Training
In [33]:
while epoch < epochs:
#get next batch
x, y = get_batch(batch_size = batch_size)
# Calculate derivatives of the input function with respect to its parameters.
grads = grad(model, x, y)
# Apply the gradient to the model
optimizer.apply_gradients(zip(grads, model.variables),
global_step=tf.train.get_or_create_global_step())
#record the current loss and accuracy
loss_history(loss(model, x, y))
accuracy_history(accuracy(model, x, y))
if i % epoch_length == 0:
print("epoch: {:d} Loss: {:.3f}, Acc: {:.3f}".format(epoch, loss_history.result(), accuracy_history.result()))
#clear the history
loss_history.init_variables()
accuracy_history.init_variables()
epoch += 1
i += 1
Testing
In [34]:
accuracy(model, x_test, y_test)
Out[34]:
Save and Restore a Model
You can save training history then restore it later.
In [35]:
#create a directory for saving the model
import os
checkpoint_dir = 'model'
os.mkdir(checkpoint_dir)
#create a root for the checkpoint
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
root = tf.train.Checkpoint(optimizer=optimizer,
model=model,
optimizer_step=tf.train.get_or_create_global_step())
#save the model
root.save(file_prefix=checkpoint_prefix)
Out[35]:
Restore the model
In [36]:
#create an empty model
model = create_model()
#accuracy of the empty model
print('accuracy before retrieving the model ',accuracy(model, x_test, y_test))
#create a checkpoint variable
root = tf.train.Checkpoint(optimizer=optimizer,
model=create_model(),
optimizer_step=tf.train.get_or_create_global_step())
#restore the model
root.restore(tf.train.latest_checkpoint(checkpoint_dir))
#retrieve the trained model
model = root.model
print('accuracy after retrieving the model ',accuracy(model, x_test, y_test))
Compiling Functions into Callable Graph
defun
trace-compiles a Python function composed of TensorFlow operations into a callable that executes a tf.Graph
containing those operations.
In [0]:
# A simple example.
def f(x):
return tf.square(x)
#callable graph function
g = tfe.defun(f)
In [38]:
x = tf.constant(3.)
g(x).numpy()
Out[38]:
Alternatively, you can can use the following
In [0]:
@tf.contrib.eager.defun
def s(x):
return 1/(1+tf.exp(-x))
In [40]:
s(x).numpy()
Out[40]:
Moreover, there is a gain in execution time when you wrap functions in in
defun
. Let us use a simple function that contains multiple layers and see the difference in time.
In [0]:
def encoder(x):
filters = 128
x = Convolution2D(filters = filters, kernel_size = 3, padding = 'same', input_shape = [224, 224, 3], activation = 'relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
for i in range(0, 5):
filters = filters // 2
x = Convolution2D(filters = filters, kernel_size = 3, padding = 'same', activation = 'relu')(x)
x = MaxPooling2D(pool_size = (2,2))(x)
x = Flatten()(x)
x = Dense(units = 512)(x)
x = Dense(units = 1)(x)
return x
In [63]:
import time
x = tf.Variable(np.zeros((10, 224, 224, 3)))
#calculate the time in eager
start = time.time()
encoder(x)
print('Time in pure python',time.time()-start)
#calculate the time in graph
encoder_g = tfe.defun(encoder)
start = time.time()
encoder_g(x)
print('Time using defun ', time.time()-start)
Use Eager and Graphs
As you can see eager is all good but can it replace graphs? TensorFlow with graph is useful for distributed training, performance optimizations, and production/deployment. Eager execution , on the other hand, cannot be used for these purposes. So, both modes are completing each other. Use eager to debug the code, create the models with tf.keras and then use the graph execution to accelarate functions and deploy it on other platforms.
References
Take a look at the following resources to read more about eager.
- Eager guide.https://www.tensorflow.org/guide/eager
- Getting started with TensorFlow https://www.tensorflow.org/tutorials/
- Real examples with seedbank. https://tools.google.com/seedbank/seeds?q=eager
Comments
Post a Comment