Deep Learning with TensorFlow: From Basics to Transfer Learning

Name: TensorFlow Crash Course for Beginners (2026) | Daniel Bourke
Uploaded: 2026-03-02T13:05:44.558422+00:00
Channel: Zero To Mastery
Description: Summary and key takeaways on Deep Learning with TensorFlow: From Basics to Transfer Learning, covering to Deep Learning and TensorFlow Deep learning is a type
Zero To Mastery
Mar 02, 2026
•
4 min read
YouTube video ID: gWvwu7qLjJs
Source: YouTube video by Zero To Mastery — Watch original video
PDF
Deep learning is a type of machine learning that uses artificial neural networks with multiple processing layers. It sits inside the broader AI hierarchy: AI → Machine Learning → Deep Learning. TensorFlow is described as “an end‑to‑end machine learning platform” that lets you write fast deep‑learning code in Python (or JavaScript) and run it on GPUs or TPUs. It provides the whole stack—from data preprocessing to model deployment—and is open‑source, originally built as Google’s internal tool.
Machine Learning vs. Deep Learning

Traditional programming starts with inputs and explicit rules that produce an output. In contrast, machine learning begins with inputs and the ideal output, letting the algorithm discover the rules. Deep learning excels on unstructured data such as text, images, and audio, while classic ML algorithms (Random Forest, Naive Bayes, SVM, etc.) usually perform best on structured data like spreadsheets. The brief warns not to use ML/DL when a simple rule‑based system would suffice.
Neural Networks: Structure and Learning Types

A neural network is a network of artificial neurons (nodes) organized into an input layer, one or more hidden layers, and an output layer. Data is first converted into numbers—tensors—before entering the network. The network learns representations (weights) through supervised, semi‑supervised, unsupervised, or transfer learning. Common deep‑learning use cases include recommendation systems, translation, speech recognition, computer vision, and natural‑language processing (e.g., YouTube recommendations, Google Translate, Siri, AlphaFold).
Tensors: The Core Numerical Representation

In TensorFlow, a tensor is “some way or some numerical way to represent information.” Tensors can be scalars (0‑D), vectors (1‑D), matrices (2‑D), or n‑dimensional arrays. They are created with tf.constant (unchangeable) or tf.variable (changeable), and can be initialized randomly (tf.random.uniform, tf.random.normal) or with fixed values (tf.ones, tf.zeros). Important tensor attributes include shape, rank (or ndim), and size. Operations such as addition, subtraction, multiplication, division, matrix multiplication (tf.matmul or @), reshaping (tf.reshape), transposing (tf.transpose), and aggregation (tf.reduce.mean, tf.reduce.sum) are all supported.
TensorFlow Workflow: From Data to Deployment

The typical TensorFlow workflow follows these steps:
Prepare data – convert raw inputs (images, text, sound) into tensors, optionally normalizing, standardizing, or one‑hot encoding them.
Build a model – use the Keras Sequential API to stack layers (Dense, Conv2D, etc.).
Compile the model – specify a loss function (e.g., MSE for regression, binary cross‑entropy for binary classification), an optimizer (Adam, SGD), and metrics (accuracy, MAE).
Fit the model – train on the data for a number of epochs, optionally using callbacks such as EarlyStopping or TensorBoard.
Evaluate – assess performance on a validation or test set using metrics like MAE, MSE, accuracy, precision, recall, F1, and confusion matrices.
Improve – adjust hyperparameters (learning rate, number of layers, units), preprocess data differently, or modify the architecture.
Save and load – persist models in SavedModel or HDF5 format for later inference or further training.
Regression and Classification

Regression models predict numerical values (e.g., house prices, insurance costs) and typically have a single output neuron with a linear activation. Classification models predict categories: binary (sigmoid activation, one output), multiclass (softmax activation, one output per class), or multilabel (multiple binary outputs). The brief emphasizes visualizing data, splitting it into training/validation/test sets, and using appropriate loss functions (sparse_categorical_crossentropy for integer‑encoded labels, categorical_crossentropy for one‑hot encoded labels).
Data Preprocessing Essentials

Normalization (Min‑Max scaling) and standardization (zero‑mean, unit‑variance) are highlighted as techniques that dramatically improve model performance—“Just by normalizing our data, we've gone from 5,000 MAE to 3,120 MAE.” One‑hot encoding (pd.get_dummies, OneHotEncoder) handles categorical features, while ColumnTransformer can apply different preprocessing pipelines to different columns. For image data, scaling pixel values to 0‑1 and using ImageDataGenerator for augmentation are standard practices.
Model Evaluation and Overfitting

Metrics such as MAE, MSE, accuracy, precision, recall, and F1 score quantify performance. Visualization (“visualize, visualize, visualize”) of loss curves, accuracy plots, and confusion matrices helps detect overfitting—when training loss decreases while validation loss rises. Regularization techniques include data augmentation, max pooling, early stopping, and shuffling training data each epoch (“The power of shuffling training data”).
Convolutional Neural Networks (CNNs) for Computer Vision

CNNs are introduced as the go‑to architecture for image tasks, outperforming dense networks. A baseline CNN (Model 4) uses Conv2D, MaxPooling2D, Flatten, and Dense layers. Adding more layers, increasing filter counts, and applying max pooling (Model 5) improve validation accuracy from ~68 % to ~86 %. Data augmentation (rotation, shear, zoom, flip, shift) further reduces overfitting.
Transfer Learning and TensorFlow Hub

Transfer learning leverages pre‑trained models (e.g., ResNet50V2, EfficientNetB0) from TensorFlow Hub. By freezing the pre‑trained layers (trainable=False) and adding a custom output layer, you can achieve high accuracy with far less data and training time. In the brief, EfficientNetB0 reached ~86 % validation accuracy after only five epochs, compared to ~40 % for a model trained from scratch.
Experimentation, Callbacks, and Resources

The course encourages an experimental mindset: “The machine learning practitioner’s motto is experiment, experiment, experiment.” Callbacks such as TensorBoard (for logging), ModelCheckpoint (for saving best weights), and EarlyStopping (to prevent overfitting) are integral to a robust workflow. Learners are urged to run the code when in doubt, visualize results constantly, ask questions in community Discord, and share their work.
Takeaways

Deep learning uses multi‑layer neural networks and sits within the AI → ML → DL hierarchy, while TensorFlow provides an end‑to‑end platform for building such models.
Traditional programming writes explicit rules, whereas machine learning starts with inputs and desired outputs, making it suitable for complex problems that cannot be rule‑based.
Tensors are the fundamental numerical representation in TensorFlow, and operations like reshaping, aggregation, and matrix multiplication enable data flow through neural networks.
Effective model building follows a workflow of data preparation, model construction, compilation, training, evaluation, and saving, with hyperparameter tuning and regularization to combat overfitting.
Transfer learning with pre‑trained models from TensorFlow Hub dramatically reduces data and training requirements, often delivering higher accuracy than training from scratch.
Frequently Asked Questions

Who is Zero To Mastery on YouTube?

Zero To Mastery is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.
Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.
Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.
Tensorflow Book Recommended
一本深入讲解 TensorFlow 使用和最佳实践的书籍，帮助读者快速上手并掌握模型部署技巧。
Amazon →
Gpu Accelerator
GPU 能显著加速 TensorFlow 训练过程，尤其在处理大规模图像或深度网络时提升效率。
Amazon →
Google Colab
Google Colab 提供免费 GPU/TPU 环境，让学习者无需本地硬件即可运行 TensorFlow 代码。
Amazon →
Links may be affiliate links. We only include resources that are genuinely relevant to the topic.
Summarize another video
Full Transcript YouTube

[Music]
So, you've heard that TensorFlow is one
of the most in- demand skills for
machine learning engineers, and now you
want to get up to speed fast. Well,
thank the YouTube algorithm or your
elite search skills because you've
landed in the right place. This crash
course is your hands-on intro to
building deep learning models with
TensorFlow. You'll start with the
fundamentals, then move on to neural
network regression, classification,
computer vision, and even transfer
learning using pre-trained models. And
yes, you'll actually build real models,
not just watch someone else do it. Your
guide is none other than Daniel Burke, a
subtop machine learning engineer who's
worked at Max Kelson, one of the fastest
growing AI agencies in Australia. And he
also has an extremely popular YouTube
channel of his own. By the end of this
crash course, you'll understand how
TensorFlow works and how to build and
train basic deep learning models, a
solid first step into becoming a
TensorFlow developer. And here's
something extra for those of you taking
this crash course. As a massive bonus,
we're given direct access to ZTM's
community Discord, which is typically
reserved for ZTM Academy members. This
means that you can now chat with others
taking the exact same course as you, ask
questions, and even get feedback from
Dan himself. The invite link is in the
description down below. All right,
that's enough chitchat for me. Let me
hand it over to Dan so you can start
learning TensorFlow. Enjoy.
[Music]
All right. All right. All right. Are you
ready? I hope you are because we're
about to learn deep learning with, wait
for it,
TensorFlow. Now, we could go back. I
want to watch that again. Deep learning
with
TensorFlow. Now, if you like that little
animation, that's just a taste of what
to come because we're going to have lots
of fun learning deep learning with
TensorFlow. Now, if you're here for
that, you might be asking yourself the
question, what is deep learning? Hm.
Well, for these type of questions, and
you'll probably notice this trend
throughout the course. Oh, we've come to
our friend Google here. What is deep
learning?
Oh, there we go. a type of machine
learning based on artificial neural
networks in which multiple layers of
processing are used to extract
progressively higher level features from
data. Whoa, there's a lot going on
there. And you might be thinking,
Daniel, why did we go straight to Google
at the beginning of this course? Well,
for questions like this one where
there's some sort of definition, this is
what I want you to do. If you're not
sure what it is, I want you to search
for it and do some research because what
we're going to be focused on in this
course is getting hands-on as quick as
possible. So, we're going to be writing
lots of code. So, let's come into my
definition of machine learning for this
course. Machine learning is turning
things data into numbers and finding
patterns in those numbers. And you might
be thinking again now Daniel I've signed
up to this deep learning with TensorFlow
course. Why have you got machine
learning there? Well, we'll answer that
question in a second. But in terms of
finding patterns, how does this happen?
Well, the computer does this part. How?
Code and math. Now, we're going to be
writing lots and lots of code to do
this. More specifically, code or deep
learning code with TensorFlow.
So, let's have a look at machine
learning versus deep learning.
We have this fun little diagram that you
might also find on Google. There we go.
Artificial intelligence,
artificial intelligence, machine
learning, deep learning.
There we go. Very similar one here. So
if we have the broad field that is
artificial intelligence. So trying to
get a computer to think for itself. And
now if we have machine learning which
comes under artificial intelligence.
Remember the definition for machine
learning from before was writing code
for a computer to figure out patterns in
data. Well, deep learning is usually
considered a sub field of machine
learning. And so this is about all you
need to know for the time being. I mean
if you want to do your own research what
deep learning is, go ahead and do that.
But again with this course we're going
to be focused on getting hands-on
writing deep learning code. So if we
come here what's the difference between
traditional programming
versus machine learning or deep learning
programming. So with traditional
programming you might start with some
inputs. You might code up a bunch of
rules. In our case we've got our
favorite Sicilian grandmother's chicken
dish here. and she's got this recipe and
she knows it off by heart and she's
passed it on to you because you want to
host a dinner party at your place and
invite everyone over. So, you might
start with the inputs which are the
ingredients. We got some vegetables
here. We got the chicken of course and
we got the rules. This what we have to
do. Cut the vegetables, season the
chicken, preheat the oven, cook the
chicken for 30 minutes, add vegetables,
and then if you've done all this
correctly, you'll get this beautiful
output. But where machine learning
algorithms
usually differ is that you'll start with
the inputs and the ideal output. So this
is the major difference here. We're
going to be very familiar with this
concept of inputs and outputs by the end
of the course. And so what happens with
a machine learning algorithm is you'll
show it examples of what the inputs look
like and what the ideal outputs look
like. So this is a beautifully cooked
chicken here from this input. So you'll
start with this and then the algorithm
will figure out hopefully we'll figure
out the ideal rules to get from this
input to this output.
All right, so that's the machine
learning in a nutshell. The next video
we're going to cover why use machine
learning or deep learning.
[Music]
So now we've got a little conceptual
overview of deep learning very brief.
The next question to answer is why would
we want to use machine learning or deep
learning? So the good reason is why not?
I mean you might have seen what machine
learning is capable of. You might have
heard of the power of deep learning, the
power of artificial intelligence and
just all the problems we've got in the
world. Why don't we just use it?
Maybe.
But a better reason is for a complex
problem such as maybe we're trying to
teach a self-driving car to drive. Can
you think of all the rules you'd have to
code? If we remember back to our
Sicilian grandmother's famous chicken
recipe,
maybe you can code up the rules for
that, such as preheat the oven, cut up
the vegetables, cook the chicken for 30
minutes, but in terms of a self-driving
car, like I mean, when you go for a
drive, do you think about the rules in
your head or do you just drive? Like I
mean, you need stop signs, you need
traffic lights, you need what to do for
another car. You can imagine how that
problem gets quite out of hand pretty
quickly.
So for a complex problem, can you think
of all the rules? Probably not. And so I
found this great comment on one of my
YouTube videos here, the 2020 machine
learning road map. So be sure to check
that out if you haven't checked it out.
This is from Yashawi. I think you can
use ML, so machine learning. If you see
ML throughout this course, it's machine
learning for literally anything as long
as you can convert it into numbers and
program it to find patterns. Literally,
it could be anything, any input or
output. Oh, we've seen that again, input
or output from the universe. Wow.
Important piece to note here is as long
as you can convert it into numbers. Now,
again, if you're reading this and you're
like, Daniel, this doesn't make sense.
I've never experienced deep learning
before. Don't worry, we're going to have
plenty of practice throughout the
course. But here's the exciting part. I
think you can use ML for literally
anything as long as you can convert it
into numbers. I want you to keep that
sentence in your head. However, when
should you not use machine learning?
If you can build a simple rule-based
system, or actually maybe not very
simple if you're building a complex
software product that doesn't require
machine learning, do that.
Where's this come from? Well, this is
from a wise a very wise software
engineer. It's actually rule one of
Google's machine learning handbook. So,
this is our first external resource for
the course. There's going to be a bunch
of these, so don't worry. I'll link them
to wherever you need them. So, this is
the rules of machine learning, best
practices for ML engineering. We're not
going to go through this, but I'm going
to link to this. If you want to read
through this in your own time, feel free
to do so. This is a very exhaustive
resource to get you up to scratch on how
Google thinks about using machine
learning. So we'll come back.
So yeah, if you think about using
machine learning but you think you can
code up a simple rule-based system, you
should probably do the rule-based system
rather than machine learning.
So what is deep learning good for?
Problems with long lists of rules. So
when the traditional approach fails
machine learning/dearning
again whenever you see throughout this
course machine learning/deep learning
you can you can kind of think of them as
the same thing may help
continually changing environments. So if
you're the problem you're trying to
solve is constantly changing. I mean the
information you're dealing with is
continually changing. The good thing
about deep learning is that it can adapt
to new scenarios. So if the problem
you're working on changes, you can train
another deep learning model on those
changes.
Discovering insights within large
collections of data. So imagine the
problem of trying to take photos of food
and identify what's in that photo. Can
you imagine trying to handcraft the
rules for 101? Say you just wanted to
start with you were building an app
called 101 dishes. So you've got maybe
spaghetti, ramen, steak, eggs, kale,
coffee. Can you imagine trying to
handcraft the rules for what 101
different kinds of food look like? I
can't. Now, what is deep learning not
good for?
Typically, again, take this with a grain
of salt because as you'll see as you get
deeper into your, pardon the pun, deeper
into your deep learning journey, you'll
see there are lots of ways if you're
stuck on a certain problem about what
we're about to see on this slide, you
can kind of formulate ways to circumvent
that problem
when you need explanability.
So the patterns learned by a deep
learning model, again we haven't seen
these, but we will get familiar with
them, are typically uninterpretable by a
human. So if a deep learning model is
finding lots of patterns in numbers,
which we'll see throughout this course,
typically those patterns are quite hard
to interpret once you look at them.
When the traditional approach is a
better option, remember if you can
accomplish what you need with a simple
rule-based system, you should probably
do that.
when errors are unacceptable. So since
the outputs of a deep learning model
aren't always predictable, meaning they
may contain errors, maybe deep learning
isn't your best option. If you need a
software system to do the same thing
every single time, potentially a
non-deing based system is what you're
after. And when you don't have much
data, so deep learning models usually
require a fairly large amount of data to
produce great results.
However,
we'll see how to get great results
without huge amounts of data. So, as I
said, typically not good for there are
ways to kind of try and circumvent these
points here. So, let's have a look again
machine learning versus deep learning.
So, in terms of when to use machine
learning versus deep learning,
traditional machine learning algorithms
have typically performed best on
structured data. So what I mean by
structured data is data or information
you can find in an Excel spreadsheet or
a Google Sheets. So you've got rows and
you've got columns. So this is clearly
row two for example. We've got car
sales, we've got a make, we've got a
color, we've got odometer, doors, and
we've got a price. So maybe we want to
build a deep learning model or a machine
learning model to take these parameters
here which are often referred to as
features. So columns, features, make
color odometer doors and predict the
price. Whereas deep learning typically
performs better on unstructured data
such as natural language text or in the
case of we wanted to build our food 101
application to take photos of food and
identify what's in that food or if we
have a Wikipedia article talking about
deep learning. So this is example of
data that you could argue that there is
structure to this picture. Like I mean
you got a shape here, you got shapes
there and here you've got a sentence
there, but it's certainly nowhere near
as structured as what you find in an
Excel spreadsheet. And then again, you
could have something like voice. So
sound waves.
If we go here, we'll dig a little deeper
into the structured versus unstructured
data. Let's have a look at some of the
most common algorithms you're going to
come across in the machine learning and
deep learning world. Of course, this
slide only goes through just the names
of the algorithms, but we're going to be
getting hands-on with building some of
the most common deep learning algorithms
you see coming up in a minute. So, on
the machine learning side of things, you
might have the random forest, the naive
bays, nearest neighbor, support vector
machine, and many more. And since the
advent of deep learning, these
algorithms here are often referred to as
shallow algorithms. Now, what that means
for now is not too important. I just
want you to get start getting familiar
with some of the terms you're going to
hear in the machine learning and deep
learning world. So, when they come up,
you're not going, "Wow, what does this
mean?" Now on the deep learning side of
things, we have neural networks, fully
connected neural networks, convolutional
neural networks, recurrent neural
networks, and the transformer
architecture is a fairly new one that's
only come out in the last couple of
years. And for this course, we're going
to be focused on building these four
type of neural networks here with
TensorFlow. These are kind of like the
bedrock type neural networks, the ones
that a deep learning field over the past
decade or so has been built on top of.
And again, if we're looking at through
the structured data versus unstructured
data paradigm,
these algorithms here, the machine
learning algorithms are typically uh
better performing on structured data,
whereas neural network type
architectures are typically better
performing on unstructured data.
However, depending on how you represent
your problem, many algorithms can be
used for both structured data or
unstructured data.
Now, we've covered the names of a few
types of different neural network
architectures. So, you're probably
wondering or the question in your head
is probably what exactly are neural
networks? So, let's tackle that in the
next video.
[Music]
All right. So in the last video we saw
or we heard that neural networks are a
common deep learning algorithm. So what
exactly are neural networks? So remember
for these type of what questions or
definition questions? I want you to get
really familiar with going in what are
neural networks
explained neural networks. Neural
network okay Wikipedia
these are generally quite wordy
definitions. A neural network is a
network of circuit or neurons or in a
modern sense an artificial neural
network composed of artificial neurons
or nodes. Ah, so you might get diagrams
like this. And so
that's just a one sentence definition,
but let's have a look of an overview of
what neural networks are.
So we might start with some data here,
whether it be images of food or some
natural language text or some sound
waves like what's going into the
microphone here. And then if we want to
use that data with a neural network,
we're going to have to somehow turn this
into numbers. So this is called a a
numerical encoding. There are many
different names for it, but we're going
to treat this as a numerical encoding.
And you might be wondering what are
these square brackets? Well, that's to
represent that we've got this data here
and we've turned it into a numerical
encoding, which is often referred to as
a tensor, but we'll see that we're
getting ahead of ourselves here. And so
before data gets used with the neural
network, it needs to be turned into
numbers. Okay, remember our one sentence
definition of machine learning. Machine
learning algorithms are about turning
data into numbers and finding patterns
in those numbers. So this is our numbers
here.
Now we might feed these numbers here
that represent our data into a neural
network. If we come here, this is a
simple neural network.
This is a very similar diagram here, but
probably a little bit more colorful. And
what is this neural network going to do?
Well, it's going to learn a
representation, all the patterns,
features, weights, these are other terms
you might hear of what kind of
representation it learns. And again,
depending on what problem we're working
on, whether it's recognizing images or
discovering meaning from text or trying
to turn a soundwave into text, you're
going to have to choose the appropriate
neural network for your problem. We'll
see how to do that in upcoming lessons.
So, this is okay, we've got three steps
so far. We've got data as inputs. We've
turned that data into numbers. We're
going to feed that to our neural
network. And this is going to learn a
representation or find patterns in these
numbers. Hm. What does it do then?
Okay. Then it's going to create some
representation outputs. You might notice
that these numbers have been
transformed.
And so this might not mean anything to
us now. We're going to inspect this in a
moment with TensorFlow code. But what
these representation outputs are are
basically the neural network going you
know what I've taken in these numbers
here. I've found the patterns. These are
the patterns that I found in a numerical
representation form. Okay. All right.
And then what do we do next? Well, it's
up to us to take these representation
outputs that our neural network has
learned about our data and convert them
into human understandable outputs. So
say for example in classifying images of
food we might have a photo of ramen here
might have to turn that into numbers in
some way have to feed it through our
neural network
and get our neural network to discover
patterns in that so it outputs into a
representation output and then it's our
job to write some code to convert these
representation outputs to hopefully if
it's learned it right it's going to
output the label ramen
And the same goes for this photo of
spaghetti. We could pass that through
and it goes to there. And maybe for this
tweet, we want to say we're building a
system that was to read a tweet and say
we wanted to know where natural
disasters are going on in the world. Can
we convert this tweet into numbers, feed
it through a specific type of neural
network, convert it into some sort of
representation output, and then label it
as say a disasterbased tweet or not a
disaster? And then finally, say we
building a smart speaker or something to
recognize sound waves. How would we turn
those sound waves into numbers? pass it
through our neural network, get our
neural network to discover patterns in
those numbers, create a representation
output, and then translate or
transcribe. I think transcribe is a
better word there. Transcribe that to,
hey Siri, what's the weather today?
Whoa, there's a lot going on here. But
again, this is the whole premise of
neural networks here. We have some
inputs. We turn those inputs into a
numerical encoding. our neural network
that we've chosen for our specific
problem. We'll learn a representation
referred to as patterns, features, or
weights. We'll create some kind of
representation output based on the
patterns it's figured out. And then it's
up to us to define what we want our
outputs to look like based on this. So
let's look at the anatomy of a neural
network. So if we have it here,
the first layer of a neural network is
referred to as the input layer. So when
you see diagrams like this, this is
actually a very common diagram of what a
neural network is. If we go here, neural
network
back, what are neural networks?
There we go. So we have inputs,
something in the middle, outputs.
Remember in our previous slide, our
input, the data goes in here. In this
case, this would be labeled with these
are two hidden neurons. Again, we're
going to get very familiar with these
terms as we keep going. And in the
middle here, there's hidden layers.
You'll notice is this is a plural. So,
usually there's one input layer or
depending on what data you work with,
you might have multiple input layers.
Hidden layers. This can be an arbitrary
amount of layers. So, you could have one
hidden layer or you could have 152
hidden layers. So, these hidden layers
learn the patterns in the data. And then
it has a output layer. So this outputs
the learned representation or prediction
probabilities.
Again, we're going to get very hands-on
with all these terms. I don't want you
to get too flustered with the terms that
I'm putting out here because we're going
to see all of these in action. And in
this case with the input layer, there's
two neurons and there's three hidden
neurons here and there's one neuron in
the output layer.
And I want to put a little tidbit here
is that when I say learns patterns and
data, it's kind of arbitrary term
because you're going to hear lots and
lots of jargon and different terms for
different things in machine learning.
And I'll try use as many different terms
as possible, but also tie them back to
kind of a single term. So when you hear
me saying patterns, you'll often hear
different things like embedding or
weights or feature representation or
feature vectors. And these are often all
referring to very similar things.
Now, when it comes to neural networks,
we've seen the anatomy,
but there's also a few different types
of learning.
The first one is supervised learning,
semi-supervised learning, and then
unsupervised learning and transfer
learning. Whoa. And so supervised
learning often involves having data and
labels. So in the case of identifying
different images of food, we may have
the data would be the images of the food
and the labels would be the labels of
food. So ramen, spaghetti, coffee,
steak, pizza associated with each one of
those images. Semi-supervised learning
has some data or actually could be as
much data as supervised learning, but it
only has some labels. So, for example,
maybe we had 10,000 images of food and
we only had labels for 1,000 of those
images and we wanted to train a neural
network on the images which have labels.
And then we use that neural network to
try and label the other images of food,
the ones we don't have labels for. And
then unsupervised learning is you
basically only have data and no labels.
You kind of pass your data to a machine
learning model or a neural network or a
deep learning model. Again, very similar
terms for very similar concepts. And you
kind of go, hey, I'm not sure what
patterns are going to be within this
data, but see what you can find anyway.
and then transfer learning which is
actually a very very important concept.
We're definitely going to get hands on
with this one. The beautiful thing about
deep learning models is when they learn
patterns in some sort of data set, those
patterns can be used for another problem
type. So transfer learning would be
taking what one deep learning model has
learned on some set of data and then
using it in another problem on another
set of data. So in our food image
discovery or image classification
problem, we may take what a machine
learning model has learned on just
pictures of the world. So general
pictures and apply that to our specific
images of food.
Now in terms of this course, we're going
to be writing lots of code to do
supervised learning and transfer
learning. But again, the premise of what
we're doing, starting with inputs,
turning that into numbers, and then
having some sort of algorithm go through
those numbers, find patterns, create
outputs, can also be used for these
other two types of learning here.
All right, so now we've heard a little
bit about deep learning, we've heard
about neural networks. What is deep
learning actually used for?
[Music]
Now, what is deep learning actually used
for?
Well,
we'll return back to this comment we saw
before, cuz this is actually a beautiful
comment. Thank you very much, Yashawi.
Hopefully, I'm saying that right. I
think you can use ML for literally
anything as long as you can convert it
into numbers and program it to find
patterns. Literally, it could be
anything. Any input or output. Again,
lots of emphasis on input or output from
the universe. And so, I want you to keep
this sentence in your mind because by
the time you finish this course, you're
going to look at the world through a
different lens. I mean, you're going to
walk down the street and you're going to
wonder how could I represent this
experience that I'm having in numbers
and then program a machine learning
algorithm or a deep learning algorithm
to find patterns in those numbers.
Seriously, once you learn the concept of
machine learning and deep learning, you
start to look at almost everything
through the lens of turning something
into numbers and finding patterns in
those numbers. Now let's have a look at
some common deep learning use cases that
you've probably experienced in your
day-to-day life. So the first one is
recommendation,
translation,
speech recognition, computer vision,
natural language processing. Again,
emphasis on some here because there's
many more, but these are some of the
main ones. So the recommendation here is
my YouTube home screen. We got some
programming videos. is we got a video of
my friend's YouTube channel, an
interview from Peter Norvig, one of the
heroes of artificial intelligence, some
Runescape videos cuz that was a game I
love to play, jiu-jitsu, some
bodybuilding, oh my goodness, more
hacking videos. So, I can safely say
YouTube has learned me pretty well. And
now, this is based off again my history
of the things I've watched on YouTube.
And so if YouTube has all of that
information, it can go, you know what,
Daniel, I've seen a hundred different
people or maybe in the case of YouTube,
there's actually close to I think over a
billion users on YouTube. Maybe there's
a hundred 100,000 people out there like
me and they're like, "You know what?
You've watched programming videos, so
we're going to show you this one." And
you're interested in the movie Her.
Great movie, by the way. We're going to
show you this soundtrack based on the
patterns that we found in everyone
else's YouTube browsing history. And
then in the case of translation, we
might have some language here. This is
English, the language that I speak. And
if I wanted to speak Spanish, I'm not
even going to pretend that I can
pronounce that. Um, I can type in my
sentence here, deep learning is epic.
And then Google will run a deep learning
algorithm to figure out the patterns in
this English sentence and translate that
into Spanish. So if you're a Spanish
speaker out there, you can tell me or
send me a message of how accurate this
translation actually is. In the case of
speech recognition, we may have some
sound waves again like what I'm saying
here. And when I find out from Siri,
it's like, hey, who's the biggest big
dog of them all? And so deep learning
probably go through that soundwave and
transcribe it into this sentence here.
Computer vision. Actually, this image is
a bit painful for me because this is
actually from some security footage from
one of my neighbors. You see this car
here? If I had a computer vision
algorithm that was running on this
security camera, maybe I could pick up
this car or found the license plate
because this car actually the trailer,
it had a trailer on the back. The
trailer came off as it was driving past
my car on the street and the trailer hit
my car
and basically totaled the whole car. So
now I need to find a new car and this
was the car that did it. So if their
security footage had a computer vision
algorithm, maybe we could have found it.
So I'm still on the investigation for
that. But that's one use case computer
vision can be used for. So again, a deep
learning algorithm may take the pixels
of this image, find the patterns, and
go, "You know what? I've seen a car that
looks like this before. Maybe it's a
Toyota Hilux that you're looking for."
Anyway, we'll leave that problem there
before I get too upset. I really loved
my car. And the other one is natural
language processing. So maybe taking in
an email. You got your text here. So
unstructured text here. Hey Daniel, this
deep learning course is incredible. Oh,
thank you so much. I can't wait to use
what I've learned. I hope you send me a
lot of these type of emails here. That
is my actual email address. So, feel
free to send me whatever you want. And
this is not spam. So, this is something
I want to see in my inbox. But if you're
sending me messages like, "Hey, Daniel,
uh, congratulations. You win." That is a
big number. I'm not going to pretend
that I can read that out loud without
slowly deducing how many zeros there
are. That's spam. So, I don't want this
in my inbox. I want a natural language
processing algorithm, a deep learning
algorithm to put that in my spam box.
And so if we dive a bit deeper, you
might see these referred to as sequence
to sequence or sectose for short deep
learning problems. Now the premise here
is if sequence to sequence doesn't make
much sense, just think about it like
this. You have a sequence of words and
you want to translate or transform that
into another sequence. That's all you're
trying to doing with seek to seek
problems. Same with here. You might have
a sequence of sound waves and you're
trying to convert them into another
sequence of words. And for this problem
here, it's called classification or
regression. Now, classification is this
email not spam or spam. Is it one thing
or another? There's also multiclass
classification, but we'll get on to that
in a second. And then regression. You
might be wondering, hey Daniel, I've
heard regression is more so for
predicting a number. Yes. Well, in this
case, in our computer vision problem,
our numbers might be the pixel
coordinates of where the corner of this
box should be. So maybe we start with an
image that doesn't have a box on here,
and then our computer vision algorithm
looks over it, or our object detection
algorithm looks over it and goes, you
know what, I think the car is most
likely in the box here. So maybe that's
50 pixels in from the x- axis and 60
pixels down from the y-axis there. And
so that's that corner. And then it does
the same for each other corner. Now I
want to show you another phenomenal use
case for deep learning that only just
recently happened. So if we go here now,
you could also just go deep learning use
cases and you're going to get a whole
bunch of different ones. There we go.
automatic speech recognition, image
recognition, natural language
processing. But there's one I'm
specifically looking for. Now, this is
Deep Mind.
Deep stands for deep learning. Now, this
is a deep learning research company.
Boom. AlphaFold, a solution to a
50-year-old grand challenge in biology.
I'm not going to go too much into this,
but this is possibly one of the biggest
breakthroughs in AI powered by deep
learning, as well as Oh, do I want
cookies? Fine, I'll have the cookies as
long as they're delicious. Now, do we
have deep learning here? Here we go.
Now, new deep learning architectures
we've developed have driven changes in
our methods for CASP 14, enabling us to
achieve unparalleled levels of accuracy.
Again, I'm not going to go too deep into
this, but this is the type of I believe
they've used a deep learning algorithm.
This is this is actually brand new to
figure out how a protein should fold.
Now again, I haven't read up on the
research here, but this is just the type
of things that deep learning can be used
for. If you remember the comment we
looked at before, the YouTube comment, I
think you can use ML for literally
anything as long as you can convert it
into numbers and program it to find
patterns. So that's what Deep Mind have
done. They've taken proteins, turned
them into some sort of numerical
representation, crafted a deep learning
architecture to find patterns in those
proteins. So this just blows my mind. As
I said, by the time you finish this
course, you're going to have a new lens
on the world. So we've seen what deep
learning can be used for. How can we
write these deep learning algorithms? So
this is where TensorFlow that's worthy
of seeing again. TensorFlow comes in. So
in the next video, let's figure out what
TensorFlow is.
[Music]
We've seen that deep learning can be
used for a range of different problems
including how deep learning
architectures are now figuring out the
protein folding problem. In other words,
finding patterns in proteins. I mean
proteins as in the building blocks of
you and I. So if proteins are the
building blocks of you and I and deep
learning is finding out patterns in
that. How might we create such deep
learning architectures? And if you see
deep learning architecture, deep
learning model is kind of another word
that you can substitute in here for
architecture. So when you see deep
learning architecture, you can think
deep learning model. And when you think
deep learning model, you can think deep
learning architecture.
So how might we build such a thing?
Now this is kind of a segue to
tensorflow. The TensorFlow is going to
help us build deep learning
architectures or deep learning models.
And now, of course, naturally, you're
probably asking, well, what is
TensorFlow?
Let's figure this out. TensorFlow is an
endto-end machine learning platform.
You can write fast deep learning code in
Python, which is the language we're
going to be using, slash other
accessible language such as JavaScript.
And when you write that fast deep
learning code using TensorFlow, that
code is able to run on a GPU, which is a
graphics processing unit/TPU,
which is a tensor processing unit, hence
TensorFlow. We're going to see the
significance of that in a moment. With
TensorFlow, you're also able to access
many pre-built deep learning models.
When we saw types of learning, transfer
learning is where you use a pre-built
deep learning model. In other words,
leveraging the patterns that one deep
learning model has learned on another
problem to your own problem. We'll see
how we can use TensorFlow Hub to do
that, which is a a TensorFlow resource.
Now, it's the whole stack. This is a
follow on from this point here.
TensorFlow allows you to pre-process
data. In other words, turn that data
into numbers, model that data, in other
words, build a neural network to find
patterns in that data, and then deploy
your model into your application. So
depending on what you're building, you
might want to take the deep learning
model such as in the case of our
security recognition system. If we
wanted to identify different types of
cars, we might want to take our computer
vision model and deploy it into our
security camera and use that to detect
cars that have maybe crashed into other
cars. In my case, again, you can
probably hear the sadness in my voice.
It makes me emotional when I talk about
my my poor little car. And it was
originally designed and used in-house by
Google, one of the largest internet
companies in the world right now that
basically machine learning runs
everything they do, machine learning and
deep learning. And it's now open source,
which means we can leverage the tools
that Google use to work on our own
problems.
So that's a bit of an idea of what
TensorFlow is. But why TensorFlow? Well,
allows us easy model building. Now
again, this is from the TensorFlow
website. We're going to get very
familiar with this website. Robust ML
production anywhere. So train your
models, deploy them in the cloud, on
prem, which means on your server if you
have one locally, in the browser, or on
device, no matter what language you use.
Powerful experimentation for research.
So if you found a new idea such as Deep
Mind building a new deep learning
architecture to discover patterns in
protein folding, you might want to
rebuild their research using a
TensorFlow powered deep learning model.
And so we're going to be doing lots of
this easy model building.
And we're also going to do a little bit
of this throughout this course, but
that's for later on. In the meantime,
let's check out the tensorflow.org
website.
So here we go. An end to-end open-
source machine learning platform.
There's a lot going on here. So you've
got some examples here. You can see you
can install it. You can learn it.
There's the API. Probably the exercise
for this lesson is to go to
tensorflow.org
and have a play around. Just get
yourself familiar with the website. If
you've never looked at this website
before, there's probably almost too much
here. So, it can be a little
overwhelming, but that's all right. Just
follow your curiosity and see what's
there. Let's go back here.
Now, why else might we want to use
TensorFlow?
Now, this is Francois Chalet and this is
a tweet that he put out recently with
tools like Collab. We'll get familiar
with Collab, Keras. Keras is a part of
TensorFlow and TensorFlow virtually
anyone can solve in a day with no
initial investment. So that means
basically for free problems that would
have required an engineering team
working for a quarter quarter is 3
months and $20,000 in 2014 which is so
true. So the resources we're going to
use throughout this course I mean just 5
years ago or just a little bit over 5
years ago would have cost a fairly large
amount to even get up and set up. And
not only like the things that we're
going to be doing ourselves would have
required a whole team working for a
couple of months that we're going to be
getting up and running within in some
cases a few lines of code. So this is
why it's really really exciting to be
able to use TensorFlow and these other
resources here to work on deep learning
problems. So now we mentioned before
TensorFlow allows you to run your
numerical code on a GPU/TPU.
Now, what exactly is a GPU/TPU?
If we come here, this is what if you
Google the photo of a GPU, you'll get up
this graphics processing unit. These are
actually Nvidia cards. This is an RTX
380 and I believe this is a P100. So,
you don't need to worry too much about
these. You can install these in your own
computer like these type. These are
usually built for servers. Say you have
like a warehouse of different computers.
This is often what you'll get when you
connect to a cloud hosted GPU, but we're
getting a little bit too ahead of
ourselves. The major thing you need to
know about these graphics processing
units is that they're very fast at
crunching numbers. So, when I say we
want to find patterns in numbers, these
type of computing chips are very fast at
finding or doing numerical calculations.
In other words, finding patterns in
numbers, the numbers that we've
converted our data into. And in the case
of TensorFlow, it also allows you to use
TPUs, which are tensor processing units.
Now, this looks like a pretty advanced
piece of hardware. If we go here, um,
what is a TPU? Again, for these
questions where you have what is
something, don't be afraid to look it
up. Tensor Processing Unit. Whoa. Tensor
Processing Unit is an AI accelerator
application specific integrated circuit
developed by Google specifically for
neural network machine learning
particularly using Google's own
TensorFlow software. Ah okay. So this is
what we're going to get familiar with.
And so
if you think of a TPU if graphics
processing units are fast at crunching
numbers well a TPU is probably even
faster. But we're going to see
throughout this course how we can get
access to these chips. If you don't have
I mean unless you're Google, you
probably don't have this in your
computer right now or in your bedroom or
something like that. You maybe not even
have one of these, but that's okay.
We're going to look at resources
throughout the course that we can use to
gain access to these fast computing
chips for free.
Now, we've heard a lot about tenses, or
specifically TensorFlow. And naturally,
you might be asking, what is a tensor?
So let's look at that in the next video.
[Music]
We've seen TensorFlow, discussed what it
is and why we should use it, but we
haven't actually discussed half of its
name. I mean, what exactly is a tensor?
Great question. Now, remember our slide
for neural networks? I'm going to give
you a little hint. Well, we kind of
revealed this earlier, so spoiler alert.
Remember how we started with our inputs,
which is maybe some images, maybe some
natural language text, maybe some sound
waves, and then we create a numerical
encoding H.
We pass that numerical encoding to our
neural network, which learns patterns in
that neural numerical encoding. And then
our neural network outputs some kind of
representation outputs and then we
convert those into something that we can
understand.
Now the secret here is is that these
are tenses. Whoa. And so the most basic
definition I can think of for a tensor
is some way or some numerical way to
represent information. Now what that
information is, I mean that's totally up
to you. But just think of it as if we
wanted to encode these images into some
kind of numerical form. We're going to
be turning them into a tensor. And we're
going to see this in practice as we
write TensorFlow code. We're going to
pass that tensor to our neural network.
Our neural network is then going to
figure out different patterns in these
numbers. And then it's going to output
another tensor which is the patterns
that it's learned in our original
numerical encoding. And then we take
this tensor the representation outputs
from our neural network and convert them
into something that we can understand.
So if we take away some of the excess
parts this is the founding principle of
neural networks and tenses in general. I
mean this is where the name tensorflow
comes from. If we imagine our inputs our
food image classification problem we
turn it into a numerical encoding in
other words a tensor. We pass it to our
neural network. Our neural network finds
the patterns and then outputs those
patterns that it's found. And again,
this is another tensor into something
that we can understand. So if you
imagine this is the flow, this is where
the flow in tensor flow comes from. It
starts here by creating a numerical
encoding. Then it flows through our
neural network. Then our neural network
flows it on again the output here. And
then we again another little flow here
into something that we can understand.
So that's what you have to understand
for the concept of tenses. If you want a
little bit of an extension, I'll show
you one of my favorite videos on what is
a tensor. This can actually be your
homework is this video here. So this is
available on YouTube. What is a tensor?
So I want you to watch this video,
figure out
Dan Fletches. I'm I'm not quite sure how
to pronounce that name, but Dan, also a
great name. Watch Dan's explanation of
what a tensor is, and then come back to
the other Dan, me, and then see if you
can line these up. Remember, a tensor is
a way to represent some sort of data in
numerical form. So, if we come back,
we've covered what TensorFlow is, we've
covered what deep learning is, we
covered what neural networks can be used
for.
What else are we going to cover? We'll
look at that in the next video.
[Music]
We've covered a fair bit in this course
already, but let's get specific. What
are we going to cover throughout this
course? Well, I have a great great tweet
here from Elon Musk. Dose X machine
learning. So, learning ML or machine
learning, deep learning from university,
and you get this small little brain
here. Online courses such as this one,
you get some activation here from
YouTube. Whoa, we're really starting to
activate the brain here from articles.
Look at that superpower. And then from
memes, I mean, this little figure here
looks like they can control the whole
universe. So, that's we're strictly
going to be focused on learning deep
learning through memes.
No, I'm kidding. As fun as that is,
here's what we're going to cover. Now,
this is pretty broad, but this is just
like sort of a a list of the topics
we're going to look at. TensorFlow
basics and fundamentals.
Pre-processing data. In other words,
getting it into tenses. So, turning it
say from a picture into a Tensor.
Building and using pre-trained deep
learning models. Actually going to be
building our own, a lot of them from
scratch. And then we're also going to be
using pre-trained deep learning models.
fitting a model to the data. So in other
words, learning patterns. So fitting the
models that we've built and even
pre-trained deep learning models to the
data we've pre-processed. We're going to
make predictions with our model. So
using the patterns that our deep
learning models have used, we're going
to figure out how we can evaluate those
model predictions. So if we were
building a food classification app and
our deep learning model took a photo, we
took a photo of pizza and it identified
it as steak and it did that a thousand
times. How can we evaluate our model's
predictions to better understand them
and figure out how we can improve them
going forward? We're going to see how we
can save our model and load it. Such as
if we trained a food classification
model and we wanted to use it in our
application, we might train it somewhere
and then save it somewhere else so that
we can use it later on. We're going to
figure out how we can use a train model
to make predictions on custom data. So
the custom data here is part is
important because a lot of the time
you'll practice training deep learning
and machine learning models on data sets
that have already been created for you.
But the real test of where a machine
learning or deep learning model whether
it performs well or not is on data it
has truly never seen before. We'll get
familiar with this as we go on.
And how are we going to go through all
of this? Well, we have the cook and we
have the chemist. Chemist you can
imagine is very exact. Everything has to
be millimeter precise or milll precise
if you're doing some sort of chemical
experimentation. And then you have the
cook. Maybe the cook is your Sicilian
grandmother and she's making this
beautiful roast chicken dish that she's
made a h 100 times. And of course she
has the rules. She knows it off by
heart. But this time she decides, you
know what? I'm going to sprinkle a
little more seasoning in here. I'm going
to try a different set of vegetables
instead of the traditional set of
vegetables that I use. So, the cook is a
little less exact. A cook likes to
experiment and try different things. And
in our case, that's exactly how we're
going to go through all of this. We're
going to experiment. In fact, that's
going to be basically our motto for the
entire course is experiment, experiment,
experiment.
We'll be cooking up lots of code.
So, if we flip this in, boom. We're
going to be building ourselves a
TensorFlow workflow. We've seen this
before. This is what I want you to get
really familiar with is that getting the
data ready, turning it into tenses.
Then, we're going to learn how to build
or pick a pre-trained model to suit our
problems using TensorFlow and TensorFlow
Hub. We're going to fit the model to the
data and learn to make a prediction
using our trained model. So prediction
is where our model takes some sort of
sample and then makes prediction outputs
that we can turn into something like if
we fed it an image of a bowl of ramen,
what does our model think that that
image is of? We're going to learn how to
evaluate our model's predictions. Then
we'll figure out how to improve our
models through experimentation. Again,
experiment, experiment, experiment. And
then finally, we're going to look at
saving and reloading our trained models.
So, with that being said, we know what
we're going to cover.
How should I approach this course? Great
question. Let's answer that in the next
video.
[Music]
We've been through a plethora of
different what questions. And often
times you'll see in online courses and
online resources there's a lot of what
going on like what we're going to be
doing. But not so often is there how you
should approach this course. So we
talked about a lot of different concepts
deep learning neural networks TensorFlow
tenses. How should you approach trying
to learn these things? Well here's some
guidelines.
Write code. Lots of it. Follow along.
So, you're going to see me writing a lot
of code and I'm 100% going to make a lot
of mistakes. So, follow along if you can
and make the mistakes with me. So, our
first motto is if in doubt, run the
code. If that doesn't make any sense, if
you haven't written much deep learning
code before, don't worry, we're going to
be writing lots of it. This motto is
going to be ringing in your head. As
much as you're going to look in the
world through the different lens of
trying to figure out how machine
learning and deep learning can be used
with almost any problem in your life,
you're also going to be hearing this in
your head as well.
Explore and experiment. So our second
motto is experiment, experiment,
experiment. This is a great follow on.
If in doubt, run the code. We want to
try as many different things as possible
because when we're running experiments,
I'm going to emphasize trying a lot of
different small experiments. Why?
Because that helps us, even if the
experiment fails, it helps us figure out
what doesn't work. And in a lot of
cases, figuring out what doesn't work is
often just as helpful or even more
helpful than figuring out what does
because how rare in terms of anything
you've worked on previously, how rare is
it that you got everything you needed to
do correct the first time? Especially in
deep learning. I mean, chances are for
the problem we saw, for the deep mind
figuring out how to write a deep
learning architecture for protein
folding, I can only imagine how many
experiments they would have done to set
that up. Now, model number three is
going to be visualize, visualize,
visualize. And what I mean by this is if
you're not sure of something in the code
that we're writing, recreate it in a way
that you can understand it. And often
times that will be visualizing it in a
different way. So say we turn some sort
of data into tenses. What do those
tenses look like? Can we turn data into
a tensor and then turn it from that
tensor back into data? Those are the
type of things we'll want to be looking
at.
Now again, I cannot stress every point
on this slide is worth writing down. Ask
questions, including the I'm going to
put inverted commas here. You can't see
what I'm doing, but I'm raising my
fingers up towards the sky and I'm
curling them in as if I'm saying the
dumb ones because there is no such thing
as a dumb question. You'll get very
smart if you ask lots of dumb questions.
So, make sure if you don't understand
something, you can do exactly what we've
done. Go here. What is deep learning?
You can search that for yourself. Spend
10 minutes reading here. you'll know a
lot more than what you did before you
asked the question or ask in any of the
resources that you have available. So
there's the discord chat and I'll put
some more links in another resource of
where else you can ask questions.
Do the exercises. So each of the code
notebooks of concepts that we have have
exercises attached to them. So I want to
emphasize again try them yourself. write
lots of code before looking at the
solutions. Now, this course doesn't
cover everything. Of course, if this is
your first introduction to deep
learning, you're going to quickly
realize how broad the field is. So, if
you want to learn more on something,
look it up. I want you to become an
expert at searching for things that
spark your curiosity.
Share your work. If you want to learn
something, I find aside from doing it
yourself or replicating it yourself, the
next best way or possibly even better is
to teach someone else. So if there's a
concept that you've learned in this
course and you want to really nail it,
you want to get better at it, figure out
how you can explain that to someone
else. So maybe you write an article
sharing how you've learned how to turn
data into tenses, write a deep learning
model with TensorFlow, figure out
patterns in those tenses, and then turn
those representation outputs from that
neural network into some sort of human
understandable output. Maybe you want to
share that with others. That's going to
be a great way to really cement your
knowledge.
Avoid the following things. Overthinking
the process. So, I can't stress enough.
This comes back to our number one motto.
If in doubt, run the code. Again, we're
going to be learning so many different
concepts. You're probably going to be
overwhelmed at different points. But
don't worry, everyone who's learned
anything has gone through the trouble of
basically creating new patterns in their
brain to understand the new concept that
they're learning. So, if you're
overthinking the process, you're going
to hold yourself back. and avoid the I
can't learn it mentality.
That's You can learn it. All
right, enough talking.
I love that fire. Let's do it again.
Let's code.
[Music]
We've got an overview of deep learning.
We've got an overview of TensorFlow and
Tensors. It's time to get hands-on. This
is very exciting. So, I'm going to open
up my web browser. This is the tool
we're going to be using throughout
basically the entire course is Google
Collab. So, if we come here to
collab.resarch.google.com.
If you're unfamiliar with Google Collab,
check out the Collab overview. Or if you
just go to collab.resarch.google.com,
google.com. You can go to the examples
tab here and you can open up a whole
bunch of different tutorials to go
through and learn about Google Collab.
If you're just starting out, I'd check
out overview of collaboratory features
or the markdown guide. But as I said, if
you want another overview, check out the
overview video because we're going to be
using Collab a whole bunch throughout
this course. So, let's get started. I'm
going to open up a new notebook here
because we're going to I don't know if
you can hear that, but I'm rubbing my
hands together because I'm so excited.
We're going to get hands-on with
TensorFlow. So, some of the most
fundamental functions of TensorFlow. And
let's give our notebook a title here.
So, let's go 00 TensorFlow
fundamentals. The reason I'm doing 00 at
the start is because we're going to by
the end of this course have probably
about 10 or so of these notebooks. So
the 0 0 at the front just lets us know
what order they come in. So TensorFlow
fundamentals and now let's put in here
in this notebook
we're going to cover some of the most
fundamental
concepts of tenses using TensorFlow.
Beautiful. And we can put a little
hashtag at the front. And then what I
did there was I did commandm and turned
it into a markdown cell. And then if I
press shift and enter, we get another
code cell here. Beautiful. And so to
enable us to write code, we're going to
have to connect here. So just press the
connect tab there. And let's get a
little outline of what we're actually
going to do. So, more specifically,
this is what I do with most of my
notebooks. Whenever I come to something
before writing code, you're going to
hear me say write code as much as
possible, but I just like to give myself
a little bit of an outline so I know the
direction of where I'm heading. So,
we're going to cover
what do we have? Introduction to tenses.
We might as well get some information
from tenses. If none of this makes
sense, don't worry. We're going to code
it up by hand.
Manipulating tenses. Manipulating
tenses. So, changing the information
that's stored within tenses. And then
we're going to go tenses and numpy. If
you've ever used numpy,
tensorflow, you'll find has very very
similar features to numpy. using at TF
function which is a way to speed up your
regular Python functions in TensorFlow
because remember the whole premise of
using TensorFlow is so that we can use
GPUs with TensorFlow
or TPUs to do faster numerical
computing. That's what we're after here.
And at the end we're going to have a few
exercises to try for yourself.
Alrighty, let's just jump straight in.
I'm going to put another little heading
here, introduction to tenses. And then
I'm going to press commandm to turn that
into markdown. Now for another cell
here, I'm going to press command MB. Oh,
I didn't press escape while I had this
cell highlighted. So escape, command MB
will give me a new cell. Now I'm saying
command. However, if you're on Windows,
it's probably control because I'm on a
Mac. It's command for me. So, the first
thing to do is to use TensorFlow is that
we have to import TensorFlow. Now, I
want you to try and follow along as much
as you can with these videos, right?
When I'm writing code, I want you to be
writing code by my side. It's like we're
partner coders here. And if you can't
keep up because I'm writing a little bit
too fast, that's all right. I've had a
lot of practice writing TensorFlow code.
And so again, I've spelled TensorFlow
wrong. Maybe you'll catch my errors
before I do. If you need to slow down
the video or watch something again,
that's perfectly fine. I'm probably
going to need to slow down my code so I
don't write as many typos. So this is
how we're going to import TensorFlow.
TensorFlow becomes the alias TF in
Python. TF is basically universal. You
can put it as what you want, but use TF,
trust me. And then I'm going to use
printtf
double underscore here version to check
what version of TensorFlow we're using.
If you're using Google Collab, you
should be using TensorFlow 2 something.
I'm currently using 2.3.0. By the time
you watch this video, there may be a
newer version of TensorFlow, but
everything that we do in this
fundamentals notebook should still work
perfectly fine. So look at that. We've
already got TensorFlow ready to go. Now,
let's jump in and create our first
tensor. Creating tenses with TF
constant. Now, you're going to see
there's a few different ways to create
tenses, but in general, you actually
probably won't create many tenses
yourself. This is because TensorFlow has
many built-in modules, as we'll get
familiar with throughout the course,
such as tf.io and tf.data, data which
are able to read in your data sources
such as if you had a whole bunch of
different images and automatically
convert them into tenses as long as you
write the code for that that is and then
later on our neural network models will
process these tenses for us but for now
we want to get familiar with tenses
themselves so let's start creating
tenses and what I've written here is the
word scalar which you might not be
familiar with for now but that's all
right we will learn what that is.
Beautiful. So when we create a tensor
with tf.constant,
we get this. We return this. It says
tf.tensor. It has a shape that's empty.
The data type is int32. And in numpy,
this value is 7. So again, numpy and
tensorflow are quite intertwined. Now,
if we want to get the information or the
dock string for what this function is in
Google Collab, you can press
commandshift space. If you're on
Windows, it may be control shift space
whilst you're in the function. And here
we go. Creates a constant tensor from a
tensor-l like object. Or if we just
wanted to search TensorFlow TF.Constant,
constant.
This is what I want you to get really
familiar with is just searching up
something like this and then going into
the documentation
tf.stant yes creates a tensor a constant
tensor from a tensor like object. All
right.
And then we go here example use cases.
So often times if I'm not familiar with
something in tensorflow I'll just search
it up like this. I'll find the example
code here and then I'll just rewrite
that code in the notebook. But let's
keep going. Let's keep writing. All
right. How about we check the number of
dimensions
of a tensor? This is another important
concept that we're going to get familiar
with as we go along. So, nim stands for
number of dimensions. You might be
wondering, Daniel, what's N dim? So we
got scalar n dim
number of dimensions. So zero hm number
of dimensions. What does that mean?
Well, let's keep going. We'll come back
to that in a second. How about we create
a vector?
So we've got a weird word here you might
not be familiar with. Scala. Now another
word you're going to come across quite
often in the deep learning world is
vector. So let's see what they are.
Again, this is the structure of what I
want to go through is write the code
first and then go through the concept or
concept code concept code concept code
vector. Let's see how these two differ.
We have scalar which is a TF tensor has
shape which is blank dype in32 numpy 7.
Okay. Now we created a vector and now we
passed it a Python list to TF constant
which has TF tensor shape equals 2, H.
Maybe that's cuz there's two two items
or two objects within this list. The
data type is int32 and the numpy array
version is 1010. So the same as this and
the D type is int32 for the numpy array
version. Okay. All right. Now, what
about if we check the dimension
of our vector? So, we go vector end.
What do you think it'll be?
One. Okay. So, is that because this
shape is empty? Maybe zero came from
that. And this shape has one element.
So, maybe that dimension came from that.
Hm. Well, let's keep going. create a
matrix.
So a matrix has more than one dimension.
Again, if these terms are unfamiliar for
now, we're going to get very familiar as
we go along. So matrix, we want to go 10
7
and then 7 10. Two of my favorite
numbers. If you ever played poker, my
favorite hand is 107. Matrix. Let's see
what this is.
Okay. TF tensor shape equals 22. Does
that make sense? Two items here,
two items here. Okay. Yep, that makes
sense. D type is int32. So these are
integers. And then the numpy version is
just the same thing here with a data
type of int32.
Wonderful. Now before I write the code,
I want you to have a think. If we write
matrix.nd, and dim.
If the vector had a dimension or number
of dimensions as one and if a scala had
number of dimensions as zero, what do
you think the matrix will be?
3 2 1
boom two. Okay. So, I'm starting to see
the number of elements in shape is
starting to relate. That's where I'm
drawing my own pattern between and dim.
All right. Now, let's create another
matrix, but this time we're going to
try out.
How about we try and specify the data
type.
So, if we go here, DT type, that's what
we want to try and use. So, let's try
and do that.
Create another matrix.
Now, another matrix equals How can we do
this? TF constant.
What do we want to put in? What are your
favorite numbers? I'm going to do my
favorite numbers again. 107.
Oh, you might have noticed something
different about this one already. Is
that we have a dot after our number. And
if you're writing in Python code, what
does a dot typically mean after an
integer?
And eight. And nine. That'll be nice and
simple. And then what did we say we were
going to do? We're going to use a DT
type parameter
float 16.
So we're going to specify
the data type with DT type parameter.
Now the reason I'm getting you familiar
with the DT type parameter is because by
default if we create a tensor with
TF.stant we get the data type int32.
So this is known as 32-bit precision. So
if we go here 32-bit precision,
which is a quite I'm not going to dive
too deep into that.
The concept is the higher the number of
precision uh the more exact these
numbers are stored on your computer. So
if we do float 16, it's a lower number
than 32. So that means storing these
numbers or storing this tensor on our
computer's memory is going to take up
less space. The reason I'm getting you
familiar with this is because if you get
a data type error in the future of your
tenses being the wrong data type, you
can manipulate them using the dtype
parameter. So let's have a look here.
Another matrix.
What does this come up as? There we go.
Boom. TF tensor. The shape is 3x two.
Oh, okay. So, this is rows columns. So,
we have three rows. 1 2 3. And because
we've specified the data type as float
16, we get d type equals float 16, which
makes sense. And our integers here have
a dot after, so they're actually floats.
And then the same goes for our numpy
version of our tensor matrix. Hm. So
what do you think
the number of dimensions is
of
another matrix?
So if we come back up here, what was our
number of dimensions of our matrix here
and our vector and our scala? We've just
created another matrix. If we were to go
another matrix end, what do you think
the output would be based on the shape
here?
Let's find out.
Boom. It's still two. So even though the
shape here is different, the total
number of dimensions, which is the
element I'm really starting to notice a
pattern here. The total number of
dimensions is how many elements is in
the shape. Okay, I've got that down. How
about we why don't we see how might we
increase this number of dimensions? I
know how we can do it. Let's create a
tensor.
So typically with this nmanllete, a
scaler will have no dimensions. A vector
will have one dimension. A matrix will
have two dimensions. And a tensor will
have
we'll see what that has.
Let's go. Tensor can't reveal all my
tricks before we try and code them out.
Remember, if in doubt, code it out. Now,
to create these, you're going to have to
get pretty fancy with where you put your
commas.
So, while I'm creating this, if you're
finding that it's you're like, Daniel, I
can't really follow along. That's okay.
Just wait till I'm finished here, watch
what I'm doing, and then you can pause
the video, replicate what I've got, and
then we'll run it together. All right.
Boom. And then this one has to have two
Now you see how tedious it is creating
tenses from like with your hands. I mean
this is why it's so helpful for
TensorFlow to create tenses for us as
we'll see in future modules. Hopefully
I've got all these little square
brackets and commas in the right place.
So if we look at our tensor shift and
enter what's it going to output? Oh
hello we have how many more elements in
the shape? We have an extra one. Okay.
So there's three. So one, two, three.
Yes. Two. So one, two. And then three
again. One, two, three. Ah, beautiful.
Now if we wanted to check the number of
dimensions of this tensor,
what do you think it would be?
I think you might know this one.
tensor.end dim.
Boom. Three dimensions. Wonderful. Now,
the important thing to note is that
although we've given these different
names, tensor, matrix,
vector, scalar, is that all of these in
TensorFlow throughout the entire course,
we're basically going to continually
refer to these no matter if they're a
three-dimensional tensor or if they're a
two-dimensional tensor, which is a
matrix. We're going to consistently
refer to them as tenses. So if you're
getting confused when I say matrix or
tensor, chances are it's the same thing
throughout this course. And so how about
we write down some little definitions
what we've created so far.
Nice and simple. So scalar, a scalar is
a single number.
A vector is
a number with direction. So, eg wind
speed and
direction.
A matrix is a twodimensional
dimensional
array of numbers. Wonderful. And a
tensor
is an n dimensional. Now when you see n
so the n here for n dimensions stands
for number. Whenever you see n it's
often referred to as number of
something. So n dimensional means it
could be zero dimensional or it could be
up to a thousand different dimensions or
more array of numbers where n can be any
number.
A zero dimension as we said before
dimensional tensor is a scalar and a one
dimensional
tensor is a vector.
Beautiful. So that's what we've created
so far. We've created our first five.
Have we created five? Five different
tenses.
Now in the next video let's start to
we've created them with TF constant.
Let's have a look at another way of how
we can create tenses. This time with TF
dot variable.
I'll see you then.
[Music]
So we've created some of our very first
tenses which is so exciting using TF
constant. Let's look at how we might do
it with our TF variable. So oh again
little tip here. If you want to create a
new cell in collab you can press code or
text. But most of the time I just do the
shortcut. So escape and command and B.
Boom. But I'm going to put a text cell
here. And I'm going to put a little
heading here. creating tenses with then
if we go back tick TF variable.
Wonderful. So let's see what type TF
variable is here. We could jump straight
into it but we want to have some
practice looking up the documentation.
So TensorFlow TF variable.
Now, the first probably I reckon at
least 100 times you read something in
the documentation, especially if you're
new to TensorFlow, it probably won't
make too much sense, but once you've had
a fair bit of hands-on practice,
you'll start to get used to it. So, TF
variable, look how many different
parameters we have here. Far out.
Traditionally, like in in practice, I
don't really read all of these. I go
down and I like to see things being
used. So, here we go. TF variable one
dot. So that should be a float. Yes. V
assign. Hm. What's that? Well, we could
keep going through that, but let's get
hands-on. Again, if you want more about
something from TensorFlow, look it up in
the documentation. Come through these
code examples, write them out for
yourself, and just check what the inputs
and outputs are of the code that you're
writing. So, let's go back. tf.variable.
How might we create the same tensor with
TF variable as above? This is going to
be as you might have guessed by the name
a changeable tensor. Hm. What's that? TF
variable 107. So remember up here we
created a vector
of 10 10. Oh, I said 1010. 107. There we
go. That's what we want. 107. Now we're
doing it here. And let's have a look at
this one. Unchangeable
tensor
TF constant.
And then we want 10 7.
Wonderful. We're going to write these
out. We're going to visualize them.
Unchangeable tensor.
Did I type out the variable names
correctly? Yes, I did. Wonderful. Okay.
Now, we get some different outputs. So
the first one here is our changeable
tensor that we created with TF variable.
So TF variable wonderful shape. Okay. 2,
DT type in 32. We've seen that before.
Numpy. Okay. Now this is TF.tensor which
is the unchangeable tensor that we
created with TF constant. You see these
are output in order here. So this one
first and then this one which is exactly
what we've seen above before. shape 2 DT
type equals in32 numpy array is of that
wonderful. What do you think might
happen if we tried to change one of the
elements in our changeable tensor? It's
all right if you're not sure, but let's
try change one of the elements in our
changeable tensor. So changeable tensor
zero or let's try the index at first.
What comes up is numpy 10. So we've got
the first value here. So the zero index.
So what if we wanted to set that we want
to make a a tensor which is 7 7. So we
want to set that first number 10 to be
seven also. What happens if we try to
visualize it here? H okay we get an
error. resource variable object does not
support item assignment. Okay, what if
we go back to this documentation?
So TF variable, they've created one here
similar to how we've created ours, but
they've only got one element which is a
scalar. Then they've used v.assign.
All right. And then it's changed from
one dot to two dot. So what if we tried
that? So let's come back here. How about
we try dot assign?
All right. Changeable tensor
and the zero element dot assign. And
then we want to change it to seven. Just
seven. What do you think will come out
here?
Let's have a look.
Oo, wonderful. There we go. All right.
So how about if we tried to change a
value in our TF constant tensor or our
unchangeable tensor?
Now let's try change our unchangeable
tensor.
Let's just try to do the same change.
Make it simple. Unchangeable tensor
and zero element. We'll first we'll
index it as well just to check. Same
output here. Numpy 10. And what if we go
equals 7?
What happens there? Oh, object is not a
support item assignment. Okay, so
similar error to what we got up here.
Oh, that's okay. We know the fix. We can
just go assign seven.
Oh, get that bracket back up on there.
And then we'll check out the un
changeable tensor.
Oh,
another attribute error. TensorFlow
Python framework ops eager tensor object
has no attribute assign H. So what do
you think's going on here? We've got a
changeable tensor which is a TF variable
and an unchangeable tensor.
Hm. Where our TF variable
is changeable using the assign attribute
whereas our unchangeable tensor which
kind of makes sense if it's name doesn't
change even when we use the assign.
H
might be wondering why can we change
some tenses and why can't we change
others? Well, it comes down to behind
the scenes
when we're writing neural network code,
we might want some tenses for their
values to be changed, whereas we might
want some tenses for their values not to
be changed. So, there's a variable
tensor and there's a constant tensor.
Now, again, we're going to reiterate the
fact that a lot of the times you won't
have to make the decision between using
a variable tensor or a constant tensor.
the decision we made for you behind the
scenes when TensorFlow creates tenses
for your neural networks.
But we're going to get very hands-on
with that. And so speaking of TensorFlow
creating tenses for our neural networks,
let's have a look in the next video,
which is going to be creating random
tenses. Woohoo. That's exciting. I'll
see you then.
[Music]
So we've seen how to make tenses with TF
constant and TF variable. Let's see how
we might create random tenses.
So actually I'll make that a size three
heading. And I just want you to I'll
point out something here that I've added
to the last little section of code that
we run. This is going to be a trend
throughout the course is that if if
there's like a a little tidbit that you
should be aware of, I'm going to use
this key emoji and add a little note
there. So rarely in practice will you
need to decide whether to use TF
constant or TF variable to create tenses
as TensorFlow does this for you.
However, if in doubt, use TF constant
and change it later if you need. So
that's a little tidbit going forward.
So, if you ever see the key emoji with
something coming after it throughout
this entire course, throughout any of
the notebooks that you use and the
GitHub and whatnot, uh that's just a
little tidbit to take note of for later.
But let's get hands-on creating random
tenses. So,
random tenses
are tenses of some arbitrary
size which contain random numbers. You
might be wondering why on earth would I
ever want to create a tensor. Where's an
example of one of our tenses like this
filled with random numbers? And I have a
great illustration to show you. So let's
go to our keynote slide here.
This is what neural networks use to
initialize their weights. In other
words, the patterns that they're trying
to learn in our data. So if we imagine
this diagram here, we have our inputs.
Say we had pictures of food that we're
trying to classify into ramen or
spaghetti. We might turn that into
tenses. We'll see how to do this in a
future project. So this is our numerical
encoding of our images. Then we might
pass that numerical encoding to the
input layer of our neural network. Then
our neural network might learn
representations which are called
patterns, features or weights. then
output those representation outputs and
then we convert them into something that
we can understand rather than just
tenses. But where does a neural network
get these representations from? Well, in
the beginning, it's going to initialize
itself. Initialize is another common
word you'll hear in deep learning.
Initialize just means start with
initialize its weights or the patterns
it knows with random weights or random
tenses. So this would be a random tensor
to begin with and then as it sees
different examples of photos of ramen
and spaghetti, it's going to update its
representation outputs. In other words,
start to tweak these random weights and
patterns to better suit our data. If we
go through again, we're going to repeat
this cycle with more and more examples.
So if we imagine the float here, we got
images. We numerically encode them. Our
neural network initializes itself with
random weights. In other words, a random
tensor. And then as we start to show it
more examples of what photos of ramen
and spaghetti look like, it starts to
tweak these random weights to be better
adjusted to the patterns that are in our
images so that our representation
outputs line up better with our desired
labeled outputs. So maybe in the
beginning it gets these images, these
two here wrong, but then as it keeps
going, it starts to learn them and it
starts to get them right like these two
here. So let's go back.
That's a brief overview of how a neural
network learns. But this is this that's
the context of where you might use a
random tensor. So how might we create a
random tensor? So let's create two
random but the same
tensors.
Now before we even write any code, we go
how to create random tenses with
TensorFlow.
TensorFlow.random.uniform.
Wonderful. So we could go through that
or we could start to write the code.
Seriously, when I don't know something,
that is the type of search that I will
put into Google and look up. As much as
I'm trying to teach you TensorFlow
itself, I'm trying to teach you how to
search for answers to solve your own
problems because at the end of the day,
I can only show you so much. But let's
uh create random one, which is going to
be TF.random
generator
dot from seed 42.
Hm. What is this seed? So the set seed
for reproduce.
[Music]
So we'll see what the random seed means
in a minute. If you've ever used numpy,
it's very similar to numpy's uh random
seed. So random one equals random
one.normal.
I want you to guess if we put in the
shape parameter here.
So if we come here,
go to this first one.
Random uniform
shape.
Hm. We've got random normal shape 32.
What shape will our tensor be if we were
to press shift and enter?
Ah, TF tensor shape 32. Now, we've used
normal here, but this is uniform. What's
the difference? So, output random values
from a uniform distribution.
What is a uniform distribution?
Here we go. wolfram.com.
A uniform distribution, sometimes also
known as a rectangular distribution, is
a distribution that has constant
probability.
Whoa. So, often times when I read stuff
like this, it takes me a long time to
grasp it. If you're the same, I want you
to realize that that's not necessarily a
problem. It's because
in practice when you're writing these
random tenses, again, a lot of it is
done behind the scenes for you. What I
want you to get familiar with is writing
as much code as possible, like we've
done here, and then starting to see what
the outputs of that code is. The more
and more you do it, the more familiar
you will get with it. So, let's create
another one. Maybe we want, we'll come
up here. Random 2 equals TF random
generator
from seed 42
random 2 equals random
2n normal shape equals 32. You might be
wondering Daniel you've just looked up
uniform and you haven't done anything
with it. So what is normal? Well great
question. So what is TensorFlow random
normal?
Ah TF.random.normal
outputs random values from a normal
distribution. Okay. TF random normal
shape. All right. What is a normal
distribution?
See how we're sort of just pulling the
thread here. This is what I do with any
problem that I'm working on. a function
that represents the distribution of many
random variables as a symmetric
bell-shaped graph images.
Okay, so that's the normal distribution.
Again, you can look into this more if
you want, but we're going to be
practicing writing more random tenses.
So, are they equal?
Random one, random two. And now we'll
use this equality operator to compare
the two. Uh random one equals random
two. Are these equal? Let's have a look.
What do you think? Are they going to be
equal? Oh yes, we do. We have random
one. Remember these come out in order.
And then random two. And so they've come
from a normal distribution.
This equality operator is this last one
here, which is saying that the top left
element is matching this top left
element. True. The top right is matching
here and here. So that comes out true.
And then we get the same for the rest of
the array. Beautiful. So I want you to
know that the random tenses here,
although these appear pretty random,
these numbers, they're actually pseudo
random numbers. So they appear as
random, but they really aren't. That's
because we've set the seed here.
Setting the seed says something along
the lines of um h hey tesaflow create me
some random numbers but flavor them with
x where x is the seed. So what do you
think would happen if we changed the
random seed to my favorite number which
is seven. I have a dog called seven.
She's beautiful. What do you think will
happen here? Ah we get some slightly
different random outputs but they're
still the same between random one and
random two. Okay, we've seen how to
briefly create some random tenses. And I
mean, if we go how to create random
tenses with TensorFlow, there's going to
be a whole bunch of of more different
ways. But if we had our random tenses or
if we had just non-random tenses, what
if we wanted to shuffle the order that
the variables appear in here? Let's uh
let's have a look at that in the next
video.
[Music]
In the last video, we checked out how we
can create some random tenses. And we
tied that back to the concept of when a
neural network starts to learn, it wants
to learn patterns in some sort of data
set, it starts off with random patterns
and then slowly adjusts them as it
continually learns on more and more
examples. So if we come back here in
this video, let's see how we might uh
shuffle the order
of
what should we call it elements
in a tensor. All right. Hm. Why would
you want to shuffle the order of
elements in a tensor? Let's go back to
our example here. So let's say we were
working on a food image classification
problem.
And we had 15,000 images of ramen and
spaghetti. Let's keep it nice and
simple. And the first 10,000 images were
all of ramen. And the last 5,000 images
in our folder, like they're in order. So
the first 10,000 all of ramen and the
last 5,000 are all of spaghetti. Now,
this order could affect how our neural
network learns. So if it goes through
these images in order, it might start to
adjust its random weight too much so to
the images of ramen. Because if it goes
through 10,000 images of ramen, it's
like, okay, well, I only have to learn
what ramen looks like. It doesn't know
that it has to also learn what spaghetti
looks like until it goes to the last
5,000 images. So instead, it might be a
good idea to just mix up all the images
in our folder so that they basically
have no order at all. So we might have
ramen, ramen, ramen, spaghetti,
spaghetti, ramen, spaghetti, ramen,
spaghetti, ramen. And then the neural
network can be fed different examples of
different images and adjust its internal
patterns or weights, the random tenses
it got initialized with to learn both
types of images at the same time. Let's
go back to our collab notebook and let's
see how we might shuffle a tensor. So we
go shuffle
a tensor. So this is valuable for when
you want
to shuffle your data.
So the um inherent
order doesn't affect learning.
Now what we might do is create another
tensor. We'll get some practice creating
tenses here. Not shuffled equals TF
constant. And then we might just create
a very similar tensor to what we created
above. 10 7 my favorite combination of
numbers. My favorite poker hand. And
there we go. And 25. Why not? There we
go. So a little test here is if we did
not shuffled
N dim, what do you think that's going to
output?
Let's find out.
Two. Okay. Why might that be not
shuffled?
Ah, because it has a shape of 32. So
there's two elements in the shape
attribute. Beautiful. So let's see how
we'll shuffle this. Maybe if we just go
how to shuffle a tensor in TensorFlow.
TF.random.shuffle.
Okay, beautiful. So randomly shuffles a
tensor along its first dimension. TF
random shuffle seed equals none value.
Okay, there's an example output.
Let's try it out. So the value, what's
value? This is how I read the
documentation. So the arguments, so the
value is a tensor to be shuffled. That's
what we want. And the seed could be a
Python integger
used to create a random seed for the
distribution. C random dot set seed for
behavior. Okay, let's get hands on with
this. So
shuffle our nonshuffled
tensor.
So let's go tf.random dotsh shuffle
and then oh look at that. The dock
string appears for us. If we wanted that
on our own we could press command shift
space. See I'll come
in collab it automatically. Sometimes
the dock string just automatically comes
up if you're just chilling in here. But
we can press shift command space or if
you're on Windows shift control space.
There we go. We just get the exact
information from the TensorFlow
documentation. How handy is that? So
let's pass it. What does it take? Args
value a tensor to be shuffled. Let's
pass it in not shuffled.
Okay. What do you think is going to
happen here? Well, let's find out
together.
Boom. Okay. Look at that order there. So
if we get 10 7 3 4 25 and now the orders
changed up here.
Okay. So there's still 25 still 107
still 34.
Okay. What it's done is if we come back
up here
randomly shuffles a tensor along its
first dimension. H. So if we look at not
shuffled, it's got a dimension of 3 * 2
which is 1 2 3 elements. So it's it's
shuffled it along its first dimension
which means it's shuffled it along this
dimension here. So the three 1 2 3. So
it means that the elements in the second
dimension which is 10 7 3 4 25 have
stayed in order but they've been
shuffled around. So 10 7 was at the top,
but now it's in the middle. 34 was in
the middle, but now it's at the bottom.
And 25 was at the bottom, but now it's
at the top. Okay. Now, what if we were
to run this cell again? I want you to
take note. Actually, let's copy this.
So, copy that. And I'm going to add
another code cell here. I'm going to run
this cell here.
Okay. 1073425.
Same as before. But what if we run it
again?
10725 34. Okay, different order.
Notice how every time I run it, it's a
different order.
And I mean, there's only three elements
here. So sometimes you're going to get
the same. Now, what if we were to put
seed equals 42?
Cuz if we come here, a Python integer
seed used to create random seed for the
distribution. What happens if we do
that? Seed= 42.
C equals 42.
Ah,
we get
still different results.
So, what if we did TFT random
dot set seed 42?
Okay. 1073425.
1073425.
1073425.
Hm.
Okay. 1073425.
Now we're getting the same order. Hm.
That's quite confusing. So, how might we
figure this out? If we come back here,
um, seed, we set that to 42, but we
didn't get the same results. So, seed,
it says here, use to create a random
seed for the distribution. C TF random
seed for behavior. Okay. So, let's click
on that. Here we're kind of pulling the
thread of what our problem is. We're
trying to figure out how to set random
seeds or random operations in
TensorFlow. So operations that rely on a
random seed actually derive it from two
seeds, the global and operation level
seeds. H. So I think this one might be
the global level seed and this one might
be the operation level seed because it's
it's it's a part of an operation here.
That's my intuition.
Now, its interaction with operation
level C's is as follows. Okay, so we've
got four rules here. I'm not going to
read these out, but what I want to do is
put this here
and I want you to start exploring or
start practicing. So, this is your
homework for this lesson is to
so this one can be here. If you see this
symbol here, this means exercise. So
exercise,
we're going to go read through
TensorFlow
documentation
on random seed generation
and practice
writing five
random tenses and shuffle them. That's
your takeaway for your homework for this
lesson. It's all right if you want to
skip over that and just go straight to
the next video, you can. But to really
understand the concept of TensorFlow
creating tenses, just go through and try
and create maybe five different tenses
here with tensor coke with TF constant.
Then shuffle them. Then try to get the
reproducible shuffled order using TF
random set seed and a combination of
seed and then figure out seed. Just see
what happens. If in doubt, run the code.
You can't break it. So have a practice
of that and I'll see you in the next
video. We'll see a few other ways to
make tenses.
[Music]
In the last video, we went over the
concept of shuffling the order of
elements in our tensor. And we tried it
out a few times, but we ran into some
problems using the global random seed
and the operation level random seed. But
the main intuition behind shuffling the
order of tenses is that if we were
trying to build a food image
classification neural network and we had
15,000 images of food, 10,000 images of
ramen, 5,000 images of spaghetti and our
neural network, all it saw was the ramen
images first, the patterns that it
learns may be too aligned with what's in
a ramen image rather than a spaghetti
image. So that's why we might want to
shuffle the order of our images so that
the patterns that our neural network
learns uh are tuned to both kinds of
images throughout the entire training
cycle. So let's go back here. We also
left off jeez last video was full on. We
left off with an exercise to read
through the TensorFlow documentation.
Again, the first few times or the first
probably hundred times you read through
the documentation, you might not fully
understand it, but with a lot of
practice, you'll start to get the
concepts. And so, what did you find? Did
you try to create five random tenses and
shuffle them? I'll tell you what I found
after reading these rules.
Number four, I think, is what relates
most to us. It says, if both the global
and the operation seed are set, both
seeds are used in conjunction to
determine the random sequence. Okay. So
if we come back here, if we set TF
random.shuffle,
we try to shuffle our
so this is the operation level seed.
So if we run that,
okay, we get a different order each
time.
But then
if we set the global
random seed,
what do you think's going to happen?
So we'll go here global level
random seed
operation level
random seed.
Do we get the same order every time?
1073425
1073425
1073425. Beautiful. We could keep going,
but we're just going to trust that this
rule is correct. So here, if we come
here, let's write down here.
It looks like if we want
our tenses, or maybe our shuffle tenses,
that's a better way of saying it, our
shuffled tenses to be in the same order.
We've got to use the global level random
seed as well as the operation level
random seed. And then we'll put in here,
if you want to put in a quote, you can
do this little arrow thing here. Rule
four. There we go.
So again, now this might not make a lot
of sense of why you've want reproducible
tenses, but as you start to run more
deep learning experiments,
you'll often find is that because a
neural network initializes itself with
random patterns, you could get different
results every single time you run this
experiment. So to make reproducible
experiments, you probably want to
shuffle your data in a similar order,
initialize with a similar random pattern
and then run through this experiment.
But for now, just be aware that if you
want to set the random seed, TensorFlow
has a few rules that you have to adhere
to in order to get reproducible
randomness.
Whoa, that's a bit of a tongue twister.
So let's have a look at some other ways
to make tenses. Other ways to make
tenses.
The first one is if you're familiar with
numpy, you can do numpy ones. What's
this? Numpy ones.
Let's have a read. Return a new array of
given shape and type filled with ones.
So for many numpy operations because
numpy is one of the most prevalent
numerical computing libraries out there.
TensorFlow has similar operations. So
TensorFlow ones creates a tensor with
all elements set to one. Okay, let's try
that out. So TF1's what happens here?
Let's check the dock string. Can we do
that? Yeah, there we go. TF1s. It looks
like we have to pass it a shape and then
a data type. So why don't we make it
107? What happens?
Wonderful.
Create a tensor of all ones.
And then we want to go maybe create a
tensor
of all zeros. Oh, how do you think we
might do that? If we created a tensor of
all ones, how might we create a tensor
of all zeros? I want you to try and
guess this before we even bother looking
it up. Let's go TF zeros. And then what
shape should we pass this under? So this
is in square brackets shape you can also
pass in curly brackets here or the
curved brackets
how you'd create a tpple 3 4 put a space
there wonderful okay so if we wanted to
create a tensor of all ones and all
zeros we can do that using TF1's or TF
zeros very similar to numpy and another
thing speaking of numpy we might as well
cover that while we're here you can also
turn numpy arrays
into tenses. Whoa. Into tensorflow
tenses. So remember the main difference
between or well actually I don't think
we've covered this but actually let's
put this here.
We want text. So I'm going to push
command mm. We go here. Turn numpy
arrays into tenses.
Um the main difference between numpy and
tensorflow
this should be numpy arrays
is that tenses can be run on a GPU
much faster for numerical computing.
Otherwise they're very similar. So let's
try this out. We want to import numpy as
np and then we want to create numpy a
this capitalization
often you'll see a matrix created with a
capital
constant some matrix and then often
you'll see a vector
with a non capital. So that's the a
little tidbit there too. So we'll put
that there.
And then we'll go little capital for
matrix
or tensor non capital
for vector. So we want to make a just a
simple numpy array range
d type is going to be int
actually we'll make it int32.
So this is going to create you might be
able to guess. So numpy a range between
1 and 25. So create a numpy array
between 1 and 25. Let's have a look at
that. Numpai a.
Now how do you think you might turn this
into a tensor? So if we come back to our
TF constant documentation
creates a constant tensor from a tensor
like object value. Okay. What's value?
Where's the dock string for that value?
A constant value or list of output type
DT type. Hm, it doesn't really give us
much, but let's try. If in doubt, code
it out. A equals TF constant. Do you
think we can just pass our
numpy a
this one here directly to TF constant? I
mean, what do you think is going to
happen if we do this?
Oh, wonderful. There we go. We've now
just converted our numpy array. So, if
we have a look here, the output here is
an array that's of type numpy. Then, if
we go here, we've just passed it to TF
constant. Now, it's into the form of a
tensor. How beautiful is that? So,
anything we've got in numpy, we can
convert to a tensor by just passing it
to something like TF constant. Now, what
if we wanted to change the shape of
this? So, this is a shape 24.
How about if we wanted to make it into a
three-dimensional tensor? Yeah, that
sounds fun. So, let's change it into 2 3
4. Is that the right 2 * 3 * 4?
Now, why do you think I checked that?
We'll see in a second.
Hey, look at that. Maybe we'll make this
B equals TF constant numpy A.
And then we'll go A
B.
Woohoo. So there's a change shape one.
Our first one A. We've got the shape
modifier there. So this is 1 2. So two
is the first dimension. And then three.
1 2 3. And then four. 1 2 3 4.
Wonderful. But the unmodified shape is
just the same shape as our original
numpy array which is just 24. So this is
a vector and this would be a tensor
because it's got more than one
dimension. Now the important thing to
know about shape is that if you want to
readjust the shape of a tensor or an
array, the new elements here must
add up to give the same number of
elements in the original tensor. So what
if we tried here? What's going to
happen? It doesn't work. Eager execution
of TF constants with unsupported shape.
Value has 24 elements. Shape is 2 35
with 30 elements. So if we go here. Hm,
that doesn't work. We go there.
Wonderful.
We need it to equal 24. So what about 3
* 8? Let's try that. 3 * 8. We're kind
of getting a little bit of ahead of
ourselves here with this reshape. Oh,
look at that. Beautiful. So 3 * 8 = 24.
So that works. So this is 1
2
3 rows and then 1 2 3 4 5 6 7 8. Eight
elements per row. Beautiful. Okay. So
we've seen how we can turn numpy arrays
into tenses. And we've seen how to
create tenses with all ones and all
zeros. It's probably time that we get a
little bit more information from our
tenses. So let's make that in our next
video. So getting information from
tenses.
All right. So have a play around. Create
some numpy arrays turn them into
tensorflow tenses and then try to adjust
their shape so they fit into a different
size and then you could even check the
number of dimensions
of a.
But otherwise, give that a go and I'll
see you in the next video. We'll see how
we can get more information from our
tenses.
[Music]
We finished up the last video checking
out how numpy arrays, which are a very
common form of of representing numerical
data, can be used or actually more so
converted into tenses. And the main
difference being between a numpy array
and a tensorflow tensor is that although
they may store the same information
here, so this array here, numpy a has
the numbers 1 to 24 and this tensor here
B has the numbers 1 to 24. Is that
because this one's in a tensor format,
it can run on a GPU, which we'll see
later on in the course is a lot faster
than a nonGPU chip at finding patterns
in numerical data. that is. Let's have a
look at how we'll get a bit more
information from our tenses.
But before we do that, I just want to
show you a little tidbit. So, I took a
break after this lecture and came back
to my notebook. And what happened was
the runtime disconnected. Now, you can't
see it here, but we've got a green tick
saying we're connected. But when you
disconnect in Collab,
you might find that oh, it's still
there. But if the memory has gone, so
for example, if we did A and it told us
that A didn't exist, what I like to use
is go to runtime and then run before.
What this is going to do is run every
cell before the current cell. Meaning if
we've instantiated some variables such
as numpy A, those variables are going to
get reinstantiated and TensorFlow is
going to get reimpported, etc. We don't
necessarily need to do it now because or
in my case because it seems that collab
has remembered my variables. But if you
do come into that case, go runtime run
before. And as you'll see, all the cells
here are going to run. Now they run
pretty quickly. we get the same error we
got before, but then we can just go
shift enter, run, keep running down to
where we were, and then all of the
variables we've been working with will
be in reinstantiated.
Wonderful. So, with that being said,
let's get back to our getting
information from tenses. There will be
times where you want to get different
attributes of your tenses. Let's have a
look. Let's set the following vocabulary
at least. So we want shape. These are
the most important tensor terms, axis or
dimension. And then you'll also want the
size. So if we come here to our keynote,
we'll have a look at how to do those in
code in a minute. But if we go on to the
next slide, these are some of the main
tensor attributes here. So we got shape,
axis or dimension, and then size. And
the meaning here the shape is the length
or the number of elements of each
dimension of a tensor. And we can get
that by going tensor.shape.
The rank the number of tensor
dimensions. For example, a scala has
rank zero or zero dimensions. A vector
has rank one or one dimension. And a
tensor has rank n or n dimensions where
n can be almost any any number zero and
above. And then we can get that by going
tensor.end dim. We've seen that a few
times. Access or dimension. So this is
how you access a particular dimension of
a tensor. For example, if we wanted to
index on the first dimension, tenses a
zero index. So the first dimension is
actually 0 1 etc. And then if we wanted
to get all of the elements in the zeroth
dimension and then the one axis, we can
use some indexing like this. We'll see
this in a minute. And then if we wanted
the size attribute, the size is the
total number of items or elements in a
tensor. And we can access that by using
tf size. So let's go back.
When dealing with tenses,
you probably want to be aware of the
following
attributes. Boom. Shape, access, rank,
size. I said that out of order, but that
doesn't really matter. So the probably
the most important one here will be
shape, but we'll see that again in
practice. Let's create a tensor.
Create a rank four tensor. Now if I say
rank four tensor, we come back here.
What does that mean? Rank the number of
tensor dimensions. Hm. So what might
that look like? We want four dimensions.
We come in here. We want rank for tensor
equals TF.zer.
What does TF.zeros do? This is a little
test from before. We just covered this
one. 2 3 4 5. So remember this is
probably the shape parameter here. So
rank four, we've got 1 2 3 four
dimensions here. Now let's have a look
at this rank four tensor.
Beautiful. Now this is all zeros.
So if we see here this is one. So this
is the first axis. This is one
two. Then the next one is three. So
we've got within the one and two we've
got one
2
three. Now this will take some getting
used to. The only reason I'm able to
sort of deduce which is which of these
elements here is because I've had a lot
of practice writing different tenses.
Now if we look at four within the three.
So 1 2 3 we've got 1 2 3 4 and now we've
got five. So within the four 1 2 3 4
we've got 1 2 3 4 5. Now, as we'll see
later on, I know I keep saying later on,
we're getting the fundamentals down pat
here, is you'll probably spend most of
your time dealing with lining your
shapes of your tenses up. When you pass
a tensor into a neural network, it
typically has to be in a certain shape.
And then the output tensor that comes
out of a neural network, so in other
words, the the patterns a neural network
has learned, the tensor again has to be
in a certain shape. So it's good to be
able to deduce different elements of a
tensor by its shape. So if we wanted
rank for
tensor um let's get its zero element.
What does that look like?
Which is this three tensor here. So see
how we've indexed onto this second shape
here. Now the shape changes because
we're getting the first we're getting
this set of three here. This set of
three tenses. And so the shape here is 1
2 3 1 2 3 4 1 2 3 4 5. Again, it'll take
quite a bit of practice to get used to
that, but that's just a brief overview.
Now,
what we might do is get all of the so
the shape, the rank, axis or dimension,
and the size of our tensor. So, we just
saw the zero axis. Let's go rank for
tensor dotshape. You might be able to
guess what this is already. Rank four
tensor. Nend dim. So the number of
dimensions. And then let's get the size
of our tensor. Before I even type any of
these, you might be able to deduce what
they actually are already. So what's the
shape? Probably going to be similar to
what we set the shape as. The nim,
remember that's the number of dimensions
in our tensor. And the size is what? If
we come back to our keynote, the size is
the total number of items in the tensor.
Out of all of these, I think size will
probably be the one you use least often,
but I'm just putting it here for
completeness. So, if we go there, boom.
So, we look, these come in order again.
So, the shape is 2 3 4 5. Yep. Uh, the
number of dimensions is four. That makes
sense cuz there's 1 2 3 4. Beautiful.
And then the size is there's 120
elements. Now, what might that be? I'll
give you a hint. 2 * 3 * 4 * 5. What do
you think this is going to equal?
120. Beautiful. Okay, how about we print
out some various attributes of our
tensor, make them a bit more readable.
So I want to show you get various
attributes of our tensor. This is what I
often do to sort of create pretty I call
this pretty print statements. I mean
there is a function called pretty print
but
um when I'm trying to figure out or
visualize my tensor data I typically set
up a bunch of print statements like this
especially when I'm inspecting the
outputs of a neural network.
So print so I want to get the data type
of every element.
Boom. That's what I want. And then I
want to get the number of dimensions
which can be the rank.
And again, typing this stuff out is is
tedious when you first do it, but it
does help visualize. I mean, we could
put this into a function if we wanted to
later on. Print. And now, how about the
shape of our tensor?
That's going to be rank for
tensor.shape.
I'm not sure why that disappeared.
Beautiful. And then now how many
elements are along the zero axis. So
elements along the zero axis.
Oh access doesn't have that. And we want
to go rank for
tensor zero
dot shape. Oh no. Maybe we want
shape zero. That's what we want. And
then how about
we want elements
along the last axis.
So rank four tensor shape. Now negative
1 we can use for the last axis. So
instead of we have 1 2 3 4 here. We
could have used three because it's a
fourth index, but we're going to use
negative 1 just to grab the last index.
That's a little trick for Python
indexing. And then we want to go the
total number of elements in our tensor
which will be
TF size rank
for is this going to auto complete rank
for tensor H
that should be ready to go let's have a
look
what do we get okay data type of every
element D type is a float so okay that
Makes sense cuz we've got zero dot.
Yeah. Number of dimensions rank four.
Yes, that makes sense. 1 2 3 4. Shape of
tensor. Yes, we've seen that before.
Elements along zero axis. So, two
elements there. Yeah, beautiful.
Elements along last axis five. That's
correct. Wonderful. And the total number
of elements in our tensor is TF tensor
120. Oh, numpy 120. Now, that output can
be a little confusing. So, I want to
show you just a trick to convert it into
a numpy integer. So you can just go here
with a tensor output for a lot of them
you can just add dot numpy on the end
and we'll watch the conversion here. So
this is there or actually we might do it
with and without just so you can see the
difference. I'm just going to copy that
line.
Boom. There we go. So the total number
of elements in our tensor that's it
coming out as a TF tensor type. But if
we add the numpy on the end, we get it
in this single element, which can be
handy if you didn't want to have the TF
tensor type. You just want the element
itself. Try using numpy at the end
there. All right. Wonderful. Since we
now know a fair few pieces of
information on how to get them from our
tenses, we'll practice one very
important thing is being able to index
on tenses. So, we'll do a little bit of
practice of that in the next video. So,
I'll see you then.
[Music]
So in the last video we checked out how
we can get various attributes from any
tensor that we can create such as the
data type, the number of dimensions, the
shape and then various other attributes
of it which can be helpful when we're
trying to figure out just what's going
on with our tenses because I mean
although we can see this one often times
with neural networks you'll be dealing
with tenses that you can't actually
visualize, meaning that they're so big
that you won't be able to just look at
them. So, you need to be able to to use
code to find different information or
different attributes about them. So, now
let's have a look. Say we did have a
tensor like this and we want to index
it. So, the beautiful thing about tenses
is that oh, we don't want that. I just
want to turn that into text. So, we'll
go here. Indexing tenses. So tenses can
be indexed just like Python list. So if
you ever done indexing on Python list,
we'll see how it relates here. But if
not, that's okay because we're just
going to get hands- on as we do. That's
the theme of this course. So get the
first two elements
of each dimension in our tensor. So if
you have had experience with indexing
Python lists, how do you think you might
get the first two elements of each
dimension of our rank four tensor? So
remember, we've got 1 2 3 four
dimensions here. And we want the first
two elements. In our case, it's going to
be zero for every single one of them,
but we want the first two elements of
each dimension.
So give that a try. If not, I'm going to
start writing the code to do it and
we're going to have a look at it in a
few seconds. So, this is how I would get
or how you would index a tensor to get
the first two elements of each
dimension. There we go. So, just like a
Python list. So, if we make here
sum list equals 1 2 3 4 and we want the
first two elements of some list.
We go here. Use a colon. We'll get one
two. Beautiful. So, now we can do the
same but with our tensor. We just have
multiple dimensions. We separate them by
column. So let's have a look at what the
output of this is.
Wonderful. Because our tens is all
zeros. We get 2 22 2 and each value is 0
0 0 0 0.
But can you see the shape here? This is
the first the outer tensor. This is the
second one on the zeroth dimension. And
then we have two here. One two. And then
we have two within these brackets. So
one, two, and then the final two, the
elements within the most inner brackets.
So 0 0. Again, takes a lot of practice,
but just look at these diagrams and
start to count the tenses yourself of
tenses that you can visualize. As a
warning, if the tenses are too large,
it'll probably separate them with three
dots, but we'll get to that in a bit.
How about if we wanted to get the
dimension or get each dimension from
each index except for the final one?
Now, does that make sense? So, we want
get each dimension
from each index except for the final
one. Oh, sorry, that doesn't make sense.
Get the first element from each
dimension from each index except for the
final one. Okay, how might we do that?
Actually, let's try it with our sum
list. If we wanted the first element
one, beautiful. Now, we want the first
element from every dimension except for
the final one. So, let's try that. Rank
for tensor and we want
first element from each dimension except
for the last one. Will that work? We
don't have the last one there. Let's
try.
Wonderful. 1 one five. Beautiful. So
that means all of our tenses kind of get
flattened. All the other shapes get
flattened into one, but we've still got
five. Beautiful. We can get the same
thing if we add this colon here. So this
colon, if it's there without a number,
it just means get the whole thing.
So there we go. Same output as before. 1
1 5 for the last dimension. Now if we
wanted to change this up, say we wanted
the except for the second last one. What
do you think this is going to output? If
we go here rank four tensor if we want
to get the second last axis and we go
shape.
So we're getting the first element from
each dimension of our tensor except for
the second last one. So what shape is
this going to output if we run this
line? Let's try.
Wonderful. So there we go. 1 4 1 again.
and plenty of practice. But that's how
we're going to learn things. So there we
go. 1 3 1 1. Then we could do the same
here
with the zero axis.
Beautiful. Okay, we'll convert that back
to that comment there. Makes sense. Now,
what else might we want to do? If we've
got our tensor of this shape, sometimes
what we might want to do is expand or
reshape. So in this case, we've seen a
little bit before on how to reshape a
tensor, but let's practice changing or
adding an extra dimension to the end of
a tensor. So how about we create a rank
two tensor which has how many
dimensions? Two dimensions.
So we can do that by rank two tensor.
You can pause the video here if you want
and try to create your own rank two
tensor without me coding.
But we want TF constant, my favorite
numbers, 107.
And then how about 3 4 cuz 3 and four
add to seven. And now let's have a look
at our tensor. Rank two tensor.
We could even get some attributes about
it. Rank two tensor. N dim.
Again, lots of practice here figuring
out information about our tenses. Okay.
The shape two. Yep. And dim. That's
exactly what we want. Now, if we wanted
to get the last item of each row
of each row of our rank two tensor, how
might we do that? So, again, let's
remind ourselves what it looks like.
Rank two tensor.
Let's remind ourselves what our Python
list looks like. Sum list. How do we get
the last element of a list?
Sum list
one.
So there we go. So if we want to get the
last item of each of our of each row of
our rank two tensor, well the row comes
first. So row column. So we might go hm
rank two tensor. So the last item one.
Does that make sense? Let's see.
Does that get it? So 7 4. Beautiful.
That's what we want. If we wanted to add
an extra dimension onto this shape here.
So maybe we have 2 2 1. We can do so in
two ways. Um, and now this is helpful
for later on when we're creating neural
networks and we need to alter the size
of our tenses so that their shapes line
up. So might not seem like it's very
helpful now, but I just want to sort of
plant the seed so that when we come
across it in future videos, it's not
like, wa, Daniel, we haven't covered
this method before. So let's go in here,
add in extra dimension to our rank two
tensor. So we want to turn our rank two
tensor into a rank three tensor, but
keeping the exact same information that
is stored in our rank two tensor. So
we're not going to change these numbers.
We're just going to change the shape
that the numbers fit into. And we can do
that by going here dot dot dot. Now this
is a little bit of fancy notation. You
might not have seen this before. We can
go TF dot new axis. Let's see what this
does. Oh, we need to visualize our
tensor, don't we?
Beautiful. So, you see what that did
there? That's now added a dimension on
the end of one. So, let's look this up.
TF dot new axis
TF tensor.
If we search again, I'm just pressing
command F. You might press Ctrl+ F. TF
dot new axis.
There we go. So one of eight. It appears
eight times. So insert another
dimension. Does it have the actual
definition here?
Notes. TF new access is none as in
numpy. So if you're using numpy, you
probably use none. In TensorFlow, it's
TF.axis.
Doesn't actually say what it does.
sometimes where you'll find the
documentation. It can be quite hard to
sometimes find the exact definition of
this method, but you can kind of deduce
what it does by running through the
examples as we see here. So, let's go in
there. It's better off just to remember
if in doubt, write the code, try it out.
So, the other alternative to using this,
oh, by the way, these three dots means
on every previous access to this one. So
this means instead of just going um like
this. So see how we've got access 0,
access one, access two in this case, add
an access on the very end. We can just
scrap these and go every access before
the last one include those because
that's dot dot dot and add a new access
on the end. So the other alternative to
TF new access is
alternative to TF
new access. You might also see TF.expand
dims. So that stands for expand
dimensions. And then we're going to pass
it rank two tensor. And then we want to
expand it on the final axis.
So -1 means expand the final axis.
There we go. We get the exact same
output as this notation here. It's just
slightly different. So if we want to go
here, tf.expand
dims.
There we go. We got some documentation
for this one. So we have the input. We
come down. It says what is the input?
The input is a tensor and the axis is
integer specifying the dimension index
at which to expand the shape of input
giving an input of n of dimensions. So
if we come back here, what do you think
will happen if we go access equals zero.
So look at our shape there. What I might
do is keep that there. We're going to
retype this out. TF expand dims rank 2
tensor. We want as much practice as
writing TensorFlow code as possible,
right? So expand the zero axis.
So let's try that.
Boom. So now instead of being on the
end, our extra dimension is at the
front. And then we can even change it.
We want the extra dimension in the
middle. Wonderful. And then if we want
to put it on the end, we can go negative
one. Beautiful. So we might put that
there. Expand the zero axis. And so
notice even if we run this and we check
out rank 2 tensor, the numbers inside
stay the same 10734
except the dimensions change. So how
those numbers are stored changes. We've
covered a fair bit in terms of uh
getting information from our tenses and
also indexing our tenses. Now let's have
a look at how we might we've done we've
actually done a little bit of
manipulation here too but the next
videos we might go through manipulating
tenses in other words known as tensor
operations. So if we do have elements
within our tenses how do we manipulate
those and how do we combine them into
different ways. So go back through,
practice expanding the dimensions of
tenses, practice getting different
attributes from tenses that you've
created. And then in the next video,
we'll see what kind of tensor operations
we can work with.
[Music]
If your data is stored in some sort of
tensor format, finding patterns in those
tensor formats often involves
manipulating tenses. Now again, when
building models in TensorFlow, much of
the pattern discovery within tenses is
done for you. However, often times that
pattern discovery is through the
extended use of a few basic operations.
So let's start getting into those
otherwise known as tensor operations.
We'll start off with basic operations.
Now, do we have a tensor? So, you can
add values to a tensor using the
addition operator.
And by the way, when I say basic
operations, I'm talking about the
default Python operator such as plus,
minus,
uh, multiplication,
and then if we want divide, we can go
like that. Right? So we can add values
to a tensor using the addition operator.
So let's create a tensor
equals TF constant and we might make it
107 my favorite and 34. And now if we go
here + 10 what do you think will happen?
So this is our tensor here. And if we
just use the basic operation of tensor +
10.
Wonderful. We get 10 + 10 is 20. 7 + 10
is 17. 3 + 10 is 13. And 4 + 10 is 14.
Wonderful. Now you'll notice that the
original tensor
is unchanged. This is important
because when we're manipulating tenses,
we don't necessarily always want to
change the underlying tensor. The only
way it'll change is if we go tensor
equals tensor + 10.
We'll reset it. Now, this is going to be
+ 20, isn't it? There we go.
We'll come back here.
There we go. Tensor. But if we got rid
of this
tensor,
we rerun this cell. Oh, invalid syntax.
And then we rerun this one,
we still get the same values as what our
original tensor was. And we might just
press enter there. So it's more uh
succinct with how it comes out.
Beautiful. So of course other operations
also work. So multiplication
also works. So if we wanted to go tens *
10
same thing we get 100 cuz 10 * 10 100 70
30 40 etc. and then subtraction
if you want.
So tens minus 10
can go into the negatives. That's no
problem. Tensores can store a whole
range of different values. In addition
to the operators like the Python
operators, we can also use an equivalent
TensorFlow function. Now what this means
is that if we do use the TensorFlow
function, so that means we're accessing
the TensorFlow library. Let me just show
you. because it's easier to talk about
something if we can see it. We can use
the TensorFlow
built-in
function 2. So TF do multiply
um and we want see here we get the dock
string here we get TF. multiply. So
we'll have a look at that in a moment
but I just want to demonstrate it to
you. Tensor 10 we get the same output
here. So if we go here, let's look up
tf.mmath.m multiply.
So for a lot of these TF domath
functions,
so TF domath, we can do the alias of
just TF multiply. So if we go TF
multiply, just going to lead us to the
same thing. Wonderful.
Now, does it say here what the benefits
are?
It doesn't quite. But what I'm going to
get you to do is just trust me here. uh
when I say that if you have to do some
sort of tensor operation like this and
again usually a lot of these will be
done behind the scenes for you but if
you want your code to be sped up on a
GPU typically it gets sped up when you
use the TensorFlow version of some sort
of operator. So if we wanted to do
addition, I mean this is going to be
very fast because it's only a small
tensor and we could do tensor plus 10.
But if you want to have the advantages
of tensorflow built in to your basic
operations, use the tf math something
tensor tensor to manipulate your tenses.
But here again, even though we've used
the tensorflow function here, the
original tensor is still unchanged.
So these are the basic operators. create
some tenses, have some practice at
running the addition, minus,
multiplication, and division on them.
And also do the equivalence of TF do
multiply or TF. Or something like that.
Is it TF do addition? TFMath
addition or is it just add?
Yeah, add. Okay, that's your little
homework for this brief video here. But
in the next one, we're going to have a
look at a very important concept in
neural networks and that is matrix
multiplication. We'll see how to do that
with tenses.
[Music]
So, we left off in the last video
figuring out how we can manipulate our
tenses with the basic operations. And so
hopefully you've tried out a few of
these for yourself. But now we're going
to go on to matrix multiplication. So in
machine learning,
matrix multiplication
is one of the most common tensor
operations.
So the ones we've been through already,
these basic operations are often
referred to as elementwise operations.
So that means that if we go here for the
addition for example we've got our
tensor here which is 10 734. Now element
wise means go through one element at a
time and just add 10. So this element
add 10. This element add 10. This
element add 10. This element add 10. But
there's a few different types. So matrix
multiplication and the dotproduct are
not necessarily element wise. So if we
go here what is matrix multiplication?
How to multiply matrices? Math is fun. I
really like this domain name actually.
Let's have a look here.
A matrix is an array of numbers. To
multiply a matrix by a single number is
easy. So 2 * 4 = 8. Yes, that's what
we've done, right? So that's element
wise. So 2 * 0 = 0. 2 * 1 = 2. 2 * 9 =8.
Beautiful. However, what if we wanted to
multiply a matrix by another matrix? And
remember, because we're using
TensorFlow, even though this is a
matrix, we're referring to them as
tenses as well. So, I want you to read
this as matrix and tensor
interchangeable. But to multiply a
matrix by another matrix, we need to do
the dotproduct of rows and columns. What
does that mean? Hm. Let's see an
example. So, the dot product is the
first row and the first column. Okay, so
we've got a row here times this column
equals 58. Okay, the dot product is
where we multiply matching members. Then
sum them up. So 1 2 3 * 7 911. Okay. 1 2
3 * 7 911. Yes. Equ= 1 * 7. Okay. Yep. 2
* 9. Yep. 2 * 9. 3 * 11. Okay, I've got
that. That equals 58. H then we do it
again for the second column and that's
64 and then again again again a great
website I like to visualize this is I
believe matrix multiplication.xyz
XY Z. We'll go to that. So here,
beautiful. We can even customize these.
So we'll go here. 1 2 7. My favorite
number. 7 2 1 just to be fancy. And then
maybe three 33 cuz I like three as well.
And then this can just stay the same. So
might zoom in here. So let's go
multiply. What happens? Whoa. We get
some. So that tensor just came up there
and jumped on top of that. Then if we go
through, we go 1 * 2, 2 * 6, 7 * 1.
Okay, so that's 2 + 12 +
7. H, how did this get to 21?
2
+ 12 + 7.
H, we might try a different set of
numbers. How did that work?
Five. Maybe our matrix multiplication
demo is busted. Multiply.
Let's try 1 * 3 + 12
+ 5. Does that make sense? So 3 12 + 5.
Yeah, that makes sense. 15. So this is 3
+ 12 is 15 + 5 is 20. There we go. And
if we step through there, step through,
step through. Wow, we get the output
there.
Before we even start to write code, go
to this website, start practicing
around, see what happens. And after
you've done that for about a minute or
so, or maybe 3 minutes, give yourself a
few different goes. Let's go back to the
notebook and see how we can do
this exact operation in TensorFlow code.
So, I'll give myself a second there, but
I'm going to start coding. So you can
pause the video now if you want. Try out
that website and come back. So let's see
matrix multiplication in TensorFlow.
What if we just Google that? Matrix
multiplication in TensorFlow.
Matt Mull. There we go.
TF.L,
which is short for linear algebra.
um matt
multiplies matrix A by matrix B
producing A * B A B
and we get a whole bunch of
documentation which is beautiful. We
could read through that or we could
write the code ourselves. So what do we
have? Do we have a tensor? Print our
tensor. We can print it if we want.
Print. Yes, we do. Now tf matt mole. Now
for a lot of these operations you can
usually drop the intermediary here. So
lin alge this is a little trick of
tensorflow you can just do tf.mmat mall
which is what I'm about to do here. tfm
map tensor tensor.
Can we matrix multiply those? Oh yes we
can. Wonderful. We get 121 98 42 37.
Beautiful. Now if you wanted to do the
matrix multiplication,
Python has an operator
A.
Now you see how the outputs here
actually are different to if we just did
what if we did tensor time tensor,
we get different outputs here. Uh, okay.
Because if we have a look at our tensor
again,
this is elementwise. So, it's just gone
10. Maybe we'll print it out twice.
10 * 10 is 100. 7 * 7 is 49. 3 * 3 is 9.
4 * 4 is 16. Whereas with Matt Mole,
we've gone in this fancy little way
here.
So, maybe I've got an idea. we recreate
this tensor.
That could be a little exercise for you
as well is to recreate this tensor in
TensorFlow and recreate this tensor in
TensorFlow. So the same elements in the
same shape. This would be a what? 1 2 3
a 3x3 tensor and this would be a 1 2 3x
two tensor. So 3x two and then run
tf.m map mule on
left tensor right tensor. See if you get
the same outputs as here. But we come
back here. If you wanted to do matrix
multiplication with a Python operator,
we can do tensor at tensor.
So the at symbol in Python actually is
for matrix multiplication.
Now both of these examples work because
our tensor variable this is an important
fact. This is where we come in and we
check the shapes attribute because our
tensor.shape
attribute is two and two. So both of
them have the same shape of two and two.
But what if we wanted to do matrix
multiplication on tenses which had
differing shapes?
Now rather than just think about it,
let's try. So create a tensor of maybe
32
similar to this one here. So this is 32.
Remember rows come first. 1 2 3 columns
1 2. Let's create a similar one. Create
a 32 tensor equals TF constant. And we
want one two. We'll keep this one nice
and simple. 3 4
5 6. Beautiful. And then we want create
another 32
tensor. And in this case it can be Y TF
constant. And we'll just increase this
one. So that can go from 7 8
9 10
11 12.
Wonderful. Let's have a look at both of
those. X Y.
Okay, so we've got two tenses here. Both
of the same shape, 3x two, and they got
slightly differing elements. Now, what
if we tried to matrix multiply them? So,
let's try to matrix multiply tenses of
same shape.
We could do it like that.
See what happens. Ah, matrix size
incompatible. in 32 in 32. What if we
try the TF map? What happens here? X Y
invalid matrix size incompatible. Same
thing 32 in zero. So this is our first
element here, our zero element. And this
is our other element here, our first
element which is also 32. H.
This is where we have two rules for
matrix multiplication. Do these websites
tell us?
H why do we do it this way? Actually,
this is a great example. I'm going to
put this as an external resource. If we
come back, if you ever see this emoji in
the course, this is a resource. So,
resource info and example of matrix
multiplication.
So, check that out. If you ever see that
little book emoji, that's an external
resource that I definitely recommend.
Now, let's go back to here. Why doesn't
this work? I don't think this web page
explicitly states it, but I'm going to
tell you anyway. And you can look this
up. You'll find this uh wherever you
find matrix multiplication is that there
is a couple of rules that our tenses or
matrices need to fulfill. So, come up
here. Let's put in here. There are two
rules.
our tenses
or matrices
need to fulfill if we're going to matrix
multiply them. Now, rule one, the inner
dimensions
must match.
And rule two, the resulting matrix has
the shape of the inner dimensions.
So knowing these two rules, I'm going to
set you a challenge is before the next
video, we're going to go through these
two rules and see how we can fix our
problem here. But considering these two
tenses, so we've got if I say rule one
is that the inner dimensions must match.
What are the inner dimensions here? H
maybe two and three, they don't really
match. So how would we get this the
inner dimensions of these two matrices
to match? That's your challenge. If
you're not sure, that's completely fine.
But that's your challenge before the
next video. If you can get it done,
amazing work. If not, we're going to go
through that in the next video. So, I'll
see you there.
[Music]
In the last video, we kind of left off
on an error, and we said in this video
that we're going to fix it. So let's do
that. Now we tried to matrix multiply
two tenses of the same shape,
specifically X and Y. They're both of
the shape 32. We got this little error
here saying in 0 is 32 and in1 is 32. So
matrix size incompatible. So if we come
up here back to the rules we set
ourselves, the inner dimensions must
match and the resulting matrix has the
same shape of the inner dimensions.
H. Now, if you had a go at fixing this
and you got through it, excellent work.
You may want to skip over this lecture.
However, if you didn't make it, that's
completely fine. Matrix multiplication
takes a little while to get your head
around. If we come over here, I've got
to slide. Matrix multiplication is also
known as the dot product. So when we
call the code TF.mmat mole and we have a
matrix of some size here and a matrix of
some size here. Remember a lot of the
operations that we do can be on matrices
or tenses of an arbitrary size. But just
for illustration purposes I've done this
size here. So here's our first rule. So
numbers on the inside must match. And
there's our second rule. New size is the
same as outside numbers. Okay. So if we
come back, does that line up with what
we've said here? The inner dimensions
must match. The resulting matrix has the
same shape of the inner dimensions. Oh
no, of the outer dimensions. There we
go. Getting our own rules incorrect. So
we come back. Let's have a look at
what's going on here. We've got TF
maple. We've got one matrix or tensor
here. A B CDE E F G HI, which is the
shape of 3x3. Wonderful. And this will
run because the numbers on the inside
match. See these two threes here with
the red background and this is of shape
3 2. So 1 2 3 and two columns. Okay, so
that matches and then we're going to
result in here. If we did the dot
product, which is the same as matrix
multiply in TensorFlow,
we have A * J plus B * L plus C * N. And
then we we follow through with that rule
for every single cell in these two or
every single element in these two
tenses. And now the new size is 3x two.
One row, two row, three row, two
columns, which is the blue numbers here.
Okay, wonderful. And then if we swap
these elements from letters to actual
numbers, so the same shapes here, we
take 5 * 4. Okay, this is the same as a
* j. Yes. 0 * 6. Okay. 0 * 6. Yes. Which
is the same as B * L. Wonderful. And
then we have 3 * 8. Mhm. Which is the
same as C * N. All right. Now again
looking at these diagrams, they're quite
for the first I would say 20, 30, 50
times you look at dot products and
matrix multiplications, it takes a lot
of repetition to get used to. But then
if we have 5 * 4 = 20 + 0 * 6 = 0 + 3 *
8 = 24. So we add these three up, we get
44 in the top left. Repeat the same for
each element and we get this resulting
matrix here. Now the live demo we've
already had a look at matrix
multiplication.xyz. This is what's
happening here. We reset and multiply.
We take that we flip it on the top. Then
we go here, we get 20 and we get the
next two elements. The next two and the
next one. So this is a moving diagram of
what's going on here. So let's go back
to our code. Let's fix it up. Let's make
it work. So knowing our two rules,
number one, that the inner dimensions
must match and that the resulting matrix
is the same shape as the outer
dimensions. How might we get this to
work?
Well, we're either going to have to
change the shape of this matrix or
change the shape of this matrix. Again,
I'm using the term matrix and tensor
interchangeably here. Or we're going to
just have to create an entirely new
tensor with the same shapes on the inner
axises. So, let's see it in action.
Let's change the shape of Y. So we can
change a shape using tf dot reshape y
and then we enter the shape that we'd
like. In our case it is currently 32. So
maybe we change it to 2 three and let's
see what happens.
Oo wonderful. Let's see the original y.
Beautiful. So we can see that the nine
has come up to here in this one and
we've now got 10 11 12. So this has got
two rows and three columns. Okay. So
maybe we go try to multiply X by
reshaped Y. So X and we might have to
put matrix multiplying here so we know
what type of multiplication we're doing.
X TF reshape Y and then shape 2 three.
Now you see what's happening here. So if
we remind ourselves what shape X is.
X is shape 32. Let's pay attention only
to the shapes. TF.shape
Y shape equals 23
dotshape.
Paying attention to our rule that the
shapes of the inner dimensions. So this
tensor multiplied by this tensor, the
inner dimensions must match. Does that
rule get fulfilled?
To me, it does because we've got an
inner dimension of two here and an inner
dimension of two here. Let's test it
out. Beautiful.
It looks like that worked. So we come
here. We want to make a new cell.
What if we tried to use TF domat on
this? So if we go TF matt
x tfreshape
and then we want yshape equals 2 3. Is
this going to work as well? Beautiful.
Now we get the same outputs there. Now
hold on. Now do you think if we were to
reverse this it would work as well. Now
I mean we reshape X to be a different
shape instead of reshaping Y. If in
doubt code it out. So X TF reshape X.
We're going to change that to be 2 three
this time.
And we're going to not change the shape
of Y. Let's see what happens.
Ah,
what happened there?
So, let's see.
We'll just put a little note here. Try
change the shape of X instead of Y.
So, you see here how this one has the
shape 3, but this one has the shape 2.
That's because in this operation
we multiply two tenses with the shape 3
2 and 2 3. So the resulting tensor
finishes with the dimensions of the
outer dimensions. Whereas in this case
if we go TF do reshape X um shape= 2 3 Y
dot shape we also want the shape here.
This is our other rule of matrix
multiplication.
Because we reshaped X, the resulting
matrix here ends up in a shape of 2 two
because it becomes the outer dimensions.
So see here, back to our rule, numbers
on the inside must smash. New size is
the same as outside numbers. Okay? So in
both these cases depending on which
tensor we reshaped the matrix
multiplication works but the resulting
output is different. Now it's a very
important point when you're dealing with
matrix multiplication or dotproducts. It
really depends on which tensor you
manipulate in terms of what your output
would be. So see here how different
these two outputs are.
Again we're multiplying the same numbers
but just in different shapes. So that's
an important concept to be aware of. And
now we can do the same thing with
reshape as with transpose. So let's have
a look. This is another important tensor
transformation.
So can do the same with transpose.
However, as you'll see in a second,
transpose is slightly different to what
reshape is. So transpose and if we do TF
reshape
X shape equals 2 3.
So this is the transpose tensor that the
same shape as reshaping it. However,
you'll notice with the transpose is that
well let's get X the first one as well.
It starts off as 1 2 3 4 5 6. Then the
transpose is 1 35.
Hm. The odd ones are up the first row.
And then 2 4 6. And then the reshape is
1 2 3 4 5 6. Hm.
So the difference between transpose and
reshape is that transpose flips the
axises whereas reshape just shuffles the
tensor around into the shape that you
want. So if we go here
tensorflow
transpose
tfranspose here we go transpose a where
a is a tensor. Wow that's a great
description. Uh permutes the dimensions
according to the value of perm h the
returns tensor's dimension i will
correspond to the input dimension perm
i. If perm is not given it is set to n
minus one to z where n is the rank of
the input tensor. Hence by default this
operation performs a regular matrix
transpose on 2D input tenses. Okay. So
we see here 1 2 3 4 5 6 transpose. Now
the shape is 32.
Again we could try a bunch of different
examples but let's go back and write
some different code here. So what we
might want to do is because transpose
also changes the shape of our X tensor.
Let's try try matrix multiplication with
transpose rather than reshape.
So TF map TF transpose
X and Y. What might be the output here?
Let's find out.
H 89 98 116 128. Now see how that is
different here. Now that is because
transpose flips the axises rather than
shuffles around the elements of a
tensor. This can be quite confusing.
However, this kind of data manipulation
is a reminder that you're going to spend
a lot of your time in machine learning
and neural networks reshaping your data
in the form of tenses to both prepare it
to be used with various operations such
as feeding into the model and once you
get it out of the model to be able to
deduce patterns from it and convert it
into something human understandable. Now
again, the numbers that we're dealing
with are just basically toy numbers. to
see an example of how matrix
multiplication is actually used. This is
a great example. This may seem an odd
and complicated way of multiplying, but
it is necessary. I give you a real life
example to illustrate why we multiply
matrices this way. So, example, the
local shop sells three types of pies.
And then we have details here. And if
you go through this, this is actually a
dot product to work out the value of
sales for any given day. Now, this is
just a small example here, but in the
case of neural networks, they're
actually going to go through an absolute
multitude of these different numerical
transformations to find patterns in our
data. The good news is most of the time
is that once we've set up our neural
network code, once we've gotten our
tenses into the right shape, a lot of
these are done behind the scenes for us.
So with that being said, let's uh finish
this video here. And in the next video,
we'll go through a little bit more.
We'll really cement down or we'll get
some more practice with transposing and
reshaping different matrices and
practicing matrix multiplication.
So I'll see you there.
[Music]
We've had a little bit of hands-on
practice with matrix multiplication and
transposing and reshaping different
tenses. We're going to do one more video
on it to really nail it down. So, let's
call this one the dot product. And we
want to turn that into a markdown cell.
Beautiful. So, let's go here. Matrix
multiplication is also referred to as
the dot product. And now you can perform
matrix
multiplication using we want TF dom
matrix which is short for matrix
multiplication or we can do it with TF
dot tensor dot. We haven't seen this one
yet but they essentially do the same
thing just with different parameters.
So we come here. We've also gotten
hands- on with matrix multiplication
using matrix multiplication.xyz
and we've seen this colorful example of
the dot product. But let's once again
get a little bit more hands-on with some
code. So let's see how we might use
tensor dot. Again I'm typing I'm
spelling tensor wrong. I'm going to do
that throughout this whole course,
aren't I? There we go. So perform the
dotproduct on X and Y. Now this requires
X or Y to be transposed. We've also seen
how transposing
results in different outputs than
reshaping. And transposing is flipping
the axis whereas reshaping is just
reshuffling. So if we come here and if
we reset this, I want you to have a
guess at what type of operation this is.
Is this a reshape if I go like that and
flip it or is that a transpose?
I'll give you a hint. What did we do? We
had to flip the axises to get it up like
that. So in that case, it's going to be
a transpose.
Now let's use TF tensor dot. And
actually before we do that, let's remind
ourselves of what X and Y are.
X is a tensor of shape three and two.
And y is a tensor of shape 32. They have
increasing elements from 1 through to
12. Beautiful. Now if we go tf transpose
x and y. Oh, I've typed y as a mini
case. There we go. Tensor dot. Oh,
missing argument axises. So we want to
type in on the first axis.
Beautiful. 89 98 116 128. Now if we look
up TF tensor dot
what is this going to tell us?
Tensor contraction of A and B along
specified axises and outer product.
Tensor dot also known as tensor
contraction sums the product of elements
from A and B over the indices specified
by AISE and B aises. Now again you can
read through this if you'd really like
to or you can practice coding it. I
would suggest doing both. Now in this
case you might notice that we could use
transpose or reshape. However we get
different results. Now let's try matrix
multiplication by transposing Y and
reshaping Y. We've done it with X. Let's
try it with Y. So
perform matrix multiplication. There's
going to be a lot of repetition when
manipulating tenses because we want to
get practice. Y transposed. So TF Matt
mall let's take X and we'll transpose Y
transpose. There we go. Now perform
matrix multiplication
between X and Y reshaped.
TF domatt
you might be wondering why I'm just
continually showing you different ways
of manipulating matrices.
It's because you're going to spend a lot
and a lot of your time reshaping tenses
and matrices into the shape that you
want them. So, as if we can get as much
hands-on experience as we can, it might
not mean too much to us now because
we're just working with toy data, it
means later on down the track when we
have to reshape and manipulate our
tenses and matrices, is that we've got
all of this practice under our belt.
Wonderful. So we see we get different
results from transpose and reshape. Now
to really demonstrate the fact here or
the fact that calling reshape and
transpose on Y don't necessarily result
in the same values is that we're going
to go through the same steps we did up
here before for X. But let's make it a
little bit more pretty. So check the
values of Y. This is how I really try to
understand things. And transposed Y. So
if I am writing some code as I've said
before and I'm I'm wondering why I'm
getting different outputs of my tenses
and the calculations that I'm making
aren't really sort of making sense in my
head. By the way, this slash n is for
new line.
As I've said, one of the most common
errors you're going to get in writing
neural network code is missshaped
tenses. And even more like sort of just
as common but deceivingly or deceiving
error that is is when your tenses line
up with shape but the outputs that
you're getting. So you get no error
message. It just works. So it's
important then to be able to investigate
what's going on and figure out that's
called a silent error is that when your
code works like no error gets outputed
from the code. itself, but the results
just clearly aren't correct. So,
this is the sort of thing that I will do
to investigate those silent errors.
TFRpose Y. That's what we want to do.
So, I will create some sort of print
statement like this. And then I'll use
this to really get my head around what's
going on. So, this is our normal Y. 7 8
9 10 11 12. Beautiful. Now, this is
reshaped again, just reshuffled. We
bring the nine up here. We move the 10
across. The 11 comes up there. The 12
goes on there. So 7 8 9 10 11 12.
Beautiful. Now this is the transposed.
So all we've done here is just flip the
axises. Let's recreate this one. Nice
and simple. So 7 8 9 10 11 12 over here.
We'll just let that run through. This is
a great way to just visualize. Let me
reset this.
7 8
9 10 11 12. Okay, we come back to this
tab. You see how we've just got the same
tensor here. 7 8 9 10 11 12. This is the
same over here. Beautiful. Now watch.
This is a transpose. We're going to
bring this up. Flip it. Boom. That's
what has happened here.
You see? Except this one is upside down
in this case. So it just displays it
slightly differently. But the shape and
where the values end up in this case is
what's going to happen if we were to
write say some code like this. So if we
go tf.m mall xtf transpose
y
beautiful there we go. So this is a sort
of investigative work that we do. We've
had a fair bit of practice now
transposing and reshaped. If you're
still unclear what the difference is
between transpose and reshape or you're
wondering which one should I use? Well,
most of the time these operations will
be done behind the scenes for you. So,
if you write neural network code, it's
going to do a lot of matrix
multiplication behind the scenes to
figure out different patterns and
numbers. What sort of patterns are they?
Well, remember here's a simple example
of the kind of patterns you can work out
with matrix multiplication, but again,
it'll be different depending what
problem you're working on. But
generally, whenever you're performing a
matrix multiplication on two different
tenses and the shapes of the matrices or
tenses don't line up, you will transpose
one of the tenses rather than reshaping
them. So, I'll write that here.
Generally when performing matrix
multiplication
on two tenses
and one of the axises
doesn't line up, you will transpose
rather than reshape one of the tenses
to satisfy
satisfy. Yeah, satisfy the matrix
multiplication
rules.
All righty. So again, if there's
anything that here that doesn't quite
stick out or make sense, I would
definitely encourage you to create a
whole bunch of different tenses, try to
use TF maple on them. Try to use tensor
dot if you want or try to even use the
at operator between them. Generally, if
we're using TensorFlow, we're going to
be using map mole, but uh get some
errors, fix the shapes with transpose
and reshape and see what kind of
different outputs you get. So, we've
covered matrix multiplication. Now,
let's have a look at if we've got the
default data type is often int32.
So, sometimes depending on what data
we're working with, the data type might
not be int32. So, what if we had to
change the data type of a tensor?
That's what we'll look at in the next
video. I'll see you there.
[Music]
All right. So, as we said in a previous
video, the default data type of most
tenses will be end 32 depending on how
they've been created. However, sometimes
you'll want to change the data type of
your tensor. So, let's see how we'll do
that. So, we can create a new tensor
with the default data type which is
float 32. Yeah, let's start off with
float. Now, I just said that the default
is is int32, but if we go
the default data type will actually
depend on what data is inside your
tensor. So if there are floats inside
your tensor, we will have B. DT type
TF float 32. Now if we create C equals
TF constant. Now in here we want 710 and
this is going to be C DT type
int 32. Wonderful. So in the current
version of TensorFlow that I'm using
which is uh TF I believe it's 2.3
version
2.3.0. You may have a slightly different
version depending on when you're
watching this video. Uh at this time if
we create a tensor with floats inside it
they're going to be of float 32 type.
And if we create a tensor with integers
inside it they're going to be of
tf.int32 type. Now let's say if we
wanted to change from float 32 to float
16. Now this is called reduced
precision.
Let's go 32bit precision.
What does this mean?
So I'm reading a book about OpenGES
development.
A float in Java has 32 bits of precision
while a bite has 8 bits of precision.
And this might seem like an obvious
point to make, but there are four bytes
in every float. Okay, so that's in Java.
That's not what we want. So 32bit
precision tensor verse 16 bit precision
tensor.
Mix precision. Here we go. TensorFlow.
Why did we ever doubt the TensorFlow
documentation for having something? Mix
precision is the use of both 16 and
32-bit floatingoint types in a model
during training to make it run faster
and use less memory. Okay. By keeping
certain parts of the model in 32-bit
types for numeric stability, the model
will have a lower step time and train
equally as well in terms of evaluation
metrics such as accuracy. Using this
guide, this API can improve performance
by more than three times on modern GPUs
and 60% on TPUs.
Today, most models use float 32D type,
which takes 32 bits of memory. There we
go. However, there are two lower
precision D types, float 16 and Bflat
16, each which takes 16 bits of memory
instead. Beautiful. Modern accelerators
can run operations faster in the 16 bit
D types. Okay, so the main takeaway from
this is that in TensorFlow the default
is 32bit precision, right? However, we
can also do with modern pieces of
hardware, we can also do um float 16
which takes 16 bits of memory instead.
and modern accelerators which are
talking about hardware accelerators. So
when we run our code on hardware can run
operations faster in the 16 bit DT types
that is what's exciting. So in our case
if we wanted to change our numbers from
32-bit precision in other words storing
this tensor in 32 bits on memory we
wanted to change it to float 16. How
might we do that? Well let's have a
look. We can use tf.cast cast and then
we're going to pass it our original
tensor and then its new DT type is going
to be TF float 16.
Oh, and then we want to do B. Find out
B.
Wonderful. DT type equals float 16. Uh
the original B. What we might do is type
out B and then B. DT type. So we'll
reinstantiate B up here. Or maybe it's
better just to go D here.
Yeah, that's a better idea, Daniel.
Wonderful. Look at that. We got float 32
and now D is the same as B, but it's
been stored as float 16. So again, might
not matter too much for tensors of only
two elements. But if we had a tensor of
a million elements and we reduced the
floating point size from 32 to 16, we've
basically haveved the amount of space
our tensor is taking up on memory
allowing a hardware accelerator to make
calculations on it potentially twice as
fast. Now again that scale may be
different because there's a lot of
different things that go into uh
computing scale and speeds and whatnot.
But that's that's the concept I want you
to imagine is that although these
numbers here take up less space on
memory, they may be able to compute
faster. And the same thing goes for if
we wanted to change from int 32 to float
32, we can do it in a very similar
manner. So let's go change from int 32
to float 32. So we can go e equals tf
cast.
Which one do we make? C D type equals TF
float 32. Now let's have a look at E.
Beautiful. So C was originally
there we go. DT type equals int32 and
710. We just changed it using TF cast.
So E is now take C and turn it into a
float 32. Beautiful. And now we can take
E
float 16 equals TF cast E DT type equals
TF float 16 E float 16.
Wonderful. So there are many different
data types in TensorFlow. So if you ever
come across the the need to change your
data type of a tensor because some
calculations aren't working correctly,
TF cast is going to be your friend. So
with that being said, let's have a look
in the next video. We're going to go
aggregating.
We'll see what that means. So have a
play around with this one. Create some
tenses of float type of integer type and
then change them from you can go from
float 32 to float 16 or you could go
from int 32 to float 32. See how many
different combinations of uh of data
type you can create and change before we
see how we can aggregate our tenses.
[Music]
If your data is stored in some sort of
tensor format, finding patterns in those
tensor formats often involves
manipulating tenses. Now again, when
building models in TensorFlow, much of
the pattern discovery within tenses is
done for you. However, often times that
pattern discovery is through the
extended use of a few basic operations.
So, let's start getting into those
otherwise known as tensor operations.
We'll start off with basic operations.
Now, do we have a tensor? So, you can
add values to a tensor using the
addition operator.
And by the way, when I say basic
operations, I'm talking about the
default Python operator such as plus,
minus,
uh, multiplication,
and then if we want divide, we can go
like that, right? So, we can add values
to a tensor using the addition operator.
So, let's create a tensor
equals TF constant, and we might make it
107, my favorite, and 34. And now if we
go here + 10, what do you think will
happen? So this is our tensor here. And
if we just use the basic operation of
tensor + 10.
Wonderful. We get 10 + 10 is 20. 7 + 10
is 17. 3 + 10 is 13. And 4 + 10 is 14.
Wonderful. Now you'll notice that the
original tensor
is unchanged. This is important.
Because when we're manipulating tenses,
we don't necessarily always want to
change the underlying tensor. The only
way it'll change is if we go tensor
equals tensor + 10.
We'll reset it. Now, this is going to be
+ 20, isn't it? There we go.
Oh, come back here.
There we go. Tensor. But if we got rid
of this
tensor,
we rerun this cell. Oh, invalid syntax.
And then we rerun this one, we still get
the same values as what our original
tensor was. And we might just press
enter there so it's more uh succinct
with how it comes out. Beautiful. So of
course other operations also work. So
multiplication
also works. So if we wanted to go tens *
10,
same thing we get 100 cuz 10 * 10 100 70
30 40 etc. And then subtraction
if you want.
So tensor minus 10
can go into the negatives. That's no
problem. Tensores can store a whole
range of different values. In addition
to the operators like the Python
operators, we can also use an equivalent
TensorFlow function. Now what this means
is that if we do use the TensorFlow
function, so that means we're accessing
the TensorFlow library. Let me just show
you because it's easier to talk about
something if we can see it. We can use
the TensorFlow
built-in
function too. So TF do multiply
um and we want see here we get the dock
string here we get TF domat do multiply.
So we'll have a look at that in a moment
but I just want to demonstrate it to you
tensor 10 we get the same output here.
So if we go here let's look up
tf.mmath.m multiply.
So for a lot of these TF.mmath
functions,
so TF domath, we can do the alias of
just TF multiply. So if we go TF
multiply, just going to lead us to the
same thing. Wonderful.
Now, does it say here what the benefits
are?
It doesn't quite. But what I'm going to
get you to do is just trust me here when
I say that if you have to do some sort
of tensor operation like this and again
usually a lot of these will be done
behind the scenes for you. But if you
want your code to be sped up on a GPU
typically it gets sped up when you use
the TensorFlow version of some sort of
operator. So if we wanted to do addition
I mean this is going to be very fast cuz
it's only a small tensor and we could do
Tensor + 10. But if you want to have the
advantages of TensorFlow built into your
basic operations, use the TF math
something tensor tensor to manipulate
your tenses. But here again, even though
we've used the TensorFlow function here,
the original tensor is still unchanged.
So these are the basic operators. Create
some tenses, have some practice at
running the addition, minus,
multiplication, and division on them.
and also do the equivalence of TF do
multiply or TF do addition or something
like that. Is it TF do addition? TFMath
addition or is it just add?
Yeah, add. Okay, that's your little
homework for this brief video here. But
in the next one, we're going to have a
look at a very important concept in
neural networks and that is matrix
multiplication. We'll see how to do that
with tenses.
[Music]
So, we left off in the last video
figuring out how we can manipulate our
tenses with the basic operations. And
so, hopefully you've tried out a few of
these for yourself. But now we're going
to go on to matrix multiplication. So in
machine learning,
matrix multiplication
is one of the most common tensor
operations.
So the ones we've been through already,
these basic operations are often
referred to as elementwise operations.
So that means that if we go here for the
addition, for example, we've got our
tensor here, which is 10 734. Now
elementwise means go through one element
at a time and just add 10. So this
element add 10. This element add 10.
This element add 10. This element add
10. But there's a few different types.
So matrix multiplication and the
dotproduct uh not necessarily element
wise. So if we go here, what is matrix
multiplication?
How to multiply matrices? Math is fun. I
really like this domain name actually.
Let's have a look here.
A matrix is an array of numbers. To
multiply a matrix by a single number is
easy. So 2 * 4 = 8. Yes, that's what
we've done, right? So that's
elementwise. So 2 * 0 = 0. 2 * 1 = 2. 2
* 9 =8. Beautiful. However, what if we
wanted to multiply a matrix by another
matrix? And remember, because we're
using TensorFlow, even though this is a
matrix, we're referring to them as
tenses as well. So I want you to read
this as matrix and tensor
interchangeable. But to multiply a
matrix by another matrix, we need to do
the dotproduct of rows and columns. What
does that mean? Hm. Let's see an
example. So the dot product is the first
row and the first column. Okay, so we've
got a row here times this column equals
58. Okay. The dot product is where we
multiply matching members then sum them
up. So 1 2 3 * 7 91. Okay. 1 2 3 * 7
911. Yes. Equ= 1 * 7. Okay. Yep. 2 * 9.
Yeah. 2 * 9. 3 * 11. Okay. I've got
that. That equals 58. H. Then we do it
again for the second column and that's
64. and then again again again a great
website I like to visualize this is I
believe matrix multiplication
doxyz
we'll go to that so here beautiful we
can even customize these so we'll go
here 1 2 7 my favorite number 7 2 1 just
to be fancy and then maybe three 33
because I like three as well and then
this can just stay the
So might zoom in here.
So let's go multiply. What happens?
Whoa. We get some. So that tensor just
came up there and jumped on top of that.
And then if we go through, we go 1 * 2 2
* 6 7 * 1. Okay. So that's 2 + 12 +
7. Hm. How did this get to 21?
2
+ 12 + 7
H. We might try a different set of
numbers. How did that work?
Five. Maybe our matrix multiplication
demo is busted. Multiply.
Let's try 1 * 3 + 12
+ 5. Does that make sense? So 3 12 + 5.
Yeah, that makes sense. 15. So this is 3
+ 12 is 15 + 5 is 20. There we go. And
if we step through there, step through.
Step through. Wow, we get the output
there.
Before we even start to write code, go
to this website, start practicing
around, see what happens. And after
you've done that for about a minute or
so, or maybe 3 minutes, give yourself a
few different goes. Let's go back to the
notebook and see how we can do
this exact operation in TensorFlow code.
So I'll give yourself a second there,
but I'm going to start coding. So you
can pause the video now if you want, try
out that website and come back. So let's
see matrix multiplication in TensorFlow.
What if we just Google that matrix
multiplication in TensorFlow?
Matt Mull. There we go.
TF.L
which is short for linear algebra
um matt
multiplies matrix A by matrix B
producing A * B a B
and we get a whole bunch of
documentation which is beautiful. We
could read through that or we could
write the code ourselves. So what do we
have? Do we have a tensor? Print our
tensor. We can print it if we want.
Print. Yes, we do. Now TF Matt Mull. Now
for a lot of these operations you can
usually drop the intermediary here. So
lin alge this is a little trick of
tensorflow you can just do tf.m mall
which is what I'm about to do here. TF
map tensor tensor.
Can we matrix multiply those? Oh yes we
can. Wonderful. We get 121 98 42 37.
Beautiful.
Now if you wanted to do the matrix
multiplication
hyphen has an operator
a.
Now you see how the outputs here
actually are different to if we just did
what if we did tensor time tensor
we get different outputs here. Ah, okay.
Because if we have a look at our tensor
again,
this is element wise. So, it's just gone
10. Maybe we'll print it out twice.
10 * 10 is 100. 7 * 7 is 49. 3 * 3 is 9.
4 * 4 is 16. Whereas with Matt Mole,
we've gone in this fancy little way
here.
So, maybe I've got an idea. we recreate
this tensor.
That could be a little exercise for you
as well is to recreate this tensor in
TensorFlow and recreate this tensor in
TensorFlow. So the same elements in the
same shape. This would be a what? 1 2 3
a 3x3 tensor and this would be a 1 2 3x
two tensor. So 3x two and then run
tf.mmap mule on
left tensor right tensor. See if you get
the same outputs as here. But we come
back here. If you wanted to do matrix
multiplication with a Python operator,
we can do tensor at tensor.
So the at symbol in Python actually is
for matrix multiplication.
Now both of these examples work because
our tensor variable this is an important
fact. This is where we come in and we
check the shapes attribute because our
tensor.shape
attribute is two and two. So both of
them have the same shape of two and two.
But what if we wanted to do matrix
multiplication on tenses which had
differing shapes?
Now rather than just think about it,
let's try. So create a tensor of maybe
32
similar to this one here. So this is 32.
Remember rows come first. 1 2 3 columns
1 2. Let's create a similar one. Create
a 32 tensor equals TF constant.
And we want one two. We'll keep this one
nice and simple.
3 4
5 6 beautiful and then we want create
another 32
tensor and in this case it can be y tf
constant and we'll just increase this
one so that can go from 7 8
9 10
11 12
wonderful let's have a look at both of
those x Y.
Okay, so we've got two tenses here. Both
of the same shape, 3x two, and they got
slightly differing elements. Now, what
if we try to matrix multiply them? So,
let's try to matrix multiply tenses of
same shape.
We could do it like that.
See what happens. Oh,
matrix size incompatible in 32 in 32.
What if we try the TF map more? What
happens here? X Y
invalid matrix size incompatible. Same
thing. 32 N0. So this is our first
element here, our zero element. And this
is our other element here, our first
element which is also 32 H. This is
where we have two rules for matrix
multiplication. Do these websites tell
us?
Hm. Why do we do it this way? Actually,
this is a great example. I'm going to
put this as an external resource. If we
come back, if you ever see this emoji in
the course, this is a resource. So,
resource info and example of matrix
multiplication.
So, check that out. If you ever see that
little book emoji, that's an external
resource that I definitely recommend.
Now, let's go back to here. Why doesn't
this work? I don't think this web page
explicitly states it, but I'm going to
tell you anyway. And you can look this
up. You'll find this uh wherever you
find matrix multiplication is that there
is a couple of rules that our tenses or
matrices need to fulfill. So, come up
here. Let's put in here. There are two
rules
our tenses
or matrices
need to fulfill if we're going to matrix
multiply them. Now, rule one, the inner
dimensions
must match.
And rule two, the resulting matrix has
the shape of the inner dimensions.
So knowing these two rules, I'm going to
set you a challenge is before the next
video, we're going to go through these
two rules and see how we can fix our
problem here. But considering these two
tenses, so we've got if I say rule one
is that the inner dimensions must match.
What are the inner dimensions here?
Hm. Maybe two and three. They don't
really match. So, how would we get this
the inner dimensions of these two
matrices to match? That's your
challenge. If you're not sure, that's
completely fine. But that's your
challenge before the next video. If you
can get it done, amazing work. If not,
we're going to go through that in the
next video. So, I'll see you there.
[Music]
In the last video, we kind of left off
on an error. And we said in this video
that we're going to fix it. So, let's do
that. Now, we tried to matrix multiply
two tenses of the same shape,
specifically X and Y. They're both of
the shape 32. We got this little error
here saying N0 is 32 and N1 is 32. So,
matrix size incompatible. So if we come
up here back to the rules we set
ourselves, the inner dimensions must
match and the resulting matrix has the
same shape of the inner dimensions.
H. Now, if you had a go at fixing this
and you got through it, excellent work.
You may want to skip over this lecture.
However, if you didn't make it, that's
completely fine. Matrix multiplication
takes a little while to get your head
around. If we come over here, I'm going
to slide. Matrix multiplication is also
known as the dotproduct. So when we call
the code TF domat mole and we have a
matrix of some size here and a matrix of
some size here. Remember a lot of the
operations that we do can be on matrices
or tenses of an arbitrary size. But just
for illustration purposes I've done this
size here. So here's our first rule. So
numbers on the inside must match. And
there's our second rule. New size is the
same as outside numbers. Okay. So if we
come back, does that line up with what
we've said here? The inner dimensions
must match. The resulting matrix has the
same shape of the inner dimensions. Oh
no, of the outer dimensions. There we
go. Getting our own rules incorrect. So
we come back. Let's have a look at
what's going on here. We've got TF map.
We've got one matrix or tensor here. A B
CDE E F G HI, which is the shape of 3x3.
Wonderful. And this will run because the
numbers on the inside match. See these
two threes here with the red background.
And this is of shape 32. So 1 2 3 and
two columns. Okay. So that matches. And
then we're going to result in here. If
we did the dot product, which is the
same as matrix multiply in TensorFlow,
we have A * J plus B * L + C * N. And
then we we follow through with that rule
for every single cell in these two or
every single element in these two
tenses. And now the new size is 3x two.
One row, two row, three row, two
columns, which is the blue numbers here.
Okay, wonderful. And then if we swap
these elements from letters to actual
numbers, so the same shapes here, we
take 5 * 4. Okay, this is the same as a
* j. Yes. 0 * 6. Okay. 0 * 6. Yes. Which
is the same as B * L. Wonderful. And
then we have 3 * 8. Mhm. Which is the
same as C * N. All right. Now, again,
looking at these diagrams, they're quite
for the first I would say 20, 30, 50
times you look at dock products and
matrix multiplications, it takes a lot
of repetition to get used to. But then
if we have 5 * 4 = 20 + 0 * 6 = 0 + 3 *
8 = 24. So we add these three up, we get
44 in the top left. Repeat the same for
each element and we get this resulting
matrix here. Now the live demo we've
already had a look at matrix
multiplication.xyz. This is what's
happening here. We reset and multiply.
We take that we flip it on the top. Then
we go here, we get 20 and we get the
next two elements. The next two and the
next one. So this is a moving diagram of
what's going on here. So let's go back
to our code. Let's fix it up. Let's make
it work. So knowing our two rules,
number one, that the inner dimensions
must match and that the resulting matrix
is the same shape as the outer
dimensions. How might we get this to
work?
Well, we're either going to have to
change the shape of this matrix or
change the shape of this matrix. Again,
I'm using the term matrix and tensor
interchangeably here. Or we're going to
just have to create an entirely new
tensor with the same shapes on the inner
axises. So, let's see it in action.
Let's change the shape of Y. So we can
change a shape using TF dot reshape Y
and then we enter the shape that we'd
like. In our case it is currently 32. So
maybe we change it to 23
and let's see what happens.
Ooh, wonderful. Let's see the original
Y.
Beautiful. So we can see that the nine
has come up to here in this one and
we've now got 10 11 12. So this has got
two rows and three columns. Okay. So
maybe we go try to multiply X by
reshaped Y. So X and we might have to
put matrix multiplying here so we know
what type of multiplication we're doing.
X TF reshape Y and then shape 2 three.
Now you see what's happening here. So if
we remind ourselves what shape X is.
X is shape 32. Let's pay attention only
to the shapes. TF.shape
Y shape equals 23
dot shape.
Paying attention to our rule that the
shapes of the inner dimensions, so this
tensor multiplied by this tensor, the
inner dimensions must match. Does that
rule get fulfilled?
To me, it does because we've got an
inner dimension of two here and an inner
dimension of two here. Let's test it
out. Beautiful.
It looks like that worked. So we come
here. We want to make a new cell.
What if we tried to use TF domat on
this? So if we go TF mat X TF do reshape
and then we want Yshape equals 2 3. Is
this going to work as well? Beautiful.
Now we get the same outputs there. Now
hold on. Now do you think if we were to
reverse this it would work as well. Now
I mean we reshape X to be a different
shape instead of reshaping Y. If in
doubt code it out. So X TF reshape X.
We're going to change that to be 2 three
this time.
And we're going to not change the shape
of Y. Let's see what happens.
Ah,
what happened there?
So, let's see.
We'll just put a little note here. Try
change the shape of X instead of Y.
So, you see here how this one has the
shape 33, but this one has the shape 2
two. That's because in this operation
we multiply two tenses with the shape 3
2 and 2 3. So the resulting tensor
finishes with the dimensions of the
outer dimensions. Whereas in this case
if we go tfreshape
x um shape= 2 3 y dot shape we also want
the shape here. This is our other rule
of matrix multiplication.
Because we reshaped X, the resulting
matrix here ends up in a shape of 2 two
because it becomes the outer dimensions.
So see here, back to our rule, numbers
on the inside must smash. New size is
the same as outside numbers. Okay? So in
both these cases depending on which
tensor we reshaped the matrix
multiplication works but the resulting
output is different. Now that's a very
important point when you're dealing with
matrix multiplication or dotproducts it
really depends on which tensor you
manipulate in terms of what your output
would be. So see here how different
these two outputs are.
Again we're multiplying the same numbers
but just in different shapes. So that's
an important concept to be aware of. And
now we can do the same thing with
reshape as with transpose. So let's have
a look. This is another important tensor
transformation.
So we can do the same with transpose.
However, as you'll see in a second,
transpose is slightly different to what
reshape is. So transpose and if we do tf
reshape
x shape equals 2 3.
So this is the transpose tensor that the
same shape as reshaping it. However,
you'll notice with the transpose is that
well let's get x the first one as well.
It starts off as 1 2 3 4 5 6. Then the
transpose is 1 35.
Hm. The odd ones are up the first row.
And then 2 4 6. And then the reshape is
1 2 3 4 5 6. Hm.
So the difference between transpose and
reshape is that transpose flips the
axises whereas reshape just shuffles the
tensor around into the shape that you
want. So if we go here,
tensorflow
transpose
tfranspose here we go transpose a where
a is a tensor. Wow, that's a great
description. Uh permutes the dimensions
according to the value of perm h the
returns tenses dimension i will
correspond to the input dimension perm
i. If perm is not given it is set to n
minus one to z where n is the rank of
the input tensor. Hence by default this
operation performs a regular matrix
transpose on 2D input tenses. Okay. So
we see here 1 2 3 4 5 6 transpose. Now
the shape is 32.
Again we could try a bunch of different
examples but let's go back and write
some different code here. So what we
might want to do is because transpose
also changes the shape of our X tensor.
Let's try try matrix multiplication with
transpose rather than reshape.
So TF map TF transpose X and Y. What
might be the output here? Let's find
out.
H 89 98 116 128. Now see how that is
different here. Now that is because
transpose flips the axises rather than
shuffles around the elements of a
tensor. This can be quite confusing.
However, this kind of data manipulation
is a reminder that you're going to spend
a lot of your time in machine learning
and neural networks reshaping your data
in the form of tenses to both prepare it
to be used with various operations such
as feeding into the model and once you
get it out of the model to be able to
deduce patterns from it and convert it
into something human understandable. Now
again, the numbers that we're dealing
with are just basically toy numbers. to
see an example of how matrix
multiplication is actually used. This is
a great example. This may seem an odd
and complicated way of multiplying, but
it is necessary. I give you a real life
example to illustrate why we multiply
matrices this way. So, example, the
local shop sells three types of pies.
And then we have details here. And if
you go through this, this is actually a
dot product to work out the value of
sales for any given day. Now, this is
just a small example here, but in the
case of neural networks, they're
actually going to go through an absolute
multitude of these different numerical
transformations to find patterns in our
data. The good news is most of the time
is that once we've set up our neural
network code, once we've gotten our
tenses into the right shape, a lot of
these are done behind the scenes for us.
So with that being said, let's uh finish
this video here. And in the next video,
we'll go through a little bit more.
We'll really cement down or we'll get
some more practice with transposing and
reshaping different matrices and
practicing matrix multiplication.
So I'll see you there.
[Music]
We've had a little bit of hands-on
practice with matrix multiplication and
transposing and reshaping different
tenses. We're going to do one more video
on it to really nail it down. So, let's
call this one the dot product. And we
want to turn that into a markdown cell.
Beautiful. So, let's go here. Matrix
multiplication is also referred to as
the dot product. And now you can perform
matrix
multiplication using we want TF dom
matrix which is short for matrix
multiplication or we can do it with TF
dot tensor dot. We haven't seen this one
yet but they essentially do the same
thing just with different parameters.
So we come here. We've also gotten
hands- on with matrix multiplication
using matrix multiplication.xyz.
And we've seen this colorful example of
the dot product. But let's once again
get a little bit more hands-on with some
code. So let's see how we might use
tensor dot. Again, I'm typing I'm
spelling tensor wrong. I'm going to do
that throughout this whole course,
aren't I? There we go. So perform the
dotproduct on X and Y. Now this requires
X or Y to be transposed. We've also seen
how transposing
results in different outputs than
reshaping. And transposing is flipping
the axis whereas reshaping is just
reshuffling. So if we come here and if
we reset this, I want you to have a
guess at what type of operation this is.
Is this a reshape if I go like that and
flip it or is that a transpose?
I'll give you a hint. What did we do? We
had to flip the axes to get it up like
that. So in that case, it's going to be
a transpose.
Now let's use TF tensor dot. And
actually before we do that, let's remind
ourselves of what X and Y are.
X is a tensor of shape three and two.
And y is a tensor of shape 32. They have
increasing elements from 1 through to
12. Beautiful. Now if we go tf transpose
x and y. Oh, I've typed y as a mini
case. There we go. Tensor dot. Oh,
missing argument axises. So we want to
type in on the first axis.
Beautiful. 89 98 116 128. Now if we look
up TF tensor dot,
what is this going to tell us?
Tensor contraction of A and B along
specified axises and outer product.
Tensor dot also known as tensor
contraction sums the product of elements
from A and B over the indices specified
by AI accesses and B axises. Now again
you can read through this if you'd
really like to or you can practice
coding it. I would suggest doing both.
Now in this case you might notice that
we could use transpose or reshape.
However, we get different results. Now
let's try matrix multiplication by
transposing Y and reshaping Y. We've
done it with X. Let's try it with Y. So
perform matrix multiplication. There's
going to be a lot of repetition when
manipulating tenses because we want to
get practice. Y transposed. So TF Matt
mole let's take X and we'll transpose Y
transpose. There we go. Now perform
matrix multiplication
between X and Y reshaped.
TF domatt mall. You might be wondering
why I'm just continually showing you
different ways of manipulating matrices.
It's because you're going to spend a lot
and a lot of your time reshaping tenses
and matrices into the shape that you
want them. So, as if we can get as much
hands-on experience as we can, it might
not mean too much to us now because
we're just working with toy data, it
means later on down the track when we
have to reshape and manipulate our
tenses and matrices, is that we've got
all of this practice under our belt.
Wonderful. So we see we get different
results from transpose and reshape. Now
to really demonstrate the fact here or
the fact that calling reshape and
transpose on Y don't necessarily result
in the same values is that we're going
to go through the same steps we did up
here before for X. But let's make it a
little bit more pretty. So check the
values of Y. This is how I really try to
understand things. And transposed Y. So
if I am writing some code as I've said
before and I'm I'm wondering why I'm
getting different outputs of my tenses
and the calculations that I'm making
aren't really sort of making sense in my
head. By the way, this slash N is for
new line.
As I've said, one of the most common
errors you're going to get in writing
neural network code is missshaped
tenses. And even more like sort of just
as common but deceivingly or deceiving
error that is is when your tenses line
up with shape but the outputs that
you're getting. So you get no error
message. It just works. So it's
important then to be able to investigate
what's going on and figure out that's
called a silent error is that when your
code works like no error gets outputed
from the code itself but the results
just clearly aren't correct. So
this is the sort of thing that I will do
to investigate those silent errors
tfranspose y that's what we want to do.
So I'll create some sort of print
statement like this. And then I'll use
this to really get my head around what's
going on. So this is our normal Y. 7 8 9
10 11 12. Beautiful. Now this is
reshaped again. Just reshuffled. We
bring the 9 up here. We move the 10
across. The 11 comes up there. The 12
goes on there. So 7 8 9 10 11 12.
Beautiful. Now this is the transposed.
So all we've done here is just flip the
axises. Let's recreate this one. Nice
and simple. So 7 8 9 10 11 12 over here.
We'll just let that run through. This is
a great way to just visualize. Let me
reset this.
7 8
9 10 11 12. Okay, we come back to this
tab. You see how we've just got the same
tensor here. 7 8 9 10 11 12. This is the
same over here. Beautiful. Now watch.
This is a transpose. We're going to
bring this up. Flip it. Boom. That's
what has happened here.
You see? Except this one is upside down
in this case. So, it just displays it
slightly differently. But the shape and
where the values end up in this case is
what's going to happen if we were to
write say some code like this. So, if we
go tf. mole xtf transpose
y.
Beautiful. There we go. So, this is a
sort of investigative work that we do.
We've had a fair bit of practice now
transposing and reshaped. If you're
still unclear what the difference is
between transpose and reshape, or you're
wondering which one should I use? Well,
most of the time these operations will
be done behind the scenes for you. So if
you write neural network code, it's
going to do a lot of matrix
multiplication behind the scenes to
figure out different patterns in
numbers. What sort of patterns are they?
Well, remember here's a simple example
of the kind of patterns you can work out
with matrix multiplication. But again,
it'll be different depending what
problem you're working on. But generally
whenever you're performing a matrix
multiplication on two different tenses
and the shapes of the matrices or tenses
don't line up, you will transpose one of
the tenses rather than reshaping them.
So I'll write that here. Generally when
performing matrix multiplication on two
tenses
and one of the axises
doesn't line up, you will transpose
rather than reshape one of the tenses
to satisfy
satisfy. Yeah, satisfy the matrix
multiplication
rules.
All righty. So again, if there's
anything that here that doesn't quite
stick out or make sense, I would
definitely encourage you to create a
whole bunch of different tenses, try to
use TF maple on them. Try to use tensor
dot if you want or try to even use the
at operator between them. Generally, if
we're using TensorFlow, we're going to
be using maple,
but uh get some errors, fix the shapes
with transpose and reshape and see what
kind of different outputs you get. So,
we've covered matrix multiplication.
Now, let's have a look at if we've got
the default data type is often int32.
So, sometimes depending on what data
we're working with, the data type might
not be int32. So, what if we had to
change the data type of a tensor?
That's what we'll look at in the next
video. I'll see you there.
[Music]
All right. So, as we said in a previous
video, the default data type of most
tenses will be end 32 depending on how
they've been created. However, sometimes
you'll want to change the data type of
your tensor. So, let's see how we'll do
that. So, we can create a new tensor
with the default data type which is
float 32. Yeah, let's start off with
float. Now, I just said that the default
is is int32.
But if we go
the default data type will actually
depend on what data is inside your
tensor. So if there are floats inside
your tensor, we will have B. DT type
TF float 32. Now if we create C equals
TF constant. Now in here we want 710 and
this is going to be C D type
in 32. Wonderful. So in the current
version of TensorFlow that I'm using
which is uh TF I believe it's 2.3
version
2.3.0. You may have a slightly different
version depending on when you're
watching this video. Uh at this time if
we create a tensor with floats inside it
they're going to be of float 32 type.
And if we create a tensor with integers
inside it they're going to be of
tf.int32 type. Now let's say if we
wanted to change from float 32 to float
16. Now this is called reduced
precision.
Let's go 32bit precision.
What does this mean?
So I'm reading a book about OpenGLES
development.
A float in Java has 32 bits of precision
while a bite has 8 bits of precision.
This might seem like an obvious point to
make, but there are four bytes in every
float. Okay, so that's in Java. That's
not what we want. So 32bit precision
tensor verse 16 bit precision tensor.
Mix precision. Here we go. TensorFlow.
Why did we ever doubt the TensorFlow
documentation for having something?
Mixed precision is the use of both 16
and 32-bit floatingoint types in a model
during training to make it run faster
and use less memory. Okay. By keeping
certain parts of the model in 32-bit
types for numeric stability, the model
will have a lower step time and train
equally as well in terms of evaluation
metrics such as accuracy. Using this
guide, this API can improve performance
by more than three times on modern GPUs
and 60% on TPUs.
Today, most models use float 32D type,
which takes 32 bits of memory. There we
go. However, there are two lower
precision DT types, float 16 and Bflat
16, each which takes 16 bits of memory
instead. Beautiful. Modern accelerators
can run operations faster in the 16 bit
D types. Okay, so the main takeaway from
this is that in TensorFlow the default
is 32bit precision, right? However, we
can also do with modern pieces of
hardware, we can also do um float 16
which takes 16 bits of memory instead.
and modern accelerators which are
talking about hardware accelerators. So
when we run our code on hardware can run
operations faster in the 16 bit DT types
that is what's exciting. So in our case
if we wanted to change our numbers from
32-bit precision in other words storing
this tensor in 32 bits on memory we
wanted to change it to float 16. How
might we do that? Well let's have a
look. We can use tf.cast cast and then
we're going to pass it our original
tensor and then its new D type is going
to be TF float 16.
Oh, and then we want to do B. Find out
B.
Wonderful. DT type equals float 16. Uh
the original B. What we might do is type
out B and then B. DT type. So we'll
reinstantiate B up here. Or maybe it's
better just to go D here.
Yeah, that's a better idea, Daniel.
Wonderful. Look at that. We got float 32
and now D is the same as B, but it's
been stored as float 16. So again, might
not matter too much for tensors of only
two elements. But if we had a tensor of
a million elements and we reduced the
floating point size from 32 to 16, we've
basically haveved the amount of space
our tensor is taking up on memory
allowing a hardware accelerator to make
calculations on it potentially twice as
fast. Now again that scale may be
different because there's a lot of
different things that go into uh
computing scale and speeds and whatnot.
But that's that's the concept I want you
to imagine is that although these
numbers here take up less space on
memory, they may be able to compute
faster. And the same thing goes for if
we wanted to change from int32 to float
32, we can do it in a very similar
manner. So let's go change from int 32
to float 32. So we can go E equals TF
cast.
Which one do we make? C D type equals TF
float 32. Now let's have a look at E.
Beautiful. So C was originally
there we go. DT type equals int32 and
710. We just changed it using TF cast.
So E is now take C and turn it into a
float 32. Beautiful. And now we can take
E
float 16 equals TF cast E DT type equals
TF float 16
E float 16.
Wonderful. So there are many different
data types in TensorFlow. So if you ever
come across the the need to change your
data type of a tensor because some
calculations aren't working correctly,
TF cast is going to be your friend. So
with that being said, let's have a look
in the next video. We're going to go
aggregating.
We'll see what that means. So have a
play around with this one. Create some
tenses of float type of integer type and
then change them from you can go from
float 32 to float 16 or you could go
from int 32 to float 32. See how many
different combinations of uh of data
type you can create and change before we
see how we can aggregate our tenses.
[Music]
Welcome back. Now, after the last video,
I took another little break. And so,
what you see here is when the cell
numbers are empty, I just want to show
you this. This is rather than going
straight onto the next video in
practice, I just want to show you how I
actually work with Google Collab
Notebooks every day. is that when you
leave it for a while, all your
information in Collab is going to shut
down so that Google's compute resources
can be allocated to other people who
want to use Collab. So you see here, I
haven't got a connection. But if I
decide to run a single cell, shift, and
enter, it's going to allocate some
memory to me. Basically, a computer in
Google's warehouse full of computers
somewhere around the world. And once
it's ready to go, boom, we're connected.
However, if I try to run this cell above
here,
TF is not defined. So, I haven't
imported TensorFlow. So, what I often do
is we saw this in a previous video, but
it'll work better now because uh see
here we've got zero one. These are the
first cells I've tried to run in this
notebook since taking a break. Let me do
if I wanted to run,
I could do uh restart and run all. So,
that would run all the cells in this
notebook. Or I could just do run before,
which is what I usually do. It's going
to try and run all of the cells before
whichever current cell that you're on.
But boom, we get our type error right
back up the top here where we created
our changeable tensor with Tth variable
and we deliberately caused an error. So
if that happens, it's your errors are
not going to fix themselves. But then we
can just go back to here and then just
we should be able to run all of our
cells. I'm just continually pressing
shift and enter.
Should be pretty quick cuz we're not
processing much data here. But we'll go
right down to where we were aggregating
tenses. Oh, we got another error. That's
all right.
We'll keep going.
Oh, we've covered a fair bit of ground
here.
Are we at? Yes, here we are. So, we're
back to where we were. So, aggregating
tenses. Now, let's have a look. What
does aggregation mean?
Define aggregation.
A group, body or mass composed of many
distinct parts or individual, a galaxy
is an aggregation of stars and gas. The
collecting of units or parts into a mass
or whole. Hm. The condition of being so
collected.
Well,
that doesn't really help us with
understanding tenses, but we're going to
see hands-on what aggregation means in
the in the concept of or in the topic of
tenses. So, to me, aggregating
tenses equals condensing them from
multiple values down to a smaller amount
of values. That's how I conceptually
understand aggregation. You can create
your own definition. And in fact, I
actually want you to start doing that
more more and more with as you get more
familiar with the world of deep learning
is to of course there's going to be uh
agreed upon definitions in the field.
But I also want you to start
conceptualizing your own definitions in
a way that you best understand things.
But let's see how we might start
aggregating our tenses. The first way
we're going to do is by getting
the absolute values.
So this is actually probably not the
best form of aggregation to start with,
but we're going to include it here
anyway just for completeness. So we
create a tensor. What were we up to? E,
we might recreate D here. TF constant
just because uh Daniel begins with D and
it's uh aside from the number seven, D
is probably my favorite. Sorry, number
seven is probably my favorite number and
D is probably my favorite letter. So,
fun fact about me. There we go. Get the
absolute values. Beautiful.
So, oh sorry, we just created a new
tensor. And if we wanted to get the
absolute values,
we can go tf.absd. ABS D
and we'll see what this does. Beautiful.
So the absolute values are basically
just take all the negative numbers in
one tensor and turn them into positive
numbers. So if we see here, let's check
out the dock string. I'm pressing shift
command space to get this dock string.
Computes the absolute value of a tensor.
Given a tensor of integral or
floatingoint values, this operation
returns a tensor of the same type where
each element returns the absolute value
of the corresponding element in the
input. Beautiful. Such a formal
definition. And there we go. We've got
some example use cases. So let's
actually start to see a few more
versions of aggregation. Let's go
through the following forms of
aggregation.
We'll turn that into a markdown cell.
And we'll come down here. The first one
I want to go through is get the minimum.
And then we might want to get the
maximum of a certain tensor. We also
want to we might want to get get the
mean of a tensor.
And we also might want to get the sum of
a tensor. Now, we've I've shown you a
few different ways of how you can
explore different methods in TensorFlow.
So if you wanted to start with a tensor
and find the minimum, the maximum, the
mean or the sum, how might you research?
So if we go here, find the mean of a
tensor in TensorFlow. Here's a little
hint. If you were to search that, you
can pause the video now, skip and go
ahead and try to create your own tensor.
Do these four things and then come back
and and we'll compare notes. All right?
But otherwise, if you're going to watch
on, let's start by creating
a random tensor so that we can practice
each of these four things here. Create a
random tensor. We might uh do it create
a larger tensor. Yeah, how about we do
that with values
between zero and 100 of how long should
we do it? Of size 50. Let's do that. So
E= TF constant. We'll do it using a
numpy
random array. So again, this is going to
just be return random integers from low
inclusive value to high exclusive value.
So we can just go 0 to 100 size equals
50. Now this should give us a tensor.
Beautiful. 50 values between 0 and 100.
And we might go e how do we check the
size of our tensor? Let's practice. And
then how do we check the shape of our
tensors? And then how do we check how
many number of dimensions our tensor
has?
Beautiful. So we get size is 50.
Beautiful. Uh the shape is also 50 and
the number of dimensions is one.
All right. Now, how about we start with
finding the minimum? So we're going to
go find the minimum. To find the minimum
of a tensor, you can do reduce min, pass
it a tensor,
put the output there. Now this is going
to be a little bit of a trend with
TensorFlow aggregation is that the
reduce underscore in some like numpy if
we did MP min I believe we can just go
zero but TensorFlow tends to put reduce
min in front of its aggregation methods
here so we'll get rid of that one. So
knowing that this is the way to find the
minimum, how do you think we might find
the maximum?
Let's find it anyway. Find the maximum.
TF reduce max.
Wonderful. So there's 97. So we Yeah,
that makes sense because we have values
between 0 and 100. So our lowest value
so far is zero and the highest value is
97.
And find the mean.
Let's go TF reduce mean of E
and wonderful. So 48 is about the
average. So get the sum of a tensor. Hm.
Let's have a guess. Find the sum. TF
reduce
sum of E.
And there is the sum of our tensor.
Beautiful. Now, there's a few other
things that we can do such as I'm going
to put a little challenge here. I want
you of our e tensor before the next
video. I'm going to put an a little
emoji here. Exercise
is to
with what we've just learned,
find the variance
and standard deviation
of our
E tensor using TensorFlow methods. So,
if you're not sure what variance and
standard deviation are, this challenge
has two sides to it. First, you have to
find the code to find the variance of
our E tensor, and then you have to look
up what variance and standard deviation
are. So, go and give that a try and I'll
see you in the next video.
[Music]
We finished up the last video with a
little challenge. So, did you figure it
out? Did you find the variance, the
standard deviation of our e tensor using
TensorFlow methods? I hope you gave it a
try. But if not, we're going to check it
out here. So, find the variance of our
tensor.
Now, as you might have guessed, we can
do tf reduce var e
Oh, has no method reduce var. Is it
variance?
Has no method reduce variance? Ah,
TensorFlow variance.
I typed TensorFlow wrong.
Ah,
okay. We can do the reduced standard
deviation.
Have I led you on a wild goose chase?
Yes, I totally meant for all this to
happen. TFP. So, what we might need to
do is What's TFP? This is what I like to
do is just figure out go down the rabbit
hole.
TFP TensorFlow probability. Okay,
how do we import it?
Can we do
import TensorFlow
probability as TFP?
There we go. And then we might do
TFP.stats
dot
reduce variance of E.
What was that method again? We come back
here.
This is the type of research I want you
to start practicing. Okay. the
tfp.stats.variance.
There we go.
Wonderful. Okay. So, there's the
variance there.
I'm going to put here won't work.
And then if we wanted to find the
standard deviation.
So, let's put a little note here.
Actually to find the variance
of our tensor we need access to
tensorflow
probability
and then we go here find the standard
deviation.
See I thought that you would be able to
just do reduce variance.
So I've learned something there too.
Apparently we need tensorflow
probability to find the variance. Now I
believe we can find the standard
deviation here.
Reduce STD. Ah,
what's happened here? TF math reduced
STD.
Does it not have an alias?
H.
What if we do TF domath reduce std?
Must be either real or complex. So if we
come here
input tensor what's wrong with our
tensor reduces input tensor along the
given axis unless keep dims is true the
rank of the tensor is reduced by one
input tensor the tensor to reduce should
have real or complex type does it have
to be a float
h this is a good little challenge I'm
glad I issued this exercise we can work
this out together. So where's our here
dype int 64? You know what I think this
method might need is if we cast our e
tensor
as d typef
int32.
This might work.
Int32
input must be real or complex. Maybe
float 32.
There we go. Okay. So, this is what I
mean.
Things may not be as they appear when
you go through the TensorFlow library.
So, up here we've got a few relatively
easy methods compared to what we had to
do here. But this has been a great
practice as to how you might go about
researching and trying things out to
make something work. So if you ever get
that error that we got just before,
let's write this again. So TFMath.reduce
std
e
for our tensor. So type error. If you
ever get the type error message, come to
the documentation, which is what we did
here.
Right? So there's TF, math, reduce, std,
and
check out the types that they're using
in the example workflows. So see here,
the reason why I picked up that we might
need to change our tensor to data type
float 32
is because they're using floats in their
input sample here.
So there we go. Returns the reduced
tensor of the same DT type as input
tensor. note for complex or complex 128
input. The return tensor will be of type
float 32 or float 64 respectively. So
usually a lot of the time when you're
working with tensorflow, float 32 is
generally sort of the standard type
throughout the entire library. So if you
need to change the type of your tenses,
chances are it might have to be in float
32. But that was a good practice in
exploration. Now, in the next video,
let's see how we're going to find the
positional,
maximum, and minimum. You might want to
try this out now that you've had your
expert problem solving skills upgraded
in this video. See how you might find
the positional maximum and minimum of a
tensor. I'll see you in the next video.
[Music]
So, I was all ready for this video to
jump straight into finding the
positional maximum and minimum. However,
after a little bit more exploration of
the TensorFlow documentation, which can
be a bit of a beast to tame, I found
that we didn't actually have to jump
through some of the hoops we did in the
last video to find the variance.
So, look at this. I thought that we
might just be able to go TF domath,
sorry, TF.
It turns out we can find the variance of
our tensor in a very similar way than
what we've done here finding the
standard deviation, which is strange
because with these methods, you can get
rid of the math, but with the variance
in standard deviation, you require the
math. And so, we've both learned
something here. So I want to just show
you here. If we want to go find the
variance of our E tensor,
we go TFMath.reduce
variance E just exactly like we've done
here in the the documentation. And if
you're ever stuck, you can just go
through the documentation here like TF
do something and just read up all the
different methods. I mean, we're working
with TF.mmath. So, you might want to
even try a fair few of these just to see
what happens with what you're working
on. But anyway, let's go back here.
We'll find a variance.
Input must be real or complex. How did
we fix that? We go TF.Cars.
Let's change this dype to TF float 32.
And oh, we forgot a little bracket on
the end. Wonderful. we get a very
similar number up here to our variance
that we found using the TensorFlow
probability. So, turns out we actually
don't need access to TensorFlow
probability. However, what we just went
through was a great exercise in figuring
things out. I mean, at first you try
something and it might not work and then
you'll you'll find a fix to it. Then
you'll find that, hey, the fix probably
wasn't even the best way of doing it. We
did a little bit more research and we
found we wanted to find the variance
without importing any other libraries
aside from TensorFlow. We can just do it
like this. But anyway, let's get into
finding the positional maximum and
minimum. Now, when might this be
helpful? Well, you're going to see this
a lot when your neural network outputs
prediction probabilities, which we
haven't seen yet. But if we go here,
so remember our little example where we
got our images, we input it into some
numerical encoding, goes through our
neural network, finds the patterns in
there, adjust its random weights that it
initializes with, it updates its
representation outputs, and then here
often times the representation outputs
are referred to as prediction
probabilities. So in this case 983 is
the highest number 0.04
0.013.
So imagine if above here if this column
was say ramen and this column was say
spaghetti and this column was say not
ramen or spaghetti. So in this case this
number the highest output this is our
ramen column would be dedicated to ramen
because the first index here the zeroth
index the highest number goes to ramen.
And then if we wanted to turn this row
into labels we might go ramen spaghetti
uh not ramen or spaghetti. Which one has
the highest value? The index where
spaghetti occurs. And then for this one
which is no food. So ramen, spaghetti,
no food or spaghetti. This is the
highest number. So we might want to get
this index. So that's what I mean by
finding the positional maximum minimum.
At which index
of the tensor or of this row does the
maximum value occur? So in this row it
occurs at the zeroth index. In this row
it occurs at the first index. And in
this row it occurs at the second index.
But let's have a look how we might do
that in code.
So the first thing we're going to do is
create a new tensor. So create a new
tensor
for finding positional
minimum and maximum.
We go here. F= TF random.
We'll create a random tensor with
uniform. This time we'll give it a shape
of we'll get it 50 long. Let's have a
look.
Beautiful. Now I wonder if we go here TF
random set seed equals 42 do we get the
same numbers? So this one is you are you
help me remember what's the this first
number 664
5621.
Okay, beautiful. So if you use random
seed 42, you should get the same numbers
as me. Don't worry too much if you don't
if something's changed between now and
when you watch this video. But we just
need a tensor with 50 random numbers and
we can work with this. So if we wanted
to find the maximum element position of
f meaning at which index does the
maximum value occur? So I have a I'm
going to guess let's have a quick look.
There's 948.
Can we beat that? Yes, we can. There's
967. So that'll be about
I'm not even maybe 43. Let's try um TF.
So find the positional maximum. Now the
positional maximum is often referred to
as arg max. So I believe it's the same
in numpy. MP. Max.
If we wanted to find the dock string
here, returns the indices of the maximum
values along an axis. Same with argax.
returns the index with the largest value
across axes of a tensor. So let's do it.
Let's put in our
42. Oh, so we were close.
Now if we were to index
index on our
largest value position. So what I mean
by this is if we grab our f tensor and
then we find arg max f what value are we
going to get
boom it's that value there and then if
we go find the max value of f and tf dot
reduce max f do all these values line up
wonderful so this should now Let's check
for equality just to be safe. If we go
argmax tf argmax
f
does this equal we might put a little
assert here cuz this will error if they
don't equal
no error. And then if we take away this
assert we should get true. Beautiful. So
this is we're indexing.
Does this make sense that they are
equal? Because look, we found the
positional maximum which gives us
position at index 42 which is this one.
That's the maximum value of this tensor.
And then we can index on our largest
value position. So it's saying hey take
the f tensor and grab position 42 or
grab the value from position 42 which is
this. Yes. Now, if we use just our
reduce max method, which is saying, hey,
find me the max value of f, we get this
value, and then we check to see if
they're equal. Beautiful. Now, another
one you might use, I don't use this one
as often, is finding the the minimum.
So, wherever the positional minimum
occurs in here, can we find it? There's
0.03. That's pretty low. So, maybe it's
that one. Or there's 0.009.
So, I'm guessing that might be at uh
let's go 21
or 20. So, find the positional minimum.
We want to do TF. You might be able to
if arg max finds the maximum. What do
you think finds the minimum? If you
guessed arg min 100 points for you.
Boom. So position 16. Let's find out
what it is.
Find the minimum using the positional
minimum index. So f tf arg min
0.009. So that was actually 16. There we
go. So that's index 16. Beautiful. All
righty then. So what we might do in the
next video is we can see how we can
squeeze our tensor. In other words,
removing all single dimensions. So have
a go at this. Maybe create your own
tensor. Um can be as big as you like and
practice around using the arg max and
arg min methods to see if you can find
the minimum maximum positional values of
your tensor. And I'll see you in the
next video.
[Music]
How'd you go? Did you create your own
tensor? Did you find the minimum and
maximum positions? I hope you did. But
for now, we are on to the next part,
which is squeezing a tensor. Oh, we're
going to give our tenses a big hug. In
other words, removing all single
dimensions. You're going to fall in love
with tensors by the end of this course
if you haven't already. Now, what we
might do is, as always, we need to
create a tensor to get started. Now, it
might go here. G. Are we up to G? Yeah.
E F G H I J K L M N O P. I won't do that
again. No promises, but I like to sing.
Now, if we go here, what have we got?
So, I'm creating a similar tensor to
what we did before. uh random uniform,
but this time we're going to use the
shape parameter to add a few single
dimensions right to the start.
So if we check G,
look at this. So we have just actually
let me uh use our faithful random seed
as 42.
Beautiful. We getting the same numbers
there. Yep. 6645 621. Maybe I'll maybe
I'll know that off by heart by the end
of this course. And so we have here the
shape is what we have to pay attention
to with squeezing. So this is shape 1 1
50. So you see here we've got a few
square brackets 1 2 3 4 and then the
innermost square brackets is where the
big dog 50 is sitting. Now, what we're
going to do with squeezing, it's
actually quite a quite a fun little
concept. So, if we go G-shape,
we get that. Now, if we were to go uh G
squeezed,
that could be our TensorFlow wrapper
name. G squeezed.
TF squeeze. And now,
actually, we'll check the dock string.
removes dimensions of size one from the
shape of a tensor. Now, if I were to run
this, what do you think G Squeeze new
shape is going to be?
I'll give you 3 2 1 to guess. And now
let's check.
Boom. So, there we go. So, what squeeze
does is we just read that before with
the dock string, but we'll read it
again. removes dimensions of size one
from the shape of a tensor. So if your
tensor has too many single dimensions
and you want to just reduce it, so get
rid of all these extra dimensions here
and reduce it back to its essence, you
can use the squeeze method.
So that's a little tidbit there. Nice
and quick video this one. And the next
one we're going to cover one hot
encoding. So I'll see you there.
[Music]
Okay. So if you wanted to one hot encode
actually one hot encoding tensors.
Now what is one hot encoding? Let's
check this out. What is one hot
encoding? If you've done any machine
learning before, you might have uh come
across one hot encoding. But it's a very
important concept in terms of uh
preparing data. This is a great website
by the way, machine learning mastery. So
let's check what this says there. Okay,
so this is a great example. We'll create
our own in a minute, but let's just have
a look here. So we've got three columns,
red, green, blue. So row two has a one
for red and zero for the rest. So that
is one hot encoded for red. Row three
has a one for green but zero for red and
blue. So that is a one hot encoding for
green. And row four uh has a zero for
red, a zero for green, and a one for
blue. So that's a one hot encode for
blue. So that's a great way to turn if
we just remember what do we have to do
when we're passing our data to neural
networks. We have to find a way to
numerically encode it. One hot encoding
is a form of numerical encoding. So if
we had the colors red, green, blue, and
we wanted to pass it to our neural
network, it would look at that and go,
you know what? I'm very smart. I can
find out patterns and numbers, but when
you pass me words, red, green, blue, I
can't understand them. So we could one
hot encode those words, red could be one
for red, zero for blue, zero for green,
and so on. And then pass those as a
tensor to our neural network. So let's
have a look how we might do that in
code. Um let's uh what we should do is
create a list of indices. So values
pretend this is let's get our sum list.
Sum list can be zero 1 2 3. In other
words, maybe we go could be red, green,
blue or purple.
And now if we wanted to one hot encode
this, we can go one hot encode.
Our list of indices
is TF do oneh hot
sum list.
We're missing one positional argument.
H. All right, let's look up the dock
string. What does this say? TF1 hot
returns a one hot tensor.
If on value is not provided, it will
default to a value of one with type D
type. Okay, where's the example?
Here we go. For example, indices equals
that depth equals 3. We need to pass a
depth parameter. Okay, so the output's
going to be 3 pi 3 because this is three
long and the depth is three. So let's
figure this out. Depth. If we have four
elements here, how many or how what I've
kind of given it away here. So pretend
you can't see that. Well, it's going to
be four. Let's try it out.
There we go. So, do you see what's
happened here? What kind of tensor we've
created? We've transformed this tensor
some list into one hot encoded version
of itself. So, for the first row, this
relates to the first element. So, it's
got a one as you see for where zero
occurs, but then zero for all of the
other values. And now for this one,
we've got this refers to this value
here. It's got zero. Yes. And a one
because this is a number one here. And
then so on for the other two values for
each rows. Beautiful. Now, a fun little
thing that we can do is that if we
wanted to change this from I mean,
you're rarely going to use this in
practice, but uh I want to show you
anyway because it's cool. If you wanted
to show uh get rid of rather than using
ones and zeros, we can use on value and
off value and pass them different
values. So, let's go specify custom
values for one hot encoding.
If we want to do TF one hot sum list
depth can be four and then on value can
be yo I love deep learning and then the
off value can be let's go what could we
do? I need something fun here. I also
like to dance. Boom. Let's see what this
comes out with. Oh, yo, I love deep
learning. I also like to dance. I also
like to dance. I also like to dance. I
also like to dance. Yo, I love deep
learning. I also like to dance. Did you
Did you know I like to dance? So, have a
practice with uh TF1 hot. Maybe create
some of your own on values, off values.
Again, rarely will you ever use them
when you're passing them to a neural
network because a neural network, what
does it love? It loves numbers. But
yeah, this is just a little fun exercise
to practice one hot encoding. Now if we
look up the TF.1hot
I believe it requires
so the indices yeah the indices have to
be
a tensor of indices. Okay well that
makes sense. So yeah go through the
example there create your own uh list of
indices and then one hot encode it.
practice around. See what happens if you
change the depth parameter and then
change the on value and the off value.
And I'll see you in the next video.
[Music]
Did you have some fun creating one hot
encoded tenses with different on and off
values? I hope you did. And I hope
you're loving deep learning. We actually
haven't covered too much deep learning
yet. We're still in the fundamentals,
but I do believe you you do like to
dance. So, by the end of this section,
you're going to be able to dance through
your way through TensorFlow code. And
speaking of which, let's do a little bit
more dancing through uh there are some
math functions. Just want to show you
how to use a few more of these that you
may come across. There's a lot here as
you see tf.mmath overview. We got a
whole bunch here, but let's let's
practice with a few more. We can never
get enough practice. We might go a few
common mathematical operations are
squaring, log, and square root. So let's
see how we'll do that.
First things first, let's create a new
tensor. So this time we might go H. I
believe we're up to. So TF, I'm going to
show you a new way to create a tensor.
1 to 10.
Nice and simple. Oh, let's see it. There
we go. Now if you want to square it, how
do you think we might do it?
TF square
H. Beautiful. So we get just the square
of all that. So 1 * 1 is 1, 2 * 2 is 4,
3 * 3 is 9, etc., etc. And then if we
wanted to find the square root, let's
find the square root. We come over here.
Maybe we go
square root.
Will this show us?
There we go. TF math square root.
Beautiful. Let's try that now. I believe
it may have
an alias. We can just skip the math
part. Oh, what do we got here? Invalid
argument error. Value for attribute t of
int32 is not on the list of allowed
values. Okay, this is good. So again,
we're coming across a value error where
it's saying that int32 is that the type
of our tensor? Yes, it is. Int32 is not
in the list of allowed values for this
function. So we want bflat, half float,
double complex 64, complex 128. Okay,
let's change it into a float.
TF cast
and the DT type is going to be TF float
32.
Oh, forgot again.
Boom. Look at that. There we go.
Actually, let me put this one above here
so that you'll know for future reference
is that this is going to be will error.
Method requires
non-int type. So if we put here
just h, we get an error. Wonderful. How
about we find the log? So we go here,
find the log. Now again, TensorFlow.
Oops. I want to go up here and search
for log.
TFM math log. Wonderful. Computes the
natural logarithm of X element wise. Has
it got an alias?
I'm pretty sure we can skip the math
part, but if not, we can try it out. So,
find the log TF log of H
has no attribute log. Oh, do we need
math for this one?
We do. Oh, again we get the same error
as above.
So, what do we have to do here? Value
for t of int 32 is not in the list of
allowed values. What do we have to do?
Let me give you a hint. We have to turn
it into a float or at least one of the
allowed values. Got some rules here. So,
we go here. tf cast h dt type equals
tf.flat
32.
Will this work?
Wonderful.
There we go. So that's just an example
again of how many different like
functions you're going to find in
TensorFlow. and a little bit of practice
again as I can't stress this enough is
how much practice we're going to need
sort of just making sure our tenses are
in the right data type. So have a play
around with some of the functions in
tf.mmath
and see how many of them pick three and
try them out the ones that we haven't
covered so far and see if you come up
against any errors like this what you
can do to fix them. So that's a little
challenge before the next video.
[Music]
We've covered a whole bunch so far.
We've even seen before how tenses can
interact with numpy arrays. But let's
have a a dedicated session on tenses and
numpy.
So if you've done any numerical
computing before or any type of data
science or machine learning, you've
probably used numpy.
So if we come here, what is numpy? Not
going to dig too much into this because
we want to get focused on using it. But
chances are, there we go. The
fundamental package for scientific
computing with Python. So NumPy is a
fundamental library for any type of
numerical computing with Python and it's
all built upon the NumPy array. So um
TensorFlow
interacts
beautifully with
numpy arrays and we can see an example
of that by if we create a tensor. This
is what we've seen before directly from
a numpy array. So tensorflow is built
upon the tensor. Numpai is built upon
the array. And the beautiful thing is is
that they basically have full
interoperability.
I believe that's a word. That's like a
fancy word for uh they work together
quite nicely. So there we go. We create
an array here. And then we pass it to
TF.stant to turn it into a TF.tensor.
Beautiful. And now we can convert it
back from a TF tensor to a numpy array.
So convert our tensor back to a numpy
array by just going np array
j.
And then we can check the type here by
going type um np array
j.
There we go. So np array j turns it into
a numpy array. And then when we check
the type, it turns into a numpy. ND
array, which is numpy's base type, just
like TensorFlow's base type is TF.
All right. And we can also do the same
thing as a cell above by convert
tensor J to a numpy array using if we go
J. Numpy, we've also seen the numpy
method in tensor before as well. numpy.
Um, and then if we check the type
j.numpy,
we go here.
Beautiful. So, we can convert it back to
a numpy array there. Now, where this
might be helpful is sometimes if you
have say if we have let's rewrite j to
be tf constant.
It's just a single number there. If we
wanted to have access it just as uh the
float on its own, we can actually do
numpy.
And there we go. That'll give us access
to it there. And if we want to go there.
So the beautiful thing about being able
to really easily convert our tenses uh
to numpy arrays and numpy arrays into
tenses is that if there's some sort of
functionality that we want to use with
that doesn't work with our data in
tensor types, we can convert it to a
numpy array and use it there. and vice
versa. So if we delete that, let's see.
H there's one more thing you should know
and that's uh default types of each
uh slightly different. So if we go we
want to create numpy j equals tf
constant. We'll create this directly
from a numpy array just if we as we've
done before. It's going to go np array
same as above 3
710.
Wonderful. And then if we want to go uh
tensorj
equals just tf constant we'll create
this one directly uh from a python list
37 10. Now if we want to go check the
data types of each. Now, we've had some
practice doing this, so you might be
able to know if we wanted to check the
data types of numpy j and tensor j. What
might might we use? If you guessed we
use the dtype attribute, you'd be 100%
correct. Tens of j. And if you didn't
guess correct, well, that's all right.
We get to practice.
Boom. So, there we go. So, if we create
uh a tensor from a numpy array, the
default type is float 64. Whereas if we
create a tensor from a Python list or
directly through TensorFlow, the default
type is going to be float 32. Now the
reason why this is important is because
remember we actually
had some errors up here where we had to
change the type of our tenses from a
certain data type to another data type.
So this is where I want you to be aware
is that if you do convert your numpy
arrays into tenses, they may have a
different data type to uh compared to if
you created your tenses directly from
TensorFlow. So just be aware of that.
One of the main issues you'll run into
when computing with different tenses is
different data type issues. So treat
this as your heads up. But that's a a
quick reiteration of tenses and numpy.
Remember, numpy is a fundamental package
for scientific computing with Python.
And chances are when you're doing
machine learning and deep learning,
you're going to run into numpy
somewhere. So keep in mind, TensorFlow
works beautifully with numpy arrays and
vice versa.
[Music]
In the last video, we checked out how
beautifully TensorFlow interacts with
NumPy. And I kind of just added a little
tidbit here. Remember the key emoji? If
I've picked up a little tidbit that you
should be aware of, I'm going to include
it here with this little emoji as a key.
Get it? There's a key there cuz it's a
key idea. Anyway, uh you might be
wondering why are we using differences
like numpy arrays and tenses look
basically the same when when they come
out. I mean, if we print one of our
tenses, we get the numpy version of it
at the end. Well, one of the main
differences between a tensorflow tensor
and a numpy array is that a tensorflow
tensor can be run on a GPU or TPU for
faster numerical processing. So
remember, if we look up a GPU, it's a
graphics processing unit. There we go. A
graphics processing unit is a
specialized electronic circuit designed
to rapidly manipulate and alter memory
to accelerate the creation of images in
a frame buffer intended to output to a
display device. Oh my goodness. In other
words, they're really fast at doing
numerical computing. And then we also
have a TPU, which is
a tensor processing unit. So a tensor
processing unit is an AI accelerator
application specific integrated circuit
ASICH developed by Google specifically
for neural network machine learning
particularly using Google's own
TensorFlow software which is what we're
getting familiar with right now. How
exciting. So if we come down here this
is what this lecture is all about. That
was kind of a long-winded intro to how
you can find access to GPUs.
Boom. So, if we wanted to use our
tensors on a GPU, how might we do that?
The whole time we've been connected to
Collab. And if we look here, it says
connected to Python 3, Google Compute
Engine backend RAM disk. Hm. I wonder if
how might we Let's go. TensorFlow
check
GPU usage.
Let's see this.
tf.est is GPU available. That's makes
sense. Returns where the TensorFlow can
access a GPU deprecated. Hm.
What is it? Where is it? Have we got
config here? What if we search um I
believe it's called list physical
devices.
Oh, what happened there? TF config.
There we go. List physical devices.
return a list of physical devices
visible to the host runtime. This is how
we're going to check what type of device
TensorFlow is running on. So, TFconig
list physical devices. We could type
that out, but I'm going to write that
in. TF.config
list physical devices.
Does that need is that a method?
What comes out here? Okay. Physical
device name CPU device type CPU. So
right now we're running on a CPU.
So this is a central processing unit.
Now this is also a computer chip very
fast at doing numerical computing but
not as fast as a uh as a tensor
processing unit or a GPU. I'll let you
do your research in terms of comparison
between CPU and GPU for numerical
processing. But just for now trust me uh
GPU and TPUs are much faster than CPUs.
And we'll see this uh when we start to
write neural network code. However, if
we wanted to, we can actually pass GPU
to this.
It says none. So, it's going to return a
list of the physical devices available.
And right now, we have no access.
So, you might have seen this in the
introduction to Collab video, but if
not, I'll show you how we can get free
access to a GPU, which is beautiful uh
on behalf of Google. So, if we go change
runtime type, now I'll just redo that
again. Runtime
change runtime type. Oh, what does this
little question mark say? More info. GPU
availability. Let me zoom in here. What
types of GPUs are available in Collab?
The types of GPUs that are available in
Collab vary over time. This is necessary
for Collab to provide access to these
resources for free. Ooh, that is
exciting. That means we get access to a
GPU for free. So, thank you very much to
Google. The GPUs available in Collab
often include Nvidia K80s, T4s, P4s, and
P100s. I'll let you check out which ones
are what the difference is between those
are. Essentially, the higher the number,
the faster the GPU. So, this one's the
fastest. T4 is faster than the K. These
are actually in order in terms of speed.
So, I believe K80 is still very fast
compared to a CPU, but not as fast as a
P100. There's no way to choose what type
of GPU you want, but anyway, you can
keep reading that. If I want to change
the hardware accelerator, it starts off
as none. So whenever you start a default
Collab notebook, you're going to get no
GPU. So Google can allocate I mean
compute power isn't free. So if we want
to use a hardware accelerator, we have
to choose. We can choose a TPU if we
want, which is even faster than GPU, but
that requires a little bit more setup.
For this course, we're going to be
focused on GPU. If you want to learn how
to use a TPU, there's a great TensorFlow
guide on that. I'll let you research
that in your own time. Anyway, we'll go
to GPU. We're going to click save. Then
our runtime is going to actually reset.
So, if we wanted to run this cell
because our runtime reset, we don't have
access to TensorFlow anymore. But that's
all right. So, if we go here, import
TensorFlow as TF, and run this.
There we go. It shows that now we have a
CPU and a GPU. That's what we're after.
So, if we go in here, just type in GPU.
Wonderful. And we can also check it
because it's using a Nvidia GPU, we can
use this little command here, which is
called bang, to basically say uh run
this uh on the command line. We're going
to type in Nvidia SMI. I'm not quite
sure what Smi stands for. If you want to
tell me in a in a message, that would be
phenomenal. But I just know that this is
the command to check what type of GPU
you're using. Wonderful. So, we've got
access to a Tesla T4 GPU with almost 16
GB of of memory, which is actually this
is actually quite a fast GPU. We're not
going to take full advantage of it just
yet. Um it's running CUDA version 10.1
which is basically we look up CUDA
basically a driver compute unified
device architecture which is a driver
that is a parallel computing platform
and application programming interface
API model created by Nvidia.
You can read the full Wikipedia article
if you want. What it does is it's kind
of the interface between the GPU and our
TensorFlow code that makes our
TensorFlow code run really really fast.
And so one last note, this is actually
probably going to be the last video of
this TensorFlow fundamentals section.
The note here and the important takeaway
is you might be wondering, okay, I've
I've set up access to a GPU. Now, how do
I get my TensorFlow code to run on it?
Well, the amazing thing is if you have
access to a CUDA
enabled GPU and in our case since we're
using Collab and we do have a CUDA
enabled GPU.
Well, guess what? TensorFlow will
automatically
use it
whenever
possible.
So we haven't actually done anything too
intensive in terms of uh finding
patterns in data using our neural
networks. But to find patterns in data
to to discover what type what's in an
image and create representation outputs
like this, our neural network algorithms
are going to have to do a lot of
mathematical calculations. And we're
going to be writing TensorFlow code to
do those calculations. And the beautiful
thing is because we have access to
Google Collab and a GPU, those
calculations are going to be sped up as
much as possible.
So let's have a quick just overview of
what we've covered. We started with
introduction to tenses. We learned a
whole bunch of different ways to create
tenses. We figured out how to shuffle
the elements of tenses. We we even made
tenses with other ways with TF1s, TF0.
We learned about numpy and numpy arrays
and tenses. We actually covered that
twice because it's such a fundamental
concept. Uh we figured out how to get
information from our tenses, index them,
change them, change the data type,
aggregate them, find the positional max
and minimum, squeeze them, one hot
encode them, square, log, and square
root. And again, how to make our
calculations extremely fast by having
free access to GPU with Google Collab.
Now, with that being said, we have a
whole bunch of different tabs here.
Although we've covered a whole bunch,
we've barely scratched the surface of
the TensorFlow library. I mean, you can
look at it here. Look at that. There's
there's there's too much here to cover
in a single course. So, there are well,
actually, the good news is we've covered
the fundamentals, the building blocks of
what you'll need to sort of start your
journey with TensorFlow. We've probably
actually done a little bit more than
that. But anyway, my uh there's going to
be a couple of exercises um in the
upcoming video as well as some extra
curriculum mainly being just getting
familiar with the TensorFlow
documentation and trying out a few
different things. So check out that that
lecture resource there. Go through back
through the material uh to solve those
exercises and I'll see you in the next
section. But congratulations for getting
familiar with TensorFlow. Give yourself
a pat on the back. You have now passed
the TensorFlow fundamentals section.
Woohoo.
[Music]
Welcome to neural network regression
with
TensorFlow. Now, we've seen some of the
basics of TensorFlow in the previous
section. Now, we're going to get
hands-on building some neural networks
with TensorFlow, specifically neural
networks for regression. Now, before we
get into things, I'm going to start this
lesson off with a slide called where can
you get help? Now, this is very
important because learning a new concept
is challenging, of course. So, that
being said, if you do get stuck, here
are the steps I want you to take. First
of all, follow along with the code. I'm
going to be writing it with you.
Remember our motto. If in doubt, run the
code. If I'm going too fast, that's all
right. Slow the video down, then try it
for yourself. If I'm going too slow, you
can speed the video up and beat me. If
we're writing TensorFlow code, don't
forget in Google Collab, you can press
shift command space to read the dock
string. That will get you a little bit
of information about any of the
functions that we're running. And then
if you're still stuck there, if the dock
string doesn't have that great of an
explanation, remember most dock strings
in TensorFlow have examples of different
code. So try it for yourself. But then
still stuck? Search for it. You're going
to become very familiar with these two
resources here, Stack Overflow and of
course the TensorFlow documentation,
which is forever improving, forever
getting better. So, it's very vast, but
with a lot of experience, hands-on
practicing, checking it out, you'll
start to get much more familiar with it.
Then, once you've learned some things
from searching for it, don't forget to
try the code again. Remember, back to
the motto, if in doubt, run the code.
You can't break anything. And finally,
if you're still stuck, ask a question.
I'm going to emphasize here, including
the I'm putting inverted commas. You
can't see my face in this video, but I'm
I'm doing the inverted comas symbol with
my fingers. The dumb questions. Don't
forget the Discord chat. Ask a question
if you're stuck. Very important skill to
have is asking the right question. Now,
with that being said, if we're going to
do neural network regression with
TensorFlow, you might be thinking, what
is a regression problem? So, let's have
a look at a couple of examples of what
regression problems are. Here we go.
some example regression problems.
Say we're trying to predict the house or
the sale price of a house we're
interested in. So, how much will this
house sell for? If we've got a house
down the street and we want to try and
predict how much it's going to sell for,
we might ask ourselves, how much will
this house sell for? We just said that,
didn't we? Another regression type
problem is how many people will buy this
app or how much will my health insurance
be? or how much should I save each week
for fuel? Now, you'll notice a trend
here with these questions. It's how much
or how many? That's one of the key
points of a regression problem. It's
predicting a number of some sort. And if
you're thinking about in terms of price
or numbers in these types of questions,
well, there are other types of problems
where a regression well we can turn them
into a regression problem such as trying
to predict the coordinates of where the
boxes should be in a object detection
problem. So these are numbers here. So
say we've got 1390. So that would be on
the x-axis. This corner, the top left
corner of this particular box here
should be at 13 pixels in but 90 pixels
down and so on and so on for the top
right corner, the bottom right corner
and the bottom left corner. And then
again, we could do the same with these
corners here, the top left. And
remember, this is the perpetrator who
hit and run on my car. So, if I was
going to build an object detection model
to look at security footage around my
home to see if I can find the people who
hit and run on my car, I might train an
object detection model, specifically a
neural network regression, to try and
predict the corners of where the
bounding box should be around my target
object. So, this is what I want you to
start thinking about. Although we have a
kind of a oneliner definition for
regression problems, predicting a number
is that a lot of the time in machine
learning and deep learning, how you sort
of think about the problem will
definitely define how you approach that
problem. And again, if you're not
satisfied with the definitions here,
this is what we can do. So what is a
regression
problem?
Regression analysis. Here we go. In
statistical modeling, regression
analysis is a set of statistical
processes for estimating the
relationship between a dependent
variable, often called the outcome
variable, or one or more independent
variables, often called predictors,
covariants or features. Okay, so we want
to predict the relationship. So say this
line, that's the relationship there. So
the dependent variable in our example
problem of trying to predict the house
price, our dependent variable might be
the price of the house. So the house
price is the outcome we're trying to
predict. And
the independent variables often called
predictors, covariants or features might
be the number of rooms in the house or
the number of bathrooms or the number of
garage spaces or all three of those
combined. So the independent variables.
So we might have 10 different houses
with 10 different numbers of bedrooms,
10 different numbers of bathrooms and 10
different numbers of garages and 10
different house prices. And we might
want to build a model to take in all
that information and then predict what
the house price might be. So with that
being said, we're going to get into more
examples of this later on as we write
code, but let's have a look at what
we're going to cover
broadly. Of course, we're going to be
writing lots of code to do these things.
We're going to cover the architecture of
a neural network regression model. So
specifically the building blocks of a
neural network regression model, the
input shapes and output shapes of a
regression model. In other words, the
features and labels or in Wikipedia
terms the dependent variables and the
independent variables.
We're going to look at how we can create
custom data to view and fit. We're going
to take care of the steps in modeling
such as creating a model, compiling a
model, fitting a model, and evaluating a
model. If all of these don't make sense
right now, don't worry. We're going to
get very hands-on with them, and they'll
start to make sense as we code them.
We're going to look at different
evaluation methods for regression
models. And finally, we're going to look
at how we can save and load our models
so that if we do train a machine
learning model and we save it, we don't
have to go through that process of
retraining. And if we want to load it
into our applications going forward, we
can do that using our save and load
methods. And finally, how are we going
to do this? a big emphasis between the
chef and the chemist. Chemist is very
exact, whereas what's a cook do? A cook
tries things out. A cook experiments,
pours in little bits of different
flavoring.
We're going to be cooking up lots of
code. So, with that being said, here's
what we're going to cover. In the next
video, let's have a look at what the
inputs and outputs of a regression
problem or neural network might look
like.
[Music]
So if we're dealing with regression
problems, well, we're trying to build
neural networks to solve regression
problems, what might be our inputs and
outputs? Remember, in machine learning
and deep learning, a lot of the time
your focus will be on defining your
inputs of your algorithm and what the
outputs of your algorithm look like.
So, let's say we wanted to predict the
sale price of this house. Now, this is
an actual house that was up the road
from me, not far from where I live that
was up for sale. And say I wanted to
build a machine learning model to try
and predict what I should offer if I was
going to auction what this house should
sell for. So, again, here's our little
diagram of what we're going to be
focused on. We have the inputs. So, what
might our inputs be if we're trying to
predict what our house might sell for
and what might the outputs be? So if
we're just looking at this diagram and
if we want to build a machine learning
algorithm, we might have to build this
ourselves. But first, let's focus on the
inputs and outputs. So maybe we've got
our house, and we know a few things
about it. The number of bedrooms, the
number of bathrooms, and the number of
garages that it has. And we might have a
whole bunch of other houses that are
nearby. And we know what their sale
prices are. What might we do with that?
So we know these. Okay. If we come back
to our definition of a regression
problem here, we're trying to figure out
the relationships between a dependent
variable often called the outcome
variable and one or more independent
variables often called predictors,
coariantss or features. Okay.
So in our case, our independent
variables might be the number of
bedrooms four, the number of bathrooms,
two, and the number of garages two. as
well. So for our machine learning model
or deep learning algorithm, we're going
to have to encode these in some
numerical way cuz we can't just pass
four bedrooms, two bathrooms, two card
spots. So we'll put them into a
numerical encoding. And as you might see
here, this is a one hot encoded. We've
seen this before in the TensorFlow
basics section. And so if we had this
array here or this tensor, this vector
could be number of bedrooms. So it's 0 0
0 1. Now what that would mean is that
does it have one bedroom? No. Does it
have two bedroom? No. Does it have three
bedrooms? No. Does it have four? Yes. So
we put a one there. And in this case,
does it have one bathroom? No. Does it
have two bathrooms? Yes. Does it have
one garage? Yes. So zero in case of one
hot encoding encodes for not that thing
and one encodes for that thing. So that
might be our numerical encoding.
Wonderful.
We put these into here and that's our
inputs to our machine learning model.
And these are often called input
features or in Wikipedia terms
predictors, covariants or features. So
features is probably the most dominant
term that I use in practice and we're
going to use throughout this course. If
you ever hear input features, it's some
kind of information about the data we're
using that goes into our machine
learning algorithm.
And so in our case, if we want to put
our inputs into a machine learning
algorithm, so often one and you'll find
this in deep learning a lot is that
often an algorithm for your problem
already exists. So that means someone
has built something that has worked
before for their problem. Might be very
similar to your problem and you can
utilize that for whatever you're working
on. However, if it doesn't already
exist, we can see how we can build our
own. That's what we're going to be
practicing. We'll see how we can build
our own. But later on in the course,
we're going to see how we can use
algorithms that already exist.
And finally, if we have a look at what
our outputs might be, we might, if we're
trying to predict the sale price of this
house here, based on the fact that it
has four bedrooms, two bathrooms, two
car spots, and it's on a pretty big
piece of land. If we wanted to make an
offer, we're using our algorithm to
figure out how much we should offer at
auction. Our algorithm is telling us we
should offer $939,700.
Now, where does it get this predicted
output from?
Ah, if we have a look at what it
actually would sell for, 940,000. So, we
weren't too far off here. We'll figure
out some evaluation metrics to compare
predicted outputs of our deep learning
models to the actual outputs of what
they should be for regression problems.
That is,
now this predicted output comes from
looking at lots of these actual outputs.
So, remember how we talked about
supervised learning? Well, this is an
example of supervised learning. So, we
might have a hundred or a thousand or
tens of thousands of different homes
with all of their input features and
their actual sale price. And then we
might feed these inputs and outputs to
our machine learning algorithm and it's
going to learn the relationships.
Remember, we come back to our definition
of a regression problem.
Regression analysis is a set of
statistical processes for estimating the
relationships between a dependent
variable, right? So our sale prices and
one or more independent variables. So
our input features here. So after our
machine learning algorithm has looked at
many many examples
comes from looking at lots of these of
input features and outputs of house sale
prices. It's going to learn the
relationships between the input features
and the outputs here. And then for use
cases where we would like to predict the
potential output of a home that we don't
know the actual sale price for, we can
feed its input features into our
algorithm and have a suggested output of
what the sale price or what we should
pay if we were going to auction. So this
is just an example of what regression
inputs and outputs look like. We're
going to have some practice writing code
to do this very shortly. So another big
point is once we've defined our inputs
and outputs and I can't emphasize this
enough is this is what we'll be focused
on a lot of the time in machine learning
and deep learning is the blue parts here
defining our inputs defining our
outputs. And now another key term is
when we're talking about inputs and
outputs is input and output shapes. So
say here we were taking our input
features numerically encoding them
feeding them to our machine learning
model as inputs. Our machine learning
algorithm works out the patterns or
utilizes its already learned patterns to
produce an output aka the sale price of
our home.
What might be the shape of our inputs?
Because remember our numerical encoding
is going to be in the form of a tensor.
And this output here, it doesn't look
like it, but it will also be in the form
of a tensor. So here we go. Here we've
got bedroom, bathroom, garage
represented as a tensor. And in this
case, the shape is going to be three
because we've got three input vectors
here. Now don't be confused by this
outer bracket here. Because we've got
three input features, the shape of our
input layer to our machine learning
model is going to be three. one for
bedroom, one for bathroom and one for
garage. We'll represent this as a
tensor. And for our outputs in our
regression problem in this case, can you
guess the shape is going to be one. So
in our case, for each example that we're
going to pass through our machine
learning algorithm, each sample will
have an input shape of three. So we need
number of bedrooms, number of bathrooms,
number of garages. And each sample will
have an output shape of one for the
price of what we're trying to predict.
And now this is just for our housing
price prediction example. We can adjust
these. So here the bedroom, bathroom,
garage to be almost as many different
inputs as we want. But usually for a
regression problem the output shape is
often one because we're trying to
predict some sort of number. So again if
we come back to Wikipedia terms we're
trying to figure out the relationships
between a dependent variable often
called the outcome variable and one or
more independent variables. So in our
case these are our independent variables
and this is our outcome variable or our
dependent variable.
Now with that being said, we've covered
inputs and outputs of our regression
problem. Let's now have a look at what
that might look like in terms of
building or writing neural network code.
We'll have a look at it in the concept
of an architecture of a neural network
regression model.
[Music]
Recall in a previous section that we
covered the anatomy of neural networks.
So essentially every neural network that
you build will have an input layer, some
number of hidden layers and an output
layer. And for your input layer, this is
where your data goes in. Now hidden
layers is plural on purpose or optional
plural. It can have one up to 100. Some
neural networks even have a thousand
hidden layers. It really depends on what
problem you're working on. And now
that's where the deep and deep learning
comes from. So if you imagine if we had
10 of these hidden layers laid out here
or the more we had the deeper our deep
neural network would be. And then for
the output layer, this is where the
outputs or the learned representation or
prediction probabilities of your neural
network come. And within the hidden
layers is often where your neural
network is learning patterns or weights
in the data. And so keeping this in
mind, if we have an input layer, a
hidden layer, and an output layer, what
might the architecture of a neural
network regression algorithm look like
if we wanted to build one with
TensorFlow?
So this is what the typical architecture
of a regression model in TensorFlow will
look like.
So in our case, we have a few
hyperparameters here. Remember a
hyperparameter is a setting that you as
a data analyst or a machine learning
engineer can change. So we might have
the input layer shape, the hidden
layers, the neurons per hidden layer,
output layer shape, hidden activation.
We haven't covered that yet. The output
activation, the loss function, and the
optimizer. Now, we haven't covered the
bottom half of this little graph here,
but that's okay. We're going to get very
familiar with that as we write code. And
now I just want to emphasize that this
has been I've adapted this from page 293
of the hands-on machine learning with
scikitlearn caras and tensorflow book by
Aurelion Guron which is a phenomenal
book. This is probably going to be my
number one external resource to go along
with this course is this book. I've read
it end to end in its entirety and I'd
highly recommend it if you're looking at
learning more with TensorFlow. But
that's beside the point. We're getting
hands-on here.
Now, in terms of the typical value for
each of these, the input layer shape,
we've covered this a little bit before.
So, if we were trying to predict the
sale prices of different houses, we
might have an input layer shape as the
same as the number of input features
that we have. Eg would be three for
number of bedrooms, bathrooms, and car
spaces in housing price prediction.
Hidden layers, the value you have here
is very problem specific. The minimum is
one. The maximum is unlimited. Again,
the neurons per hidden layer is going to
be problem specific again. So if we come
back to our anatomy of neural networks.
So this has three hidden neurons. So not
only can you customize how many of these
layers that you have, you can customize
how many of these little circles here,
also referred to as neurons. So you
could have 100 neurons here times 100
layers, and that would be a very deep
model.
Now the output layer shape. This is the
same for the desired prediction shape.
For example, one for our housing price
prediction problem. Hidden activation is
usually relu which is rectified linear
unit. Output activation. This is going
to be problem specific. But these are
some typical values here. The loss
function for a regression model is the
default one is usually mean squared
error or mean absolute error/ huba loss.
So which is a combination of mean
absolute and mean squed error if you
have outliers in your data. Again we
haven't covered these four here. We're
going to see what they mean shortly. And
then the optimizer which is a way of how
our neural network improves its
predictions is usually going to be
stochastic. Oh that's a fun word to say.
Stoastic gradient descent. Stoastic is a
fancy word for random or the atom
optimizer which is a very good default
value. So let's have a look here what
this looks like in TensorFlow code. We
haven't written any of this. So don't
worry if it feels foreign to you. We're
going to write a lot of this going
forward. But I just want to relate the
architecture of regression model. See
what we're going to start working
towards. By the end of this module that
we're covering this section, you're
going to be able to write these
yourself. So let's have a look. This is
the input layer. So the input layer is
in the blue. So we see the shape is
defined here as three because we're
working with our housing price
prediction. The hidden layers is in this
other little blue shade here. So these
are known as hidden layers. So we've got
an input here and then an output here.
Remember all neural networks have some
sort of input, some sort of output. Then
the hidden layers in our case we have 1
2 3. Then if we look for the neurons per
hidden layer is often this first number.
So we have it in green. In our case,
we've got 100 neurons in the first
hidden layer. We've got 100 neurons in
the second hidden layer. And again, the
same for the third hidden layer. Now,
our output layer shape, we can see here
in the yellow. This is the same shape as
our desired prediction shape. In our
case, this would be a greater uh output
shape for our housing price prediction.
Then we have the hidden activation
parameter. In our case, it's the relu,
which is a rectified linear unit. So
that we can set that using the
activation parameter. And then for our
output activation, which is just simply
the output activation function for our
output layer, we've decided to set this
one as none. And then for our loss
function, if we come down here, we've
got step two, which is compiling a
model. So this is step one is we created
a model. Step two is we have to compile
it in TensorFlow. Again, we're going to
be writing a lot of this code going
forward. So we define our loss function.
This measures how wrong our neural
network's predictions are. So when it's
learning the relationships between our
independent variables and our dependent
variables, in other words, the the
number of bedrooms, bathrooms, and car
spaces and the sale prices of our homes.
This loss function is going to measure
how wrong our neural networks
relationships are. And the optimizer
here in our case we've set it to which
is in the black square here. We've set
it to the atom optimizer is going to
inform our neural network how it should
improve our patterns to reduce the loss
function.
So again if we look at this in the
context of our housing price prediction
problem we see we have the input
features here which is going to go into
our input shape. They'll pass through
the hidden layers which will learn the
patterns between the input features and
the output variable. And then it's going
to come out into our output layer which
is going to output something like this.
And then the loss function is going to
tell us or tell our neural network how
wrong the relationships between these
and this is. And then the optimizer will
tell our neural network how to improve
the patterns it's learning between the
the input variables and the output
variable. And then here the fit model is
telling our model to look at a whole
bunch of different examples in the
training data for 100 laps of the data.
So that's what the epoch variable stands
for. Again, a lot of things we haven't
covered here, but as I said, we're going
to be writing a lot of code. And
actually, I think it's about time we did
that. So we've looked at some code here.
We haven't written code. So we've gone
against our rule. If in doubt, write the
code. Let's see how we'll create some of
this. And yeah, we'll create some data
for regression problems. We'll write
some code and we'll get hands-on with
neural network regression models.
[Music]
We've covered some of the fundamental
principles of regression modeling with
TensorFlow. Specifically, what is a
regression problem? But now let's see
how we might write some code to actually
do that or to actually work on our own
regression problems. So I'm going to
start a new web browser here. I'm going
to come to collab.ressearch.google.com.
I'm going to create a new notebook. Now
we'll wait for this to load up.
Just going to zoom in here. There we go.
Now I'm going to call mine 01 neural
network
regression with TensorFlow.
And I'm going to add the little video
tag here. This is the notebook that I'm
writing during the videos. If you want
the ground truth notebook, so the
notebook that I'm getting the
information for this video off, remember
you can go to the course GitHub, which
is github.com/mrdeber/tensorflow
deeple learning. This will be linked
throughout the course. At the moment,
while I'm recording this, it's still a
work in progress. But if you come here
to 01 neural network regression in
TensorFlow, you'll have all of the
information we're working on in a very
succinct manner. So there's a lot more
annotations in this one. In the notebook
that we work on during the course, this
one here, we're going to be focused on
writing code. So if you want all the
commentary around it, check out the
course GitHub, specifically the notebook
without the video tag at the end. So
I'll just close that. And here we go. So
let's write our title in markdown. So
introduction to regression with neural
networks in TensorFlow. Beautiful. I'm
going to turn this into markdown by
pressing commandm on my Mac. Might be
controlmm if you're using Windows
machine. And so we're going to write
here there are many definitions
for a regression problem.
But in our case,
we're going to simplify it.
Predicting
a numerical variable
based on some other combination of
variables.
Even shorter predicting a number. That's
what we're going to do. We're going to
be writing neural network code. Did I
spell anything wrong here? Probably
definitions is wrong. predicting a
number based on some other numbers. Nice
and simple. Actually, that's probably a
definition for neural networks in
general, but this is what we're going to
start with. And we had a lecture before
that was what we're going to cover. So,
refer to that one. But let's just get
started. We need to get hands on by
first we're going to need some data. So,
how about we do that? What should we
start with? Import TensorFlow.
So we need to import TensorFlow as TF
and then we're going to check our
TensorFlow version.
So for this course we need 2 something.
So 2 plus and you'll notice here that
we're not connected to a Collab instance
just yet. But the beautiful thing is as
soon as we run a cell it's going to
automatically connect to a Collab
instance. So we'll import TensorFlow
here. Wonderful. 2.3.0. So, this is the
version that I'm on. If you're watching
this at a later date, you might be on a
later version, but as long as you're 2
something, you should be right for this
series of videos. Let's start off by
creating
some data to view and fit.
So, we had a look at what is a
regression problem before, regression
analysis, and it looked something like
this. So we have the blue dots could be
our data points and our regression model
is this red line through the middle. So
that's the relationship that we're
trying to learn, right? Between a
dependent variable and one or more
independent variables. So let's create
some data that looks like this.
How might we do that? Let's try numpy as
MP and we'll also import mapplot lib.pot
as plt.
So we need to create the features which
we'll call X. Again for features is
generally called X in the form of a
capital.
So this is NP array. We're going to just
create this manually. Maybe we start
from my favorite number 7 or
specifically -7 -41.
you might be able to work out the
pattern of these numbers that I'm
creating as well. That's what I want you
to start thinking about is that whenever
you're viewing data before you like
write a machine learning model or some
sort of deep learning model to figure
out patterns, see if you can understand
them yourself. What's your gut feeling?
So, np array, we'll create our labels as
well, which is typically defined by y in
a lower case. So, we'll go here 3.0 0
6.0
9.0.
Now, if you were trying to figure out
the relationship between
X and Y, what might you do?
There we go. We've got X and Y. Now, if
we want to visualize it, remember one of
our other motos is visualize, visualize,
visualize. We're going to see that a lot
throughout the course.
Boom. Plot does scatter. Okay, we've got
a very simple line here.
So, you could think of X as our
independent variables and Y as our
dependent variable. If we have a look at
our regression analysis little diagram,
ours looks something like that, but just
this has got many more samples. We're
starting nice and simple. So, there we
go. Now, what might we want to do that?
So let's try work out the pattern
between x and y just as it is. So
x is -7 where y is 3 and then x is -4
where y is 6 and -1 9 2 and 12. Are you
sensing a relationship here? How might
we manipulate x to get y?
Well, I'll tell you the rule I just
figured out. y = x + 10. Does this work?
x + 10.
Do we get y? Beautiful. So if we want to
go y=
x + 10, true, true, true, true, true.
Beautiful. So the relationship we would
try to get for our neural network to
learn is this here.
This is the relationship or the function
between our X and Y. In other words, our
input features and our labels in
Wikipedia terms, our independent
variable and our dependent variable. So
let's have a look at what our input and
output shapes. So if we were to build a
model
between X and Y, what might be the input
and output shapes of our model? If we
come back to our keynote,
remember we defined our input shapes for
our housing prediction problem or
housing pricing prediction problem. If
we were to take in these independent
variables, number of bedrooms, number of
bathrooms, number of garages. So,
bedroom, bathroom, garage, the shape
would be three. And then the output
shape would be one.
And remember, these will vary depending
on the problem you're working on. So in
our case, our problem if we just want to
predict this simple line here,
it varies from the problem we're looking
at here. So if we had three input
variables for one output variable here,
what might be the shapes of our input
and output variables for this problem?
Let's create a demo
tensor for our housing price prediction
problem.
This is a very very important point for
all the neural networks that you build.
So I want you to pay attention to this
one. So the house info equals let's
create a tensor. Remember our input
shapes to neural networks often
numerical tensors, but this one we'll
just we'll just treat it as a string
tensor for now.
Garage. And now the house price is going
to be TF constant 939,700
or something like that. And now let's
check out house info
and house price.
Ah, okay. There's our shape. Beautiful.
So, what we've done is we've just
converted here this demo input and
output shape into actual tenses.
Wonderful. So, the input shape is going
to be three and the output shape is
going to be one because we're using
the house info to predict the house
price. Now, in our case, we want to use
X to predict Y. So
hm let's have a think what might be the
input shape. So if we go input shape
equals maybe it's x dotshape and the
output shape
equals y dotshape.
Let's check the input shape
and the output shape.
8 and 8. Hm. Does that make sense?
Because really we want to just know what
the y is based on this sample of x. And
really we just want to know what the y
is here based on this sample of x. So
what if it was just
one sample? So if we did x0,
we want to use x0 to predict y0.
So -7 to predict three. And we also want
to use
X1
to predict Y1.
Just like in our housing price demo, we
want to use one house to predict the
price of one house or the input features
of one house to predict the price of one
house. Same for this sample problem. We
want to use one input feature of X to
predict one Y value. The same again
here. So what if we check the shape of
this?
Huh?
From this it seems that our inputs and
outputs have no shape. How could that
be? I mean it's because no matter what
kind of data we pass to our model, it's
going to take an input and return as
output some kind of tensor. But in our
case, because of our data set, it's only
two small list of numbers here. We're
looking at a special kind of tensor. Do
you remember when we went over the
different types of tenses that you might
come into? We're looking at scalers
here, specifically a rank zero. Well,
scalers is specifically a rank zero
tensor. So if we go
X0
N dim.
It has zero dimensions. So that's why it
has no shape. So let's take a look at
the samples individually. So X0
Y 0.
Wonderful. So in our case, we're trying
to predict or build a model that's going
to take as input -7 and produce as
output 3.0.
So this is where you're going to run
into a whole bunch of input and output
shape trouble is when you get examples
like this that don't really make sense.
But how I want you to think about this
is we're going to use one x value to
predict one yvalue. So keep that in
mind. And in the next video we'll see
how we might create a model to do so.
[Music]
In the previous video, we created some
sample data. So, some sample input
features and some sample labels. And
what we're trying to do is model the
relationship between X and Y. In other
words, the relationship between our
features and labels. And we checked out
our input and output shapes which are
kind of confusing because when we
checked out the shape of our in our case
numpy arrays actually let's turn these
into tenses. How might we do that? Turn
our numpy arrays into tenses. So
remember from our tensorflow
fundamentals section we can turn numpy
array or numpy arrays into tenses by
just passing them to tensorflow.stant
constant or tf.constant
and then if we wanted to check the
x.tshape
y.shape or we'll just print x and y
actually.
So there we go. Now our numpy arrays are
in the form of tenses. Beautiful. And we
got a little confused when we check the
input and the output shape. So let's do
it again while they're in tensor form.
So input shape equals x0
dotshape
and output shape
equals y0 dotshape.
Input shape output shape.
And we got a little confused because
they have no dimension here because what
we're doing is when it has no dimension,
it's a scalar value. So it's a single
value. But then we figured out by just
investigating our data that we want to
use one input value to predict one
output value. So that's our input and
output shapes there. Now how might we
build a model to do that? How might we
do scatter XY?
How might we build a model to figure out
the relationships here? Well, we haven't
covered that. So if you don't know the
answer to that, that's perfectly fine.
But let's start with steps in modeling
with TensorFlow.
That's what we're going to cover here.
So
the first one is number one is creating
a model. In here you're going to define
the input and output layers
as well as the hidden layers of a neural
network. And if you're using deep
learning that is of a deep learning
model wonderful. Number two, we have to
compile a model.
We need to define the loss function.
In other words,
the function which tells our model how
wrong it is. And the optimizer, the
optimizer is tells our model how to
improve the patterns.
It's learning and evaluation metrics.
So what we can use to
interpret
the performance of our model. Beautiful.
And then finally three is fitting a
model. So this is letting the model try
to find patterns between
X and Y
or features and labels.
Beautiful. So we have three steps here.
Now if we come into we've got a
beautiful diagram here which is steps in
modeling with TensorFlow.
Woo. Look at that. A beautiful colorful
diagram. Step one is we have to get our
data ready. So if we were working on an
image classification problem to figure
out the what type of food is in an
image, we'd have to turn it into tenses.
But in our case, we already have our
data in tenses because if we have a
look, we've turned X and Y into tenses.
So our data is already in tenses.
Wonderful.
Now step two. Once our data is in
tenses, we can build or pick a
pre-trained model to suit our problem.
We can use pure TensorFlow or TensorFlow
hub. We'll see that in a later video. So
that would be
step one here would be creating a model
specified to your problem. Wonderful.
Then we fit the model to the data and
make a prediction. So that would be down
here fitting the model. But actually in
TensorFlow building or picking a
pre-trained model often involves step
one and two are very synergistic with
each other. Meanings if you do create a
model or pick a pre-trained model once
you instantiate it you basically always
have to compile it. That tells
TensorFlow that hey I've got this model
instantiated. I've got it set up now.
I'm compiling it. I'm telling you that
I'm ready to use it. So then we go on to
step three is to fit the model.
Wonderful. Model.fit. and we might fit
it to the training data, we might let it
look at the training data five times.
And then we have number four is evaluate
the model. So once the model has found
patterns in the training data, we might
evaluate our model on the testing data.
And then if we keep going, we can
improve via experimentation and then we
can save and reload our trained model.
But let's go back here. Let's see how we
might actually go through these three
steps here. And what I'm going to do,
I'm going to turn these into bold so
they're just a a little bit more
pretty
because that's what we're doing here.
We're creating art. So, what I might do
is set the random seed so we have some
reproducibility. TF.random set seed
42. I'm going to use 42, which is the If
you're wondering why I'm using 42, it's
the answer to the universe. I'll let you
look that one up. And then we're going
to go number one.
create a model using the sequential API.
You might be wondering what that means,
but we'll get into that in a second. I'm
going to write the code first. Model
equals TF do.se sequential
and then we're going to open bracket and
then we have so this is basically saying
to TensorFlow, hey, I want to create a
model and I want you to sequentially go
through the following. We're going to
just make this with one layer.
TF.caras.layers.dense.
We're going to have one. There we go.
You might be wondering why I'm using
one. Because in our case, what do we
want to do? We want to build a model to
take as input one number and predict one
number. So that's why I have one here.
We go down. We're going to go step
number two. Compile
the model.
So model.compile
equals loss equals TF carers.losses.m
AE.
So in our case
we have loss MAE is short for mean
absolute error. So let's have a look.
Remember if you don't know something
mean absolute error. What is this?
So in statistics, mean absolute error is
a measure of errors between paired
observations expressing the same
phenomenon.
Hm. What about do we have images?
Comparison of two observations where x1
equals 2 mean absolute error. This is
what you're going to come across. You're
going to come across a whole bunch of
different explanations. But what the
best thing to do is to just check them
out and see if something catches your
eye. So examples of y versus x include
comparison of predicted versus observed.
Ah comparisons of predicted versus
observed. All right. Subsequent time
versus initial time and one technique of
measurement versus an alternative
technique of measurement. What if we
looked up just the function that we
wrote? TF carers losses. MAE.
Here we go.
computes the mean absolute error between
labels and predictions.
Wonderful. So y true, y prred. All
right. Ah, and there's the the function.
So it's the mean
of the absolute
of yus y prred. So I'm guessing that y
prediction is the prediction our model
makes and y true is the actual value it
should be. So we make it the absolute
value here. So it's a positive number
and then we get the average from that.
Okay. So mean absolute error is just
saying on well there's the absolute
and there's the mean. So it's just
saying on average how wrong are our
predictions?
Oh, okay. That's a little bit easier
than what it first looked like. Now the
optimizer is we're going to set it as
TFARS optimizers.
SGD.
Now
SGD is short for stochastic gradient
descent.
Wonderful. Now again if you're not sure
what SGD is, go what is stochastic
gradient descent?
So I'll let you go through those in your
own time of what stochcastic gradient
descent is. But just what you need to
know for now is that an optimizer tells
our neural network how it should
improve.
And then we'll go here and we want
metrics is we're going to use MAE as
well. Now a lot of functions in
TensorFlow if they have a shortcut name
MAE or SGD, you can often use a string
variable to define the fact that you
want to use that specific function. So
in this case, we could remove that and
write SGD,
but we're going to remove that. There we
go. So that's our compile done. And now
we're going to fit the model
which is model.fit
and we're going to fit it on X and Y for
five laps.
So this is what the fit function takes.
So we create the model, we compile the
model and we fit the model aka telling
our model look at X and Y and try and
figure out the patterns. and you've got
five opportunities of going through all
of the x values and all of the yv values
and trying to find those patterns or
figure out the relationship. If we set
this to 100, we'd say you have 100
opportunities of going through all of x
and all of y. But because we want to
keep our experiments nice and short to
begin with, we'll only set it to five.
All right, so before we run this, we did
say that we use the sequential API. How
do we find out what sequential means in
TensorFlow? So, if we wanted to find out
what the dock string of this method here
is. Oh, there we go. Collab
automatically opens it up. But in my
case, I'm going to press shift command
space as it goes here. So, sequential
groups a linear stack of layers into a
TF.caras.mmodel.
Sequential provides training and
inference features on this model. All
right.
So we get an example of how we can
create a sequential model.
Now in TensorFlow and carers there's two
different types or two main types of
creating models. We've used a sequential
API here. So we can go TensorFlow
caras. Now this is if I already know
what the difference is between the
sequential and functional API is. That's
a little reveal of what we're going to
look into. But this is how I'd find out
information about TF car sequential. I
go here. Sequential groups a linear
stack of layers into TF carers model.
Now, there we go. TF.caras.se
sequential equals layers equals none.
Here's a whole bunch of different guides
I could go through to figure it out. I
could go through some tutorials if I
wanted to. And there's actually the
documentation is showing us a different
way. So, we could create this is what I
also do sometimes. I just copy the code
sample. I come back to my notebook. I
copy it here. And then below it, I'm
going to write exactly what it writes
out here. So model equals TF
carers.sequential.
And then I do here
model.add.kferas.layers.dense
8 input shape. Now this gives me the
feeling of writing this actual code.
Input shape 16 and then so on so on.
Now I want you to think about if you've
seen this example here. How might we
convert our own example into this?
So let's try. We'll go.
We've got our sample here. We might go
model equals TF carers. And again
looking at this one dot
sequential. Wonderful. And then model
add.
Looking at this we want to put this in
there.
model.add TF.caras.layers.dense
one.
So there we go. We've just got
reproduced the exact same thing here.
Now the reason why I'm showing you this
is because in TensorFlow there are
actually a fair few ways to do things.
So this is with the sequential API. You
can put your layers, you can use the add
method to add them to your sequential
model or you can put them into a list as
we've done here.
and add them to your layer. So
tfkaras.layers.dense.1
would be the same as adding
that there.
Does that make sense?
If not, that's all right. Have a little
bit of a play around. Have a practice.
Check out the We'll delete this cell
here because we're going to use this
methodology for the time being. have a
look at the TensorFlow documentation if
you'd like to have a little bit more
practice, rewrite all of this code for
yourself. But let's get back into here.
We've set up our model here, steps in
modeling. We've created a model. We've
compiled the model. And it's time to fit
our model. So, you're ready to run your
first neural network of this entire
course. I hope you are. So, let's try it
out. Three, two, one.
Oh, look at that. So, we get a warning
here. What does this say? Dense is
casting an input tensor from DT type ah
from float 64 to the layers d type of
float 32 which is new behavior in
TensorFlow 2.
Okay, the layer has dypeed float 32
because it defaults to float X. So, we
discussed this before
of if you intended to run this layer
float 32, you can safely ignore this
warning. If in doubt, this warning is
likely only an issue if you are porting
a TensorFlow 1.x model to TensorFlow 2.
No, we're not doing that.
Hm.
To change all your layers to have dtype
float 64 by default, we can set the back
end of float 64. How about how might we
just change our layer or our input data
to DT type equals float? Because you see
how we created with numpy the default
data type is float 64. So let's change
the d type to tf float 32. And same
again dype equals TF float 32.
Um
with DT type float 32 not
h or what if we go
tf.cast
and then we go pass here dype equals
float 32. And then we'll do the same
here. TA for cast
DT type equals float 32.
Ah, there we go. We'll change this with
DT type float 32. So now we have float
32 tenses.
What if we to run this model again?
Remember this warning was telling us
that the layer
uses float 32 by default, but originally
our input X and Y were float 64. Let's
run this again.
Boom. Wonderful. So this goes very
quickly. So you can see here this is
what the output is when you're going to
fit a model is it tells us this is lap
one of the data. Lap two of the data.
Lap three. This is how long it took. And
this is our loss function. So how wrong
our machine learning or our deep neural
network is when we're trying to predict
X and Y. So right now on average when
our model uses an x value to predict y
it is wrong by 11.5 and then it slowly
improves to be wrong by 10.9 and then
because we've set our metrics to be the
same as the loss function we get the
same output here. So this is loss this
is our evaluation metric. Now if we
wanted to use our machine learning model
to make a prediction, let's check out
X and Y. So right now we have a trained
machine learning model. So this is tried
to work out the patterns between X and Y
by doing five laps over the data. So
let's remind ourselves of what X and Y
are. There we go. So -7 34 61 9. Now try
and make a prediction.
using our model. So let's go
model.predict. This is how we can make a
prediction with our train model. If we
wanted to make a prediction on x=
17.
So if we had another value on the end
here,
right? If we added x= 17 to the end
here, what do you think the output
should be for y?
If you guessed 27, you'd be correct
because that's a pattern between our x
and y is that y equals x + 10. So let's
see what our model learned.
Oh,
our model predicts that if we had an x
value of 17, the y value should be 12.7.
Now,
that's pretty far off, but as we can see
from our loss and MAE values,
that actually reflects what our models
are training output has shown us here is
that on average, our model predicts
something that is basically 11 points
off where it should be. So if we wanted
to
let's go y prred equals model.predict
have a look at y prred.
So this is on average 11 points off
where it should be y prred + 11 because
I'm getting this value here. What does
that give us? 23.7. So it's still off.
So h we fit a model. Now it doesn't find
the correct patterns between X and Y. So
what we might look at doing in the next
video is seeing how we can improve
improving our model. So let's check out
that in the next video.
[Music]
We finish up the last video with you
writing your first neural network with
TensorFlow code. Specifically, we
stepped through the steps in modeling
with TensorFlow. We created a model, we
compiled it, and we fit the model to our
data. However, our neural network didn't
turn out very well. I mean, we tried to
make a prediction on a new piece of X
data, and the output was pretty far off
where it's supposed to be. You might be
thinking, "Hey, Daniel, we kind of
stepped through this code relatively
quickly without discussing the concepts
behind it." And you'd be 100% right with
that, but there's a reason for it. If we
come to our keynote, we checked out this
slide, steps of modeling with
TensorFlow. We had a brief look at our
workflow, what we're sort of working on
throughout this course. We had an even
briefer look at the steps in creating a
model, compiling a model and fitting a
model and then of course evaluating it.
Well, we do have a bit more of an
indepth guide as to the steps in
TensorFlow modeling. More specifically
talking through the code we wrote. Now,
this is an important point I want to
emphasize and it's going to be a theme
throughout the rest of the course is
that I would rather you learn the
concepts by writing code than by looking
and reading a slide. So, I'll say that
again. I would much rather well I held
this slide back because I would much
rather you write code than spend time
reading slides. So if we got the steps
here, number one, construct or import a
pre-trained model relevant to your
problem. Compile the model in the
compilation. We define the loss. We
define the optimizer. We define the
metrics. Number three, we fit the model
to the training data so that it can
discover patterns. Epoch is how many
times the model will look at the
training examples. And number four is
evaluate the model on the test data.
Now, these steps don't necessarily come
in this order all the time, but
generally you can adapt them to whatever
problem you're working on. So, with that
being said,
we've built our first neural network
with TensorFlow, but it didn't perform
very well. Let's have a look at how we
might improve our model. So, I want you
to take a guess. We've got three steps.
We're creating the model, compiling it,
and fitting it. Based on what you've
seen so far throughout the course, what
do you think are some steps that we
could do to improve our model's
performance? Now, if you're not sure,
that is completely fine. I just want you
to have a think about it before we start
going through it.
Remember how we define a model? It's got
a number of different hidden layers.
It's got a number of different hidden
neurons in each of those layers. and we
fit it for five epochs, which means it
looks at the data five times. So,
they're just a few little hints, but
let's get on to here. If we wanted to
improve our model, let's take a look at
the three steps we used before to create
a model.
I want you to think about this is that
when we create a model using these three
steps, we can actually improve a model
by altering how we went through each of
these. So when we create a model, we can
improve it via steps in the creation.
When we compile a model, we can improve
it by steps in the compiling. And when
we fit a model, we can improve the model
by steps in the fit method. So let's
write that down. We come here. We can
improve our model by altering the steps
we took to create a model.
So number one is
creating a model.
If we wanted to improve our model here,
here we might add more layers,
increase the number of hidden units,
also called neurons
within each of those layers. within each
of the hidden layers and we might change
the activation functions
of each layer.
Number two, when we're compiling a model
here, we might oh, I've written ear up
here. Here we might here we might change
the optimization
function or perhaps the learning rate.
We haven't looked at this parameter yet
or hyperparameter
should I say but we will in a second.
Learning rate of
said optimization function.
And number three, when we're fitting a
model here, we might
fit a model for more epochs. In other
words, let our model look at the
training data more times. So, leave it
training for longer or on more data. So,
give the model more examples to learn
from. Now this is a very brief overview
of how we might improve our model. We do
have a dedicated slide for looking at
this. So improving a model from our
model's perspective that is. So we might
start with a smaller model and then we
might build it into a larger model. Now
before we continue with this, we've
discussed the concepts of improving our
model. Now what I would rather do
instead of going through each of these
one by one is let's start to code this
larger model. So we'll do that in the
next video. I'll see you there.
[Music]
Welcome back. We finished the last video
by looking at this slide here comparing
our smaller model to a larger model.
Now, there's a few key differences
between these two. You'll notice in step
one here, the larger model has three
hidden layers. Well, a total of four
layers, but we've added three layers
here to the beginning. Whereas, this one
only has one layer. This large model has
four layers. So, we've increased the
number of hidden layers. The number of
hidden units or the neurons in each of
the hidden layers is 100, whereas this
one only has one hidden layer with one
hidden neuron. And then if we come down
here to step two, compile the model.
Let's have a look what what's the main
difference here. The main difference,
the loss is the same, but we've changed
the optimization function from SGD to
atom. So if you're not sure what the
atom optimizer is, just for now,
remember the optimizer tells our model
how it can improve. And the atom
optimizer is a very common and very
useful optimizer. So you'll often this
will often be the default optimizer that
you start with. We've also got a LR
parameter here in our optimizer where
this one doesn't have it. So this is LR
is short for learning rate. In other
words, when our atom optimizer tells our
model how to improve, how much should it
improve each step? So the higher the
learning rate, the more the atom
optimizer pushes the model to improve.
Whereas the lower the learning rate, so
say for example, if we had more zeros
here, the lower the learning rate, the
smaller the steps our optimizer tells
our neural network to take to improve.
And now finally in fit the model, we
here we see here we've got fit xra
subset y train subset whereas here we've
got x-ra full y train full and epoch
equals 100 versus epoch equals 5. So
what you'll often do with your data sets
is split them into a subset. Say for
example 10% of your training data rather
than the full data set. So you can run
many smaller models to make sure that
they work before upgrading to a larger
model because this larger model here may
take more time to run or time to figure
out patterns in the data set than our
smaller model. And what do we want to do
as data analysts and machine learning
engineers is we want to run as many
experiments as possible to figure out
what doesn't work before we increase the
parameters of our experiments and try
run some larger models. So let's see
we're going to do it one step at a time.
Uh let's recreate this larger model. But
as I said we'll start it. We won't
change every single thing. We won't
change the number of layers to begin
with. We won't change the optimizer.
Let's just see if we can improve it by
increasing the number of epochs. So this
is a very similar model to what we've
built. Let's do one upgrade to it by
letting it look at the training data
rather than five times. Let's let it
look at the training data 100 times. So
if we come back, so let's rebuild our
model.
So number one is create the model again.
And you could scroll up if you wanted
to, but I'm just going to recreate the
same model we created before.
tf.caras.se sequential.
And if you can't remember each of the
three steps, so create the model,
compile the model, fit the model, don't
worry because this may be the first time
or very likely is the first time you've
ever written this code. However, by the
end of the course, after you've built
I'm not even sure how many we're going
to build. I think it's well over a 100.
Once you've built a 100 plus models,
these steps you'll start to be very
familiar with. And even now, like I've
built probably thousands of these, is I
still make mistakes and I still get
things wrong and have to tweak them. So
what we've done is we've created the
model, which is just the exact same step
we did before. It's a sequential model,
meaning it's going to run from bottom to
top. It's going to pass through this one
layer. We've defined our loss function
as mean absolute error. We've defined
our optimizer as stochastic gradient
descent. And then here we're going to
define our evaluation metric as mean
absolute error as well. And now this
time in step three fit the model. We're
going to go this time will train for
longer.
We go model.fit fit x y epoch equals
100.
So this is the exact same model as we
created before up here. The only
difference is here we've changed epochs
from five to 100. So remember
the final loss and the final mae for
this after five epochs was around about
11. So let's just remember 11. That's
the number we're trying to get lower.
That's what we're trying to improve by
increasing the number of epochs. So,
let's try run this model. You should be
very excited, too. Oh, see what I mean?
I've written thousands of these, but I
still get it wrong. It's optimizers, not
optimizer. I missed the s on the end.
Run it again. Beautiful. Look at that.
Oh, nice and quick. How exciting is
that? Epoch. So, if we come down here,
oh, scrolling everywhere.
right up to the top.
So if we see here we've got epoch one
out of 100 and then this is where our
previous model stopped after five epoch.
So again the error is very similar. It's
very close to 11. But then watch as we
increase the number of epochs. So 6 7 8
9 10. That's double. Has the loss gone
down? It has. So we're getting closer to
10 after 10 epochs. And now let's keep
going down to 50 epochs. Here
we go. 50. So about halfway through. Oh,
we're almost at seven loss. Okay, let's
keep going all the way down
and we get to 100. Wow. Okay, so we're
just below seven here. This is amazing.
So just by altering one hyperparameter
of our model specifically the number of
epochs it's gone through we've decreased
our loss and our MAE mean absolute error
from around about 11 to around about 7.
So let's see what our prediction is
going to be. So remind ourselves of the
data
ourselves of the data. So we have X and
Y. What did we try to make a prediction
on before? We tried to make a prediction
if we added another value here 17, which
should come out to be about 27 based on
the other values of X and the other
values of Y.
Let's see if our model's prediction has
improved. So we go here model. If we
want to make a prediction,
predict we pass it 17. The predict data
has to be in the same format as the
training data here. So, it's in a float
there. And ready? 3 2 1
boom. Oh, yes. Look at that. So much
closer. 29.73
to what it actually should have been is
27. So much better than before. What did
we get before? Let's come up here.
We got 12 before. So, which is I mean it
should have been 27 that's about 15 off.
So, we've reduced our error to be about
three off. How cool is that? By just
tweaking one little parameter of our
smaller model and turning it closer
towards being a larger model. Now, what
if we were to alter another one of our
models parameters?
What I'll do before I do that, before I
write the code, I'd like you to give it
a try. So you see here
got our model code here. I'd like you to
rewrite this cell. However, this time
keep the number of epochs at 100. I want
you to choose one thing from this larger
model slide here. It could be the
optimizer to change. You could add in
here one hidden layer or you could add
in all three to the creating a model
step. But I'd like you to try one change
and then in the next video I'm going to
go through one of the changes. I'm not
going to let you know which one, but
just try one of these for yourself and
see if it improves our model's results.
Just go through the exact same steps
that we've done here. Creating a model,
compiling the model, fitting the model,
except this time maybe add in a hidden
layer here, TF carers dot dot dot. I'll
let you complete that. Or or change the
optimizer to Adam
and see what happens.
So, give that a try. And in the next
video, I've got to decide what change
I'm going to make and we'll see if we
can improve our prediction even further.
[Music]
Welcome back. In the last video, we saw
how we can improve our model's
prediction capabilities by just
increasing the number of times it looked
at the training data. Now important
point to note here is that
although we've seen a couple of ways of
how we can improve our model, it's not
always the case that any one of these
parameters here or hyperparameters will
result in an improvement. So I also
issued you a challenge to see if you
could change one of the hyperparameters
here in creating the model or compiling
the model and fitting it to the data
keeping the number of epochs the same
and see what happens. So depending on
what you tried it might have improved
the model might not have but let's see
if we can make another change to improve
our model. So now I'm going to write
some code to get even closer towards our
smaller model becoming our larger model.
My change I'm going to choose by adding
an extra layer here. So rather than
three layers, I'm going to make the
smallest change possible. And this is
what I'd like you to think about as well
going forward is that you don't
necessarily always have to add three
hidden layers or add 10 hidden layers.
You can just adjust one thing on your
model, try it out, and see how it goes.
In fact, that's how I would like you to
run your experiments is many, many small
changes rather than always doing
extremely large changes because
otherwise if you do too big of a change,
you might not be sure what caused the
improvement or non-improvement of your
model. So, let's go here.
Let's try
Oh, I'll keep that output there. So this
time create the model
this time with an extra hidden layer
with 100 hidden units.
So let's go here. Model equals tf.karas
dose sequential.
Open a list. I like to come back right
to the start here and tab in
tfkaras.layers.
Now 100 and the activation on this one
is going to be relu. We'll have a look
at what that is in a later video. But if
you want to test yourself at
researching, have a go at just searching
what is relu activation or however you
want to pronounce that. So tf
caras.layers
dense one. Beautiful. Now we're going to
number two is compile the model.
Whenever you create a model in
TensorFlow, you have to compile it.
Compile the model.compile.
The loss is going to be TF carers losses
MAE.
And the optimizer is going to be TF
carers optimizers.
SGD. Now, I just want to show you a
little tidbit to just to prove that it
works is if you didn't want to write
this out, you can change this to be MAE.
That should work. Fingers crossed.
Um, and then metrics is going to be
metrics has to be within a list as well.
Metrics is going to be MAE.
Now three is going to be fit the model.
We've got our data model.fit
X Y. So the features come first as you
see here. Model.fit. This is the dock
string. X equals our features. Y equals
the labels. And then we're going to go
epo equals
100.
So the only difference we've made to
this model above is we've added this
hidden layer here. So that's the one
change we've made. So let's see how it
goes. Shift and enter.
Beautiful. Oh, would you look at that?
Right from the start. Remember our first
model after five epochs had an error of
about 11. Well, this one's hitting about
10. But then after 10 epochs, it's
already just above. Oh, it's already
below our other model without a hidden
layer. So this one finished off with a
loss of just around about seven and an
MAE of around about seven as well.
Remember mean absolute error is about on
average how wrong are our models
predictions? So if we come down here,
what did we finish up with here?
Oh my goodness, how cool is that? our
next model. By just tweaking one little
thing, by just adding an extra hidden
layer here, that's all the change that
we made. We've basically cut our MAE and
our loss in half. So,
let's remind ourselves
of the data. So, we have X Y.
Remember the little prediction test that
we've been trying to make is if we had
another x over here which was 17 just
increasing in the same way that these
numbers are increasing. Let's see let's
try to make a prediction. So we want
model.predict
and we want to make a prediction on
number 17 to see what the y value might
have been.
Oh okay. So, it should be 27, but now
we're seeing it's 31.2.
Huh. If we come up here, it seems that
our previous model did better. It got
29.7, which is closer to the actual
value. It should be 27 because y = x +
10. Hm. I wonder why that is. Even
though our loss and MAE are lower,
what it might be doing is our model is
overfitting, meaning it's learning the
training data too well. So, it's
learning the patterns between X and Y
far too well. So, when it sees a new X,
it's just relating it back to what it
knows. and the error that it's producing
during training is not a really valid
representation of what it's actually
doing. See, the real way we evaluate our
machine learning models is not the
metrics it gives us from the training
data. It's the metrics we get from data
it's never seen before.
H. Now if we come back to our improving
a model section, we saw a number of
different concepts here, but let's step
back through them. And I want to
emphasize as we've seen in practice,
not all of them lead to an improvement
of our model. And as we'll see going
forward, sometimes the metrics you see
here during training aren't necessarily
representative of what you'll see for
data the model hasn't seen before. So
we're going to cover that in a little
bit, but I just want to go back and step
through some of the concepts that we've
looked at for improving a deep model. So
the first one is adding layers. We just
saw that we added one layer and we got
improved metrics during training. So
when I say during training, it's when
you call the fit function. However, in
practice, trying to predict on a sample
our model hadn't seen before, the
results weren't as good as when we
didn't add an extra layer. Another way
to improve a model is to increase the
number of hidden units. We also saw that
if we come back to our model here, we
could build another one by going 50
instead of 100. and we see how this
goes. So, we'll train the model. That
experiment ran nice and quick. Again,
we're getting a fairly low loss and a
fairly low mean absolute error, but the
real evaluation is on a sample that the
model hasn't seen before.
And we get an even worse value. Hm. So,
we come back here again. This is just a
great example of how improving a model
or the steps that we're looking at here
don't always result in an improvement of
our model.
Another way is to change the activation
function which we've tried here. So by
default if we have a look at our dock
string. Let's see this in practice
activation.
If we go here get the dock string the
activation is none by default. So what
if we just did that none
and we run what do we get here? Okay so
we get a slightly higher loss value than
before and a slightly higher MAE. But
again, those are just metrics during
training. We want to evaluate our model
on a sample that it hasn't seen.
Oh, we're getting closer. 29.5. Hm,
that's strange. We've reduced the number
of hidden units and taken away an
activation function and we're seeing an
improvement. We come here,
change the optimization function. So, in
this example of a larger model, we're
using the atom optimizer rather than
SGD. So, let's go back to our code. And
how about we change this from SGD to
Adam and we run it again.
Oh, so we get a slightly higher loss
than before compared to using SGD. But
let's remind ourselves of the data.
The training data we already have labels
for. So what if we had a new sample that
was 17? We want to figure out its label.
Make a prediction.
31. Okay, so it's worse again.
Change the learning rate. All right, so
we have Adam and it has a parameter LR.
The learning rate is if the optimizer
tells our model how it should improve.
The learning rate tells it by how much.
So Adam's learning rate by default
command shift enter is 0.001.
So how about we increase this? We'll
increase it by 10. And for this function
here, learning rate can be learning rate
or it can be the abbreviation LR. So
let's run this again.
Oh wow, look at that. Our loss and MAE
are barely even one. So theoretically,
this model should be really good.
Oh, 26.2. That's our best model so far
cuz remember predicting on 17 the ideal
value should be 27. So in our case
adjusting the learning rate of our
optimizer has resulted in the best
change so far. So that's an important
point. I want you to keep these things
in your mind as we go through. And again
if you've only just experienced this for
the first time I'll give you a little
hint. The learning rate is potentially
the most important hyperparameter you
can change on all of your neural
networks. So just keep that in mind
going forward. Don't worry too much
about it now. But just if you want to
take it one note from this slide, just
write down the learning rate is the most
important hyperparameter of many
different neural networks. So let's go
here. So fitting on more data. So in
this case we have X-rain subset and X-RA
full. Well, in our case, we don't
actually have more data than X and Y. So
maybe that's our next experiment to try
is creating a larger data set. So for
now, we only have eight samples. So in
practice, you'll probably have a lot
more samples when you're building your
neural networks. Let's come back. We've
got one more is fitting for longer. Now
this is the first one we tried and it's
actually probably the one of the most
easiest to try because we just adjust
this from 5 to 100. So now we've seen a
few different ways to improve our models
and fit them to the training data. Let's
have a look in the next video. We're
going to have a look at a probably just
as important as fitting a model to data
is evaluating a model's performance. So
we've seen here our test example of
trying to make a prediction on a sample
the model's never seen before. And we
keep getting this different number. But
how exactly do we tell how good our
model's predictions are or how good the
patterns it's learned or the
relationships that it's learned between
X and Y? Let's check that out in the
next video.
[Music]
In the previous video, we checked out a
whole bunch of different ways of how we
can improve our model. We even tried a
few of them, well most of them actually.
We adjusted the number of hidden layers.
We changed the optimizer. We changed the
learning rate. And we saw that the
learning rate actually had probably the
biggest influence on how our model
performed. And we changed the number of
times our models look at the data by
altering the epochs. And in practice,
this is actually a fairly common
workflow. So let's write this down. In
practice,
a typical
workflow you'll go through when building
neural networks is
we'll start off with build a model
and then fit it,
evaluate it, and then tweak a model,
fit it,
evaluate it and then tweak a model,
fit it,
evaluate it. Yeah. Yeah. Yeah. And now
looking at this, you might think, well,
have we actually done this? And the good
news is we have that's just what we went
through in the previous video. We
created a model, we compiled it, and
then we fit it. And then we had a look
at the things that we can tweak in our
models. We tried adding a layer. So we
tweaked our model. We fit it and then we
evaluated it. We tried increasing the
number of hidden units. Then we fit it.
Then we evaluated it. We tried changing
the activation function, the
optimization function, the learning
rate. We didn't try this fitting on more
data, but we're going to see that in
this video or maybe the next one. And we
tried fitting for longer by increasing
the number of epochs. And because you
can alter each of these,
they're referred to as hyperparameters.
Remember, hyperparameter is like a dial
on your neural network that you can
adjust to see how it improves. Whereas a
parameter is usually the patterns a
neural network learn. So these are the
things that we don't code ourselves. So
let's go back. Let's see some other ways
for evaluating our model. We tried by
making a prediction on an example the
model hadn't seen before. But what are
some other options that we have? Well,
when it comes to evaluation,
there are three words you should
memorize.
When we're building models, you want to
experiment, experiment, experiment.
But when we're evaluating models, it's
visualize,
visualize,
visualize.
So that's the most important step when
we're evaluating our models. Now what
should we visualize? So
it's a good idea. And what I mean by
visualize, we'll see in a second
visualize, but you can probably guess it
means to when we're looking at things
like this can be hard to really
understand what's going on. But when we
put them in a way that we can visually
see what's going on. So we might
visualize the data. So ask ourselves
what data are we working with?
What does it look like?
We might also visualize the model
itself.
What does our model look like?
We might visualize the training of a
model. So how does a model perform
while it learns?
And we also might visualize the
predictions. We've done this one of the
model. How do the predictions
of a model line up against the ground
truth
the original labels? We tried this one.
So this is lining up this prediction
here against
what it should actually be. So if we try
to predict on 17, that's our x value. We
know that y = x + 10, so it should be
27.
Let's really dig into these steps here a
bit further by working on a little bit
of a larger problem which is going to
suit this step here. So fitting on more
data. But first we'll need to make a
bigger data set
and we can do that. Let's set up another
x. We'll use TF range to create a range
of numbers between 100,00
and 100 with a step of four. Let's see
what this looks like.
Beautiful. So, we have 50 numbers here
that start from 100 and increase by four
all the way up to 96 because the maximum
was 100. So, 50. How many did we have in
our previous X? Oh, it's not saved up
here. There we go. So, eight. So, we got
five times as much data. That's a good
amount for now. And to do so, we're
going to need to make labels for the
data set. So, y can just equal x + 10.
This is the formula we want our model to
learn. This is a pattern we want our
model to learn. Now, let's have a look
at y just as before. There we go. 90.
Yep. 10 is90. All the way up to 106. So
we have the same shape. So one y value
for every x value. Beautiful. And now
what should we do? Hm. So it's a good
idea to visualize the data. All right.
So let's visualize the data. What we
might do is create a plot. So import
mapplot lib.pipplot
as plt. Google collab completes that for
us. So plt.plot
xy. What does this look like?
Oh, wonderful. How about we get that in
a scatter plot? I think that's a better
plot for this type of data.
Yep, we should have dots there. Now,
before
we trained our model on X and Y, and
then we evaluated it on a sample our
model hadn't seen before.
Now this is a very one of the most
important concepts in machine learning
and deep learning in general and it's
probably better explained using three
sets. So before we get into
visualizing
further into visualizing the data the
model itself the training of a model and
the predictions of a model.
and even further into evaluating our
model. Let's take a look at the concept
of the three sets. Now, if you're
familiar with machine learning, you may
already know what the three sets are.
And so, I would like you to apply your
knowledge of the three sets. And
actually, we're probably only going to
use two sets. So, split X and Y into a
training and a test set only if you're
familiar with it. If not, just go
straight into the next video. If you are
familiar with it, split X and Y into an
80% training and a 20% test set. We'll
see how to do that in the next video.
[Music]
Welcome back. In the last video, we
touched upon the concept of the three
cents. So, let's dig into what I mean by
that. So, we go here. Now, in practice,
in machine learning, you're often not
going to fit and evaluate on the same
data set. And what I mean by that is
what we did up here. We started with X
and Y, and we just fit the model to all
of the data in one hit and then we had
to evaluate on our own custom sample.
Whereas,
when you actually work on machine
learning problems, you're often going to
have three different sets of data.
You're going to have the training set.
So the model learns from this data. Now
this is typically depending on what
problem you're working on typically 70
to 80% of the total
data you have available.
And then we want a validation set. So
the model gets tuned on this data. So
this is where you would tweak different
things. So if we come back up to here
and we built a model, we fit it, we
evaluated it, then we tweaked it. So say
changing the number of hidden layers or
changing the optimizer, you'll often
test how these tweaks affected your
model's performance on the validation
set. So this is
typically 10 to 15%
of the data available. Again, this will
generally depend on how much data you
have. And then finally, you have the
test set. So the model gets evaluated on
this data to test what it has learned.
This set is typically
10 to 15% of the total data available.
So if you had a go at creating your own
training and test set, you might have
split it. I think I said 80 and 20%. So
that's another valid training and test
split. If you are going to drop one of
these sets, usually you'll get rid of
the validation set and you'll only have
a training set and a test set. Again,
we'll depend on how much data you have
will depend on the experiments you're
running, but we've spent enough time
talking about these. Let's get into
coding them up.
So here if you want another analogy we
got the slide here we have three data
sets which is possibly the most
important concept in machine learning is
to have the training set which could be
if you were studying at university your
course materials so the the things you
learned throughout the semester. The
validation set is the practice exam and
the test set is your final exam to
evaluate the knowledge uh you learned
throughout the semester. And what we're
going for here with the three data sets,
the training, validation, and test set,
is generalization. So this is the ideal
state we want our machine learning model
or deep learning model to be in. Is the
ability for a machine learning model to
perform well on data it hasn't seen
before. So if you've learned well the
course materials throughout the
university semester,
you should be able to perform well on
the final exam. So something you haven't
seen before if you've learned them well.
Now this is exactly what we want for our
machine learning models to do is we want
them to learn patterns on the training
set so that it can perform well on
samples that it has never seen before.
In other words, perform well on its
final exam. So let's go back. Let's see
how we might code these in practice.
So check the length of how many samples
we have. So len x this is the data we're
working with. We created this just
before by x is tf range to 100.
Wonderful. Now knowing this, we're going
to skip our validation set for now.
We'll see it later. But because we only
have a relatively small sample, 50 is
pretty small in the world of deep
learning. What's a large one? Well,
again, I would say minimum 100 plus
samples for deep learning, but we'll see
in future videos different sizes of
samples. Now, the training set is
typically 70 to 80% of the total data we
have available. Why don't we use 80? And
that means we're going to create a test
set. So, in other words, the final exam
of 20% of the total data available. So,
how might we do that? Well, we can go
here, split the data into train and test
sets.
X train you'll often see or this is
going to be the notation for a training
data set throughout the course is just
with the train tag after this is very
common in practice. Sometimes you'll see
something like train data equals this.
They usually just mean similar things.
And the same for X test equals or test
data equals. So just going forward if
you see different variable names for
training and test data, they're often
the same thing. I mean, sorry, different
ways of representing the same thing. And
then we'll go X test. So this will be
the training data. So we want the first
because we're working with 50 samples.
We want the first 40 training samples.
This is 80% of the data. Wonderful. And
then X test is going to be the last 10
samples.
So last 10 are testing samples,
20% of the data. Beautiful. And
here we'll create Y train as well just
in the same format.
And we'll also create Y train
here. Oh, we got a space there. Oh, we
got a space there too. Y 40.
Wonderful. And we can have a look at
length of X train and
len X test. So this should be 40 and 10
respectively. Beautiful. So what we've
done is we've created a training data
set and a test data set. We've also got
Y train which are the training labels
and Y test which are the test labels.
Y test is not defined.
There we go. See, we're catching errors
together. Beautiful. So, we've got
training features, testing features,
training labels, testing labels. Now,
what should we do? How about we go back
to what our steps are in evaluating?
So, when it comes to evaluating, we
should remember three words. Visualize,
visualize, visualize. Wonderful. That's
our data evaluation motto. It's a good
idea to visualize the data. All right.
Well, we did that before. Before we
split our data into training and test,
but now let's visualize our data split
into training and test samples. So, how
about we go here visualizing the data
and we'll go now we've got our data in
training and test sets.
Let's visualize it again.
Now, if you're wondering what I'm
putting here, when I write text in uh
Collab notebooks or in Jupyter
notebooks, depending on what you're
using, I often write like text cells.
These are just kind of notes to myself
or if I had to share this notebook with
someone else.
It's information for so they know what's
going on. Kind of like a comment, but
just in a in a text format that's easier
to understand. So, that was a little bit
of an aside. Let's write some code. So
we want to set up a map plot lib figure
um because this time we're going to be
plotting two samples of data our
training and test. So 107 is my favorite
plot size. So plot training data in
what's a good color blue plot
scatter. We're going to go x train
first. So we need to plot x train and y
train. We can use the C parameter of
plot to set as B for blue. And then
we're going to label it with
training data.
Wonderful. And now we're going to plot
test data in my favorite color, green.
Plot scatter. X test Y test. We're going
to set the color to G for green. And
we'll set a label as testing data.
Beautiful. And we want to show a legend
so we can tell the different data apart.
And we'll put this little uh semicolon
at the end so we don't get the mattplot
lib output. Shift and enter. What does
this look like? Ho ho. There we go.
We've got a very similar plot to just
above. So this one here, except now we
have our testing data, sorry, training
data in blue and our testing data in
green. So anytime you can visualize your
data, your model, your anything, it's
it's a good idea. So that way it's much
easier to understand. Well, in my case,
I find looking at something like this
much easier to understand than looking
at something like this, all these
numbers in a in a tensor. So in some
cases, you won't be able to plot your
data if you've got more dimensions than
just X and Y. Sorry, X there, Y there.
Now, what are we going to do? referring
back to our concept of three data sets.
In our case, we haven't got a validation
set. We've got a training set which is
the blue and the test set which is the
green. So what we want to do is build a
neural network to take in
the training data here to learn the
relationship between X and Y. And then
we want to if we want to uh feed in we
want our model to learn the relationship
in the training data so that it can
predict our test data. So if we fed it
in the x values of our test data,
we want our model to be able to predict
the y values. So where this green line
should be. So now we've visualized our
data in our training and test sets.
Let's build a model in the next video to
figure out the patterns in the training
data so we can make predictions on the
testing data.
[Music]
Welcome back. In the last video, we
split our data into training and test
sets. And then we had a go at Did I
spell that right? Nope. Again, another
uh spelling mistake, plenty of them
throughout the course. We had a go at
visualizing our data, more specifically
comparing our training data to the
testing data. And we want our model to
learn on the training data. And we want
our model to be able to predict the
testing data. In other words, given X,
what's the Y value? So let's have a look
at let's
have a look at how to build a neural
network for our data. Now we've actually
already done this.
So if we scroll back up, we built a
neural network very similar to what we
need right up here. So let's recreate
something like this. Nice and simple
neural network. One layer and we'll
create it, compile it, and fit it. We
come back down.
Beautiful. So, step one, I told you
we're going to get lots of practice
creating a model. Create a model. So,
model equals TF carers sequential.
And then we'll put in one layer. TF
caras layers
dense one hidden unit because remember
where our X and Y vectors
or tenses we're using one X value to
predict one Y value. Hence why our dense
layer has one hidden unit.
Two, we're going to compile the model.
Model.compile.
We need to set a loss function. TF
carers losses.mmael
and we're going to set an optimizer
which is telling our model how it should
improve
optimizers
SGD stoastic gradient descent and we're
going to set the metrics to be mae
wonderful and number three is fit the
model we're going to do model fit
on x train this time is different we
want to fit on the training data only
for let's go 100 epochs.
So notice here the important point we're
fitting on the training data which is
this blue line here. So we want our
model to learn the patterns in this
training data to be able to predict the
patterns in the test data. So
before we fit the model
I'm going to comment out this line and
I'm going to hit shift and enter. I'm
going to instantiate our model here now
because what I want us to have a look at
is visualizing the model.
So we can get an idea of what our model
looks like before we've even run it by
running
model summary.
Ah, what happened here? A value error.
This model has not yet been built. Oh,
why is that? Build the model first by
calling build or calling fit with some
data or specify an input shape argument
in the first layers for an automatic
build. Now, you might see why I
commented out this line. I did this on
purpose because I wanted you to see this
error. So, build the model first by
calling build. So, we could go
model.build
and we see what that might do. So builds
the model based on input shapes
received. We could define the input
shape there. So that's one option.
That's the first one. You could try that
out for yourself. Or we could specify an
input shape argument in the first layers
for an automatic build.
So with that in mind, let's see how we
might let's create a model which builds
automatically
by defining the input shape argument
in the first layer. Let's see what that
might look like because this is
something you'll do quite often in
practice is defining the input shape.
Usually
your neural networks can determine the
input shape. So this is what I mean by
input shape. is this parameter here,
input shape. Usually, they can figure it
out, the input shape on their own.
However, sometimes you'll need to
manually define it depending on what
problem you're working on. So, let's see
how we might do that. So, we'll set the
random seed. So, TF random set seed for
as much reproducibility as we can. So,
we'll create a model. This is just the
same as above.
And we're going to go model equals TF
carers
sequential.
Remember a sequential model just runs
from top to bottom.
We're going to go TF carers layers dense
one. Now here's what we're going to do.
The input shape argument. We need to
define that. So how might we find the
input shape? Remember what are we trying
to do? We're trying to predict Y based
on X. So what is the shape of the data
that we're passing our model
x dotshape
50 but we want just one sample of x
dotshape
hm it's a scalar value so what if we do
what is x0 and y z what do they look
like again
they're just one number so in our case
the input shape will be one because
we're passing it one number
to predict one number.
So there we go. We've specified the
input shape argument. Now again the
shape might be different depending on
the input tensor you're passing in. You
might have three different variables. So
you could pass a input shape as three.
But for our case we have one input for
one output. That's what we're after. Now
we might go number two is compile the
model.compile
Compile loss equals TF carers losses MAE
optimizer equals TF carers optimizers
do SGD stocastic gradient descent
metrics equals
mean absolute error. So if we run this
just is the same model we've created
before. This is also the same as above.
Now, let's check out our model dot
summary.
Whoa. Okay, we've got a few things going
on here. So, calling dots summary on our
model shows us the layers that it
contains, the output shape and the
number of parameters of each layer. So,
the output shape here is remember
we want one input for one output. So the
output of one that makes sense. The
layer here is a type dense. So another
word for dense is fully connected. So if
we go here fully connected
layer
images.
What this means in a fully connected
layer is that all of the neurons here
connected to all of the neurons in the
next layer. So that's what fully
connected means. And in TensorFlow, a
fully connected layer when you see that
is the same as a dense layer. So dense
is just another word for dense
connections. If you see all those
connections there
and then we've got parameter numbers
here. Now there's a few different things
here. We've got total params, trainable
params, non-trainable params. So let's
define what each of these are. So the
total params is total number as you
might have guessed of parameters in the
model. These are the patterns that the
model is going to learn. So remember
when we create had a look at our
overview of a neural network. It creates
tenses of different values. So patterns,
the total number of parameters are how
many different patterns our model is
going to try and learn within
our
the relationship between where's our X
and Y data here. The relationship
between our X and Y data.
So we come down
and the trainable parameters.
These are the parameters
the patterns
the model can update as it trains. So in
our case
the total number of parameters here two
is equal to the trainable parameters.
That means all the parameters in the
model are trainable. So they can be
updated. You might be wondering when is
ever total params different to trainable
params and different to non-trainable
params. Well, when we later on in the
course when we import a model that is
already learned patterns in data, what
we might do is freeze those learned
patterns. So in that case, it might have
a whole bunch of non-trainable
parameters because we want that model
that is already learned on data to keep
its existing patterns. We just want to
train a few parameters and apply it to
our own problem. So let's write this
down
non trainable params.
So these parameters
aren't updated during training.
This is typical
as we discussed when you bring in
already learned patterns or parameters
from other models during
transfer learning. We haven't covered
transfer learning yet but we will in a
future video. So this is a little bit of
an overview of what you get of calling
dots summary. Now, if you want to have a
a look at what the actual parameters are
in a dense layer, you're going to
probably find something called a weights
matrix and a bias vector. So, if we go
here, neural
network weights
and biases. Here we go.
Fundamentals of neural networks on
weights and biases. So if we look at
this
basic neural network structure. So
within our hidden layer here, we've got
hidden layer 1, hidden layer 2, hidden
layer 3. We have a whole bunch of
different parameters. Where's the
weights and biases metric?
There's a lot of good stuff going on
here. Weights and biases.
This is the type of research that you're
going to have to do. H maybe that's not
the best article, but Weights and Biases
is a great website. Here we go. Neural
networks, biases, and weights. We'll try
another one.
Or I got a better question. So, here's
an example of how to do better research
is what were we actually looking for?
Now I I gave away there that it's a
weights matrix and a bias vector. But
what were we actually looking for to
begin with? We don't know what the
trainable params are. So if we search
what are the trainable
params in
a neural network
learnable parameters. Here we go.
Trainable params. Much better question.
So if we come back here, this is an
example of how to rephrase what you're
looking for to find out in a better way.
So if we come here, learnable
parameters, carers, Python deep
learning, trainable params, there's a
video there.
Have we got here? Model summary. There
we go. Trainable params. This model has
far more. 2515.
There we go. If you recall from that
episode in our first convolutional
layer, we haven't looked at
convolutional layers yet. giving us a
total of 2,515 learnable parameters.
I'll leave you to to do some some more
research here, but that's a great way to
phrase a question.
Another I have an external resource for
you here while we're at it. So, this is
one of my favorites. We'll create the
resource emoji
resource.
If you want to learn what the actual
parameters are here that's going on,
remember, we're focused on writing this
code. If you want to learn what's going
on in the background, I recommend this
resource one. You could ask a question
like this to Google or you could go to
this resource for a more in-depth
overview of the trainable
parameters within a layer. So, this will
be one of the extra curriculum for this
section. Check out MIT's
introduction to deep learning video. So,
we go MIT introduction to deep learning.
And there's a full-blown course. Here we
go. Introduction to deep learning. Intro
to deeplearning.com. I'll leave this
video, by the way, in the resources
section.
So, intro to deep learning. Oh, they
might be uploading it.
Oh, it's a 2021.
Look at that. So, this is the video I'm
talking about. By the time you watch
this course, they may have the 2021
version. I will share the most relevant
one to the course. So, this is intro to
deep learning. If you want to learn
what's going on behind the code. So, the
trainable params here, you can find out
in that video. Otherwise, I got an
exercise for you before the next video
in this course. Oh, dancing emoji. You
can dance if you want, but here's your
exercise.
Exercise.
So, try playing around with the number
of hidden units in the dense layer and
then see how that affects the number of
parameters
total and trainable by calling
model summary. I'll give you a demo.
So if we were to change this from one to
three, shift and enter and then hit
model summary. Look what changes three
H. So that gives me an inkling that
there's two trainable parameters per
hidden unit.
That's all I'm going to let you know for
now because I want you to have a go at
uh going through an external resource as
well as adjusting the hidden units on
your own and seeing what comes out. For
now, all you need to do is think about
these parameters as they are learnable
patterns in the data we're working with.
So with that being said, let's fit our
model to the training data. Let's fit
our model to the training data. We can
do this by going model.fit. We commented
this out before actually.
So just the exact same thing here.
We can call that here. X- train Y train
epoch equals 100 and I'm going to set a
little parameter here called verbose
equals zero remember how can we check
what that does
verbose equals 1 by default what does
verbose do here what does it say epoch
verbose 01 or two for verosity mode zero
equals silent so that means there will
be no output uh progress bar one that's
default note that the progress bar is
not particularly useful when logged to a
file. So verbose equals 2 is recommended
when not running interactively.
So verbose equals zero. Oh, what
happened there? We didn't even get any
outputs
because we set verbose to zero. If we
set it to one, we can watch our model
train. Look at that. We'll set it back
to zero.
Now remember, because we're running this
continually, every time we call fit,
it's going to fit for an extra 100
epochs. So we've actually just fit our
model for 200 total epochs. Now, if we
call that again, that's 300 total epochs
cuz we've called 100 three times. To
reset that, we'd have to go up here,
reinstantiate our model, get the
summary.
I might change this back to one.
model summary and then we're going to
fit it.
And then we're going to leave it at that
and we'll continue on in the next video.
[Music]
Welcome back. We finished the last video
by fitting our model to the training
data for 100 epochs. We set verose equal
to zero so we don't get any output. Now
I did also set a little exercise here.
Try playing around with the number of
hidden units in the dense layer and see
how that affects the number of
parameters total and trainable by
calling model. summary. So let's try
that out. If we call or we'll just get a
summary of our model. So model do
summary.
Okay. And notice this number here
continually increases because we've
created a total of 14 sequential models
so far at least in this collab instance.
So as I said before when you first start
writing models you may not get familiar.
By the time this number approaches 100
plus we've created 100 plus sequential
models maybe we'll be pretty familiar
with what's going on. And we had a look
at the total params, trainable params,
and non-trainable params, which are the
patterns in our neural network and the
parameter numbers here per layer. And if
you wanted to learn a little bit more
about what they actually are, we got
this extracurricular resource here. So
MIT's introduction to deep learning
video. Now, let's do a little exercise.
If we were to change this to 10,
we'll get model summary. Okay, number of
parameters equals to 20. If we fit our
model,
takes almost no time at all. Wonderful.
We get a summary of our model.
Notice here that sequential went from 14
to 15 and dense went from 19 to 20
because we instantiated a new model
here. And then we've got a different
output shape and a different number of
parameters. So because we've got 10
hidden units in our dense layer, there
seems to be two trainable parameters per
dense hidden unit. Now there's one more
way we can visualize our model and
that's using the plot model function
from
carers utility utils or from
tensorflow.caras.utils.
Let's import plot model and then we can
go plot model and we'll check the the
dock string here. Is it going to come
up? We'll just error it out. So
sometimes when you write an import
statement and the function hasn't quite
been import yet, checking out the dock
string by pressing command shift enter
or control shift enter on Windows might
not come up until you've actually
imported that function. So that's why I
just ran it. It's not going to to run
anything. It's going to error out
because it's missing one required
positional argument. But that's all
right because we can check it here.
Convert a car model to dot format and
save to a file. Okay. What's the
example? TF.ARS.utils.plot
model. Let's have a look at what it
looks like. Model
equals model.
Ah, okay. So we've got an input layer.
It's going to pass that to our dense
layer. All right. So if we go show
shapes.
So if we look at what this parameter
does, def plot model two file. If we
wanted to save this as an image, we can
plot it to uh sorry, we can set the two
file parameter. But if we want to see
the shapes of our model, show shapes is
by default set to false. So we want this
to be true. Remember, a lot of the time
we'll spend with our neural networks
making sure our input and output shapes
are correct. So in our case, we have a
dense input layer with an input of one
and a dense output layer with an output
of 10. Hm. Although this type of model
here is relatively simple, this plot
model function is going to be very handy
later on when we start to create more
complex models with more hidden layers.
So we see here we've defined the input
shape as one. So that's why we have an
input shape as one and our output shape
is 10 because we have 10 hidden units in
our dense layer. Now, how might this
change if we are created tfkaras
layers dense one and I'm going to name
it output layer
and this can be name equals
input layer and I'm going to name the
whole model
as name one of many models we're going
to build.
All right, model summary. Now, notice
how this changed. Model is one of many
models we're going to build. Of course,
you can name it something more specific
to your problem, but you also see how
we've changed the layer names. So,
before that was like dense 20 or
something like that. We've now got a
layer with the input layer name and
we've now got a layer with the output
layer name. Now this is very helpful if
you have models with say 20 or even five
layers. It's confusing if you're not
sure which layer is which. So then we
fit our model
one of many layers we're going to build.
Oh the value error. So we can't actually
call uh our model that. So that's an
invalid name. So let's just call it
model one.
model summary
fit the model.
Um, that worked. Wonderful. Model
summary again. There we go. Model one
layer input layer layer output layer.
And let's see how this updates. Plot
model. Beautiful. That makes a lot more
sense. So the input layer input is one.
Then we've got an input layer with a
hidden size of 10. Then we've got an
output layer of one. So the input this
layer takes is 10 and the output it
gives is one. Now again looking at this
might seem quite confusing but as I said
as we go on as we start to build more
models as we start to visualize them
it's going to start to make more sense.
So with that being said, let's uh stick
with our theme of visualize, visualize,
visualize. And in the next video, let's
check out how we might visualizing our
models predictions.
So have a go at creating
your own different models. Maybe add
some hidden layers here. Give them
different names. Change the number of
hidden units you have here. And then fit
it to the data. get a summary, see how
high you can get this number of
trainable parameters, and then plot the
model here using the plot model utility.
And I'll see you in the next video.
[Music]
In the last video, we saw how we could
visualize the different layers in our
model and also get a summary of the
layers in our model which included the
number of total parameters, the
trainable parameters and the
non-trainable parameters. Now let's have
a look at how we might visualize our
model's predictions so we can further
evaluate how our model is performing.
And so to visualize
predictions, if I could type correctly,
it's a good idea to plot them against
the ground truth
labels.
So in practice you'll often see this in
the form of something like Y test or Y
true
versus Y prreds or Y prred whatever you
want to call them where it's this is the
ground truth versus
your models predictions.
And so to do this we'll first have to
make some predictions. So we've got a
trained model. So let's look at how
we'll make some predictions to create Y
prred.
Now it's important here that Y test is
usually the nomene you'll see for the
the test labels. Same with Y true. And Y
prred is also very common nature like so
a variable name for when you're making
predictions with your model or Y prreds
whichever you want to do. So
model.predict predict and we're going to
predict on the test data set. So let's
have a look. Y prred see what our model
comes up with. Okay, wonderful. We get a
tensor in the same format as
Y test.
There we go. So here are the ground
truth labels
and here are our model's predictions. So
in an ideal world, these would be the
exact same numbers. So if our model
learned the data perfectly and could
predict the test data set 100%.
These numbers would line up with these
numbers. So rather than going through
them one by one and comparing let's see
how we'll visualize them. So maybe we
want to how could we do this? How about
we build a plotting function to to
figure this out? This is in case we
wanted to, we're building a function
because if we wanted to visualize our
predictions going forward, we're
probably going to reuse this plotting
function. So,
let's create a plotting function. And
this is actually I'll put this down as a
tidbit because I want you to remember
something like this. We'll put a little
key here. Oh, where'd my key emoji go?
That's what we want. So, note
This is just a Python concept in
general, too. If you feel like
you're going to reuse some kind of
functionality
in the future, it's a good idea to turn
it into a function.
Nice and simple concept there. So if we
did want to plot it, so let's go maybe
dev plot predictions
and then it's going to take some
training data
which will by default equal X train.
It'll also take some training labels
which will be Y train by default. We
also want some test data which can be X
test by default. And then what do we
want? Test labels. Yes, we need that. Um
what comes after that? Oh, of course
we'll need the predictions. So the
predictions can be y prred by default.
Wonderful. And we'll give ourselves a
little dock string here. So what can we
call it? plots training data,
test data, and compares predictions to
to ground truth labels. Nice and simple.
We could talk about what the we could
put in the in the dock string what the
different variables are, but I'll leave
that up to you. And so, we want to start
by creating a figure.
We'll set the fig size to my favorite
size, 107. also a great hand in poker
plot training data in blue. So that
we're just basically taking the same
plotting code as above. So this is where
this is coming from. I'm not just
pulling this out of the air. We're just
functionizing this. You could just copy
just copy this and bring it down. But
since we're in the habit of writing code
rather than copying pasting code, let's
uh get some practice. So we want a
scatter plot here. and we're going to do
train data
and train labels.
So, we're just taking this train data,
train labels, and we're going to put the
C into blue and we'll give it a label of
training data.
Wonderful. And then we'll go plot
testing data in green
plt. We want a scatter plot. Again, the
plots will differ here based on what
kind of data you're working with. So,
scatter is is pretty good for the
problem that we're working on here. Just
a nice and simple
regression. So, label equals testing
data.
Beautiful. And here's where we're going
to add the extra dimension or add the
extra data set. We're going to plot
models
predictions in red. Nice and simple
color scheme. Plot scatter. And we want
to compare them to the test data. So
remember, we're passing our model to
make the predictions. We passed it the
X. So we've already our predictions
already have the X data built in and
they're predicting the Y value. So for
our scatter plot for the predictions,
it's going to on the x-axis plot the
test data which is x test and then on
the y-axis we want to plot the
predictions. We'll give that a nice
color of red. Beautiful. And the label
can be predictions.
And of course we want to show the legend
just in case we know the color scheme,
but in case we wanted to share this plot
with someone else, show them how
beautiful our model is going. That
should work. So, plot predictions, train
data, X-rain.
I mean, we could we could look at this
all we want, but hopefully Python will
tell us if we've got something wrong
here. Now, we're going to go plot
predictions. We could almost just call
this as it is if all the variables are
set up. Plot predictions.
Boom. Oh, look at that. How good. That's
some good predictions there. And now you
see how this is at least to me. I'm not
sure you can agree or disagree if you
want, but to me that's something to to
really visualize how well our model is
performing. So ideally these red dots
would line up perfectly with the green
dots. But to me that looks like
something, hey, if I wanted to present
this to someone else, they could pretty
quickly pick up what's going on in terms
of how good our model is. Now, of
course, depending on the scale of this
this graph here, this distance here
could actually be something that's way
off. Now, that is where we'd want to
actually compare these numbers in a
different way. So, I might just fill out
this so we know what's going on. So,
train data equals X train. We don't have
to do this, but this is just for
completeness. Train labels equals Y
train. Test data equals X test. And then
we go test labels equals Y test. And
then finally we finish the predictions
with y prred.
Wonderful. All right. So from the plot
we can see that our model's predictions
aren't totally outlandish. However, as
we said before the distance here between
these two depending on the scale could
be a fairly large error. So the way we
can figure this out is by with some
evaluation metrics. So let's cover that
in the next video. So evaluating our
models predictions with
regression evaluation metrics.
So maybe before the next video you can
have a play around try and see if you
can get that red line closer to the
green line. And how you would do that is
you would try to improve our model. So
you go back up here, maybe add an extra
layer, maybe change the optimizer, maybe
fit it for longer, and see how close you
can get the red line to the green line.
Maybe you can get them to fully overlap.
But otherwise, I'll see you in the next
video.
[Music]
Welcome back. Last video we saw how we
could evaluate our model's performance
by visualizing the predictions. Now
that's one great way to do it. But the
next best way or you could actually say
these are on par depending on what kind
of problem you're working on is to
evaluate our model with evaluation
metrics. Now depending on the problem
you're working on,
there will be different
evaluation metrics to evaluate
your model's performance.
And so since we're
working on a regression problem, two of
the main metrics you'll see now again
there there are plenty you can look
these up. However, two of the main ones
you're going to run into are MAE, which
is mean absolute error. We've been using
this one so far, which is basically
saying on average, how wrong is each of
my models predictions.
And then you have MSE, which is mean
square error, which is very similar to
mean absolute error. However, you square
the average errors. square the average
errors
and find out. Yeah. So take the errors
from your model's predictions, square
them and then find the the average. So
that being said, let's uh before we
implement these with TensorFlow, let's
have a look at the keynote. Here we have
a slide for some common regression
evaluation metrics. So let's start off
with the mean absolute error, MAE.
Here's the formula here written in fancy
mathematical notation. Now, this is just
saying these are our labels here and x i
which could also be represented by y
with a little hat. We'll see that in a
second. Depends on where you find this
formula. I think I just found this one
on Wikipedia. What this is saying is
these are our labels and these are our
predictions. So across all of this when
you see this, this means sum of this
little uh fancy looking e here. I think
it's Greek symbol for sigma. Now I want
to give you a little uh tidbit here for
math. So math kind of gets a bad rap but
it's actually a beautiful way of
representing nature. So when you see
these symbols and I know when I first
started in deep learning machine
learning I saw Greek symbols like this
and it would it would freak me out but
it's it's basically the same thing as
learning code. So it wasn't until I
started to replicate these formulas in
writing actual code that I really
started to understand them more. And so
when you see this this is sum of n
starting from the first sample. So
across all of our samples minus the so
take the label minus the prediction get
the absolute value of that meaning that
if the if the label is say 10 and this
is 20 so this number would be - 10 but
we want the absolute value. So it's
going to be remember if we did tf.abs
which is absolute it would be positive
10. And then divide it by all of the
predictions that you have. So all of the
uh samples that you're making
predictions of and when you see this in
combination so this sigma notation here
with n in combination with the divide by
n sign here we'll see another form of it
in a second. This is kind of like a
fancy way of writing average and this is
the absolute error. And so we can do
this in TensorFlow code. We're going to
see this in a second. And when to use
this metric? Well, this is a great
starter metric for any regression
problem. It's very easy to understand
because it's just basically saying I
like to think of it as on average how
wrong are our model's predictions. So
then we go to the mean square error. So
again this is a this is a very similar
way this little setup here of writing
the exact same thing as you see here. So
when you see one on n in combination of
sigma with an n on top, this is another
fancy way of saying mean. And this here
is another very similar way of writing
take our labels and minus the
prediction. When you see a y like this
with a little hat on top, this is often
standing for remember up in our
we created y prred.
So this y with a hat is means y prred or
y predictions and we square it. So see
here we took the absolute value in this
one we just square it with this little
two here and we can write tensorflow
code to do that. And when should you use
this one? Well this is a great metric to
use when larger errors are more
significant than smaller errors. Why?
Because of this little square here. So
if you've got a model and uh making an
error of say an absolute error of 10 is
okay, but an absolute error of 100 is
just a catastrophe, you might want to
use MSE because it's going to amplify
the error value for larger errors or
larger values because of this little
square here. And then finally, we've got
Huba, which has a little bit more
complicated formula. We won't dig too
far into that but again we can write
this in TensorFlow code but Huba
basically takes the combination of MSE
and MAE and it's less sensitive to
outliers than MSE. So mean squared error
here. So with that being said these are
some common regression evaluation
metrics. Take note of this slide. You'll
see these in practice. There are many
more. However, let's get hands-on and
start writing them with code.
We'll come down here. All right. So, how
might we start? So, if we wanted to
evaluate our model using evaluation
metrics, one quick way that we can do
that is evaluate the model on the test
set. We've got our train model here. We
can go model.evaluate
X test Y test because if we pass the
test data set here, remember we want to
train on the the training data set,
evaluate on the test or validation data
set. So we come here check the dock
string returns the loss value and
metrics values for the model in the test
in test mode. So if we run this what's
it going to give us comes in the order
of loss and then evaluation metric. So
where did this MAE come from? So if we
come back up here when we created our
model remember when we compiled it we
set metrics equals MAE. We also set loss
to be MAE. So for our model in this
case, because the loss and the metrics
are the same, the evaluate method is
going to return the same figure.
So there we go. Now that's one quick way
to start evaluating it. How about if we
wanted to we wanted to run one of these
evaluation metrics on their own by not
calling the evaluate method. We just
want to take our compare the prediction
array that we have Y prred to Y test.
Let's see how we might do that. So if we
come up here, we want another cell and
let's go calculate the mean square
error. Actually, let's do absolute error
first because that's what we have
already. Now before I do this, I want to
issue you a quick challenge. I want you
to see if you can figure out, we've got
this slide here. We've got the
TensorFlow code for how to calculate the
mean absolute error. I want you to see
if you can use one of these to calculate
the same value as we've got here.
So, you're going to have to compare Y
prred to Y test. So probably try type
out the code here and then read its dock
string and see how you might use these
values to calculate the mean absolute
error. So give that a go and I'll see
you in the next video and then we'll go
through it together.
[Music]
How'd you go? Did you manage to
calculate the mean absolute error? If
you had a go and you figured it out,
well done. If not, let's see how we
might do it. So, if we come back to the
slide, we got the code here. TF.metrics
mean absolute error. Maybe we'll try
this one cuz we've already seen this in
our loss function. So, let's see what
this is. We'll start exploring the
TensorFlow library. So, TF.metrics
dot mean absolute error. There we go.
Oh, here we go. Computes the mean
absolute error between labels and
predictions. Beautiful. That's what we
want. So that's the function that it
actually implements. So loss equals mean
absolute y true minus y prred.
Beautiful. Now how might we use this?
Okay. Loss equals tf.karas
dot losses mean absolute error y true
minus y prred. Now metrics dome absolute
error looks like it's the same as
tf.caras.losses.
Aabsolute error. Well, I mean, it should
be because it's the same formula, right?
Now, y true, yred. Do we have y true? We
don't have y true, but we do have y
test. So, let's type that in. Y test and
y prred.
And actually, we might save this. So,
mae equals that.
And we'll set up the
parameter names. Y true= Y test and Y
prred equals Y prred.
See what happens.
Oh, that's interesting.
Now, MAE here gave us one overall
metric, whereas this
is showing us we've got a metric for
each of our test labels and predictions.
Why might that be? Well, we've actually
got our predictions here and our test
labels here. So, what if we did Y test
minus Y prred? What happens there? Do we
get the same output? Nope. We get a very
strange output there. H, what if we did
YR minus Y test?
H, we don't get the same output there.
What's going on here? Let's have a look.
Y prred Y test now is it because this is
this is not a tensor maybe Y prred let's
turn this into a tensor and see what
happens so TF constant
Y prred cuz if we do that have a look
here this is what happens tf constant Y
bread
okay so now that's in in a tensor format
yep that's what we want this is in a
tensor format beautiful this should work
we go Yeah.
Oh, same output as before. Hm. Let's
have another look. What's the difference
between these two? We'll go through it
step by step. TF tensor shape= 10 1 DT
type equals float 32. Now, this is TF
tensor shape= 10,
ah,
that might be where we're I see. So,
they're not in the same shape. Now, this
is a a very important tidbit. So if we
ever want to compare different tenses,
remember how we did in in a previous
video, we had to reshape our tenses so
that they could be in the same format or
the right shape to do a dot product.
Well, this is very similar when we're
doing or running evaluation metrics. A
lot of the time, TensorFlow is very
smart, but it can't interpolate the fact
that our yred tensor actually has an
extra dimension here compared to our Y
test tensor. So do you remember how we
can remove the one dimension here of our
y prred tensor?
If not that's perfectly fine. Let's
revisit the squeeze method. So if we go
tf.squeeze
and then pass in y prred, what happens?
Ah there we go. Much better. Gets rid of
that one dimension there. So now our
tenses y prred and y test are of the
same shape. So let's let's redo this.
Calculate the mean absolute error. We
could copy and paste that, but we're
going to practice rewriting it. So
calculate the mean absolute error. So
let's see what happens if we go uh
MAELL= TF metrics
do mean absolute error. Yes, that's what
we want. Y true equals Y test. We're
comparing. And again, this is another
thing for all types of um evaluation
metrics. It usually involves comparing
the Y true label. So the true labels
versus predictions.
MAE. Oh, we forgot to TF squeeze our
predictions.
Shift and enter. Wonderful. Look at
that. How beautiful. So, we get the same
result as
our evaluate function up here.
Nice. Remember how I said a lot of the
time you'll be spending your your
efforts reshaping tenses to make sure
that they're in the right shape or the
right format for whatever function
you're trying to work with. Now, we've
done the mean absolute error. How about
we try the mean squared error? So, I'm
going to leave this as a challenge for
you. Calculate the mean square error.
So just as we done before,
we've got the formula here, mean square
error.
Try out this function here and see if
you can calculate the mean square error
before the next video.
[Music]
How'd you go? Did you manage to
calculate the mean square error? If not,
let's see how we might do that. We'll go
back to our little keynote here. And we
can see here mean square error. MSE is
very similar to mean absolute error. So
TF dometrics do mean square error. Let's
go back here and go MSE equals TF.IX
dot mean square. Oh squared error y true
equals the test labels and y prred
equals our predictions our models
predictions that is mse
run ah
what's going on here we should have one
value ah you know what we didn't do we
didn't remove the single dimension from
our predictions so let's do tf dos
squeeze we're going to go here
and let's see what happens.
There we go. Okay. Now, MSE will be
typically higher than MAE because if we
come back to the formula here, it's
because of this little square here. So,
the errors are typically larger. So, the
resulting singular metric will be larger
generally than MAE. So, if we come back,
when should we use it? Well, MAE is a
fairly intuitive metric. says, "On
average, how wrong are our predictions?"
However, when larger errors are more
significant than smaller errors. So, for
example, if being 100 off is far worse
than being only 10 off, you may want to
pay more attention to the MSE. Now, you
could try Huba, but I'm going to leave
that to your own extension. For now,
since we're going to be running some
modeling experiments in some upcoming
videos, it's probably a good idea to
because we also functionized our
way to visualize our predictions. Let's
functionize. I mean, this is pretty
simple, but let's make a a little
function for mean absolute error and
mean squared error so that we can use
both of them going forward. So let's uh
right here make some functions to reuse
MAE and MSE
def MAE. It's going to take in some Y
true labels and Y
pred labels
and then it's just going to return TF
metrics do mean absolute error Y true
will equal
Y true and Y prred will equal Y prred
nice and simple and then the same thing
for MSE Y true Y prred head equals
return
tf.metrics dot mean
squared error y true equals y true and y
prred equals
y prred. Beautiful. Now do we need to do
that? Not necessarily. However, it's
good to have some helper functions ready
to go for when we run some experiments.
So, let's do that in the next video.
[Music]
So, for the past previous videos, we've
uh made some predictions with a trained
model. We've visualized them. So,
compared our model's predictions to the
test data set. We've also evaluated our
model's predictions with regression
evaluation metrics such as mean absolute
error and mean squared error. Now the
next logical step you might be thinking
is how do we get these error values
lower? So how do we minimize the
difference between our model's
predictions and the test labels? So in
other words, get these red dots to be
closer to the green dots. And so that's
what we're going to tackle in the next
couple of videos. More specifically,
we're going to be running experiments to
improve our model.
Remember our workflow that we've
discussed in a previous video. usually
start by building a model. Fit it,
evaluate it,
tweak it, fit it, evaluate it, tweak it
again, fit it again, and then evaluate
it again,
so on and so on.
So if we come back to our keynote,
if the machine learning explorer's motto
is visualize, visualize, visualize, in
other words, taking a look at our data,
taking a look at our model, taking a
look at our model's training. We haven't
done that, but we will see that in a
future video. Taking a look at our
model's predictions and visualizing
these wherever possible. Well, the
machine learning practitioners motto
because we're we're an explorer and a
practitioner is experiment, experiment,
experiment. And so that's what we're
we're going to do. We're going to try
run a few a series of experiments to see
if we can improve our model following
this workflow. We've already built a
model. We've already fit it. We've
already evaluated it. Now, it's time to
tweak it a little. We'll refit it. We'll
evaluate it again. Tweak it. Fit it.
Evaluate it. So let's see what this
might look like in practice or if we
remember back what are some ways that we
can improve our model. The top three are
probably you'll see get more data. So
get more examples
for your model to train on. In other
words, more opportunities
to learn patterns or relationships
between features and labels.
Number two would be make your model
larger. We've seen this briefly before.
So, in other words, using
a more complex model. That's what you'll
often hear larger models referred to as,
more complex. So this might come in the
form of more layers or more hidden units
in each layer. Remember how we added
some uh some extra layers to our models
before? We also increase the number of
hidden units in each of the layers. And
number three is train for longer. So
give your model
more of a chance to find patterns in the
data. So with these in mind, how about
since we've got our data set already.
We've got X-RA
and Y train. This is our data set. How
might we design? We can't really do get
more data unless we just artificially
make our data sets bigger. So we'll rule
this one out, but we can make our model
larger. So use a more complex model. And
we can train for longer. So let's see.
Let's design some experiments that we
could do. How about we do three
experiments? Let's do three modeling
experiments.
Now, what might you design? How about
model number one?
What could we do for this one? So, how
about we do the same as the original
model,
one layer, but trained for 100 epochs.
Number two could be
model two. This could be two layers
trained for 100 epochs.
And
number three, model 3 can be three
layers
or maybe we won't change that. We'll
change keep it at two layers trained for
500 epochs. So 500 chances to look at
the data.
So you see how we're we're just tweaking
one parameter for each experiment. So
the first one is one layer trained for
100 epochs. Then the second one we're
increasing the the number of layers but
keeping the number of epochs train the
same. Then for the third one, the
difference is between the second model
is the number of layers is the same.
However, we're training it for longer.
This is sort of the mindset I want you
to start getting in when you run your
modeling experiments. Start with say a
baseline model and then change one of
the parameters for your next experiment.
Then do the same for the next experiment
and so on. You might want to actually
I'll issue you a little challenge is
after you've seen us setting up these
experiments, maybe you can create your
own four and five. So you can design
them yourself. That would actually be
some great practice. But let's get
started. So the first one we want to do
is
we'll create this. We're going to go
build model one.
So what did we say? Same as the original
model. One layer trained for 100 epochs.
So we're going to set the random seed
for as much reproducibility as we can
get.
favorite number 42 answer to the
universe. So step one is create the
model.
So model one equals TF Kerros. You could
scroll up and see the first model that
we built or you could just take my word
for it that it was just a singular layer
here. TFARS layers
dance one.
There we go. And what's step number two?
When we're creating a model, we've
created a model. What do we have to do?
We have to compile it. So, let's compile
the model. Remember, I said you're going
to get a lot of practice writing this
sort of code cuz that's what I want.
I've said it before and I'll say it
again. I'd rather you learn concepts
from writing code than from looking at
slides.
So, the optimizer is going to be TFARS
optimizers
dot what do we what did we build the
first model with? Oh, look at that. We
get a a few options here.
Now, there's actually a fair few
optimizers built into TensorFlow. The
main two ones that I regularly use is
Atom and SGD. So, we can go SGD. I'll
let you research the other available
optimizers that TensorFlow has. We'll
set the metrics to MAE.
Now, if we go here, fit the model. Model
one dot fit. We're going to go X-rain Y
train. Now, 100 epochs. That was the
experiment design we were we were going
to set up.
So, should this work?
I think it should. Let's go. Shift and
enter. Ah, look at us. Already modeling
experiments. Experts, sorry. So, there's
our first modeling experiment built.
Now, what might we want to do if we're
running experiments? What does a what
does a scientist do? Well, they track
their results. So, how might we do that?
We've done it before. We created some
functions before. What's the first one
we might use? How about we visualize or
at least make some predictions with our
model and then we visualize them? Great
idea. Make and plot predictions for
model one. So, how do we make
predictions with our trained model? We
can call
or we'll set up the prediction variable.
We can call model one. They'll predict
and we're going to make predictions
remember on the test data because our
model has never seen the test data and
we're interested in how our model
performs on data it hasn't seen before.
That's the the real value of machine
learning. Plot.
We have a function plot predictions.
Look at this. Look at us making our
functions. So we can just easily call
them back again. Predictions. This is
our dock string here. Plots training
data, test data, and compares
predictions to ground truth labels. Now
we hardcoded some of these parameters.
So the only one we have to change is the
predictions variable or predictions
parameter. So predictions is going to be
yreds 1 cuz we just created that yreds
one. Let's see what it looks like. How'
our model one go? Ah,
not too good. So, we can see there's
quite a large difference here between
our testing data and predictions.
Remember, ideally, these would line up
here, but that's that's all in the
nature of experimenting. We're going to
try and improve it. So, what else could
we do? Well, we we also created some
functions to calculate evaluation
metrics.
Thank you past us for creating such easy
to use functions. So create model one
evaluation
metrics. So we want MAE1 for model one
by the way equals MAE
Y test. And then we're going to pass it
in Y preds
1. Beautiful.
And then we're also going to do MSE1
equals MSE Y test Y pres 1. And let's
see what this looks like. MAE1 MSE1.
Oh no, what did we do wrong here?
We forgot to squeeze these variables. TF
squeeze to get them into the same shape.
Now, I guess what we could have done if
we knew that all of our prediction
tenses had to be squeezed.
Remember, what does this do? If we
visualize
yreds
one, if you ever forget what something
does, just create a new code cell and
run it. There we go. That's what it
looks like on its own. And then if we
squeeze it, y pres one,
it removes that one dimension. So we
have to actually turn this into a
constant uh a TensorFlow constant or
tensor to see that shape.
So there we go. As the predictions come
out, they have a one here and it's our
first axis dimension. But then if we
squeeze it, it removes that one
dimension and it turns it into this
tensor. So shape 10. So we can get rid
of that now.
And if we squeeze them, we can run that.
Beautiful. So we see the error is quite
substantially larger than what it was
before. This is the MAE. So the mean
absolute error. So on average each dot
is 18.74
away from where it should be. That's the
mean absolute error. And if we square
the errors, we see the error is even
larger. Now how about we change our MAE
and MSE functions to automatically
squeeze our predictions. So let's go up
here. This is making our functions
better.
So TF squeeze,
we'll go there and TF squeeze and we'll
also go there. Shift and enter. Now if
we come down here, we shouldn't need to
run TF squeeze here anymore.
Get rid of the extra bracket.
Take note of the output here. Make sure
it's the same. There we go. It is the
same. Beautiful. Okay, now that's model
one done. So, what was our second model?
Model two, two layers trained for 100
epochs. Now, we didn't actually specify
how many hidden units. I'm going to let
you decide how many hidden units you
should put into this second layer. So,
give that a try. We'll come down here.
We'll go create a little note build
model 2 and then we go we'll just set
up. So this is two layers
trained for 100 epochs or better say two
dense layers two dense layers also known
as fully connected layers. Now, our
first model had
one hidden unit in its one layer, but
I'll let you decide if we're going to
add a layer to model 2.
So, we'll set up model 2 equals
something. I'll let you decide how many
hidden units there are, and I'm going to
make my decision on how many hidden
units there are. So, give this a try
before the next video. Building model 2.
You could even calculate the model 2
evaluation metrics and plot the results
if you game enough. But otherwise, we'll
go through our second modeling
experiment in the next video.
[Music]
All righty. How'd you go? Did you set up
model 2? Did you run the experiments?
Did it go better than model one? Uh
hopefully these uh these red dots were
closer to the green dots, but if not
that's all right. Let's see how we can
do it. So what's our first step? We
might set
the random seed. TF random
dot set seed for as much reproducibility
as we can get. So model two, what did we
say? Two dense layers trained for 100
epochs. All right. So number one,
create the model. TF carers sequential
and then we're going to come down here.
So this one has two layers. We go TF
carers layers dense. Now how many hidden
units did you decide if you built your
own model? How many did you decide? I'm
going to say 10 for this one. Remember
this is really an arbitrary number. We
could set this to one. We can set this
to 10. We can set this to 100. I mean
most often you'll find values like this
like that are evened out. You'll rarely
find values that are like 67, but you
could try that if you want. So, I'm
going to try 10 TF carers layers dense
and one. There we go. And I'm also going
to
What do we do after we've uh created the
model? What's our next step? I've given
you a little hint here. Compile the
model. Now, we have to do model 2.mpile.
It's going to be the exact same thing.
TF carers.losses
do MAE and then the optimizer is going
to be TF carers optimizers
do SGD just exactly the same before and
the metrics is going to be M let's try
MSE you know just to mix things up
because we're going to calculate our
evaluation metrics anyway. So the only
thing we've changed from our model up
here
is we've added an extra layer.
Everything else is the same except we're
calculating the MSE as it's training.
It's important to note that this metrics
here will be calculated during training.
And if you call say the evaluate method
as we've done before on a trained model.
So we come down
and
what do we have to do once we've
compiled our model? Step three, fit the
model. So model two we're going to fit
on the same data, the training data of
course.
X- train Y train. How many epochs are we
going for? 100 epochs equals 100. You
ready? Let's run our second modeling
experiment.
Boom. Oh
wow. The MSE starts fairly high.
Remember MSE is often much higher than
mean absolute error because of the
squared value. So 1,084. Does it
decrease by the end? Surely it does.
Okay. Finishes on 608. Let's see. Let's
plot our uh we got to make some
predictions. So make and plot
predictions
of model 2. So we'll call this one y
preds 2 equals model 2.predict
on the test data of course. And now
we're going to plot our predictions
using our beautiful plot predictions
function.
And again all we have to change is this
one. Remember, this is so helpful if
we're going to be running the same code
over and over again. It uh it's helpful
to functionize it. We'll go shift and
enter. What does this look like? Oh,
yes.
Very nice.
So, our red dots are a lot closer to the
green dots. Beautiful. Come up here.
Look at this one we did before.
Looks like mud compared to
what we've got here. So now we've got
we've plotted some predictions. Now
let's calculate some evaluation metrics.
So calculate model 2 evaluation metrics.
You might want to pause the video and
and uh just go blaze right ahead and
finish this video yourself. But uh if
you don't want to do that, let's rock
and roll here. MAE2 equals our function
from before y test. And now y pres 2. We
don't have to squeeze them this time
because we've implemented that into our
MAE function. And we can press shift
command enter. Oh, we should have
actually put a dock string of what's
going on here, but that's all right. We
can do that later on for completeness.
And then we want MSE2 equals
MSE Y test. Compare it to our
predictions with the second model. And
MAE2
MSE2. Let's check them out.
Boom. Doesn't that look great? Oh, much
better than before. 3.1 and 13. I I
think this is actually very similar to a
model we've run before. But let's keep
pushing on. We're now up to our third
modeling experiment. So, build model 3.
Now, what was the experiment that we set
up for model 3? We scroll back up. Lucky
we took note of this.
Here we go. Model 3. two layers trained
for 500 epochs. Okay, so the only thing
we're going to change between model two
and model 3, we could actually just copy
and paste this if we wanted to and then
change 100 epochs to 500. So we're
fitting it for longer. However, as
always, we're going to favor writing the
code out again ourselves. So build model
3, it is two layers trained for 500
epochs. Now, you could sit here and
watch me write this out, or you could
just skip ahead and write out the Model
3 code yourself, uh, plot the
predictions and evaluate it using
evaluation metrics, and I could just see
you in the next video. Otherwise, if you
want to stay on board and we will write
this together, let's do it. So, uh,
let's, um, set the random seed. So, TF
random set seed.
And then we go. What's step one? Create
a model. Oh, we are getting some prime
experience here. tfkaras dot sequential.
Open up our list. We come back. What did
we say? Two layers. We're going to keep
it exactly the same as model 2. So, I
believe it was a dense layer with 10
hidden units. Wonderful. And then
another dense layer with one hidden unit
because that's going to be the output
layer. And then we're going to what do
we have to do after we've created a
model? What's our step? Compile the
model. I mean, by the end of this,
you're going to be like, Daniel, you are
a broken record.
But that's all right. If you're writing
TensorFlow code like it's nothing, well,
then I've done my job. Caras losses.
And we're going to go optimizer equals
TF caras optimizers
SGD. Wonderful. Going to set up our
metrics. This time we'll revert back to
MAE just to spice things up. Oh, too
late. Oh well, optimizers. Oh no, we've
made a typo there.
Classic. And now we're going to fit the
model. So model 3 ffit X train
Y train. How many epochs are we doing it
for? What's our third experiment? 500.
Oh, nearly did 5,000. Far out. That
would have been interesting. I mean, you
could do that. That's a challenge for
model 4. Maybe 5,000 epochs. We'll see
how this one performs first. Shift and
enter. There we go. It's going to take a
little bit longer, but not too much
longer cuz it's only a small data set.
But look at that. We can uh get ready to
make and plot some predictions. Oh, what
did it Before we do that, MAE? I would
have expected that to be lower maybe if
we were training for longer. But again
don't uh just trust the training
metrics. Let's uh visualize our models
predictions and run our evaluation
functions. So let's go yreds 3
equals model 3.predict
and we're going to do it on the test
data set. Beautiful. And then our
wonderful plot predictions function.
We're going to set the predictions
variable to be the predictions for our
third modeling experiment. This should
be PRs three. Let's have a look. Oh
golly gosh.
That's even worse than the first model.
That's terrible.
Far out. What's happened here? You know
what? I think our model has trained for
too long. We've we've totally missed the
Goldilock zone of where we should train
it for. So this is a prime example of
how tweaking some
hyperparameters of your model, even ones
that you intuitively think should result
in a better result, actually don't lead
to a better result. And we've actually
just experienced here probably the first
uh first time we've come across our
model overfitting, which is a very
important concept in machine learning,
but we're not going to cover it now.
I'll leave that uh for a future video.
However, if you want to look it up, I
would search overfitting. Essentially,
it means that the model has learned the
training data too well and it doesn't
generalize very well at all to data it
hasn't seen before. So, give that a look
in your in your spare time as a little
extension to this video. But let's uh
let's run our evaluation metrics for
completeness.
So,
calculate model 3 evaluation metrics.
We'll do our MAE3 for model 3 is MAE.
We'll compare the Y test labels to the Y
prreds for the third model. Wonderful.
And then we'll do the MSE3 equals MSE Y
test Y prreds 3. Then we'll have a look
at ME3. And we'll also have a look at
MSE3.
Whoa. Now, this is substantially higher
than our other models that we ran. I
think model 2 was actually the best one.
Now,
what we might do is rather than just
scroll back and forth comparing our
models metrics, like we could look at
that one. Yep, model 2 is the best.
Let's uh let's put them into a more
structured manner. And so, in the next
video, let's look at how we might
comparing the results of our
experiments. So, we've run a few
experiments. Now
let's compare the results.
So in our workflow we've said before
come back to the keynote machine
learning practitioner's motto is
experiment experiment experiment. But as
you might have guessed any budding
practitioner or any budding scientist
doesn't just experiment relentlessly.
They do a few compare them see which one
works and then they disregard the ones
that don't work and move forward with
the ones that do. So, let's see in the
next video how we might compare the
results of our experiments.
[Music]
Welcome back. So, we've run a few
experiments. Now, it's time to compare
the results of our experiments to see
what worked and what didn't. Now, again,
we said we could just scroll back up and
check the numbers, but that's not really
a structured way. like we've only done
three experiments and already it's
getting tedious to just go back through
and see what works and what didn't. So
the beautiful thing is is that we've
saved all of our model results to
different variables. So to compare them,
oh actually wait, I just want to put a
little tidbit here. Something that I
just thought of as I finished the last
lecture
um is note
you want to start with small experiments
and make sure they work. So by small
experiments I mean small models and then
increase their scale when necessary
because what's our motto as a machine
learning practitioner? Experiment,
experiment, experiment. And it's hard to
run a lot of experiments when you're
trying really large experiments all the
time. So start small like we've done up
here with our first model. Model one. We
started with one layer and then we
started to increase the complexity as
our experiments went on. So just keep
that in mind when you're running
experiments. Start small, build up, add
complexity when needed. So let's go down
here. And if that doesn't make sense for
now, we're going to see that a lot
throughout the course. It's going to be
our theme is start small, build up when
needed. Let's see how we can compare our
results.
So how might we do so? Well, how about
we create a pandas data frame which is
just a table so we can compare our
results of a different model. Let's
compare our models results using a
pandas data frame.
So the beautiful thing about collab is
that it's got pandas inbuilt uh import
pandas as pd and then we'll probably set
up our model results as a list of lists
so we can pass it to our
so model one can be m1
mse
and then model 2 can be m2 mse 2 and
then model three model 3 can be
MA3 MSE3
beautiful and then to create the data
frame how about we call it all results
and then we'll go PD data frame
and then the first thing we'll pass it
to is our list of lists model results
and then we can set up the column names
what should we call the columns we have
models MAE MSE that's nice and easy So,
model MAE MSE.
Beautiful. And now, what does this look
like? Is this going to work?
Ah, it worked. But that's a little hard
to read. You know what we can do here?
We can just get the numpy value of all
of these. So, we go numpy. Maybe we
could have uh built this into our
functions up above, but because we're
already here, we'll just do some uh some
hacky code to get it looking nice.
But when you're writing your code,
you're going to make it beautifully
functionalized, aren't you?
Let's see this. There we go. That looks
much better. So from our experiments and
comparing our results in this beautiful
table, I mean maybe if you've run a few
more models down here, you may have
model 4, model 5, model 6, etc., etc. It
looks like model 2 performed the best.
So what was our model 2? Model 2
summary.
So it looks like it had two layers. So
one with 10 hidden neurons and the
output layer with one hidden neuron. And
it was fit for we can come back up here.
We can see it up here. Model 2 100
epochs. All right. So, hm, what can we
do with this? You might be thinking,
comparing models is very tedious, and it
definitely can be because we've only
compared three models here. But this is
part of what machine learning modeling
is all about. trying many different
combinations of models and seeing which
performs best and actually seeing which
doesn't perform best. So each model we
ran is an experiment to figure out what
doesn't work. And I'm going to put a
tidbit here is that
we'll just write here looks like model 2
performed
the best. So the tidbit, the takeaway
here is this is a follow on from before.
So we said start with a small model and
another one is one of your main goals
should be to minimize the time between
your experiments. So that's why we start
small and build up when needed because
as we said before, if you're having to
wait like 10 minutes between each model
experiment, and sometimes you you
definitely you will have to with a
larger data set as we'll see in uh
future videos, but the more experiments
you do,
the more things you'll figure out.
Now, this is getting too big because
it's not a markdown cell. the more
things you'll figure out
which don't work. And in turn, when you
know what doesn't work, you will get
closer to figuring out what does work.
It takes a lot of trial and error.
Remember the machine learning
practitioners
motto, experiment,
experiment,
experiment. By the end of this course,
you're going to have that tattooed on
your brain. As well as if in doubt, run
the code. And as well as visualize,
visualize, visualize. And so another
thing here is that what you think might
work, such as with model 3, we increase
the number of of epochs. So the number
of times model 3 could look at the data.
Sometimes intuitively you'll think it
will work. We mentioned this in the
previous video, but it in fact turns out
to be the exact opposite. So look what
happened with model 3. So that's why
experiment experiment experiment is so
valuable.
Now you're probably wondering it's like
Daniel, hey we saved all of our results
here and we kind of set up a pandas data
frame and we had to save all of our
evaluation metrics to uh different
variables. I mean if we're you say here
we're probably going to run potentially
dozens of modeling experiments per per
problem that we're working on that's
going to end up very tedious and a lot
of different stuff all over the place.
And you're exactly right. Well, the good
thing is is that the people like machine
learning practitioners around the world
have developed solutions. And so I'm
going to put here tracking your
experiments. These are a couple of
little uh extensions for now, but we are
going to cover one of them in the
future. So let's just put here one
really good habit
um in machine learning modeling as you
may have seen is to track the results of
your experiments. And when doing so
it can be tedious if you're running
lots of experiments.
Luckily,
there are tools to help us.
So, the first one, this is a resource.
We're going to put the little resource
emoji here. Resource. Remember, all of
the resources, exercises, extensions,
etc. in the course GitHub, as well as
going to be links that there'll be links
where you can find all of these things.
So
resource
as you build more models
you'll want to look into using. So one
of my favorite tools built straight into
TensorFlow is TensorBoard. So this is a
component of the TensorFlow library. And
again I'm just introducing the names of
these things now but later on we're
going to we're going to get hands-on. So
this is TensorBoard is a component of
the TensorBoard library to help track
modeling experiments. A very important
part of machine learning and we're going
to we'll see this one later.
And another one of my favorite ones that
links straight into TensorBoard. There's
a fair few of these on the market, but
this is the one I have the most
experience with. It's called Weights and
Biases. So a tool for tracking all kinds
of machine learning experiments and the
beautiful thing is um plugs straight
into tensorboard. So if you're using
tensorboard you can definitely use
weights and biases. So if we have a look
we can search up tensorboard
tensorflow.org/tensorboard.
Here we go. TensorFlow's visualization
toolkit. We're not going to dive too
deep into this. If you want to jump
ahead and have a look at what
TensorBoard is capable of and maybe even
plug it into our models, you can check
it out. Otherwise, the other one is
weights and biases.
One of my favorite external tools for
machine learning. But this is since this
is outside TensorFlow, we're not going
to be covering weights and biases. You
can look into that as some extra
curriculum if you want. We will be
seeing TensorBoard later on. But for
now, that's all we're going to cover for
tracking our experiments. In the next
video, if we've trained a model, say
model 2, and we wanted to use it, say,
in one of our applications, what we
might want to do is get it out of our
Collab notebook and somewhere else. So,
let's look in the next video at saving
our models.
[Music]
So we've trained a few models and we
found out that from our modeling
experiments, model 2 is performing the
best so far. So let's say we wanted to
to save that model like right now it's
uh it's just sitting in our Jupyter
Collab notebook. It exists as a Python
object here. How would we save that and
export it somewhere else? So, let's
figure out how we might do that. I'll
just put a note here.
Saving our models
allows us to use them outside of Google
Collab or wherever they were trained,
such as in a web application or a mobile
app.
So, how might we save
a model in TensorFlow? This is where I'd
start if I wasn't sure. I'd look it up.
There we go. Save and load models. All
right. Caution. TensorFlow models are
code and is important to be careful with
untrusted code. See, using TensorFlow
securely for details. I'll let you have
a look at that, but trust me, we're not
using TensorFlow maliciously here. Now,
we've got a few options here. So
defining a model, they've created a
function here to define a model. Model
summary. Here we go. We could save
checkpoints during training if we were
training for a long time. We're probably
actually going to have a look at that
later on in the course. But where is the
save model function? What if we go save?
There we go. Model.save. Save the entire
model as a saved model. Now in
TensorFlow there are two formats or two
major formats of which you can save a
model to. The first one is the saved
model format. So the save model format
is another way to serialize models.
Models saved in this format can be
restored using TFARSM models.load model.
Ah okay. So that's I'm guessing once
we've saved our model. If it's in saved
model format we can load it in using
this method here. And compatible with
TensorFlow serving. Hm. We haven't seen
that, but if you wanted to look it up,
that's something you could look that up.
The saved model guide goes into detail
about how to serve/ inspect the save
model. All right. Well, we've read up a
little bit about it. Let's try it.
Oh, and what's down here?
Another format. Okay. So, it looks like
there's two formats that we can save our
models in, but they're both using the
save method,
and one has ah5 extension, HDF5
standard. All right. Well, let's try one
at a time. So, how might we save a
model? Or we'll put a note here.
There are two main formats we can save
our models to.
Number one,
the saved model format and number two,
the HDF5
format. Now, I believe the saved model
format is the default. Actually, I know
it is cuz I have experience with this,
but let's just see. So, save a model
using the saved model format. Now, we
can do so. Go model to save. And then if
we check out the dock string here, saves
the model to TensorFlow saved model or
single HDF5
format. Okay. The save file includes the
model architecture allowing to
reinstantiate the model. The model
weights, in other words, the patterns
our models learned. The state of the
optimizer. Ah, allowing you to resume
training exactly where you left off.
That's handy.
All right. So, let's try it out. So,
what was the parameter we have to pass
in first? File path. Well, let's call it
something simple. You might want to get
more creative with your
model naming. So, I'm just going to call
mine best model saved model format and
shift and enter. What happens here? We
get a whole bunch of warnings.
So, I've looked at these warnings before
and they come up a lot of the time when
you're saving a model. Essentially, what
it is is some behind the scenes
libraries that have been updated. And
if you're just using the save format,
the you can safely ignore these. You can
also find the uh what the warnings
actually are in this documentation here.
But just for now, trust me, the save
model format, if these warnings come up,
it's all right. Uh unless you're getting
some sort of failed, the model hasn't
saved correctly. But in our case, let's
have a look at what happens. We check
the files tab of Google Collab. And here
we go. We get best model saved model
format. And in this file, we get a few
things. So we get assets. Okay. So we
got what's in here and what's in the
variables and we get saved model.pb. Now
this PB format is called a protobuff
file. Again you can find more on this in
the save uh save and load documentation.
But to make sure that our model saved
correctly, one of the best ways that we
can do it is by loading it back in and
checking it out. But before we do that,
how about we save a model in the HDF5
format as well to test that out. So
let's search that. HDF5
provides a basic save format using the
HDF5 standard. All right. What's the
HDF5 standard? Hierarchical data format.
All right. Is a set of file formats,
HDF4, HDF5 designed to store and
organize large amounts of data. Okay.
Well, I'll let you know that going
forward, you're probably going to start
training some fairly large models and
sometimes you might need to store them
in a a universal data format. So
something like a HDF5. So something that
you can pass around to many other
different programming applications.
And TensorFlow allows us to save our
models directly to H5 by adding the H5
extension onto the end of our file path.
So let's have a look at that.
We're just going to run the exact same
code up here except one difference.
Save model using the HD F5 format.
Now, which one of these should you use?
Well, it's really going to depend on
your use case. If you're staying within
the TensorFlow environment, so you want
to just use your model with pure
TensorFlow code, you're probably better
off using the save model format.
However, if you're going to use your
model outside of uh pure TensorFlow
code, maybe HDF5 is is better off for
you. But we're getting a little bit
ahead of ourselves here. Just know that
there are two main formats that you can
save uh your saved models to. The save
model one is also the default. So,
majority of use cases, you'll probably
be using the saved model format. Now,
let's see what happens if we run this.
See how we've just got the exact same
code here, but all we've done is we've
changed it to haveh5 on the end. We'll
run that. Beautiful. Now, what happens?
Ah, there we go. So, you'll notice the
main difference here as well is saving a
model to the saved model format, we get
it in we get a folder here and to the H5
format, we get a single file.
Now, I mentioned before that a way that
we can check to see if our models have
saved correctly is by loading them in
and testing them out again. So, much
like we've evaluated our model 2, I'll
just close this to get these evaluation
metrics here. If we load our model back
in, if the documentation is correct
saying that it saved all of its weights
and optimizer state, when we load our
model 2 back in from its saved model
format or the H5 format, theoretically,
it should get the same results as before
we saved it. So, in the next video,
let's see how we can both load a model
in and re-evaluate it to make sure it's
saved correctly.
[Music]
Welcome back. In the last video, we saw
how we could save our models or a
trained model. And we saved them both to
or model 2, which is the best performing
model so far, to both the saved model
format and the HDF5 format. So, we've
still got our models here in the files
tab of Google Collab. Now, what if we
wanted to load that model back in? Let's
see how we do that. Loading in a saved
model.
So, if we come back to the documentation
where we looked at load and save models.
There we go. So, new model equals TFARS
models.load model. Now, I believe if we
go load model, there we go. Models saved
in this format can be restored using the
TFARS models.load load model method.
Now, the beautiful thing is is that with
both formats in TensorFlow, the save
model format and the HDFI format, we can
use the same method to load them in. So,
let's see how we do that. Um, which one
do we want to do first? Load in the
saved model format model.
And let's do what could we go loaded
saved
model format
equals TF.caras carers.load
model and then we're going to pass it
the file path of this folder here. Now
in collab you can do this by copying the
path. So I'll just show you that there.
We can copy the path there. Turn that
into a string.
Now again in collab you're going to get
a little uh content in the front of it.
I believe it works regardless of whether
you have content in the front or there
or not. So we could do it like that.
Hopefully that works. Otherwise, we can
just revert back to what we want. Um,
load save model format.
How should we evaluate it? Oh, we'll get
a summary. That should do it. Summary.
Let's see if this works.
Beautiful. Now, is this the same as our
model two above? We've got two layers,
dense with 10 hidden units and another
one here with one hidden unit. If we go
model 2
summary,
is that the same thing?
Yes. Beautiful. Now, to see if our
loaded saved model format model far out,
we're going to say model a lot here,
aren't we? is actually still the same as
model 2 because right now we've the
architectures we've confirmed are the
same and these are the same here.
However, in the documentation it told us
that it's going to save the weights as
well. So in other words, the patterns
model 2 has learned. So to really check
that, how about we make some predictions
with model 2 and the saved model loaded
model and then compare those predictions
to make sure they're making the same
same predictions because if they are
that means the patterns they've learned
should still be the same. So compare
model 2 predictions with saved model
format model predictions.
So, this is kind of like we're writing a
test to
see
if our models are the same thing. Our
loaded model and our uh saved model. And
we're going to go loaded save model
format pres equals loaded
save model format.predict
on XEST. And then to compare them, we
could do let's just test them for
equality.
How about we do that? Loaded save model
format. This should return true. False.
False. False. False. False. False.
False. Hm. What's happened here?
Let's have a look at model 2 PRS
and loaded saved model format PRs.
5
28
13991 H they seem pretty comparable to
me. What we might have to do is how
about we calculate the mean absolute
error of each. Otherwise we might have
to bring in a numpy function here. So,
MAE, we want MAE of Y true equals
Y test
and Y prred equals
this going to be model 2 PRs.
And then we want to compare these two
to map ae y true equals y test and y
prred equals
loaded
saved model format pres.
So they should have around about the
same error. How does this go?
True. Ah, beautiful. So I'm not sure why
their predictions are different. You
know why it might be? It's because of
the how a computer stores numbers. So if
we go model 2 preds, have a look at
this. Maybe we'll squeeze that. So it's
uh by the way, you can do dots squeeze
on the end if you wanted to. That's a
numpy method coming in there, I believe.
Yeah, there we go. And then if we do
um loaded save model pres.squeeze,
I believe these two should be very
similar. Yeah. So why aren't that
if we copy this up here, we're going
against our rule of always writing code.
Should this equal true? True. True.
True. True. True. True. Okay. So, I
would say the reason why here we're not
getting this correct is because
dimensionality wise, they aren't of the
same shape. Is that why? But that
doesn't make sense. So, this is this is
I want you to to experience this. This
is me exploring uh an output that I
didn't really expect while I was
creating this video. So, I really want
these videos to be as if like we were
sitting side by side and coding to
figure things out together. That doesn't
work. Okay.
Oh, you know why? Because it's not
present
typo. Did you catch that? Why didn't you
tell me earlier?
This is a challenge we face when we're
bringing in so many similar variable
names. Now, this should work. Okay.
True, true, true, true, true. All right.
All that extra code there we we wrote.
Well, the news is we didn't have to
write it. So, I mean, this is you're
going to see these things happen a lot
of the time. And I'm glad you're you're
watching me make mistakes on the fly
because as you learn, you're probably
going to make a lot of mistakes, too.
Now, we've loaded in a model using the
save model format. How about we try the
H5 format? So load in a model using
theh5
format and we're going to go loaded H5
model can be TF carers models.load model
and then we're going to pass it in just
the file path. We can come over here
copy the path if we wanted to.
Command V or controlV depending on if
you're on Windows. Going to turn that
into a string. Get rid of that. Now
again, you don't necessarily need
content, but just to check if it works.
So that's the loaded model. And we'll
get the loaded H5 model dots summary
just to make sure it's the same thing.
Does this match our model 2 summary? I
believe it does. Model 2 dot summary.
So, first things first, that's nice and
easy to check. The architectures are the
same. Now, we're going to check it to
make sure uh its predictions are the
same as model 2. So, if we go here,
check to see if loaded.h5
model predictions
match model 2. So, we can go model 2
preds. So, we've already got that
variable, but we'll just have some
practice writing predict on X test.
Writing some more code. It never hurts
to write a little bit more code. And
then we'll go loaded H5 model presals
loaded H5 model.predict
on X test. And then if we want to
compare them, we can remind ourselves to
this time adds on the end. Daniel equal
equal loaded H5 model PRS.
How do we look here? True, true, true,
true, true, true, true. Beautiful.
Uh, we could do the same again for
working out the MAE, but I'm just going
to, if all the predictions are the same,
the MAE should be the same as well. So,
we can get rid of this extra one and
maybe move this one up to the loaded
save model format.
We'll just put a note in here of what we
did so we don't confuse ourselves when
we come back to our notebook. Compare
the MAE of model to PRS and loaded
saved model pres. So
what we just did we we repeated
ourselves writing the code here. If you
wanted to sort of functionize this going
forward that could be one of the tests
that you run. So say for example you
saved a model and then you loaded it
back in. You could write a little
function to make sure that the loaded
model has the same prediction values or
error rate or summary or something like
that as your saved model. That way you
make sure you're using the same uh the
model with the same learned patterns.
But otherwise, in the next video, we
have our files here in Collab. How might
we download them to our local machine if
we wanted to use them outside of Collab?
Let's check that out.
[Music]
Welcome back. So, we've seen how we can
both save a trained model and load it in
within a Google Collab notebook. And by
the way, a lot of these methods will
work in just a Jupyter notebook.
But how might we
download a model or any other file from
Google Collab?
Now this will be specifically this
lecture specifically if you're using
Google Collab. If you're using a Jupyter
notebook oftent times your files will be
on your local machine or on your server
hosted somewhere else. But this is
specifically if you wanted to get a file
off Google Collab. Now we got a couple
of ways. So the first one number one or
if you want to download your files from
Google Collab
number one is you can go
to the files
tab
and right click on the file
you're after and click download.
So let's try that out. So files tab.
This is this one here. We have code
snippets. We have search. We have table
of contents. Far out. Look how much
we've covered so far. You should be
really proud of yourself. But we'll come
into the files tab cuz that's what we're
trying to do. And say we wanted to
download our best model HDF5 format uh
H5. We could just go here and click
download. Now depending on the size of
the file will depend how long this
takes. But now if we check my files
downloads, I've got the best model HDF5
format saved here in my download. So if
I wanted to import that into some sort
of application or into another coding
environment, I've got it saved and
trained here.
Now the second option to download a file
from Google Collab is we can use
code. See the cell below. Let's write
some code to download from Google
Collab.
If we want to download a file
from Google Collab, we can import from
Google.colab.
import files and then files allows us to
files.d download. And then if we just
pass, we'll copy this one. See if we can
download it again. Copy path.
We pass in the file path of our H5
model. Shift and enter.
Oh, as no attribute downloaded. Oh, type
that in wrong. So, don't type in
downloaded like I did. Just type in
download.
And here we go. We're going to get a
little loading bar. And there we go.
We've got another copy of it. So, we'll
check the downloads.
Ah, there we go. We got the same file
cuz we've I've got a little one here
because we downloaded it twice. But that
is a quick and easy way if you wanted to
download your files from Google Collab.
Remember, depending on the size of the
file and your internet connection may it
may take a little while to download. So,
just be patient. The other way is oh
maybe a third way is if you wanted to
while we're here you can save it to
Google Drive
by connecting
Google Drive and copying it there.
So C second code cell below. So let's
see how we might do that. I'm going to
mount my Google Drive. So this is going
to if you have uh a Google account,
which you kind of require to sign into
Google Collab, it's going to mount your
Google Drive, which is online cloud
storage hosted by Google. We're going to
wait for this to load.
And when it loads, we should see our
drive file appear in the files tab.
Beautiful. So, this is my Google Drive
here. If I open that up, I've got my
drive and I've got a few other things
that I'm working on there. Wonderful.
So, if I wanted to save it here, copy
path or actually copy path there. Um,
let's
save a file from
Google Collab to Google Drive. This
requires
mounting
Google Drive. So we can do bang cp for
copy.
So we're going to copy this file here
h5.
So that stands for copy this file at
this path to which path do we want to go
to? I believe I might have a TensorFlow
course. I'm going to copy that path. Now
you might have to create a file in your
Google Drive called TensorFlow course.
You might be able to go new folder and
create something like that. But mine's
called TensorFlow course. It's already
existed. So there we go. Copy best model
HDF5 format.h5 to content drive my drive
TensorFlow course. Let's see if this
works out.
All right, that looks like it went
through pretty quickly because it was a
relatively small file. So, let's go ls
and then just the path here to see if
it's copied across.
Oh,
best model HDFI format.h5.
And a little spoiler alert for what's to
come, but I'm not going to talk about
that for now. We're going to see that in
upcoming videos. So, there's three ways
there. If you wanted to download a model
or any other file from Google Collab,
you can rightclick and click download.
You can download it with code. Or if you
wanted to save it to your Google Drive
and access it later, you can use the
copy method to save it across there in
your your target folder. All righty. So,
we've been it through a fair few of the
fundamentals. In the next video, or in
the next series of videos, actually,
let's have a look at how we might tackle
a larger example.
And I'm not going to say anymore. I'm
going to wait until you're in the next
video to show you what's going to
happen. I'll see you there.
[Music]
I've said it before and I'll say it
again. We've covered a lot of ground so
far. I mean, we've seen the fundamentals
of building neural network regression
models in TensorFlow. So, you should be
proud of yourself. Look at all the
subheadings we've covered. Give yourself
a pat on the back. But now it's time to
step it up a notch and build a model for
a more featurerich data set. So have a
look at our data. Look at what we've
been playing with.
X-rain and Y train. Just a couple of
tenses with single numbers trying to
predict single numbers. But in practice,
you're probably going to be dealing with
a little bit more of a complex data set.
So what data set are we looking at?
Well, we're going to have a look at the
publicly available medical cost data set
available from Kaggle and hosted on
GitHub. So, let's find it. So, medical
cost data set.
Here we go. Medical cost personal data
sets from Kaggle. Now, Kaggle is a
website which is basically an incredibly
great place to uh compete in data
science competitions, find different
data sets, see example notebooks, and
also learn more about uh data science
and machine learning in general. So,
you're going to become very familiar
with Kaggle over your data science and
machine learning uh explorations. So
we're not going to go through it for
now, but just just know that if we want
example data sets, Kaggle is probably
one of the best places to find them. Now
if we have a look at this. So medical
cost personal data sets insurance
forecast by using linear regression.
Let's have a read of what's going on.
So context. So machine learning with R.
So R is another programming language uh
that can be used for numerical computing
just like Python by R Brett Lance is a
book that provides an introduction to
machine learning using R. Okay. So
there's the long story short here is
that this is a data set that's available
online and we've got a bunch of columns
here. Age, sex, BMI, children, smoker,
region, charges. Now what we're trying
to do here is use these columns. So age
through to region to predict what
someone's individual medical costs build
by health insurance will be. So we're
using these features here to predict a
number. So it's a regression problem. I
believe this example has used linear
regression but we're going to build a a
neural network regression model. So the
data set is publicly available on GitHub
here. Wonderful.
So, there's a fair few data sets here.
We're looking specifically
at insurance.csv.
Now, I'll show you a little trick that
we can use to import this straight from
GitHub to our Google Collab notebook. If
we go raw,
there we go. Raw.github user content.
Now, if you can't find this link, I'll
put in the resources section for you.
We're going to copy this link. Come back
to our notebook. I'm going to get out of
this. And the first thing we're going to
do is actually import the required
libraries for this larger example. We've
already imported these, but it's good
practice just to if we were starting
from scratch. Import TensorFlow as TF.
Import pandas as PD. Wonderful. And
we're going to also need mapplot lib in
case we want to do some plotting.
Those libraries are pretty standard. So
now because we've got the insurance data
set copied to our clipboard, read in the
insurance data set. Let's see what it
looks like. We'll call it insurance
equals PD do read CSV.
And then the beautiful thing is we can
just paste a link in here and the read
CSV function will read it directly from
here. All of these values. So there's a
columns age, sex, BMI, children, smoker,
region, charges.
We can import directly. Let's have a
look. Insurance.
All right. So 1,338
rows times seven columns. Yes, now we're
talking. This is a little bit more
complex than the problem we've been
working on so far. So what do we have
here? We have age, sex, BMI, children,
smoker, region, and then charges. This
is an amount. So this is how much
someone's medical cost or medical bills
were. Medical insurance was based on
their age, sex, BMI, number of children.
Are they a smoker? Yes or no? And
whereabouts do they live? Southwest,
Northwest, etc. I'm not sure. Is it from
a city? Anyway,
doesn't matter. You can do some
research. I'll put the the links to
where you can find this data set.
Essentially what we're focused on here
is writing TensorFlow code to take in
these features, learn the relationships
between them or more so the
relationships between these features and
this target variable here charges. So
what is a regression problem? Relating
it back to our Wikipedia definition. In
statistical modeling, regression
analysis is a set of statistical
processes for estimating the
relationships between a dependent
variable, often called the outcome
variable, and one or more independent
variables, often called predictors,
covariants, or features. So in our case,
what is our dependent variable?
Our dependent variable is the charges
because this is what we're trying to
predict. And what are our independent
variables? In other words, known as I
like the term features. So that's what
you'll hear me use a lot. Our
independent variables are these columns
here. Age, sex, BMI, children, smoker,
region. So what do we have to do? What
is our first step in getting our data
ready to pass into our machine or neural
network models? Could we just start
building a model? Model equals TF carers
sequential. I mean, we could we could
keep that going, but
what do our machine learning models
like? Can we pass in this column, sex?
If it reads female or male, what's it
going to do? So, if I go insurance, what
type
is this column? Oh, no. Insurance
data type object. Hm.
What about smoker?
data type object. What's the difference
between smoker and insurance
age
int 64? Ah,
so we have some columns here that are
numerical and some columns that aren't
numerical.
Do you remember what we have to do to
non-numerical columns before we can pass
them to a deep neural network or a
machine learning model? We have to turn
them into numbers. Right? If we come
back to our regression inputs and
outputs, say we're trying to predict the
price of homes, the sale price of homes
based on the features here, bedrooms,
bathrooms, garages. Before we can pass
it to our machine learning algorithms,
we have to create a numerical encoding
which is often referred to as input
features. So that's what we're going to
have to do to these variables here. Now
one style we've we've covered one style
of numerical encoding and it's the style
we're going to use for this problem.
It's called one hot encoding. So if we
go what is one hot encoding probably one
of the simplest methods to turn
categorical variables into numerical
variables.
So if we go here is there a diagram.
Maybe we just go to images.
There we go. Nice and simple. So one hot
encoding. If we have the food name apple
chicken broccoli for one hot encoding
apple gets a one and zero for everything
else.
For chicken, chicken gets a one and zero
for broccoli and apple. For broccoli,
broccoli gets a a one and a zero for
chicken and apple. So that's what we
want to do because we're working with
categorical variables here. We're going
to create a column that is maybe sex
female. And if the value of this sample
is female, we want to put a one there.
And if it's male, we want to put a zero
for the sex female column. Now, we could
do this manually, but that's going to
take a whole bunch of work. So, what I
prefer to do is use if we go to
Pandanda's data frame, we can use pandas
get dummies
function. Let's see what this is.
Or better yet, if you didn't know what
this function was called, how to
one hot encode a pandas data frame.
What do we get? How can I oneh hot
encode in Python? One hot encoding a
feature on a Pandanda's dataf frame
examples. What do we get from this web
page?
Again, I'm not sure what this web page
is going to show us. So, this is kind of
running the gauntlet here. All right,
there we go. With one hot encoding, a
categorical feature becomes an array
whose size is the number of possible
choices for that features. Mhm. So, we
create a data frame of country,
different countries here. Wonderful. Ah,
they use get dummies. Pandas provides a
very useful get dummies. So once we'd
seen that, oh gives a great example
there. Beautiful. We could go to the get
dummies page, but we already have that
open. Get dummies. Convert categorical
variable into dummy/indicator variables,
which is a not sure why they use dummy
indicator variables. I I just treat this
as another name for one hot encoding.
Come down.
We see some examples with pandas.
There we go. All right. So, let's just
run some code and see what happens
rather than diving into documentation.
Who who needs the docs when you can just
run hundreds of different little mini
experiments and figure things out for
yourself.
Um, no, that's not to say that
documentation is actually very valuable.
So, get used to reading documentation,
but also get used to writing lots of
code. So, let's see what happens if we
run this.
dataf frame object has no
oh we can't run it like that
I'm eating my own words here we have to
run it like this pd get dummies then
pass it our data frame
come on Daniel pd get dummies
let's try one hot encode our data frame
so it's all numbers
and what do we call our data frame
insurance.
Oh yes, look at that. Okay, that's what
we want. So we have age, BMI, children,
charges, and now for the columns that
were
categorical columns such as sex, we have
sex, female. So see here, this first
sample is has a value of female. So, it
gets a one for the sex_female column.
And then for the sex_male column gets a
zero. And then were they a smoker? Yes,
they were a smoker. Oh, maybe why?
That's why their charges are so high.
So, they get a yes. Sorry, a one for the
smoker. Yes. column. And where were they
from? They were from the southwest. So,
they should have a one in the southwest
region. Wonderful. Okay, that looks
good.
Now we might actually save this to a
variable insurance_1hot
and then insurance oneh hot
uh head.
Excellent. All right. So now we have a
few more feature columns. So because
charges is still the column we want to
predict. So we want to use all of these
columns. age, BMI, children, sex,
female, sex, male, sex, no, oh, sorry,
smoker, no, smoker, yes, region,
northeast, etc. We're going to combine
all of those and learn the relationship
between what charges equals. So, we'll
probably end this video here before it
gets too long, but if you want to have a
sort of here's a bit of a challenge to
if you want to go ahead maybe we'll in
the next video create X and Y values.
So, features and labels. Have a go at
creating um the X and Y values. Then
we'll need to create a training and test
set. And then we can build a neural
network. So, here's your challenge. If
you want to go ahead create training and
test set and then build a neural
network, it'll be sort of like model 2
above, but it'll probably have it might
have different input and output shapes
or we'll see. But that's your challenge
if you want to go ahead. Otherwise, in
the next video, we'll start to tackle
these three things.
[Music]
How'd you go? Did you take on the
challenge? Did you create X and Y
values, features, and labels? Did you
make a training and test set? And did
you perhaps even build a neural network
sort of like model to above to find the
relationship between our features and
target variable? If not, that's
perfectly fine. If you did, wow, that's
that's incredible work. Otherwise, uh
either way, we're going to go through
each of these three steps in this video.
So, let's uh split these off. I think
the command is um command m minus. Yeah,
there we go. If you want to split a
cell, so command or controlm minus in
Google collab. So, we have three
different things here.
This is what we have to do. All right.
Create x and y values features and
labels. Let's do this. X equals. Now
what did we say before are our features?
Basically every column except charges.
So let's um how can we do that? We can
drop a column. So insurance oneh hot dod
drop and we're going to drop charges on
the first axis. I believe that's how we
do it. If not we can find out in a
second. And then y is going to be just
this column. So let's make sure we can
get that insurance one hot just charges.
Wonderful.
Okay. Now let's uh view X
X do head first five rows. Wonderful.
Age, BMI, children, etc., etc. Got a
fair few columns there. And let's view Y
to make sure. Remember, always
visualize, visualize, visualize.
Beautiful. There's the charges there
which are all float 64s. Now, these are
in a format of uh right now they're in a
format of a pandas dataf frame. However,
we're going to see very shortly that we
can turn them into formats capable to be
used with our neural networks. That's
actually probably going to happen
automatically
because we're about to create training
and test sets. Now, previously we'd
indexed our data sets going something
like the first 40 examples of training
examples. However, there's a much better
function to create a training and test
set if you've got a feature matrix and
um a label vector. So, we're going to
use scikitlearn
train test split. This is a very popular
function. Model selection, train, test,
split. Remember, creating a training and
test set. Probably one of the most
important things in machine learning.
That's why this function is so
beautiful. So, split arrays or matrices
into random train and test subsets.
Wonderful. That's exactly what we want.
Oh, no. We don't need to copy that. We
can just import sklearn.mmodel selection
train tests split. Let's see how we can
use this from sklearn
dot model selection import train test
split.
So if there's one function from
scikitlearn you want to remember it's
this one very very popular. So we go
here X train because it's so useful. X
test Y train Y test equals train test
split.
And then we want to Can we get the dock
string or is it not imported yet? Not
imported yet. That's all right. I know
you need an input, but we're not up to
that yet. Here we go. What's the dock
string? Split arrays of matrices into
random train and test subsets. Okay,
so we want what does it take arrays
sequence of indexable with same length
or shape zero allowed inputs a list
numpy arrays sci sparse matrices or
pandas data frames that's us that's what
we have beautiful test size if float
should be between 0.1 and 1.0 know and
represent the proportion of the data set
to include in the test split. So before
we made an example, right up the top,
right back up here in the three sets, we
used an 80/20 split, meaning 80%
training data, 20% testing data. That's
a very common split, very useful split
for our problem. So let's keep it at
that. And I just want to see
what's the example here. This is where I
got this line of code, by the way. Okay,
train test split. It passes XY test size
equals.
They've used a 33% test split and
they've set random state 42 to make sure
that the data is split in the same way
each time. So otherwise, if you don't
set the random state, the split will be
random. So let's go in here. I'm going
to pass X Y.
Oh, I've got caps lock on. Goodness
crashes me. Test size equals 0.2. And
I'm going to set the random state.
They've set 42. So we'll set 42, too.
All right. And let's see what happens
here. We'll go l x just to see the
length of
x overall.
And then length of x train and length of
x test. What's going to happen here?
B E A. Beautiful. Look at that. It's
created our
train and test split. So this is how
many were originally in X. And because
we've set the test size to 0.2,
our test data set has 20% of the
samples. So if we were to go 0.2 * 1
338, what does it give us?
Okay, it's going to be around. It's been
rounded up. All right, nice and fair.
And it's also random. So if we go X
train, what does that look like?
Ah, all right. All these indexes are
randomly shuffled. Beautiful.
That's what we're after. So let's get
out of that. Now, how might we build a
neural network to take in X-RA and Y
train and learn the relationships
between the two? So the hint is it's
sort of like model two above.
So if we go uh we get another code cell
here. Model 2 dos summary.
What does model 2 look like?
All right. So we've got two layers. Very
familiar with this model 2. How about we
just recreate it? What should we do? TF
set the random seed first. Random. Set
seed 42.
Nice. Now step one is
create a model. How do we do that? We go
model or actually we'll call it
insurance model. There we go. TF.
Pretend we're working for a large
insurance company and we've been asked
to model this data set so that when
someone when Sally Smith comes to buy
insurance off us, we know if she tells
us how old she is, where she's from, if
she's a smoker or not. We know how much
we should charge her based on our other
customers.
That's what we're doing. Now,
there we go. Now, what do we do always
after we create a model?
Step two is we compile it. Compile the
model.
And here it's called insurance model do
compile. What's our loss function going
to be? TF carers.losses.
What have we been using? MAE, mean
absolute error. We can keep it like
that. We could even change it to MSE if
we were feeling uh a little adventurous.
But since we're going to stay with the
similar experience to what we've been
running, we'll keep it as MAE optimizer
equals TFAS optimizer dot the old
faithful stoastic gradient descent SGD.
And the metrics we're going to put here
is mean absolute error. Beautiful. And
now number three is we fit the model.
Now I said before we were going to
format our data into a way that is going
to work with TensorFlow. So right now
we're writing TensorFlow code here. But
up here this is pandas code. Pandas
scikitlearn and now TensorFlow. I wonder
if this will work. Do you reckon it'll
work? If in doubt, run the code. You
ready? We're about to build a neural
network model on one or really because
we're fitting on the training data. It's
just over a thousand rows of data, which
is a big increase on what we've been
working with so far. So, don't hold your
breath. Three, two, one. Is it going to
work? What did we mess up? I knew
there'd be an error. Ah, spelled
insurance wrong.
I want anticlimax. All right. Three,
two, one.
And again. Here we go. Here we go. Now
all the typos come out. All right. Third
time lucky. Three, two, one.
I'm done.
Oh, you typed in. All right. No
countdowns anymore. Just fit the model.
Boom. Look at that. All right. Now, this
one takes a slightly little bit longer
because we're working with more rows.
So, we come down here. Does our MAE
decrease as we're going up?
It starts off around 8,600 and it
finishes
around 7,100.
Okay, so a fairly good decrease. And
we're just going to get rid of this cell
here. Wonderful. So what do we do now?
We just train a model. Oh, and yeah, by
the way, we didn't even have to reformat
this into tenses. The reason being is
because pandas is built on top of numpy.
So really when we see it like a pandas
data frame, this is actually a big numpy
array. And when we pass that to
tensorflow, it automatically knows how
to deal with numpy arrays because we've
seen tensors work with numpy arrays
before. So what's next? After we've
trained a model, what should we do?
Should we evaluate it on the test data?
Great idea. Let's do that. So check the
results of the insurance model on the
test data.
So
insurance how can we do that?
We can use the evaluate method. Evaluate
on X test Y test. Let's see how it
performs.
All right. Okay. So on the test data
set, it's performing even just a
slightly little bit better than on the
training data set. That's pretty cool.
So now what might we want to do? What is
this error telling us? Actually mean
absolute error. It means that on average
our model is wrong by about 7,000. Now
is that number large? Is that
significant compared to the other values
in our data set? Let's have a look at
our training variables.
Y train. Whoa.
So,
if our model is off by 7,000,
what's the the median,
the middle number of our target
variables? Wow. So, the median variable
or actually what's the mean as well?
13,300
9,500.
So, right now, our model's pretty
substantially wrong because I mean, look
at this. The average value of Y train,
so the average insurance cost is 13,300.
So, and our model is on average off by
7,000. That's pretty significant
considering the total amount is only
13,300.
Well, that's the mean. But the median,
and it's even worse, it's only 95,000.
So, if we're off by 7,000 there, we
might be charging someone 2,000 when
they should be being charged 9,500.
So naturally, what might we want to do?
If you guessed improve our model, you'd
be 100% correct. So let's uh in the next
video, right now it looks like our model
isn't performing too well.
Let's try and improve it.
So we're going to start to run some
experiments to improve our model. But
before I do, again, I'm going to issue
you another challenge if you want to go
right ahead. We've already run some
experiments to improve our models. How
about you try running some very similar
experiments on the model we just built
here?
Maybe you add an extra layer, maybe you
change the optimizer, maybe you train
for longer. Give it a play around and
see how you go. But otherwise, we'll
we'll try a few experiments in the next
couple of videos.
[Music]
How'd your modeling improvement go? Did
you get a better MAE score? Was it
lower?
I hope yours did better than the one we
first built, but if not, that's all
right. We're going to see if we can
improve it now. So, what should we do
first? When we did up here,
we did some experiments. So the first
thing we did was okay, we added an extra
layer. Okay, let's try that. We'll come
down here a larger example.
First experiment might write this down
actually. So to improve our model or
actually we'll put try to try improve
our model
we'll run let's do two experiments.
So one can be add an extra layer with
more hidden units and then two can be
train for longer
and then I'll leave the rest up to your
imagination. Insert your own experiment
here. You can put to to use what we've
learned. So how about we recreate the
model? We'll try the first one. So add
an extra layer with more hidden units.
So set random seed and then TF random
set seed. Wonderful. Now number one
create the model. So TF carers oh we
need to go insurance.
What did we call it before? Just
insurance model right? Insurance model 2
equals TF carers sequential.
Wonderful.
And then we should go TF caras layers.
Maybe the first one here has 100 now.
Then TF carers
layers dense 10. And then TF carers
layers dense one. Is that an increase?
What was our first? Yeah, we only had
one layer with 10 there. So this model
has an extra layer with an extra 100
hidden units. So if we come down here,
if we go there two is compile the model
and we want to go insurance model 2
compile.
We want to go loss equals TF carers
losses.
Uh what's the optimizer? TF carers
optimizers.
One day I'll get this right. SGD.
Beautiful. And then the metrics can be
MAE as well.
And now let's go. Number three is fit
the model
is insurance model 2. Fit X train Y
train epoch equals 100.
All right, you ready? This time we're
going to set verbose equal to two. So we
don't have a massive output. Verbose,
sorry, equal to zero.
What did we get wrong here again? Oh,
metrics equals MA.
All right, we finished there. Again, we
get some warnings. I was just saying
that our data types are maybe not
optimized, but the beautiful thing is
that TensorFlow fixes those for us. And
now we can go evaluate
the larger model. So, insurance model
2.evaluate.
Hopefully, because we've added more
hidden units and an extra layer, we've
given our model more potential to learn.
So, well, that's at least in theory. Did
it work?
Oh, what do we get wrong here? Hm. Maybe
we should output the training
n
100 insurance model 2 insurance model.
All right, now we definitely have some
troubleshooting to do. What have we got?
X train Y train should be the same.
Maybe our model is too complex to learn
anything.
So, we might have built a too big a
model. Hm. What's another lever that we
can tune here? We could remove this
layer. Let's see if we just remove that
layer. Does this work?
Okay, so it works there.
H, this is some good troubleshooting.
Now, what if we keep that layer on and
try again?
We get loss is n. If you ever see your
loss or MAE as NAN, it means that
there's probably something wrong with
your model. So, this is some good
troubleshooting. Now, let's think about
this. We got to create the model. Our
first experiment, we put down here that
we wanted to add an extra layer with
more hidden units. Yes, but it looks
like our model may be too complex for
our data set. So, that it's not even
it's so large that it's our data set is
not large enough to to teach it
anything. So what we might try to change
is we haven't looked at anything in
number two. Now can we alter the
learning rate? So the learning rate is
0.01. How about we try old faithful
Adam?
So there's a few different optimizers
here. SGD and Adam are probably the most
popular. If SGD doesn't work, try Adam.
Let's see what happens. Ho ho. Look at
that. Yes. Oh,
did that so far so good? That looks like
it's doing better than the previous
model.
All right. Evaluate the larger model.
Oh, look at that. Now, we've just If we
go insurance model, where's the first
one? Do evaluate X test
Y test. What were the metrics from this
one? MAE. Holy goodness. We've just
decreased it by 2,000. So about 30 odd%
around 30 odd% decrease in error rate by
tweaking two little things. We added an
extra layer and we change the
optimizers. Again, this might not always
work, but that's just one of the one of
the levers that you can turn on your
models to improve them or at least try
to improve them. We put that up here on
purpose to try improve our model. So I'm
going to put here we actually modified
this experiment. Add an extra layer and
use the atom optimizer. So this one
we're going to train for longer. Same as
above, but train for longer. Maybe 200
epochs. 500 probably too many. Well, who
knows? If in doubt, run the code
experiment. Okay, so that's insurance
model 2. Let's run insurance model 3.
See if we can improve on this number
here. How do we create this? So set
random seed. So TF random set seed 42.
Beautiful. Now number one is create the
model.
Same as above.
So we'll go
insurance model 3 equals TF carers
sequential.
Go right to the end here. TFRA layers.
The first layer we had there was a dense
layer with 100 hidden units. And now we
have TF carers.layers
dense 10. And then TF carers layers
dense one. Got to make this insurance
company happy. You know what's step
number two? Compile the model.
Beautiful. insurance model 3 dot compile
loss equals TF carers losses.mmael
that deleted itself out of nowhere the
optimizer is remember what optimizer are
we using now we're using Adam optimizers
come on Daniel you can remember that and
then the metric is mae
wonderful and now let's have a
Number three is fit the model
insurance
model 3.fit
X train Y train epoch equals 200. We
might actually put this history equals
that. You'll see what this means in a
second. So, you ready? Let's run this. 3
2 1
All right.
We're past 100 epochs. Oh, yes. The MAE
is going down.
Beautiful. Woohoo. All right. So, if we
go now, let's evaluate.
Evaluate our third model.
Do evaluate. Do these results on the
training data set translate to the test
data set?
Because that's what we're really
concerned about.
Eval
U8
that is what's up we've decreased our
MAE to 3 and a half thousand I believe
we are now so if we look at back to our
first insurance model insurance model
evaluate
X test Y test
we just h haveved our error rate in
about 5 minutes. How cool is that? Now,
there's a few more levers that we could
try, but there's also a little thing
that I want to show you. And we saved
history equals insurance model 3.fit.
What is this history variable? Well,
this is something we're also going to
get very familiar as we go. So, plot
history,
also known as a loss curve or a training
curve or a training curve. So how we can
do this? If we go PD do dataf frame
equals history. So history has a history
parameter saved to it or attribute
sorry. And then if we go dotplot inside
a pandas data frame that is. And then
we're going to set the y label to loss
and then the x label to epoch and shift
and enter.
Look at that. One of the most beautiful
sites you will ever see in machine
learning and deep learning is a loss
curve decreasing.
So why is this so beautiful? It's
because
if we look at where this history
variable originates from,
we instantiated it when we started to
fit our model to the training data set,
specifically insurance model 3. Now, if
we scroll right back to the top of the
epoch chain here, all the way up to 0
out of 200, look at this. The loss
starts at a massive 13,273.
So, the MAE when our model first started
to learn, it was its average prediction
was off by 13,273, which is basically
the the average of the whole data set.
But then we scroll down as our model
trains, as it learns, its loss goes
down, its MAE goes down, and we finish
off
with a loss of 36 3600. And because our
loss function is MAE, uh we finish with
an MA of 3600 as well. And this is what
this curve is reflecting. So when we're
training our neural networks, generally
we want our loss curve to go down
because that means that the predictions
our model is making are becoming less
and less wrong. So it actually looks
like our model's loss would probably
keep decreasing. Maybe not as
extensively as it has here if we kept it
training for longer. So that's probably
a little extension that I'll leave for
you to try out for yourself. train the
insurance model 3 for a few more epochs,
maybe another 100 or 200 or so, and see
how low the loss will go. But you're
probably asking, after seeing this, how
long should you train for? Well, that's
a great question. Let's put that there.
Actually, we'll do the question emoji.
Question,
how long should you train for? So if
you're asking that,
I'm going to hit you with an answer
that's going to say it depends.
Remember how machine learning and deep
learning is very experimental. Well, how
long should you trains for is I mean it
really it actually really depends
really.
It depends on the problem you're working
on. However,
however, many people have asked this
question before.
So,
TensorFlow
has a solution.
It's called the early stopping call
back. And we're going to use this in a
later video, but I just want to
introduce it now so it seems familiar.
So, TensorFlow early stopping call back.
Here
we go. And essentially what this is is
this. I'm going to set the link here.
Early stopping callback. I'll let you
read what the documentation says, but
I'll just tell you. It's called the
early stopping callback, which is
a TensorFlow component
you can add to your model to stop
training once it stops improving
a certain metric.
So in the case of our insurance model 3,
if we wanted to say train it for a
thousand epochs, we could set the early
stopping call back to say, hey, I'm
going to set you off train for actually
for an unlimited amount of epochs. But
once your loss stops decreasing for say
three, five or 10 epochs in a row, which
means you our model stopped improving,
stop training. All right, does that make
sense? Now, a little extension you could
do would be to look at the early
stopping call back in TensorFlow and see
how you could implement it for the code
that we've written here.
All right, but otherwise I think this
video is getting a little bit too long.
So, we've covered a few modeling
experiments here. What we're going to
look at next is another way of
pre-processing data, specifically
normalization and standardization.
So again, if you want two pieces of
extra curriculum before the next video,
read up on the early stopping call back
and search for what is normalization and
standardization and think about how it
might relate to our
training data.
But anyway, I'll see you in the next
video.
[Music]
Welcome back. So, in the previous few
videos, we've been putting together all
of the different concepts that we've
learned so far, and there've been a fair
few. We started working on a larger
example, specifically trying to predict
the cost of someone's medical insurance
based on features such as their sex,
BMI, where they lived, and smoking
status. Now, we've got one more thing
we're going to cover before we move on
to the next section, and it involves
pre-processing data, specifically to do
with normalization and standardization.
Now, you may have heard of these terms
before. If not, that's perfectly fine.
We're going to go through them. You
might be thinking, hey, Daniel, why are
we going back to pre-processing data? I
thought we already had our data ready.
We we turned it into tenses. We had one
hot encoded, etc. and you'd be 100%
correct. But there is one step we can do
to improve the pre-processing of our
data. So if we come back to our keynote
here, this is our steps in modeling.
And you might have noticed if you hadn't
looked at this in a while, we've
actually done each of these steps
throughout this entire section. We've
gotten our data ready. We've turned it
into tenses. We've built a TensorFlow
model. We fit the model to the data and
we've made predictions on the data.
We've evaluated our model through both
visualization and using evaluation
metrics. We've improved our model
through experimentation and we've saved
and reloaded our trained model. But now
we're going back to step number one,
possibly the most important step in this
entire pipeline. Getting our data ready,
turning it into tenses.
So we've turned all our data into
numbers. Remember neural networks and
actually most machine learning models
can't handle strings. They need
everything to be in numerical form.
Number two, we've made sure all of our
tenses are the right shape because as
we've seen before, if our tenses are the
wrong shape, we run into a whole bunch
of different issues. And number three,
we could do or we haven't done yet is to
scale features. So, normalize or
standardize. Now, neural networks tend
to prefer normalization. Again, these
are just words to you at the moment, but
we're going to see what both of these
are in a minute. So let's go back to our
notebook again. If you wanted to figure
out what normalization and
standardization are, what could you do?
Here's what I would do. Let's go. What
is normalization in machine learning?
If we zoom in here, so we got here,
normalization is a technique often
applied as part of data preparation for
machine learning. The goal of
normalization is to change the values of
numeric columns in the data set to a
common scale. Ah without distorting
differences in the range of values. All
right. So let's have a look at our data
set X train or maybe just X as a whole.
What does X look like again?
So we see here we've got age, BMI,
children. But do you see here how age is
on a different scale to what BMI is? So
if we go X, what do we want? We want
age.plot.
So see how these values are all over the
place. That's probably not the best
plot. Maybe we can go hist.
We want a hist.
H. Oh, we need the kind word. There we
go.
Okay. So, here's the distribution of our
age column. Now, what about our BMI
column? We want plot kind equals hist
is like this. So, what if we wanted to
get these both on a similar scale?
So, see this goes from 20 to 60. This
goes from 15 to 50. And then children
I'm assuming goes from something if we
go x
children do value counts. What's the max
number of children? Value counts.
So there's a lot of people with zero
children and there's a few people with
five children. So what if we wanted to
get these? So 15 to 50, 0 to 5, and 20
to 60. What if we wanted to get all of
them say between 0 and 1? Well, that's
what normalization does. If we come back
up here, the goal of normalization is to
change the values of numeric columns in
the data set to a common scale. Okay,
beautiful. Now, let's come back to the
keynote. Feature scaling. So, we have
some options here. Now, the scaling type
of scale, a lot of different names are
used for this, but I'm going to be
referring to it as normalization. But if
you hear scale features, typically
they're referring to normalization. And
what it does converts all values to
between 0 and one whilst preserving the
original distribution. The distribution
of our data is just the spread. So here
we can see that the most values occur.
What's this? This is BMI at about 30.
And as we get towards the tails here,
the values start to spread out. Whereas
the distribution of age is pretty pretty
fair along all the values. Maybe most
people in our sample are about 20 years
old. So we come back here.
Now if we wanted to do this, we can do
it with scikitlearn using the minm max
scaler. And when should we use it? This
is often used as the default scaler with
neural networks and it's the one we're
going to get hands-on with. Another very
common scaling type is standardization.
What it does is it removes the mean and
divides each value by the standard
deviation. You can use the standard
scaler scikit learn function. And when
to use this one? Maybe you want to
transform a feature column to have a
close to normal distribution. But
remember if we scale our features or
standardize our features, this reduces
the effects of outliers. So if we come
back,
this is a normal distribution. Let's
have a look. Normal distribution.
What does a normal distribution look
like? All right, there also known as a
gausian. So we come here. If we were to
if we were to change this age column to
be a normal distribution, what would
happen? It would reduce the effects of
this outlier value here. So see how this
column if we reduce them all to be the
shape of that the tails we might lose
out there. Now
what are we going to focus on? We're
going to focus on normalization.
So if we go here, in terms of scaling
values, neural networks tend to prefer
normalization. Now, how might we do
this? Or actually, let me just put a
little tidbit here. If you're not sure
on which to use, you could
try both and see which performs better.
Now, one more resource I want to show
you before we actually get hands-on in
code is Jeff Hail's article on scale,
standardize, or normalize with
scikitlearn.
So, this is a phenomenal article. I'll
make sure it's in the resources section
for you. But if you want to read through
this for a bit more information on what
scaling is, what's what the different
kinds are, uh why you should do it. So
here we go. Here's a great reason. Many
machine learning algorithms perform
better or converge faster when features
are on a relatively similar scale and or
close to normally distributed. Examples
of such algorithm families include
neural networks, which is what we're
building. But we're going to see this.
We I don't want you again to just
continually just learn from from reading
things. I want you to write code to see
see it for yourself. So let's let's jump
into that. Enough talking, Daniel. All
right. What's this first step? What
should we do? We might import We'll
start again. We'll start fresh. So
import pandas as PD cuz I want you to be
able to just come to this uh section in
the notebook later on and just run it
straight from here. Then we want to
import mapplotib.lib.py.
pipplot as plt and then import
tensorflow as tf. Wonderful. And then
we'll read in the insurance
data frame. So we go here insurance
equals pd read csv. We'll have to come
back up and copy the URL. Just the same
URL we used as before.
There we go.
We're going to copy this one just to
reinstantiate this data frame. That's
all we're doing so that we can start
from scratch and see what pre-processing
our data or normalizing our data does.
Here we go. We'll make sure there's a
it's a string and then we'll check
insurance.
Beautiful. All right. So, we've
discussed the concept of pre-processing
data normalization and standardization.
We come back to the keynote. We've
discussed the names of them. But now
it's time we're going to get hands-on.
We've reinstantiated our insurance data
frame. In the next video, we're going to
go through normalize the numerical
features and we'll also prepare the the
non-numerical features as well and then
we'll fit our neural network. So, I'll
see you there.
[Music]
Welcome back. In the last video, we
discussed the concept of pre-processing
data in terms of normalization and
standardization. Now, we're going to get
hands-on normalizing the numerical
features of our insurance data frame
here and then we're going to build a
neural network to learn on the features
once they've been normalized. So, let's
see this in practice. All righty. How
can we do this? So, to do this, we're
actually going to borrow a few classes
from scikitlearn. So what I'll do is
I'll code it up and talk through it as
we go. You follow along as best we can
and then we'll discuss it once we've
gone through it. So first of all we're
going to need or actually
to prepare our data we can borrow a few
classes from scikitlearn.
Wonderful. Okay. So from sklearnose,
we're going to need make column
transformer.
Remember, you know how to look up the
dock string of these things now, so you
can see what these mean if you're not
sure, but you can probably guess.
Import. We're also going to need min max
scala. And I'm not pulling this name out
of the air. One hot encoder.
If we come back to our keynote scale or
normalization, we can use the minmax
scaler in scikitlearn.
So we come back. Now what we're going to
do is create a column transformer.
This is very helpful. So why do you
think it might be called column
transformer?
Well, if you've guessed because we have
columns here and we need to transform
them in some way before passing them to
our neural network, you'd be correct.
But let's see it in action. Let's call
it just CT for column transformer. We'll
run make column transformer. And we
might just run this cell first
so that we can see what the dock string
is for make column transformer. Here we
go. Construct a column transformer from
the given transformers.
And we come here.
Transformer transformer estimator
columns string. Where's an example?
Example. Here we go. So, make column
transformer standard scaler. Oh, where
have we seen that one before? That's
this standard scaler.
And what else was there? One hot
encoder. All right. So, we're actually
going to need both of those, but instead
of standard scaler, we're going to use
minmax scaler, which is transform
features by scaling each feature to a
given range. So the default is between 0
and one which is what we want with
normalization.
So let's see this in action.
We're going to pass it the columns that
or the column names that we need to
normalize.
So let's put a note here. Turn all
values in these columns
between 0 and one. Now how did we know
this?
It's because when we look at our data,
when we explore our data, we've got age
here, we've got BMI, and we've got
children. These are the numerical
columns. We don't need to scale this uh
our target variable, but we might want
to scale our feature variables. And we
know for the sex, uh smoker, and region
columns, we're going to one hot encode
them just as we've done before. So,
let's come down here.
And now we need to make a oneh hot
encoder. And we're going to set this
parameter to the handle unknown equals
ignore. So this just means if there's
any columns that the one hot encoder
doesn't know about, just ignore them.
And then we're going to pass it the
columns we want to oneh hot encode.
Smoker and region. Wonderful. So have
all the brackets correct? Yes, there we
go. Beautiful. Now, we've got our column
transformer ready. So, as we pass our
data through this, it's going to get
minmax scaled on these columns and one
hot encoded on these columns. So, now
let's create our X and Y values because
remember we just reimpported our data
frame as a fresh.
So, X
is going to be insurance. drop charges
and then we'll do that on the first
axis. Y will be what are we trying to
predict? We're trying to predict
charges. Beautiful. No typos there. And
now what do we want to do next? We've
got our X and Y, but what do we want our
model to learn on? Remember our three
data sets. We want to train our model on
some training data and evaluate it on
some data it hasn't seen. In other
words, test data. So the beautiful thing
here is that we can use scikit loans.
Oh, we should put this up here from
skarn
dot model selection
import train test split.
So now let's uh build our train and test
sets. So we want X train X test Y train
Y test equals train test split X Y test
size equals 20% of a test data set and
the random state we're going to set to
42 so that the split happens exactly the
same as you can scroll up before as we
did before otherwise it'll be random and
we'll get different results. So now
we're going to fit the column
transformer
to our training data.
So the important thing here is that
whenever you have some sort of column
transformer, you want to fit it to your
training data and then use that fit
column transformer to transform your
test data. Because otherwise if you do
that separately, remember the test data
is data the model has never seen before.
So it's basically data from the future.
So if we're transforming our training
data set with information from the test
data set, it's like taking knowledge
from the future and altering the data
that we have now. So let's go here.
CT.fit
on X-rain.
Now we want to
transform training and test data with
normalization
minmax scala
and one hot encoder.
So let's go x-ra normal for normalized
equals ct. transform. So now we're
taking what we've learned from the
training data and we're transforming it.
We're normalizing the features and one
hot encoding the features that we've
defined up here. And now we've got X
test normal equals CT transform
X test.
Beautiful. So now we've been through a
fair few steps here, but when we break
it down, we've just created a column
transformer with the miniax scaler and
the one hot encoder. We've turned our
data into features and labels. We've
split our data into training and test
sets. We fit the column transformer to
the training data only and then we've
transformed our training and test data
uh with normalization and one hot
encoding. Let's see what Oh, we got a
typo here.
Beautiful. That runs without errors. Now
we've normalized our data and one hot
encoded it. Let's check out what it
looks like.
What does our data look like now?
So, we want to go X-train.lock.
Let's look at the first one.
Oh, sorry. We want to go Oh, well,
that's what it originally was. So, then
if we go X-train
normal.lo,
maybe we just want the first one.
All righty. So, here's what we started
with.
Age is 19, sex is female, BMI 27.9,
children, smoker, region. Now, we've got
a value here. I'm guessing that's the
age. This must be the BMI. This must be
the children. And now all of these other
values are one or zero.
Beautiful. So, is that the same for
another sample?
And how about another sample?
Wonderful. What does the whole thing
look like?
Ah, it's all in numerical format. Now,
what does that mean? Well, if you
guessed, we can pass this to our neural
network as all encoded data. You'd be
100% correct. So, let's check the
shapes. One more thing
or we'll keep that sample there.
And then we'll check the shapes of our
data. Now how has our shapes changed? So
x-ra shape and x-ra normal shape.
Ah okay. So you see our x train value
had 1 2 3 4 5 six
six different columns here. Well now
since we've normalized it as well as one
in hot encoded it we've actually added
some extra columns here. So we've got 1
2 3 4 5 6 7 8 9 10 11. All righty. So it
looks like that our data is ready to
build and pass to a neural network
model. So what we might do is actually
I'll write a text cell.
We'll write beautiful cuz it is
beautiful. Beautiful. Our data has been
normalized and one hot encoded. Now
let's build a neural network model on it
and see how it goes. So the challenge
for you is to now build a neural network
model to fit on our normalized data. So
you probably want to build one similar
to the insurance model we built before
just in the same way we've been doing
it. Build the model, compile the model,
fit the model, and then evaluate it. uh
so fit the model on the training data
set and then evaluate it on the test
data set. So if you want to give that a
try, go right ahead. But otherwise,
we'll take care of that in the next
video.
[Music]
Welcome back. Did you give it a shot?
Did you try building a neural network to
fit on our normalized data? If not,
that's perfectly fine. That's what we're
going to do in this video. So, let's get
started. We'll set the random seed so we
can have as much reproducibility as as
possible. Set seed 42. Um, and now we're
going to step one is create the model.
Now, if we if you scroll back up, you'll
see that our best performing model is or
maybe you just remember was insurance
model 2.
Now, if we get a summary of what that
was, insurance model 2 dot summary,
we're going to just reproduce the same
model.
Oh, another typo,
but this time we're going to call it.
So, there we go. Three layers,
one dense of 100 units, one dense of 10
units, and the output layer of one unit.
So let's reproduce this and we'll make
it insurance model 4 equals TF carers
sequential
and we'll come right to the start. We'll
go TF carers layers. We'll add the first
hidden layer to our model with 100
hidden units. TF carers layers dense 10
and TF carers layers dense one. There we
go. We reproduce that. Now what's the
second step when we're creating our
models?
Step two is to compile the model. So
we'll take insurance model 4 compile and
the loss function is going to be TF
carers losses.mmael
and the optimizer is going to be TF
carers optimizers. We're going to use
Adam because that's the same optimizer
we use for model 2. And then we can go
metrics equals mae.
Beautiful. Now number three, here's
where the different part is going to be.
So insurance model 2, we fit it on, if
you scroll up, you can check it out on
unnormalized data. So we just fit it on
X-rain only with one hot encoding, but
not normalized.
So to change the experiment, here's
where our experiment, we put on our
experimentter hat, we when we're running
these experiments, we're just changing
one small thing at a time. So the thing
that we're changing in this experiment,
insurance model 4, is the data we're
using. Everything else remains the same.
X-ray normal because we want to see, but
it can still fit on the same labels and
we want to fit for the same amount of
epochs.
So we can still see or sorry we only
change one little parameter here so we
can see how that influences our model
now. You ready? And go.
Beautiful. Training is going to be nice
and quick.
All right. Okay. MAE 3636.
Now we're going to evaluate our
insurance model trained on normalized
data. So we got insurance
model 4 evaluate. Let's get it on. Now
we have to evaluate it on the same type
of data it was trained on. So that's an
important point too is because we
trained it on normalized data, we have
to evaluate it on normalized data. Shift
and enter.
Insurance. I keep typing N before A.
There we go. How did it go? Okay. 3438.
Now, if we come back up to our insurance
model 2, which was the best performing
model so far, where was that? So,
there's insurance model 3. All right.
So, that didn't do too well on the test
data set. Now, what we're after is
insurance model 2. Oh, look at that.
Evaluating the larger model. So, if we
copy this, let's copy this so we can
compare our other model.
If we were doing this properly, we might
have saved the results of insurance
model 2 to a variable so we didn't have
to go up and copy this. But let's make a
code cell. So this is insurance model 2.
Insurance model
2 results. Look at that. What a
reduction. So that's 5,000 MAE. Just by
normalizing our data, we've gone from
5,000 MAE to 3 12,000 MAE. That's
incredible. That's a reduction of like
30% or so. So now I hope you're starting
to see the benefits of all of the
different hyperparameters we can tune
with our model. So not only can we tweak
how we construct our models, how we
compile them, how we fit them, we can
also tweak the data that we pass it to
them. So if we come back and look at our
keynote, this is what we've done. We've
done the scaling type. All we did was we
convert the values or the target
features that we wanted to convert to be
between zero and one. That's all we
changed. And we did that with about 10
lines of code here. And those 10 lines
of code saw a reduction in error of
about 30%.
How cool is that? So that's that's the
experimental mentality that I I want you
to start developing throughout the rest
of this course is that the first result
you get when you build a neural network
model is often is actually it's
definitely not your last result because
there are lots of different things that
we can tweak. We can tweak the model. We
can tweak the data. We're going to have
a lot more experience with this going
forward. But that's one of the main
benefits of normalization is we see a
faster convergence time. What that means
is that our model gets to a a better
result. I mean, if we trained insurance
model 2 for longer, say for 200 epochs,
it might have reached this error rate
here. But when we normalize our
features, our models tend to converge
faster. In other words, they get better
results faster in less epochs.
So again, normalizing data doesn't
guarantee improved results, but it's
something worth trying because of how
easy it is to implement. And in fact, in
a lot of our pre-processing code that we
do going forward, normalization will be
built in, such as when we're dealing
with images. All righty. So I think we
have covered well and truly enough for
neural network regression with
TensorFlow. Have a look at this. Look
how far you've come. You should be
incredibly proud. We started by creating
data to view and fit. We dealt with
input and output shapes. We went through
our steps in modeling with TensorFlow.
If we come back this big boy here, look
at that. We went through all of these
different steps. How cool is that? Now,
we improved our models. We evaluated our
models through visualization and
evaluation metrics. We've run a whole
bunch of experiments. We've loaded and
saved our models. and we've even gone
through a larger example from an
open-source data set. So, give yourself
a pat on the back. I've set up some
exercises and extracurriculum activities
to go along if you want to test your
skills even further. I'd really
encourage you to try out the exercises
before looking at the solutions that
come along with them. But otherwise, go
back through all of the concepts if
there was something you didn't really
understand. Practice it. Write a bunch
more code. Get a few things wrong. See
what works. See what doesn't. And I will
see you in the next section.
[Music]
Neural network classification with
TensorFlow. Now, before we get into
anything what we're going to cover or
any code at all, I need to stress
something. And that's where can you get
help? I'm going to be writing a bunch of
code. So, or we're going to be better
yet, we're going to be writing code
together. So, make sure you follow along
with the code. If you can remember our
motto, if in doubt, run the code. Don't
forget to try it for yourself.
If you need the dock string, which is a
bunch of information about the functions
that we're going to be using. If you're
using Google Collab, you can press shift
command plus space. Or if you're on
Windows, this might be uh control. If
you're using Jupyter Notebook, it's
probably going to be shift and tab. If
you're still stuck, you can search for
whatever problem you're having. And
we're going to get a lot of practice
searching for our problems as well,
because that's what I I want to stress.
As much as we can teach in this course,
you're still going to have to develop
the skill of being able to search for
answers to your own problems. Now, if
you do search for it, you'll probably
come across resources such as Stack
Overflow or probably more importantly
the TensorFlow documentation since we're
going to be writing a lot of TensorFlow
code. And then after you've searched for
it, try again. Rewrite the code if you
have to look at the examples in the
TensorFlow documentation and copy them
out by hand and try them in your own
notebook. And then finally, this is
probably the most important out of all
of these steps is if you're still stuck,
ask a question. Don't forget the Discord
chat. And that means including those
dumb questions that you think you have.
All right. The person who asks the most
dumb questions gets smartest the
fastest. Oh, and actually, I want to add
one more to this list, and that's the
course, GitHub. So, if we come to here,
GitHub,
this is Mr. D. Burke. That's me.
TensorFlow deep learning. This will have
all the materials related to the course
you're currently going through. Um, I'll
leave a link. At the moment, it's still
a work in progress, but by the time you
go through this video, it'll have a
bunch more stuff and everything you
need. Now, every notebooking concept we
go through has a ground truth notebook.
So, for this one, 02 neural network
classification and TensorFlow, this is
the the information we're going to go
through in the videos. We're going to be
writing all of this code out. However,
the notebooks here, so this one in the
GitHub, have a lot more text around the
code that we're writing. So if you want
a more textbased explanation to go along
with the videos and along with the code,
so say in the video we just write this
code, if you want all this annotation
around it, be sure to check out the
TensorFlow deep learning GitHub and
that'll have all the information you
need. So without any further ado, let's
get back to our keynote. And since we're
covering neural network classification,
you're probably asking yourself, what is
a classification problem? Well, let's
have a look at some example
classification problems.
The first one is binary classification.
Is it is something one thing or another?
Now, an example of this would be asking
the question, is this email spam or not
spam? That's actually probably a trend
you'll see in binary classification. Is
this thing something or not something?
So, here we've got clearly a nice email
to my email there, daniel@mrdeber.com.
Hey, Daniel, this deep learning course
is incredible. I can't wait to use what
I've learned. Oh, thank you so much.
That's a beautiful email. I definitely
want to see that in my inbox. So, we
might train a machine learning model or
a neural network to classify this text
as not spam. And then if I have this
email here, which is again to my email,
daniel@mrd.com.
Hey Daniel. Uh, congratulations. Oh,
that's a one. A bit of a typo there. You
win a very large sum of money. Hm. As
much as I'd like that large sum of
money, that's probably spam and I don't
want to see that in my inbox. So, that's
binary classification.
Now, another very common classification
problem is multiclass classification.
This is where you'll have a question
such as, is this a photo of sushi,
steak, or pizza? So, we have a photo of
sushi, we have a photo of a beautiful,
delicious steak, and then a delicious
pizza. Now, multiclass, as you might
have guessed, is more than one thing or
another. Now, we've got three classes
here. Sushi, steak, pizza. But in
multiclass classification, we could have
a 100 different classes or a thousand
different classes, but the principle
would still remain. Is this a photo of
say we had a 100 different foods. Uh
maybe we added over here um a sandwich.
So the principle would still remain.
There's multiple different things that
these photos could be of. So multiclass
classification not only relates to
different photos of food, it could be
different photos of cars, could be
different photos of animals, anything
you could imagine. And finally, another
very common classification problem would
be if you are asking what tags should
this article have. So here we have the
deep learning article from Wikipedia.
Now maybe you want to if you're
Wikipedia you want to sort your articles
because you have millions of them into
different categories and so just
labeling it with one category such as
deep learning may not be enough. You
want to relate it to other things such
as machine learning or representation
learning or artificial intelligence. So
in this case multilabel classification.
Now, these two are going to be pretty
confusing to begin with. You just need
to to figure out a couple of pro example
problems to sort of uh start to
understand them. But I know when I first
started deep learning, I often got
multiclass classification and multi-
label classification mixed up. But the
easiest way or the way I remember
multilel classification is that there's
multiple label options per sample. That
means that this one sample, this one
article here could have three different
tags or it could have five different
tags. Whereas in these photo examples in
multiclass classification, we only want
one label per sample. So we have one
photo and the one label would be sushi.
One photo, one label would be steak and
so on for pizza. Whereas in multilabel
classification, we have one sample and
multiple labels.
So now that we've had a look at some of
the most common classification problems,
you're going to come up up against
binary classification, multiclass
classification, and multilel
classification. Let's have a look at the
things we're going to cover or this
specifically the things we're going to
write code for.
So here's what we're going to cover
broadly. Now I say broadly because these
are just subheadings. What really
matters is what we actually write code
for. So we're going to look at the
architecture of a neural network
classification model. We're going to see
the input shapes and output shapes of a
classification model. In other words,
the features and labels. We're going to
look at creating custom data to view and
fit. We're going to go through the steps
in modeling such as creating a model,
compiling a model, fitting a model, and
evaluating a model. We're going to see
what different classification evaluation
methods we have. And then finally, once
we've trained some models, we're going
to look at how we can save them and load
them. And how are we going to do that?
Well, we've seen this before, but we'll
see it again. We're going to be cooks,
not chemists. Cooks are very
experimental. They try a whole bunch of
different things and they might measure
something a little bit, but they're also
going to just if if a recipe isn't going
exactly how they want it, well, they're
going to improvise. So, that's what
we're going to be doing. We're going to
be cooking up lots of code. So, now
we've covered some example
classification problems and what we're
going to cover. In the next video, let's
check out what the inputs and outputs of
a classification model might look like.
[Music]
So, let's say we're working on a
classification problem and we saw this
before, the idea of taking some photos
and then perhaps we're building an app
that we want to be able to take a photo
of whatever food we're eating and
classify it as sushi, steak, or pizza.
That's the app that we're building. What
might our inputs and outputs look like?
Well, we've kind of already got them
here. Our inputs are going to be some
kind of image of food. And then our
ideal outputs, something over here will
be if we took a photo of this one, we
want it to be sushi. And if we took a
photo of this steak, uh, we want it to
be steak. And then if we took a photo of
pizza, pass it through our app, the
ideal output would be pizza. Now, what
might this look like in practice?
So, all right. Well, here's our little
scenario here. This is our machine
learning algorithm that we might build
or deep learning algorithm. And we we've
already discussed what our inputs are
and what our ideal outputs are. But
what's the what's the rule that we have
to before we can pass our inputs to our
machine learning algorithm? Or better
yet, what does our machine learning
algorithm like to look at? Can it
understand photos straight away? Like if
we just took a photo and just said,
"Hey, machine learning algorithm, tell
us what's in this photo." Will it be
able to do that? Well, if you remember
from a previous section, what do we have
to do to our inputs before we pass them
to our machine learning algorithms?
Oh,
maybe this is a little inkling. So, if
you're not sure, that's perfectly fine.
But what this is is so if we have a
photo let's say we already know what the
the width is and what the height is. So
it's 224x 224. So this is a square
image. Now what does that sort of what
are we moving towards?
We're starting to numerically encode
what an image is. So, we're starting to
break it down from being just an image
to being okay, this is a a 224x 224
image. All right, we know a little bit
more information about our image. So, we
know the width, we know the height. Oh,
what's C? We might know the color
channels. So, that's three. So, C equals
color channels, red, green, and blue. So
we might know that some combination of
red, green, and blue pixels in 224x 224,
some combination of those different
color values are going to give us this
image of sushi. And the same thing for
steak and the same thing for pizza. Now,
these width and height are arbitrary,
but very often you'll see you'll change
your numerical inputs of a machine
learning model to have all of the same
width and all the same height when
you're dealing with images. So, what
might we do before we pass these inputs
directly to our machine learning
algorithm?
Well, we're going to turn them into a
tensor. In other words, a numerical
encoding. Now here we've got normalized
pixel values. So that means we've
changed them to be whatever they were in
terms of red, green, and blue to be some
value between 0 and one. We're going to
see this very hands-on in upcoming
videos. So don't worry if this isn't
making sense. The the concept here is
that we have images and we have to turn
them into numbers in some way, shape, or
form before we can pass them to our
machine learning algorithm. And then
our machine learning algorithm often
already exists if we're using something
like transfer learning. If not, we can
build our own custom one. And then we
pass it through. The machine learning
algorithm is going to learn patterns in
the data and then create some prediction
outputs. So let's see here. What have we
got? Okay, we've got another tensor and
it seems that the highest value here is
highlighted for sushi. All right, but we
get 0.00 for steak and 0.03 for pizza.
Is that the first image? So that would
be correct. Okay, that's great. But then
this second image, the one of the steak,
the highest value, it predicted sushi.
And then these two values are lower
um for steak and pizza. So that would be
wrong. This one. Okay. So one out of
two, not too bad. And then this final
one here, say this is for the pizza
image. Same process again. Turn it into
numbers. pass it to our machine learning
algorithm and then it's got a predicted
output here of 0.03
uh 0.07 for steak and then 0.09 for
pizza. So the highest values for pizza
and so that would be correct. Wonderful.
Now how does it generate these? Well,
if we have a look what are the actual
outputs that we were looking for in our
case we're building an application to
classify different images of food. We
take a photo of food. We run it through
our machine learning algorithm and then
the ideal output is for this photo is
sushi, this photo is steak and this
photo is pizza. So
if we wanted our machine learning
algorithm to be able to create predicted
outputs like this, perhaps we can show
it lots of different examples of images
and their ideal output.
Ah okay. So this is the premise of
neural network or machine learning
classification. This is our inputs over
here. We numerically encode them, pass
them to our machine learning algorithm
and then have some sort of output. All
right. So the example we've seen here is
for multiclass classification because
there's one or more each image or each
sample can be one or another class.
However, this premise of taking some
sort of samples, numerically encoding
them, passing them to a machine learning
algorithm, and then have something some
sort of output goes for all different
types of classification inputs and
outputs. So, now we're we're familiar
with the highle overview of what a
classification inputs and outputs will
be.
Let's have a look at what the shape of
these tenses would be because that's
another very important point as we'll
see in practice later on. But before we
do so, I want you to have a think about
if we've got these inputs, what might
the the shape of our input tensor be?
And if we've got these outputs or these
ones here, what might the output shape
of our output tensor look like? Have a
think about that. And we'll cover that
in the next video.
[Music]
Welcome back. In the last video, we
checked out an example multiclass
classification problem or in other
words, the inputs and outputs of what
that type of problem may look like. So,
we had some images here, food images,
and we wanted to take a photo of them
and classify them as what's in that
image. So, in our case, we had sushi,
steak, and pizza. We learned that no
matter what classification inputs we
have, we have to numerically encode them
in some way. So for an image, we grabbed
the pixel values. So we learned that the
width was 224, the height was 224, and
there was three different color
channels, red, green, and blue. And we
figured out that we can turn those into
a tensor and pass that to our machine
learning algorithm. It's going to figure
out some patterns in these input tensor
or in the input images and then create
some predicted output that's based on
looking at lots of actual samples of
image to label pairs. So knowing that
our inputs are in the form of a tensor
and the outputs are in the form of a
tensor, let's now have a look at what
the input and output shapes of those
tenses are, which is a very important
concept because we're working in
TensorFlow. All of our data is going to
be contained within tenses. Inputs and
outputs of our neural networks, of our
deep learning models, of our machine
learning models are all in tenses. And
if the wrong shape, well, we run into
errors. So this is for an image
classification example. We're continuing
off with what we just saw. We have our
little setup here. Inputs, machine
learning algorithm, outputs.
We have our image of food. In this case,
it's just the sushi one. And we know
that the the width is 224 and the height
is 224. And there's three color
channels, red, green, and blue.
So what we might do is numerically
encode this, pass it as an input to our
machine learning algorithm, and it's
going to output some kind of prediction
probability. We haven't covered this,
but we will see this in code later on.
This is just more so the concept. Now
the reason why there are three options
here is because we were building a
multiclass classification image
classifier
where the output could be sushi, steak
or pizza. And in this case, our machine
learning algorithm has got this one
correct because the highest value is for
sushi and that's what the input image
is. Now let's check out what or how our
image would get represented as a tensor.
So here we go. These might be the
dimensions of our input tensor. Batch
size, something we haven't covered yet,
but that's all right. We'll see that
coming up. Width, height, color
channels. We have covered these. So the
width here is 224. Uh the height here is
224. And the color channels would be
three for red, green, and blue.
Now the shape of this tensor will be for
batch size might be none 224 224 3. Do
you see how these the width value goes
into here the height value goes into
here and the color channels go to here?
Or we might have a shape of 32 for the
batch size. Uh 224 for the width, 224
for the height and three for the color
channels. Now, I've set 32 here because
32 is a very common batch size. All you
need to know for now is that often times
when we're training a machine learning
algorithm, uh depending on how much data
we're working with and depending on the
size of the computing chip that we're
working with, it may only have enough
memory to look at 32 samples at one
time. So say we're working with 10,000
images, which is very common in image
classification.
Our machine learning algorithm may only
look at 32 images at a time so that it
doesn't run out of memory. And again,
the reason why it's none or 32. 32 is
very common. I think it's actually the
default batch size in TensorFlow. But
none is because this could be an
arbitrary number. And it's important
here that the width of this image isn't
set at 224 and the height isn't set at
224. But again, these are very common
values you're going to come across. So
that might be what our input tensor
shape looks like for a single image. Or
if we have a batch of 32 images, it
might be of this shape. And then if we
have a look at our output, the shape of
this might be three because we're
dealing with multiclass classification
and we have three potential classes. So
that's the shape of this output tensor.
Now of course this is for our image
classification example. However,
these shapes will vary depending on the
problem you're working on. So say what
if we had 10 different types of food
here. What do you think the output shape
would be? Well, it might be 10. And say
this image was 300 by 300. What might
change here?
Well, the width and the height might
change to 300 and 300.
So we're going to get very hands-on with
this going forward, but this is just uh
to demonstrate conceptually what we have
to do for our classification problems to
to work on them with deep learning and
machine learning algorithms. We have to
take our data numerically encode it into
some way which is typically represented
as a tensor and because we're using
TensorFlow it will always be a tensor
and then the same with the output.
Depending on what outputs we have, we
have to define them as some sort of
tensor. And they will typically have a
shape dealing with the number of classes
if you're working on multiclass
classification. That is the output shape
will have the same number or will be the
same number as the number of classes we
have. But again, this is just a
conceptual overview. It'll start to make
a bit more sense once we start to build
classification neural networks. And
speaking of which, let's have a look at
that in the next video. What the typical
architecture of a classification neural
network looks like.
[Music]
Welcome back. In the last video, we had
a look at what the input and output
shapes of an image classification
example might look like. And remember
this is specifically for an image
classification example. These may change
depending on the problem you're working
on. So say we are working on text
classification. Your input shape may
differ to be the number of words in a
string. So it may be 32 and then 100 for
100 words. And it won't have these
dimensions over here. It'll just be one
number for each word in that string. And
then the same over here. Say we would
just wanted to classify an email as spam
or not spam. We may have an output shape
of two.
Again, lots of practice coming up, but
just keep in mind uh one of the big
things for classification problems is
defining the input output shapes of our
machine learning or deep learning
algorithms. So now speaking of those,
let's have a look at what the
architecture of a classification model
might look like. We've got a little
spoiler alert here. Here's some
TensorFlow code. We've been very
familiar with this in uh in the
regression section. However, we might
notice a few different things. Oh, we've
got input layer. All right. Uh we've got
activation. We haven't seen much of
that. We've got another activation here.
Softmax. All right. What else is
different? We've got loss. Okay. So,
we're familiar with these steps. 1 2 3
4. The main differences you might see
here is the loss function and the input
shape and the activation H. We haven't
been hands-on with those just yet, but
throughout the classification section,
as you might have guessed, we definitely
are going to be. So now, it's important
to note that this is a typical
architecture of a classification model,
and we're going to building lots of
these. So although I'm giving you sort
of a a guideline as to what these
architectures may look like, they
definitely vary depending on the problem
we're working on. However, this is
pretty universal and again this is
adapted from page 295 of the hands-on
machine learning with scikitlearn caras
and tensorflow book by Aurelion. Highly
recommended that book. It'll be in the
extracurriculum section of this course.
So let's take a look at what's going on
here. We have hyperparameters and
remember a hyperparameter is something
that we can adjust ourselves as
developers and we might have a binary
classification problem and a multiclass
classification problem. Now note here we
haven't got multilel classification.
However multilel classification uh
architecture is often very similar to a
multiclass classification.
So let's start with the input layer
shape. If we go here and all of these
colors are going to activate for me
automatically, aren't they? But that's
all right. We'll step through it one by
one. Input layer shape. We have here TF
carers input. So here's where we're
defining the input layer shape. Do you
remember in just the previous example we
looked at before? It might be the same
number of features, eg five for age,
sex, height, weight, smoking status in a
heart disease classification problem.
But we looked at an image classification
problem. So here we might have the input
shape of our of a particular image might
be width, height, number of color
channels. So that's what our image looks
like in tensor form. Again, this shape
will change depending on the problem
you're working on. Now for multiclass
classification, it's going to be the
same thing as binary classification.
Now hidden layers, again, this is plural
on purpose. We've got one here and this
is problem specific what you set this
to. The minimum is going to be one. The
maximum is unlimited. So I think we're
actually going to be working with a
neural network with over a 100 layers.
That might be in this section or a later
on section. But this one just has one
hidden layer there. And for multiclass
classification, we can look at the same
value here. Can be one or up to
unlimited
neurons per hidden layer. That's this
green value here. Problem specific
generally 10 to 100. Again, this can
wildly vary depending on what you're
working on. In our case, we've set 100
here to the hidden layer. And for
multiclass classification, we've got
same as binary classification.
Now, the output layer shape for binary
classification, it's going to be one.
So, remember binary classification is is
something one thing or another. Is this
email spam or not spam? So we want the
output layer shape to be one. And
multiclass classification is going to be
one per class. Eg three for food,
person, or dog photo. If we were working
on an image classification problem to
decide whether a photo was of food, of a
person, or of a dog. And that's the
value we've put here, three. So in our
case, this input shape might actually
work for this problem here. Sorry, this
output shape. But the input shape would
also work because it could potentially
if these images, so the images of food
of p of a person or a dog were 224
pixels wide, 224 pixels high and had
three different color channels once we
turned them into tenses of course. Now
we have here the hidden activation. So
this is the hidden layers. So the
activation function that we have within
the hidden layer. So it's often defined
with the activation parameter. Now it's
usually relu which is rectified linear
unit. We haven't seen that before but if
you want to go ahead and do a little bit
of research before we get into coding
feel free to look up relu. And again
same for multiclass classification.
We're going to use a a very now again
there are multiple different hidden
activation functions but I say usually
relu because that's what you're going to
see very often and that's what we're
going to be using throughout the course.
Now, output activation. This is the
activation function in the output layer.
Again, a concept we haven't really
covered so far, but I'm just making you
aware of what the names of these things
are. So, for binary classification,
we're probably going to be using a
sigmoid activation for our final layer.
And for a multiclass classification
problem, we're probably going to be
using a softmax activation, which is
what we've got here. All right. So this
this uh architecture looks like it's for
multiclass classification.
Now loss function
across the board we're going to be using
cross entropy which is a loss function
for classification problems. Now for
binary classification we're going to be
using binary cross entropy in tensorflow
and for multiclass classification we're
going to be using categorical cross
entropy in TensorFlow. So it looks like
here when we compile our model and
define the loss function we've used TF
Keraser's losses categorical cross
entropy which makes sense because we
discussed before that this is a
multiclass classification architecture
and as for the optimizer we're going to
use old faithful stoastic gradient
descent or another old faithful is Adam.
So actually I think hm we might test out
both of these but I think we're we're
probably going to be sticking with atom
because atom is safe and and then we go
here multiclass classification same as
binary classification.
All right so that's a typical
architecture of a classification model.
Again a lot of things here that we
haven't necessarily covered but we are
going to be getting hands on building a
lot of these and I think that's what
we've got here. Yeah we're going to be
building lots of these bad boys. And the
reason you might be wondering why I said
Adam is safe. I probably I want to
reveal that. So let's go here. Andre,
here's another external resource. Neural
network recipe. I'll put this in the
extra curriculum. A recipe for training
neural network. So this is by Andre
Kapathy. If we go here, Adam. A few tips
and tricks for this stage. Adam is safe.
In the early stages of setting
baselines, I like to use Adam. is
talking about the atom optimizer with a
learning rate of this. I mean learning
rate Adam's learning rate is actually
very good in TensorFlow its default one.
In my experience, Adam is much more
forgiving to hyperparameters including a
bad learning rate. For connets, a
well-tuned stocastic gradient descent
will almost always slightly outperform
Adam. So again, a lot of things here you
might not be aware of, but just know
Andre Kapathy is saying Adam is safe. If
you're wondering who Andre Kapathy is,
let's go to about
see my website.
I am the senior director of AI at Tesla.
So Andre knows his stuff about deep
neural nets. So if Andre says Adam is
safe, we're going to trust him. But
anyway, actually, let's be skeptics.
Let's not trust Andre and let's test it
out for ourselves. So, in the upcoming
videos, we're going to get
Oh, actually, I had a little demo of
what was happening here.
So, here's our inputs. They get encoded
into tenses of this shape. And then
here's our outputs. So, that completes
that slide. But with that being said,
we've talked enough.
Let's code.
[Music]
Let's code. Now that we've looked at
what a classification problem is, what
some tensor input and output shapes
might look like, let's get hands-on and
start to code some classification code.
We're going to start a new notebook at
at Collab.
Wait for this to load up. Beautiful. Now
I'm going to call this one O2 M neural
network
classification
with TensorFlow and I'm going to add the
video tag so that you know once you see
this uh notebook in the GitHub that this
notebook was made during the recording
of videos. Remember we've got in the
GitHub uh the ground truth version of
this notebook doesn't have the video tag
on the end here but we'll get out of
that. So let's write here
introduction to neural network
classification
with TensorFlow.
So in this notebook
we're going to learn how to write neural
networks for classification problems.
And remember what's the definition of a
classification problem? A classification
problem is where you try to classify
something as one thing or another. And
there's a few types of classification.
You've got binary classification,
multiclass
classification, and multi- label
classification.
And we've got here a few types of
classification
problems. Now we'll turn this into
markdown. I'm going to press commandm.
You might press controlm if you're on
Windows. So we've got a little intro
here. Now the first thing we might have
to do is before we can write any code is
we need some data. So how about we write
a title here creating data to view and
fit. And there's a lot of toy
classification data sets out there, but
I always like uh creating our own and
getting a model to work on our simple
data set before moving on to an actual
problem. So it's like a rehearsal
experiment before our actual
experiments.
Uh how about we to make some data we can
go from sklearn
data sets import make circles
and then we'll go here. We want to make
a thousand examples. So we want a
thousand different samples of data. And
we can create circles. Oops. N samples
equals 1,000.
And then if we should go down to here,
create circles. XY equals make circles
and samples.
Again, you could change this to 10,000
if you wanted to or 100. I'm going to
stick with a th00and. That's a good
number. And samples noise can be 0.03
and random state can be 42 if you want
to re make this reproducible. So we have
the same data. Now you might be
wondering Daniel, we've never used make
circles before. And you'd be 100%
correct. This is the first time we've
used this in the whole course. But how
do you get help on something if you were
weren't sure? We want to press command
shift space.
Deaf make circles. So make a large
circle containing a smaller circle in
2D. A simple toy data set to visualize
clustering and classification
algorithms. Beautiful. So if we really
wanted to to figure out what this is is
how to make
example classification
data with skarn.
There we go. An introduction to machine
learning with scikitlearn examples. So,
we got a few options there. I've just
chosen that's that's basically what I
did when I found out about this make
circles function. And there's probably a
way to do that in TensorFlow. I'm not
quite sure if it's as quick as this one,
but if you want to give it a try once
you once you see what this data looks
like, feel free. So, now let's check out
the features,
which is under the X name space, usually
a capital X. Okay, so we have an array
here. And let's check out the labels.
Check the labels.
All right, ones and zeros. So we've got
two options here. I'm guessing one of
these samples here, so this is a sample
has this label number one. And another
sample further along here, say this one,
has this label zero. We might just make
this a bit smaller.
Oh, check the labels. That's what we
want there. So, if we have two label
options and we're working on a
classification problem, which one of
these three are we working on?
Is it binary? Is it multiclass? Or is it
multilel?
Well, it's not multilel cuz each one of
these has one label. So, each sample has
one label. And it's not multiclass
because there's only one or zero. So
this is a binary classification problem.
But if we're looking at our data like
this, I mean this is I could look at
this. I've used this before so I kind of
know what this means. But if you're
looking at this for the first time, it
it doesn't really it might not make
sense. So what's our data explorer's
motto? Come back to the keynote.
the machine learning or even the machine
learning explorer's motto. Visualize,
visualize, visualize.
What's the first thing we should
visualize the data? We could visualize
the model, the training predictions. And
it's a good idea to visualize these as
often as possible. So, let's see how we
might visualize our data. Uh let's write
ourselves a little note. Our data is a
little hard to understand
right now.
Let's visualize it.
So to get it into a structured format,
I'm going to import pandas as PD. Then
I'm going to turn it into a data frame
called circles. PD dataf frame. And I'm
going to label it
um x0 for this element here or this this
sample here. And that can be x all of
the items in the zeroth axis. And then
x1 can be x. We want all of the items in
uh the first axis.
There we go. And then we want the label
label column to be y. There we go. And
let's look at our circles data frame.
Typo there, of course.
circles. Beautiful. Okay. So now this is
a little bit easier to understand as it
is. We've got x0 x1. So we have two
features per label. And so this
coordinate 0.7542246
and 0231481
has the label of 1 and so on and so on
times a,000 because we set n samples to
a,000. Now, this is still a little hard
to understand. I wonder if we can
visualize this with a plot. Let's try
that out. Visualize with a plot.
So, how might we do that? Import mapplot
lib. I like to visualize as much as
possible before I start writing neural
network code. plt.scatter.
We want to scatter plot. Scatter plot's
a very good plot. Um, we want to just
plot this.
And the color can equal Y. And the C map
can equal,
this is just the color layout. I want it
just to be red, yellow, blue. I think
that should do. Let's have a look.
Oh, look at that. Now, so seeing this,
if we read the dock string of make
circles, has it done what it says it
it's going to do? Make a large circle
containing a smaller circle in 2D? Uh, I
think it's done that we have a large
circle and a smaller circle. Now, from
this plot, can you tell what type of
model we're going to build? It's okay if
you can't, but I just just guess like
what would be what are we trying to do
here? What would we try to do? I'm
giving you a little hint here by running
my pointer in between the two circles.
How about we build one to classify red
or blue dots?
So, in other words, a model, we want our
model to to potentially draw a line
right through the middle of these two.
So if we were trying to predict on
another 100 rows and we had values like
this, would they be a zero or a one?
Would they be red or blue?
Now I want you to think about before we
we go ahead. This is just a conceptual
thing. What is the difference between
the data we're looking at here and the
data we've looked at in our regression
notebook? So if we start a new tab um
what is a
regression problem and then if we went
to images
what's the difference between this data
here
and this data here
so have a think about that just look at
this like Google regression problem
check out the images and just have a
think what's the major difference
between our our two types of data here
and one more thing before pushing
forward. So that's your challenge just
to compare these two types of data. I
want you to have a look at tensorflow
playground.
Oh.org not.com. Here we go. Tinker with
a neural network right here in your
browser. Don't worry, you can't break
it. We promise. So again, we've got a
little few options here over what data
set we want to use. And what's this one
look like? I mean, we got a few. Watch
them change over here. We've got this
one. What's the similarities between
this data set and this data set? So,
your two exercises before the next video
is to compare this one here. You might
have already done that because we've
talked about it enough. And then the
other one is to spend 10 minutes at
playground.tensorflow.org
playing around with all the different
parameters here.
Try changing the learning rate, the
activation, the regularization. Doesn't
matter if you're not sure what these
things are. Press play. Change the
number of hidden layers here. Change the
number of neurons and see what happens
with this data set here. So, give that a
try and I'll see you back here in the
next video.
[Music]
How'd you go? Did you play around with
the neural network playground? I've just
added a little uh exercise here with
this hammer and spanner emoji. If you
ever see that throughout the course,
that's an exercise. So, before pushing
forward, spend 10 minutes playing around
with the playground tensorflow.org
building and running different neural
networks. See what happens when you
change the different hyperparameters. So
if you haven't done that yet, give that
a try and then continue the video. But
otherwise, we've created some data.
We've visualized it. Now what's another
important point is the input and output
shapes of our neural network. So let's
let's uh inspect our data. So check the
shapes of our features and labels. So
what's our features shape and our labels
shape?
Wonderful. So we have a thousand samples
of X and 1,000 samples of Y. And X has a
shape of two because there's two
samples. Whereas Y is just because
there's one output. These are scalers.
So they don't have a second dimension
here. They're just one value. All right.
So if we wanted to check how many
samples we're working with is just l x l
y.
And if we want to view the first example
of features and labels. So again, we're
just becoming one with the data here.
We're really just familiarizing
ourselves with what we're trying to do.
We've already seen a few of these
things, but just for completeness, we're
putting this here. So okay, we've got
two. Here's what we're trying to do.
We're trying to take this point,
feed it to our neural network, and
generate an output something like this.
And let's say, can we get
another one with a zero label?
Are we going to get a zero label? Nope.
I believe the fifth one may have been.
No. All right. So some of these values
here for X have a label of zero. That's
all you have to know for that.
Now we've checked the input and output
shapes. What might be the steps or the
next steps? So steps in modeling. We've
practiced this with our regression
problem. So now that we know the input
and output shapes, how might we build a
neural network to classify whether
something is a blue dot or a red dot? So
how about we go back to our keynote here
and steps in modeling with TensorFlow.
We reveal our beautiful colorful
picture. Step one is get data ready.
Turn it into tenses. Well, we kind of
have that except they're in numpy arrays
at the moment. But remember, numpy
arrays were work beautifully with
TensorFlow. So, we'll say that we
finished this one. Now, we have to build
or pick a pre-trained model to suit our
problem. Well, in our case, we're going
to build a model.
So, we have step one, create a model
specified to your problem. We might have
to define the input shape here or
sometimes car layers can automatically
infer the input shape. Then we might
have a hidden layer and an output layer.
What would our output layer shape here
be? If we come back, we only want to
predict whether it's one thing or
another. If you remember back to the
slide where we talked about the typical
architecture of a classification
problem, that may give you a hint
because we're working with binary
classification.
They want to compile the model. That's
step two. I nearly said fit the model,
but we have to compile it after we
create it. Remember step two, Daniel,
come on. Now, if we're working on binary
classification, what might our loss
function be? What optimizer could we
use? What metrics could we use? And then
once we've fit the model, okay, that
seems like we've got that set up. We've
got X and Y. Well, we haven't got a
training data set actually, have we? Or
a testing data set. We might just stick
with X and Y for now. All right. So, now
we've got the steps here. 1 2 3 4.
I'm going to issue another challenge.
We've got X and Y. See if you can create
a model with what we've learned so far.
Refer back to the uh slide where we
talked about the architecture of a
classification model and write some
neural network code with TensorFlow.
Give it a go and I'll come back in the
next video and we'll write a neural
network classification model together.
[Music]
So the last video we checked out what
the input and output shapes of our data
may be and we also had a brief overview
of what the steps in modeling with
TensorFlow are. So how about we start to
implement those? So remember the steps
in modeling uh number one or the steps
in modeling with TensorFlow
uh typically
one create or import a model two compile
the model and three fit the model and
then the fourth one is evaluate the
model
etc. Oh, five is we usually here we'll
tweak
six evaluate etc etc tweak evaluate
tweak evaluate tweak evaluate but let's
focus on these first three we'll get
these first three done for our
classification problem so how about we
start set the random seed tfrandom
set seed Now what's the simplest model
we could start with how about we just
start with one hidden layer
We'll go create the model using the
sequential API.
So we go model one equals TF caras
sequential
and then we'll add a single
dense layer.
Wonderful. And then what's step two
after we create a model? We have to
compile the model.
So model one compile. Here's where it's
going to be different from the
regression models that we've built. And
now before I write the code, I want you
to refer back to this slide here.
The typical architecture of a
classification model. Where are we up
to? Well, we're compiling a model. We
have to define the loss function and the
optimizer. So if we look back at what
we're working on, what are we working
with? Are we working with binary
classification or multiclass
classification?
Come back to our problem. What does it
look like?
Red or blue dots? So it is binary
classification. So what might our loss
function be?
Before I write it in here, I'd like you
to give it a try. Have a look at this
and see what it might be. Did you give
it a go? If not, that's all right. Let's
write it down. Loss equals TF caras
losses dot binary. Is this going to
autocomplete for us? Cross entropy. I
believe it might be capital E, but we'll
find out in a sec. It'll error for us.
Now, the optimizer, what optimizer do
you want to use? TF caras optimizers.
We come back to our typical architecture
of a classification model. We can use
SGD or Atom or really there's a there's
actually a plethora of optimizers we
could use but these two are the most
common you'll see in practice.
Let's try we'll start with SGD
and the metrics is going to be we
haven't looked at classification metrics
yet but we'll get very familiar with
them actually. What would you do if you
wanted to know different classification
metrics? I put accuracy here. That's a
bit of a spoiler but let's go. How to
evaluate a classification
model.
Here we go. Evaluating a classification
model. What does this tell us? Richeng.
Oh, look at this
topics.
Review of model evaluation. Model
evaluation procedures. Model evaluation
metrics. Okay.
Model evaluation metrics.
Boom. and classification. Classification
accuracy. There are many more metrics
and we will discuss them today.
Classification accuracy. All right,
beautiful. So then we got a few more
options. We could keep going through
that if we wanted, but we're just going
to stick with accuracy for now. And
accuracy is just basically out of 100
examples, how many did our model get
right? So what percentage? And now if we
go here
and we want to well step three fit the
model. So we have model one fit we have
X and Y and we're going to fit for maybe
just five epochs.
So this is we got the first three steps
here and all we've done is we've just
followed this the architecture of a
classification model. We've just
implemented this in TensorFlow code. So
we're ready to run this. Hopefully
there's no errors. 3 2 1
Oh, TF is not defined. That would help
if we imported TensorFlow. Did you catch
that?
Let's write some code here. We'll get
rid of this. We want to import
TensorFlow. Can't write TensorFlow code
without TensorFlow. Import TensorFlow as
TF. And check our version. By the time
you watch this, your version might be
higher than mine.
Right now, I'm running 2.3.0. As long as
you've got 2 something, all the code
should work in here.
And let's see if this runs.
We got this wrong.
So, what is it? Tab.
We want Is this going to autocomplete
for me?
So, maybe it's just a a little E. And I
think this is on the end.
There we go. fits nice and quick because
we're only working with a th00and seles.
But what's our accuracy?
48%.
H. This is percentage, by the way, cuz
we've set up accuracy here. So that's
saying that on average out of 100
examples, our model only gets 48. Right?
Now, what does that tell you? If we're
working with a binary classification
problem, so we're trying to predict
whether something is one thing or the
other. basically heads or tails and our
model is only 48% accurate.
It's basically guessing. So, it's just
going up here. It's looking we have red
or blue and 48% is basically 50%. So,
it's just going red, blue, red, blue,
red, blue, red, blue, red, blue, and
you'd get about 50% correct. Hm. Well,
that's not very good. How might we
improve it? What are some steps to
improve our model? I got an idea. How
about we train it for longer? So let's
try and improve our model by training
for longer. So we come here, model one
ffit. We can just take it straight away
and epoch equals let's do another 200.
And this time we'll turn uh verbose to
zero. So 200 epoch should be enough for
our model to figure out the patterns in
our data. We've only trained for five.
Let's really step it up. model one.
X and Y, maybe you're going to catch
what I'm doing wrong here. We'll reveal
that after. But if you have a look at
what we're doing here, fit and evaluate.
I'm doing a cardinal sin right now, but
that's all right. This is just to
exemplify what's going on. Let's see
what happens.
So, it took a little while longer to fit
cuz we're training for 200 epochs
instead of five. But we ran evaluate and
now we have loss. So the loss is 6935
and the accuracy is five or 50. So.5
times that by 100 that's 50. Only 50%
accuracy and we train for 200 epochs.
What is happening? Okay, I know how we
can really step things up. What if we
added another layer and trained for
longer? Yeah, that's a great idea. Our
model is performing as if it's guessing
right now. Let's write that down. Since
we're working on a binary classification
problem and our model is getting
around 50% accuracy,
it's performing as if it's guessing.
So, let's step things up a notch and add
an extra layer. Yeah, that worked in our
regression problem. It should surely
work here. So, we'll set the random
seed.
Um, TF.random set seed. And we'll go to
step one. Create a model this time with
two layers.
Model 2 equals TF carers sequential.
Open up the the brackets there. Square
brackets and curly brackets. TFARS
layers dense. This data set ain't going
to know what hit it. Binary
classification, two different types of
labels, two layers.
And now what do we do after we create a
model? We have to compile the model.
We go model 2 dot compile. What's our
loss function? TF carers losses cuz
we're dealing with binary
classification. It's binary cross
entropy. Little uh quiz. What would it
be if we're dealing with multiclass
classification optimizer? We're going to
keep the same. We're going to use SGD.
Sarcastic gradient descent. And the
metrics we're going to set to be
accuracy.
This is going to be so much better. Fit
the model. All right. Model 2. Fit X Y.
By the way, if you already guessed what
was what we did wrong up here, we fit on
the same data that we evaluated on. What
should we ideally do? We should fit on
training data and evaluate on testing
data. But because we're working with a
toy problem, we're we're allowed to to
uh fudge what we're doing a little bit
here. And now epochs. Hm. What should we
do? Maybe 100. That should be enough.
We've got two layers. I mean, this model
should be great. Ready? Let's run. Now
number four can be evaluate the model
because we set verbose equal to zero. We
don't have very much output but we
should have a trained model too.
Evaluate again evaluating on the same
data we trained on. Not ideal but in our
case we're just trying to see if our
model's learning anything.
what the loss is about the same even
with an extra layer but the accuracy is
still 50. So right now our model is
still not even as good as guessing like
it's getting 50% accuracy. Like you
could do that if you just for every
sample if you had a thousand samples if
you just guessed. Let's see how many
there are. Let's go here code and
circles
label which is y value counts.
So 500 and 500. If we just literally
went and tried to guess uh red blue, red
blue, red blue, we'd be getting the same
results as our current neural network
with two layers. This is prepostery.
What we should do in the next video then
is that if we're still our model's still
only getting guessing results, we need
to pull out a few more bags of tricks to
improve improve our model.
So have a think about refer to our
architecture of a classification model
and have a think about some of the
things using my cursor in an area that
might reveal something to you. By the
way, uh have a think about some of the
things that we haven't yet included in
our model and uh see if you can figure
out why our model hasn't learned
anything yet. But if not, I'll see you
in the next video and we'll see how we
can improve our classification model.
[Music]
All right. So, we've built a few
classification models so far, namely
model 2 and model one, which is a little
bit further up. There we go. However,
we've seen that despite adding an extra
layer, model 2 is still performing very
poorly. I mean, it's getting 50%
accuracy on our binary classification
problem. And since we've got an even
amount
of samples for each class.
So if we look here, we've got 500
samples of number one and 500 samples of
zero. Now, or 500 samples of a blue
circle and 500 samples of a red circle.
Now, if you were to toss a coin a
thousand times, you kind of expect about
500 heads and 500 tails. Hence why we're
our model is essentially guessing. So we
have to look into our bag of tricks to
see how else we can improve our model.
So let's look into
our bag of tricks to see how we can
improve our model. Now recall back from
the previous section where we went
through the steps in creating a model or
the steps in building a model that is.
But as you might remember, there are
steps from within each of those steps
that we can try to improve our model. So
if we go create a model, that's number
one. Two is compiling a model. And three
is fitting a model. What can we try
here? So when we create a model, we
might want to
add more layers or
increase the number of hidden units
within a layer. We've seen this. We
actually tried this. We didn't increase
the number of hidden units for model 2,
but we did increase the number of
layers. So we went from one layer to two
layers, and that didn't really improve
our model. What else could we do?
Compiling a model. So here we might want
to choose a different optimization
function
such as Adam instead of SGD. Now what
optimization function did we use for
model 2? We used SGD. So we might
actually Adam could be the the next
thing that we try there. And then
fitting a model. So perhaps we might fit
our model for more epochs. So leave it
training for longer so that it might get
a better understanding of what's going
on or the patterns in our data. Now
these steps have just come from if we
refer back. This is what we've been
through so far. Steps in modeling with
TensorFlow. Number one, create a model.
We've we've created a few models here. I
want you to think about this this one in
particular. Actually, there's a little
part that we haven't quite covered yet,
and it may or may not hold the keys to
why our model isn't improving, but I'll
let you have a think about that before
we discuss it. Number two is compile the
model. So, we define the loss function,
in other words, how wrong our model's
predictions are. We define the
optimizer, which is how our model should
update its internal patterns to better
its predictions. We define the metrics
which is a human interpretable values
for how well our model is doing. We fit
the model and we evaluate it. But now if
we have a look at how we might improve
our model from a model's perspective,
we've got a few different ways. Adding
layers, we've tried that. Increase the
number of hidden units. We haven't tried
that. Change the activation functions.
We definitely haven't tried that, at
least for the hidden layers. Change the
optimization function. We could use SGD
or Atom or one of the other optimizers
in the optimizers package. Change the
learning rate. We haven't tried that.
Okay. So, we've got a fair few things
here that we could try. Let's uh let's
start implementing some of these. And
remember, because these are
hyperparameters or because these are
changeable, they're called
hyperparameters. So, if we come back,
instead of staring at our slide, let's
write some code. How about we increase
the number of hidden units? Yeah. and
we'll add an extra layer. Let's do that.
Set the random seed. So, TF
random set seed. I want to emphasize as
well is that if you're thinking, well,
Daniel, we don't really have a structure
to what we're doing here. We're just
trying things and seeing if they work. I
mean, you're exactly right in thinking
that. And that is a lot of what machine
learning and deep learning is, trying
things and seeing if they work.
Hence why our machine learning
practitioners motto is
I'm just creating a sequential model
here by the way and I'm calling it model
3.
But our machine learning practitioners
motto is experiment experiment
experiment. Now we've created our first
hidden layer here and we're creating 100
hidden units in this one. So we're
stepping it up from model two. If I
create a new cell I've turned that into
markdown. I don't want that. You can
turn it from markdown back into code by
the way by going commandm y or controlm
y if you're on windows caras layers
dense wonderful and we'll have
the final layer there. So if we check
model 2,
what we've done here with model 3 is
we've added 10 hidden units to the
middle layer and we've added an extra
layer this time with 100 hidden units.
So I'll put there
add 100 dense neurons and this one is
add another layer with 10 neurons.
All right, this one should definitely
work because we've stepped it up a notch
here. I mean, if our model with two
layers didn't work and only one hidden
neuron, well, it definitely should work
with it with three layers and over a 100
hidden neurons. Now, compile the model.
So, model 3, compile. What do we have to
do for compile? We define the loss
function, which is how wrong our model's
predictions are. For a binary cross for
a binary classification problem, we use
binary cross entropy
and then the optimizer. Oh, this time I
think we might also change atom or
change to atom from SGD. Let's let's try
that out. Why not? Metrics equals
accuracy, which is a very standard uh
baseline classification metric, but may
not be the best one depending on your
problem. We'll see other classification
metrics in a future video. Oh, two
equals there. We don't want that. And
now we're going to fit the model. Oh, I
can't wait to see the results of this
one. Three layers over 100 hidden units
equals let's go for 100 epochs. What do
we do model two for? I think we only did
five. Oh no, we did 100 as well. So,
we'll just do 100 this one. Vose equals
zero. So, we're not outputting a lot of
information.
And shift and enter. Did we get any
errors?
No. Beautiful. All right. So, let's see
how number four is evaluate the model.
So, again, not necessarily a great idea
to evaluate on the same data set that
we've trained on. But for this toy
example, we'll keep it like that. Shift
and enter. What?
What? We're still getting 50% accuracy.
We've pulled out our few tricks. We've
changed the optimizer to atom. We've
added an extra layer this time with 100
hidden units and even another one with
10 hidden neurons. We're still getting
about 50% accuracy. So, just as good as
guessing. You know what we're going to
have to do? We're going to have to
visualize. What do you think we should
visualize in this case? If we want to
see how our model's performing and the
evaluation metrics are telling us it's
not performing pretty well, what's
another thing that we can visualize?
How about we visualize the predictions?
Yeah, I think that's a good idea. So, in
the next video, we'll make some
predictions with our trained model and
see we'll we'll plot them against
Where's our graph? Here. Yeah, let's
make some predictions on this graph and
we'll plot what our model's predicting
how to separate these uh blue and red
circles and we'll see what they look
like just so we can understand what our
model is is trying to or what the
pattern our model is trying to to figure
out between these two circles. I don't
think it's very good. So,
give that a play around. Maybe you want
to try a few more things before we plot
those. But otherwise, I'll see you in
the next video.
[Music]
In the last few videos, we've been
building models to try and classify
our toy data set here of blue and red
dots. However, they're all performing
pretty poorly. Like, I mean, as good as
guessing, even we've pulled out a few
tricks to improve our models. So what
we're going to do in this video is if we
come back to our keynote and we remember
our machine learning explorer's motto,
visualize, visualize, visualize.
It's a good idea to visualize these as
often as possible. We visualized our
data. We know what it looks like. We
visualized our model. We know the layers
and and the different components of
that. Uh we've seen it train. We haven't
plot a loss curve just yet, but we don't
need to cuz we know it's performing
poorly. So let's test out try visualize
some of the predictions that our model
is making. Now how might we do that? We
could do it like this. We could go model
3.predict
x and see what's happening. All right,
we get some values there. They're all
sitting around ah that's a little
inkling. They're all sitting around 0.5,
but we want them to be closer to 0 and
one. H.
All right. So, that's a little inkling
there. But I prefer to see things
visually. So, let's uh get rid of this.
And to do so, what we might do is make a
function, a plotting function called
plot decision boundary, where what we
want is, do you remember with our
regression model, we had some data that
was like a line, and then we plotted our
model's prediction as a line compared to
the actual data. Why don't we do the
same thing but with this circular data
plot our model's predictions against the
actual data. So to do so let's create a
function
right here to visualize
our models
predictions.
Let's create a function. I'm going to
call it plot decision
boundary
because a decision boundary is just
basically where's our model deciding
where the bounds are between red and
blue dots and this function is going to
this function will that's a better
option. So we need to take in a trained
model features x and labels y. So those
are the the parameters and then it'll
create might turn this into markdown
actually
create a mesh grid. If you're not sure
what a mesh grid is, it's concept in
numpy. So numpy mesh grid. That's what
you can do for anything that you're not
not sure. Just search it up and have a
little play around and figure out what
it is. Numpy mesh grid. You could read
the documentation there or we could just
practice coding it. create a mesh grid
of the different X values.
And then we're going to make predictions
across the mesh grid.
And then finally, we're going to plot
the predictions as well as a line
between the different zones where each
unique class falls. Now, right now these
steps are all just in English. They may
not make sense. So, let's start to write
code for it. We'll go here, import numpy
as np. And then we'll create our
function def plot decision
boundary model x and y. And there we go.
And then we're going to put in a little
dock string here so we're being nice and
pythonic. So plots the decision
boundary created by a model predicting
on X. Beautiful.
So the first thing we have to do is
we've taken in a train model. So there's
our model parameter, there's our
features and there's our labels. Now
let's define the axis boundaries of the
plot and create a mesh grid. So we want
the x minimum and the x maximum is going
to be we can index on x to get both of
these values. So we just want zeroth
axis the minimum and we're going to
minus 0.1 to give ourselves a little bit
of a margin. And the same thing with the
maximum. So we just want the zeroth axis
and then we take the max. And then we're
going to plus 0.1 to give ourselves a
little bit of a margin there. And then
we can do the same thing for the y min
and the y max except we're going to
index on x and get the first axis dot
minimum.
And the same thing here semicolon 1
domax plus one for a little bit of a
margin. Now if you're not sure what
these values are, what can you do for
any type of function? You can just take
the code, copy and paste it or probably
better to write it yourself and then
xmin to visualize them. This is what we
want to look at all the time line by
line. I'm not going to deconstruct this
function line by line. But if you wanted
to understand any function, this is what
I do. I take the line by line and then I
visualize it in a single cell. So here's
the values. These are our boundaries.
And now we can create the mesh grid y
equals. Now again this is coming
straight from uh the documentation here
for mesh grid is np mesh grid and then
in here it takes lin space. Now how do
we figure out what something does? We
can do command shift space. Oh we
haven't imported numpy yet. That's why
we can't run this function or can't get
the dock string. We'll import numpy up
there
and see what lindspace does. Here we go.
return evenly spaced numbers over a
specified interval. Okay, so if we did 0
to 10, it might return evenly spaced
numbers between 0 and 10. Beautiful. But
we want evenly spaced numbers between
our xmin and xmaxx cuz this is going to
be the first parameter of our mesh grid.
And then we want the same thing for y
min yax. And then we're going to go 100.
So here's the minimum, here's the
maximum. return 100 values evenly spaced
between Xmin and Xmaxx and create a mesh
grid out of this one and this one. If
you want to see what that looks like
again, you can copy this and probably
put it in a code cell below. Otherwise,
let's push forward with the function.
There we've created our mesh grid of the
different x values. Now, it's time to
create x values. We're going to make
predictions on these.
So x in equals np c. Now we want to
unravel xx by passing it to ravel y
ravel. And if you're not sure what c
does, you can go what can we do? numpy
C
user manual translates slice objects to
concatenation along the second axis. So
if we pass it in two arrays like this,
what does it do? They were stacked and
now horizontal.
So that's what's happening here. So
stack 2D arrays
together.
Now we're going to make predictions
using our trained model. We've passed it
in with the model parameter here. So we
go y prred which is our usual variable
name for any type of predictions.
model.predict
xin.
All righty. And now we're going to add a
little bit of functionality here that we
can check for multiclass. We're working
with binary classification, but let's
check for whether we're working on a
multiclass classification problem. This
will help make our function usable uh if
we did want to. if we had red, green,
and blue circles instead of just red and
blue. So we can check the length of y
prred for the first sample. If that's
greater than one, print
doing multiclass classification
and then if so we have to reshape our
predictions to get them ready.
Now here we got to go y prred equals np
arg max.
We haven't seen this before. We could
use tf tf a arg max as well. y prred
axis equals 1
reshape xx.shape. So we're reshaping it
just to the shape of xx up here. Now
let's make else. If they're binary,
we'll print doing binary classification.
So this is just the else statement for
this little if condition here. And then
y prred equals np round. So we're just
rounding our prediction yred
reshape to xx.shape.
Now we're finally up to plot the
decision boundary. Again this function
has a fair few steps but again if you
want to understand it take them out
deconstruct it and look at what what's
happening line by line.
That could be some uh homework for you
after this after this video. Y why I
want to contour plot.
So what is this plot contours?
Nice and succinct definition from that
documentation there. Thank you very
much. But again, we're going to see what
this looks like rather than just reading
docs. We're going to write as much code
as we can. Alpha. Wonderful. And we're
going to plot a scatter. We need to plot
the zeroth axis of X and X's first axis.
And color this in Y. S= 40. And what
does the S parameter do?
Where is S? There. Scala. So the marker
size in points. So this will define how
how big the the size of um the scatter
plot points come out. And then we're
going to plot cm RD yellow blue. This is
our color. And plot xlim. We want to set
the limitations. So this is where xx
minimum comes in and the xx max comes in
as well as the y limitation. We can do
the same thing for y min
and y dot max.
Okay, that's probably the biggest
function we've written so far. I mean,
it almost doesn't fit on the screen. A
lot going on here. But now that we've
got a function to plot our model's
decision boundary, I mean, the cut off
point between the decisions it's making
between red and blue dots, let's try it
out. So, we go here. We'll run that.
Hopefully, there's no errors. Check out
the predictions our model is making. So
if we want to plot decision boundary, we
pass it in our train model which is
we'll set model 3 cuz that's our most
recent model. X is just X and Y is equal
to Y. Remember X and Y are just our
features and labels.
So again, one point here leads to a one
or a zero. We'll get rid of that. You
ready? Three, two, one.
Oh,
would you look at that? So,
h this is why
our model is performing so poorly.
It looks like it's trying to draw a
straight line through the data. Now,
what's wrong with this?
Well, we've got circular data. So the
main issue here is that our data isn't
separable by a straight line. But if
we're working on a regression problem,
what is a regression problem?
Hm.
Now, in a regression problem, our model
might actually work because it's drawing
a straight line. So, how about we test
that? We might do that in the next
video. And as for this function, if
you're wondering, Daniel, like how did
you how did you learn how to create all
of this? Well, the truth is I've
borrowed this and I've adapted it for
our use case. So if we come here, one of
the resources I used was CS231N
neural networks case study. Now this is
a phenomenal course, convolutional neur
neural networks for visual recognition.
We haven't actually worked with
convolutional neural networks yet. So if
you go through this, it might be a bit
fullon, but I'd highly recommend this as
this is going to be a part of the
extracurriculum for this section and the
the convolutional neural network
section. So if we come down here, I dug
into the code behind this material here
and I found this. So they've got some
spiral data here and I've just changed
it for our circular data.
And I also so I'm going to put this link
in here.
This function was inspired by two
resources.
Number one is this one. And then number
two was another phenomenal resource made
with ML.
So if we go to made with ML GitHub
and then into the basics. So this is a
phenomenal phenomenal repo. I highly
suggest you star this one. I've
definitely have. If you want to learn
more about machine learning and deep
learning and how to code them, there are
some amazing notebooks here. So, I got
it from this one here, the multi-layer
perceptrons.
We come in this one. Oh, no, that's for
PyTorch. We come into TensorFlow.
Now, there's a function in here
somewhere
very similar to the one we have.
Boom. Plot multiclass decision boundary.
So there we go. So I'm going to copy
this resource here. Number two is there.
So I'll make sure to link those in the
extra curriculum. But in the meantime,
let's uh let's try and adapt our neural
network to see if it works. Even though
we're working on classification right
now, because our neural network is
plotting a straight line here or
predicting a straight line, I wonder if
we can adapt it to a regression problem.
Let's check that in the next video.
[Music]
In the last video, we created a function
plot decision boundary to visually
inspect our model's predictions. And we
found that it's performing so poorly
because it's predicting that the
decision boundary is a straight line
whereas our data is circular. So we have
a linear decision boundary which linear
is a fancy word for straight line. And
we have nonlinear data which is another
fancy way of saying nonstraight data.
Now if we come back up here I've just uh
added a little tidbit in here in
retrospect but we've already discussed
this. It's that remember wherever you
see the key, it's a important point to
to note. Whenever your model is
performing strangely or there's
something going on with your data you're
not quite sure of, remember these three
words. Visualize, visualize, visualize.
Inspect your data, inspect your model,
inspect your model's predictions. And
that's what we've done here. And we also
discussed that since our model is
predicting a straight line, might we be
able to to use it for a regression
problem? Let's have a look. We'll write
some code. Let's see if our model can be
used for a regression problem.
Now, we'll need some regression data.
Now, I'm going to set the random seed
here
because we're going to create some
regression data.
And we'll make X regression so we don't
override our already existing X
variable. And I'm going to do TF range
0 to a,000 and the step can be five. And
then Y regression can be TF range 0 to
actually let's start from 100.
So the function here we're trying to
predict is just y = x + 10. That's the
relationship between x and y here. Oh,
sorry, plus 100 because y starts from
100, finishes at 1,100, x starts from
zero and finishes at 1,000. They both
have the same step. So let's check this
out. Remember, visualize, visualize,
visualize. Inspect your data. Become one
with the data.
So there we go. Starting from zero,
stepping up by five all the way up to
995.
Starting from 100, stepping by five all
the way up to 1,95.
So we can remove those. Let's split our
regression data into training and test
sets.
X regge train for X regression can be
let's just make it how many values
we've got a thousand so there's 200
total
and let's do the first 150 can be for
the training data set and the last the
last 50 can be for the test data set so
X reg test equals X regression we want
just the last 150 so 150 onwards That's
how we can index that. Then we'll do the
same thing for
the Y data.
We could have actually done this for our
circles data, but
we'll move on from that one. We'll do
that
later on, I think. And then we go there.
And now let's see. So, we've got
training and test data. Let's see if we
can fit our model. Fit our model to the
regression data. Wonderful. Model 3.fit.
Now, before we do this, I want you to
have a think. Will this work? Go back
up. Scroll back up in the notebook. Not
going to give you any hints, but just
step through each of the steps that we
used to create model 3. Build a model.
Compile a model. Fit a model. We're
doing step three right now. Fit a model.
Will this work or will it error?
Inspect how we created it. Inspect how
we compiled it. Think about the problem
we're working on. regression
and have a think. Will this work? I'll
give you about 3 seconds or so to try
that out.
Well, I mean, you can pause the video,
but 3 seconds for me. 3 2 1. All right.
Did you figure out why it will or won't
work? You know what I hope you did? I
hope you ran the code instead instead of
scrolling back up. I hope you just tried
it, cuz that's what I'm going to do. Oh
no, what's happened here?
Input zero of layer sequential 2 is
incompatible with the layer. Expected
axis of negative 1 of input shape to
have two but received input with shape
none one. So what's happened? We've got
a shape issue. One of the most common
issues in deep learning. So what's going
on? We got train. Yeah, we've created
our data sets correctly. Hm. Why hasn't
our model worked? I wonder. So we've
compiled our model for a binary
classification problem.
So we come back up here. Where's model
3?
Compile the model. We've set the loss
function to be binary cross entropy. But
what should it be if we're working on a
regression problem? It should be
something like MAE or MSE, mean squared
error or mean absolute error. So let's
go back. How about we write here um oh
wait
we compiled our model for
a binary classification problem
but
we're now working on a regression
problem.
Let's change the model to suit our data.
All right. So, the only thing we're
going to change in model 3 is the loss
function. What I might do is get rid of
this code so it doesn't interfere with
what we're going to write. So, if you
want to try yourself, all we're going to
do is just recreate model 3 exactly how
it is. We're going to change the loss
function. Instead of being binary cross
entropy, we're going to change it to
mean absolute error so that the loss
function is regression specific. So give
that a try and otherwise I'm going to
start writing it now. So we got to set
up the random seed so we get
reproducible results
set seed 42. Now step one we're going to
create the model. We'll just recreate
model 3. It's all right. We can
overwrite it. TF carers. And plus what's
the harm in having some more practice
writing model code? TF caras. By the end
of this course, I want you to have
created over a 100 models. Maybe even
more. Who knows? I'm not actually sure.
I haven't counted. Maybe someone out
there could count how many models we
actually build together and then uh let
me know cuz that would be pretty cool.
You know what I should have done? I
should have increased this number so we
just knew the whole time what what what
number model we were up to. Too late
now. And so compile the model.
Now this time with a regression
specific loss function. Beautiful. Model
3.Compile
loss equals TF carros losses. MAE MAE.
Beautiful. And then the optimizer can be
TF carros. We're going to keep the
optimizer the same. We'll keep it as
Adam. Optimizers.
Remember Adam is safe. metrics equals.
We need to adjust the metrics as well
because it's a regression problem. We'll
keep MAE there. And finally three is fit
the model. So we have model 3. We've
recreated it. FIT on X regge train and Y
regge train which is our regression data
set. And we'll set it up for 100 epochs.
Now you ready to run? 3 2 1.
Ho. There we go. Now what did it start
with? Do we have Oh, so the MAE is close
to 250 and by the end we go right down
to
37. That's beautiful. So we see a
reduction in loss and a reduction in
MAE. But to make sure seems like our
model's learning something from these
training metrics. Let's uh plot them. So
just like we've plotted our circular
data, we'll plot our regression data. So
make predictions with our trained model.
Y regge preds equals model 3.predict
x. Make it on the test data set. Oh,
sorry. It's going to be Oh, yeah.
predict xred regge test. Yeah, that's
what we want. And now let's uh plot the
models predictions against our
regression data.
So, we can create a plot here. We're not
going to write as big and intricate a
function as we did for our
classification data because this is uh
regression data is just a straight line
and we're going to do a scatter plot and
on this scatter plot is going to be the
training data. So x red train y regrain
and the color of the training can be
blue. We'll label it training data so we
know that it is the training data.
Create another scatter plot this time
for the test data set. So test features
and test labels and we will color this
with green and give it the label equals
test data and plot scatter X regge test.
And now let's plot our regge
predictions.
And oh, we're going to have to No, I
think that should be okay. We might have
to I wonder what dimension they are. Oh
well, we'll check if in doubt run the
code. Wait for the error to pop up for
us and then we can see and then plot
legend. So that's all we've done. We've
trained a regression model. We adapted
model 3 to be suited for our regression
data. We've made some predictions on the
test data set. And now we're just
plotting it just like we did in the
regression section. Training data, test
data, predictions. Let's go.
Figure. Oh, we want this to be fig size.
Oh, would you look at that? Okay, so the
predictions aren't perfect. I mean, if
they were, the red line would line up
perfectly with the green line, but they
definitely look better than complete
guessing. I mean, imagine if the
predictions were all over the shop, like
red dots everywhere. That's that would
be basically guessing for this
regression problem. Now, this means that
our model must be learning something.
However, it's still missing something
for our classification problem. What's
the difference here? What is the main
difference between our data sets?
Give you a hint.
This one, this regression problem is a
straight line. Whereas, if we come back,
we discussed this before, our
classification data is not a straight
line. It's nonlinear. But the decision
boundary our model's trying to plot is
linear. straight line. So, hm, that
might be the missing piece, the thing
that we haven't introduced to our models
yet. We haven't introduced nonlinearity.
And if you haven't heard of that before,
that's okay because we're going to
discuss it in the next video and
probably maybe a couple of videos after
that. So, let's write that down. The
missing piece. It's like we're on a
treasure hunt here. Nonlinearity.
All righty. Get excited because we're
going to learn one of the most important
concepts in neural networks. Woohoo.
I'll see you in the next video.
[Music]
Now, we've seen that our neural network
that we've built can model straight
lines with somewhat better than guessing
accuracy. However, when it comes to
nonstraight lines, it's not doing too
well. And so, we also mentioned the
missing piece. We're like treasure
hunters here. The missing piece is
nonlinearity.
So, I cannot emphasize enough how
important this concept is for neural
networks. Now, before we write any code,
I want to introduce to you a little
question here. I want something to think
about just over the next few videos
while we we start to learn about
nonlinearity.
What could you draw if you had an
unlimited amount of straight, in other
words, linear and nonstraight or
nonlinear lines? So, just think about
that. What kind of patterns could you
draw? You have an unlimited amount of
straight and non-straight lines. Let's
look at our data again. We have some
linear data. So you might think is this
possible to model with straight lines? I
think so. And then we have some
nonlinear data. Not possible to model
with straight lines. Now you could post
an argument to say, "Oh, well Daniel,
what if you just draw really small
straight lines and go in between the
gaps here?" Well, that is one option,
but let's just pretend that we need
straight and non-straight lines to draw
the patterns that we need. So keep
thinking about this question. And again,
before we write any code, we're going to
have a little bit of a play around. I
introduced the TensorFlow playground in
a previous video. However, we haven't
looked at it together.
So, I'm just going to
playground.tensorflow.org.
And we can set up our data here. What I
might do is zoom in nice and far. Okay,
beautiful. So, we have a few things
going on here. We have data. Now, this
data looks very similar to our
circulator that we're working with,
except they're using orange and blue
dots instead of red and blue dots. And
we have some features X1 and X2. We
could pretend that this is our X0 and
X1. Now, right now, this neural network
has two hidden layers. What we might do
is reduce this down to one hidden layer
and one hidden neuron. So, a very simple
neural network. And let's have a play
around. What are these? We've seen the
learning rate. All right, let's switch
this to what have we seen before. I
believe Adams is the default is 0.001
and activation. Hm, we haven't really
played around with these. What we might
do is switch this to linear. Now, we've
seen what a linear data is, what linear
data looks like. And we got
regularization rate, problem type,
classification. All right, so let's just
leave it these settings for now. And I'm
going to press play and see what
happens.
So, we get the test lost that starts to
Oh, it's just evening out at about 0.5.
What does that remind us of? Do you
remember our model that we built
previously that only ended up with 50%
accuracy? I think this is what's going
on here. I know. To test this out, how
about we recreate this neural network of
what's going on? We've already got the
data. We've already got we can create
the hidden layer and we can set the
learning rate. We can we might see how
we can set the activation too. All
right, let's do that. Let's go back and
let's write some code. So, we come in
here. Now, how might we do this? Just as
before, we're going to set the random
seed
and
TF random set seed
42. And now let's go number one create
the model model 4 TF carers sequential
if we come down here and then we want to
go TF caras layers.dense.1
and the activation can be what did we
set the activation to in here linear.
How might we get access to that? Well,
one shortcut is to just go linear as a
string. Or we could do TF carers
doactivations.linear.
Ah, all right. Now, we've got a
parameter that we haven't seen before.
Activation. But we we actually have seen
this before. If we go back to our
slides,
and I think we might have to go a few
back here. Yeah, there we go. We got an
activation parameter. Now, we're going
to start to look into we've seen how we
can add layers to try and improve our
deep model. We've seen how we can
increase the number of hidden units, but
right now these aren't really working
for us. So, we're up to changing the
activation functions. All right, so
we're we're working with this
hyperparameter here. Now, let's go back
and what we're going to do is compile
our model because we want to re recreate
what we've done in TensorFlow playground
and see if we we can replicate those
results. This is a very a very good
exercise to to go through for yourself
if you're playing around with the
TensorFlow playground
um to just re recreate what you can see
in there. Binary cross entropy. This is
the because we're working with a binary
problem. Now this is another way you can
write the losses. You can actually write
the losses in many different ways. We've
seen this one binary cross entropy and
we could even do it. Let me just
introduce you to writing things as
strings cross entropy just in case you
see it some like that somewhere else.
And the optimizer we can also set this
to atom. That would be the same as
writing it out like this. TF carers
optimizers.
The benefit of being able to write it
out like this is that we can set the
learning rate. Remember LR is short for
learning rate and the metrics can be
accuracy.
So if we got the same thing so far, come
back to our neural network playground
features X. We haven't passed our model
data yet, but we have these. These are
our blue and red dots. We've got one
hidden layer with one neuron, and the
activation is linear. The learning rate
is 0.001. 001. Okay, now let's see if we
can number three is fit the model. And
we're going to start to add the history
variable here whenever we start to fit
models now because this will come up
later on. Just want to start building
that habit. All right, let's do it. 100
epochs. I mean, this one's run for over
a thousand, but we're just going to
stick with 100 so we don't have a
incredibly large output in our notebook.
And shift and enter.
All right. What kind of results are we
getting? Wow, looks like our model's
performing worse than guessing right
now. Is that aligned with the Okay, so
the TFL playground is getting quite
similar results. They're basically still
guessing as well. If you were trying to
separate these two, the blue and the
orange dots. If you were just tossing a
coin for a thousand different times and
you got heads and tails 500 different
times, well, you're going to get these
about these results.
H. Now, let's remind ourselves of what
our data looks like because whenever our
models predictions aren't working very
well, we can evaluate our model with by
looking at the evaluation metrics or we
can adhere to our model of visualize,
visualize, visualize.
So our data is
zero. Get the zero axis of X and the
first axis of X. And we'll color it with
Y. And we'll choose a color set up as
plot.color map red, yellow, blue.
So we're just reminding ourselves of
what our data looks like. All right.
Wonderful. And now we've got model 4
which is a trained model albe it looks
like it's not performing very well.
Let's check out its predictions anyway.
Check the decision boundary
for our latest model. So this is why we
created a handy function before our
beautiful plot decision boundary
function where we pass it our trained
model. This can be model 4. We'll pass
it our features
X and
Y. What does it look like? Where's our
model's decision boundary? Oh my
goodness. It's all over the shop. So,
our model, if blue is the blue class and
red is the red class, I mean, red's
right up here. Yellow is the crossover
because we've set our color map up as
so.
Our model is basically going, you know
what? Anything in this yellow could be
blue or red, which is basically why our
model's accuracy is below guessing. H.
Now, what can we do? It looks like our
model's still predicting straight lines.
So, what we might try is in the next
video, or actually, if you want to jump
ahead,
reset the TensorFlow playground and try
playing around with activation here.
What happens if you set it to a
different value?
see what happens and we'll go through
that in the next video.
[Music]
In the last video, we had a play around
with the TensorFlow playground and we
saw that setting the activation function
with one hidden neuron, one hidden
layer, learning rate of 0.001 001 didn't
result in a very good separation of the
orange and blue dots similar to to the
neural network that we built for our
latest neural network that we built. So
there's model 4. We just copied what
we've got here. Now I've hope you've
thought about this question here. What
could you draw if you had an unlimited
amount of straight linear and
nonstraight nonlinear lines? So remember
our data is that we're trying to model
for classification is nonlinear. So it's
not possible to model with just straight
lines alone. If we come back here, we've
we've spoken about nonlinear.
Right now our activation is linear. How
about if we change this activation
function to something that is not
linear. So it could be any of these
three values even though we haven't we
haven't seen what these are. Let's just
change it to let's try the top one relu.
Okay. And let's see what happens if we
just run this.
H, still not getting a very good value.
Wonder if we just keep it training for
longer.
Well, we're almost at a thousand epochs
now and it's still not improving.
But before we we explore different ones
here, let's see how we might use the
relu activation in our own neural
network. So let's come back to our
notebook. Let's just see if we can
replicate. We'll take another practice
and replicate this neural network here
with TensorFlow code. So first of all
actually we'll put in a note here. Let's
try build our first neural network with
a nonlinear activation function.
And now when I say nonlinear I really
mean just anything that is not linear.
So we could choose linear. It didn't
perform very well. But if we wanted to
choose any of these which are not
linear, we're going to start with relu
which is very common as you'll see in
your deep learning journey. So set
random seed
TF random set seed 42. Now we're going
to create a model or we'll put number
one here. So just remind ourselves.
Create a model with a nonlinear. So just
anything but linear activation. So we go
model 5 equals TF carers sequential.
And then we go back here. TF carers do
layers.dense.
And we'll just keep it the same as what
we did before. We'll set activation
equal to. You see here by default it's
none. So this is where we're going to
set it to nonlinear. And we could do
relu like this or we could do
tfkaras.activations
relu. Wonderful. Now two is we're going
to compile the model just exactly as we
did before. Model 5.compile
loss equals what problem are we working
on? tf caras losses. We're going to use
binary cross entropy. And what optimizer
have we been using? The atom optimizer.
optimizers.
We're just setting the LR to what we
have in the TensorFlow playground.
01
just the same there. And then the
metrics is going to be
accuracy.
Now
number three is fit the model. Model
five. This is exciting. Our first model
with a nonlinear activation. Oh, we said
we were going to start developing the
habit of saving the history. XY epo
equals 100. Let's see what happens.
Oh, what did we mess up here?
tfkaras.optimizes
a typo as usual. All right, wonderful.
So, we keep going. We're using a
nonlinear activation function. Remember,
in this case, nonlinear is just anything
but linear. And again, our model ends up
performing basically worse than
guessing.
H. So it's still not learning. What if
we come back to our keynote? Now we go
back to the slide of where we looked at
improving a model. So so far this is
what we tried. We've tried adding
layers. We've tried increasing the
number of hidden units. And we've tried
changing the activation function as well
as the optimization function. Far out.
We've tried a fair few things, but we've
only just tried changing the activation
function. We haven't done that in
conjunction with these two. So, how
about we try that? What if we increase
the number of neurons and layers and
change the activation function? I've got
an idea. We can try that in TensorFlow
playground first. So before the next
video, go back to the TensorFlow
playground and try increasing the number
of hidden layers, the number of neurons.
Keep the activation and the learning
rate the same and everything else the
same, but just increase the number of
hidden layers to whatever you want. I
think there's a maximum of six. That's
all right. Whatever you want. And
increase the number of hidden neurons to
whatever you want. And then run it and
see what happens. And if you really
wanted to, if your neural network works
or if it doesn't, reproduce it in
TensorFlow code. And I'll see you in the
next video where we're going to do
everything we just talked about.
[Music]
Welcome back. How'd you go? Did you
manage to to get the TensorFlow
playground to find the patterns or
distinguish orange from blue dots? Did
you get the test loss to decrease the
training loss to decrease? Did you
increase the number of hidden layers or
the number of hidden units? I hope you
did, but if not, that's what we're going
to go through in this video. Now, we've
discussed
that we've tried to improve our model
and we've tried adding layers. We've
tried to increase the number of hidden
units. We've even tried to add a
nonlinear activation function because
the data we're working with is
nonlinear. We've even tried changing the
optimization function, but we haven't
done all of these in conjunction with
each other. So, let's have a play around
with the TensorFlow playground and see
if we can adjust those parameters or
hyperparameters. See if we can get this
to distinguish patterns between orange
and blue dots. So, I'm going to add
another hidden layer. And I'm going to
increase this
to let's say four neurons. Yeah, that's
a good number. And I'll I'll do the same
for this layer. All right. I'm keeping
everything the same. So, I've just
increase the hidden layer, increase the
number of neurons, learning rate the
same, activation is the same. It's
nonlinear. So, at the moment, it's relu.
And let's press play and see what
happens.
Oh, whoa. What do we got here?
That's a good sign. The test loss is
going down
and it's continuing to go down.
Wow. All right, we're nearly at 1,000
epochs. It's still going down. Let's
just wait here for a second.
Is it going to keep going down? Oh, look
at this. So, it's starting to be able to
distinguish orange dots from blue dots.
Now, once it hits 2,000, I think we'll
stop there. You could leave yours
running for longer and see what happens.
But how cool is that?
So, just by increasing the number of
hidden layers, adding a few more hidden
neurons, and changing it to a nonlinear
activation function, we get much better
results. Now, again, some great
practice. Let's replicate this neural
network in TensorFlow code. So we'll
come back to our notebook and
time to replicate the multilayer
neural network
from TensorFlow playground
in code
and we'll set the random seed. So it's
going to be TF random set seed 42.
And what did we have to do for our
model? Well, we had two hidden layers
with four hidden neurons each and then
we had an output and we have a ReLU
activation function a learning rate of
0.01.
All right, that seems pretty
straightforward. So, create the model.
Uh, what number are we up to? I believe
we're up to model 6 equals TFARS
sequential.
Beautiful. Now, we need two hidden
layers here. TF Caras layers dense with
how many hidden units? four or six.
Four. And then activation equals ReLU.
Wonderful. We'll just come to the end of
this. And then we'll create another one.
TF carers layers dense. Oh, forgot the S
there. Four hidden units and activation
equals.
Wonderful.
Now we'll go step two. Compile the
model.
Model 6.Compile. compile just exactly
the same as we've done before again
getting lots of practice here loss can
be we'll use the string notation for for
this model cross entropy binary cross
entropy that is optimizer equals tfkaras
optimizers
and the LR is 0.1 which is Adam's
default learning rate so we don't
actually have to put that one there and
then finally we're going to set the
metrics metrics equals equals accuracy
and let's fit the model.
So we're saving our results to the
history variable
model 6 dofit on XY just the same number
of epochs that we fit before. All right,
let's see what happens here.
All right, do we get any improvements?
Oh, looks like our model's performing
even worse than guessing. So below, what
if we evaluate our model far out? What's
the difference here? Evaluate the model.
Model 6 dot evaluate
and we'll go X and Y.
Hm. Why is our model performing far
worse than what we like this model
trained? Oh, 2,000 epochs. Maybe we need
to train our model for longer. What if
we did that?
What if we come up here, make it 250
epochs? Let's try that.
Oh, whoa. Our metrics are jumping all
over the place.
So, it definitely seems like our model
is still basically guessing. What if we
added You know what? I might think it
think it might be. You can't see it in
this neural network playground, but I
think doesn't make sense the fact that
we're dealing with two or features and
we're dealing with a binary
classification problem and our the last
layer has four hidden neurons instead of
the same was a set of one for binary
classification. So let's see what
happens if we just add TFA's layers
dense because we want our output to be
one or the other, right? Not four
different options. We want it to be red
dot or blue dot, not four. So, let's see
what happens here.
Accuracy still 50%.
Wow. Even after 200 or close to 250,
it's about to be 250 epochs, we're still
getting 50%. H. What if we evaluate
that? What's going on? What should we do
when we're not sure what our model is
doing based on the evaluation metrics?
We should
visualize visualize visualize. So how do
our model
predictions look? So remember we've got
our handy plot decision boundary
function. We're going to pass in model 6
x and y. What type of decision boundary
is our model creating? Oh my goodness.
All right. So, it's starting to realize
that red might be towards the outside,
but it's still operating with it looks
like straight lines.
Hm. What gives? Our model looks like
it's the exact one in the TensorFlow
playground, too. I mean, ideally, our
yellow line would go inside or in
between the red and blue circle.
All right, let's model this circle once
and for all. Well, we're going to build
one more model, I promise. But actually,
we're going to build plenty more models
throughout the course. But for this
circle, I mean, we've done enough times.
It's time to reveal the missing piece
here. It is, if we come back to our
keynote, we've looked at improving our
model. We've added, we've altered the
activation functions in the hidden
layers, but we haven't changed the
activation function in the output layer.
So, we've set up ReLU Reu here. Now,
what if we came back to our we we we
looked at this, right? right at the
start. I mean, we could have come back
to this to begin with, but I wanted to
go through the concept of of figuring
things out. Let's go back to our
architecture of a classification model.
The typical architecture that is. So, if
we have a look, what are we dealing
with? We're dealing with binary
classification. We've got hidden layers.
Well, we've got two at the moment. We've
got neurons per hidden layer, generally
10 to 100, but we've seen on TensorFlow
Playground that four is enough for this
type of data set. So, we'll stick with
that. Output layer shape is one. We
we've set that up. Hidden activation is
usually relu. Okay, we've set that one
up. Output activation
sigmoid.
Ah, so on our demo model here, we also
have an activation on our output layer.
But on the current neural network that
we're working with, come back model 6,
we don't have any activation here. And
remember, for a dense layer, the
activation is by default none. Hm. So
what should we do? If in doubt, we could
refer to our little table that we have
here, sigmoid, or we could search
something like this. We could go what
activation function to use for binary
classification.
Now maybe we go here.
What activation function for the output
layer?
So regression linear softmax
simple sigmoid works too but softmax
works better. Okay, we could dig through
this information here. We could go back
for another one.
Here we go. The output layer contains a
single neuron in order to make
predictions. It uses this is binary
classification. That's what we're after.
The sigmoid activation function. All
right,
that must be the missing piece that this
TensorFlow playground doesn't show us.
It doesn't show us the output layer, but
that's all right. We've got TensorFlow
code. So, I want to issue you another
challenge. If you want to model this
circle once and for all, create a model
like model 6, but for the output layer,
add in a sigmoid activation function.
I'll let you do your research and give
that a go, but otherwise, we're going to
model this circle once and for all by
introducing the output layer activation
function in the next video.
[Music]
Welcome back. In the last video, we
discussed that we're probably missing an
activation function for our output
layer. However, I hope you didn't take
my word for it. I hope you tried to
write the code yourself, but if not,
let's do it. Let's model this circle
once and for all. I'm getting sick of
seeing these uh these straight yellow
lines. So let's um set the random seed
as usual. You can probably tell that my
uh coding hands are eager to write this
neural network and get this circle
modeled. So let's do step one. Create a
model just as we have before. I believe
we're up to model 7. Fingers crossed
this is lucky model 7, you know. And
we're going to go sequential.
Go there. TF caras layers. Just going to
be the exact same hidden layers as what
we've been using before. The activation
equals relu. Wonderful.
Come back. TF carers layers dense four
as well. Activation equals rel.
TF carers do layers.dense.
We want one for the output layer cuz
we're dealing with binary
classification. One thing or another.
And here is where we're going to
introduce the magical. We could do TF
carers um activations dot sigmoid. Or if
we wanted to keep in line with the
string notation that we've been using,
we could just turn this into sigmoid.
Now again, where does this come from? We
could ask Google or we could check out
the typical architecture of a
classification model. So binary
classification hidden activation usually
relu or the output activation is sigmoid
for binary when we deal with multiclass
we'll have to deal with softmax so keep
that in mind going forward now we'll
come back all right our model
architecture is looking great let's
compile it compile the model 7 dompile
now we want to go loss equals tf caras
losses dot no we were using string
notation come on Daniel, let's go.
Binary cross entropy. Wonderful. And
then optimizer can be TF Keras optimizes
atom. We'll set the learning rate to be
0.01.
Again, that's the default. We don't
necessarily need to do that, but just
keeping in line with staying true to our
TensorFlow playground setup. And we will
go metrics equals
accuracy.
Wonderful. Now let's fit the model.
I'm eagerly awaiting to see if the
results for this one are better than our
previous model. This is the fun part of
uh neural networks in deep learning is
running so many different modeling
experiments.
epoch equals 1. Oh, let's watch the
training. I was about to set verbose
equal to zero, but let's watch it. If
this neural network's going to work, I
want to see those metrics going where
they should go. Accuracy. It starts. Are
we going up? Yes. Oh my gosh. Look at
that. Look at that.
We finished with an accuracy before we
we added that output activation
function.
We were we were getting about a 50%
accuracy. And now we're borderline 99%.
But again, let's not trust just the
metrics. Let's try number four is
evaluate our model. So model lucky
number seven. That's where we're at.
Evaluate X and Y. Are we getting the
same wow, we are loss uh below 0.3 and
accuracy is 99%. How does that line up
with our TensorFlow playground? So that
loss is getting almost 10 times lower,
but this is fit for 2,000 epochs. Maybe
ours would somewhat approach that if we
kept training. But again, let's not just
trust the metrics. Let's visualize.
Let's visualize our incredible metrics.
So again, we created this beautiful
function a few videos ago because we
want to use it multiple times. Plot
decision boundary model number seven. X
Y. Are you ready? 3 2
1.
Oh my goodness.
How much better is that?
So, it looks like our model has
basically perfectly found the decision
boundary between the red and blue dots
except for maybe a couple of points. Oh,
that one. That one got caught and that
one got caught there. So, that's why
we're not getting a perfect result, but
we're very close. That's 99% accuracy
between two evenly spread classes or two
evenly spread labels. That's pretty darn
good. But I have a question for you. I
want to put this down here. And we've
we've discussed this previously, but I'm
going to put the question emoji. This is
a little challenge. So, question equals
or equals. I'm too used to saying
equals. What's wrong with the
predictions we've made?
Are we really
evaluating? So, if we're looking at this
our evaluation metric and our plotting
of predictions, are we really evaluating
our model correctly? Here's a little
hint. What data did the model learn on?
And what data did we predict on?
Now, have a think about that. You
probably know the answer already. If
not, perfectly fine. But before we we
answer that, I want to emphasize what
we've just covered, which is back to the
question we had before. If we go to our
nonlinearity slide, I posed this
question a couple of videos ago. What
could you draw if you had an unlimited
amount of straight linear and
non-straight nonlinear lines? And then
we looked at our linear data and we
looked at our nonlinear data. So the
combination of linear straight lines and
non-straight lines if we go back here so
nonlinear functions remember we haven't
even discussed what relu is or what
sigmoid is we just know that it's not
linear so we come back linear and these
three are not linear that's all we've
discussed for now but this is a I'm
going to even write this down so let me
put a key here. If you want to take away
anything from the few videos we've just
gone through, is that the combination of
linear, which is straight lines, and
nonlinear,
which is non-straight lines, that's all
you need to know for now. Functions
is one of the key fundamentals of neural
networks. And notice key is on purpose
because I've got key there. Now, back to
our question. Just think of it like
this. If I gave you an unlimited amount
of straight lines and nonstraight or
nonlinear lines, you could essentially
draw any pattern that you wanted to.
Now, that is essentially what our neural
networks are doing. This data set is
relatively easy, but if you imagine if
we're working on a whole bunch of
different other data sets such as
building a neural network to understand
what's in a picture, there's almost an
unlimited amount of things. Look up if
we go food
images.
If we wanted to build a neural network
to identify different patterns in
pictures of food, look how many
different patterns there are here. We
need a whole bunch of non-straight and
straight lines. So that's the essence of
what a neural network is doing when it
looks at different uh examples of data.
It's drawing patterns with straight and
non-straight lines through it. So if
this doesn't really make sense for now,
you might even be thinking, "Hey Daniel,
I've never actually seen a linear
function or a nonlinear function
before." Well, you kind of have. We've
been using the whole time. They're what
power the layers we've just built.
So with that being said, uh in the next
video, let's take a look at applying
these linear and nonlinear functions
that we've been using in our neural
networks, but just applying them on
their own. So I'll see you there.
[Music]
In the previous video, we modeled our
nonlinear classification data once and
for all. I mean, look at that. That's a
beautiful site, isn't it? Our decision
boundary is basically splitting them
perfectly. Now, we've also discussed the
concept of linear or straight lines and
nonlinear non-straight lines or
functions, but we haven't really built
up an intuition about them. We've just
we've just got familiar with the names
of some of them. We've looked at relu
and we've looked at sigmoid. But let's
start to build up an intuition of them.
So right here now we've discussed the
concept of linear and nonlinear
functions or lines. Let's see them in
action. So to do so, how about we create
a toy tensor which is very similar to
the data we pass into our models.
Because
we're dealing with TensorFlow, all of
the data we use gets encoded into a
tensor. We pass that tensor to a neural
network. It figures out patterns and
outputs another tensor. So let's go a
equals TF cast. I'm going to go TF
range. I want just a tensor, a nice one
from -10 to positive 10. And I want it
to be of float 32.
Let's have a look.
Okay, nice and simple. Just -10 all the
way up to 9 because it includes zero,
but it's 20 20 long. Now, let's have a
look at what it looks like. Visualize
our toy tensor. plt.plot
a.
Wonderful. Nice and straight line. What
would that be called? What's a straight
line? It's a linear line. Right now,
what activation function did we just try
in our output layer?
The sigmoid. Okay. So, how might we
apply this sigmoid activation function
directly to tensor A? I've got an idea.
Why don't we look that up? TensorFlow
sigmoid
activation function
TF car's activation sigmoid. Okay,
wonderful. Oh, the sigmoid activation
function sigmoid x = 1 / 1 + exponent
tox. Okay, what if we just did sigmoid
activation function? What does that look
like? We're going to get some fancy math
notation, I'm guessing. Yeah, there we
go. Okay, so sigmoid z = 1 on
1 + e to the^ of z. We get a line like
that. Oh, so see how that's nonlinear.
All right, enough looking at pictures.
Let's try to replicate this. We could do
it TFAR's activation sigmoid, but how
about we start off by just replicating
this. So, what I would do uh in a in a
case scenario like this where I'm trying
to replicate something and we go, let's
start by replicating sigmoid and I put
this here. So, I'm trying to build a
function that replicates this here. I
was just trying to point at the screen,
but I realized you can't see that. Um,
so let's go def sigmoid. And it's going
to take an input x.
And then how about we we just it's
relatively it's only this 1 /
um 1 + now to get the exponent
we let's look that up. Can we do it in
TensorFlow? TensorFlow exponent
tfmath exp. Wonderful. Or does it have
an alias? Oh, TFX.
All right. Exp. So if we go TF.exp
pass in negativex. Is that our sigmoid
function? Really? That's all it is? So
let's try it out. Use the sigmoid
function on our toy tensor. So we
created a up here before. Remember this
is what a looks like right now. It's
linear. Now, if we use our sigmoid,
sigmoid on a,
all right, we get a whole bunch of what
looks like gibberish numbers,
but let's plot them. That's a much
better way to look at them. Um, how do
we do that? Um, plot our a our toy
tensor transformed by sigmoid. So,
plt.plot,
we want to go sigmoid a. What do you
think this will look like? Give you a
second to think about it.
Let's check.
Ho,
that's what's up. Now, notice here that
the values are between zero and one. H,
that might be something we look at later
on. But the most important point here is
that this line was originally straight
and has now been modified to be non
straight.
Now, where is the intuition there? If we
come back, remember how we couldn't draw
a curved line around our data, but as
soon as we added this sigmoid activation
function to our neural network, it was
like, now that we've seen what it does
to a straight line, it was like we gave
our neural network a tool to go, hey,
you've been trying to draw patterns with
just straight lines before, but now
you've got this non-straight line, so
use that as best you can to find better
patterns. And as it turns out, it was
able to find better patterns. So, let's
not stop there. Let's keep going. What
was the other activation function that
we've used so far? We'll come up here.
We've used relu activation. All right.
So, let's see how can we uh replicate
this cuz that looks pretty cool. I
wonder if we can do the same for relu.
Let's go um another tab tensorflow
relu function. And what do we get? TF
caras activations rel. All right. Well,
it looks like we can do very similar
notation to our sigmoid function,
but does it have a definition of what
relu is? Oh, it's just the max between x
and zero.
So, does that mean it's going to make
all negative numbers zero?
H. Or we could even just go what does
rel do
in a neural network? The activation
function is responsible for transforming
the sum weighted input from the node
into the activation of the node or the
output for that input. H
we could dive into that
value is the max function x0.
What if we go images? Is there a
function of it
equals 0 for x less than 0 or x for x
equal or greater than zero? Okay, why
don't we just replicate that max zero or
zed?
Okay, let's give that a go. Let's
recreate the relu function and see what
happens. Oh, and one thing before we do,
notice what this line looks like. How's
it look compared to our original tensor,
our toy tensor that we created above. So
maybe we go defaf w
and
we want to return how do we get the
maximum in tensorflow? We can just do m
a x is it maximum or just max? Maximum.
Wonderful. We want to return the maximum
between zero and x.
That's all we've done for row.
If we come here, we want to go all. Is
there a Wikipedia page? Wikipedia is
usually pretty good. There we go.
Rectify neural network. This is all
we've written.
Reu equals max0x.
That's it. That's all we've done. Okay.
Now, let's pass our toy tensor to our
custom relu function.
Reu a. And what's going to happen? What
do you think's going to happen?
Well, let's check it out.
All right. How is this different to our
original tensor?
Just by looking at it, this one's a bit
easier to look at than the sigmoid. But
see how these are all zero. It seems
like relu all it's done is it's turned
the negative numbers to zero. And that
we've done that with the maximum
function by just going hey look at the
input you're getting and give me the
maximum of of zero or or the number
itself. So is 0 larger than -1? Yes it
is. So we set it to zero. All right. Now
better still let's see how this looks.
So plot relu modified tensor. I'm going
to go here. plot plot row u a and let's
see
ha
now what is that we come back or let's
just look at bring our straight line
tensor down here
so this is it started out straight and
now this is you could still argue that
this is straight but it's got a kink in
it. So now we have a bunch of tools
we've given our neural network with.
Hey, here's this curvy line and here's
this bendy line. Now you've got these
two tools. Let's start drawing patterns
in our data. I mean, I could draw some
pretty cool shapes if I had this curvy
line and this bendy line. Now, let's not
stop there. We've got one more
activation function that we've tried.
Remember when our in our neural network
playground, the first thing we tried was
linear and then we tried nonlinear. So
let's let's do that. Let's uh for
completeness let's see how
let's try the linear activation
function. So might just do what we did
before. TensorFlow linear activation
function.
TF car activations linear linear
activation function pass through. What
does it do?
arguments x the input tensor returns the
input unmodified.
Are you serious? We just put a tensor
into the linear activation function and
it just returns the same tensor.
Did that even need a function?
Oh well, we said we're we're working for
completeness. We might as well try it
out. We go tf caras activation.linear.
It's not even fun to replicate. This one
has no has no model activation or
activations. That's what we need. We
need the S. Come in here. Shift and
enter.
Wow. Are you serious? That's all that
does. Just our exact same tensor. Well,
for completeness, let's uh Does the
linear activation function change
anything? So plot plot TF carers
activation.linear
a
we'll see.
Oh my making the same error.
Hold on. This can't be real life. Let's
go. Does a
Does a even change? A equals
tfkaras.activations.linear.
linear
all of the elements are still the same.
I guess well the linear activation
function lives up to its to its uh
documentation returns the un the input
unmodified but this is a fundamental
concept of what we've what we've just
covered. Now let's come back to our
slide. I've just pitified all the code
we've just written. But the important
thing is that we've written this code
and we've seen it happen. But it makes
sense that the model didn't really learn
anything using only linear activation
functions because the linear activation
function doesn't even change our input
data in any way. So it's just basically
passing the same input data through the
entire neural network and the outputs.
No wonder they're basically as good as
guessing because it hasn't changed a
single thing. Whereas with our nonlinear
functions such as sigmoid for the output
layer, we give our neural network this
this tool here of using this curved
line. And the same thing here for the
relu function. When we give our model
nonlinear functions, it's able to deduce
patterns in nonlinear data.
That's a fairly important concept. Now,
we've only covered two activation
functions here, but the the premise of
what we've just seen is the concept of
nonlinearity is we've covered that. So,
that's the main takeaway you need to
take away from this series of videos is
that neural networks use a combination
of linear activations and nonlinear
activations to find patterns in data.
Now, if you want a resource for learning
more about activation functions, there's
the machine learning
cheat sheet
activation functions. I'll leave this in
the resources section, but ML cheatsheet
read the docs.io actually has a whole
bunch of different stuff, but here are
some of the most popular and useful
activation functions. We've seen linear
relu
is a is a different form of relu and
sigmoid. I'll let you go through here.
Maybe some of your extra curriculum
could be to reproduce these in
TensorFlow code. However,
we're going to push on and in the next
video, we're going to see how we can
evaluate and improve
our classification model, this one that
we've built here. So, with that being
said, I'll see you in the next video.
[Music]
In the last few videos, we tackled the
important concept of nonlinearity.
And we learned that the combination of
linear straight lines and nonlinear
non-straight lines functions is one of
the key fundamentals of neural networks.
Or other words, how they find patterns
in data. And we even saw a few different
examples of linear functions. We even
rebuilt our own nonlinear functions in
the sigmoid activation function and the
relu activation function. But now it's
time to evaluating and improving our
classification model.
All righty. So do you recall that in a
previous video I posed a question of
what's wrong with the predictions that
we've made so far?
If we scroll back up again, jumping all
over the place here, but I did pose a
question before. What's wrong with the
predictions we've made? Are we really
evaluating our model correctly? Hint,
what data did the model learn on? And
what data did we predict on? So, so far
in our toy example, we've been training
and evaluating or training and making
predictions on the same data set. But
why is that wrong? We'll come down here.
What should we do? What type of data or
what data set should our model learn on
and what data set should we evaluate our
model on? Now, if you answered that
question with we should train our models
on the training data set and test our
models on the test data set, you'd be
100% correct. But at the moment, we
don't have a training or a test data
set. So, let's remind ourselves of the
three data sets, possibly the most
important concept in machine learning.
And I know I say that about a lot of the
concepts we've talked about, but this is
probably the number one thing to do with
data. So, we got the course materials as
if you were studying at a university
course is the training set. The
validation set is a practice exam. And
the test set, the test data set is the
final exam. And the goal here is for our
machine learning model or deep learning
model to generalize. In other words, the
ability for our model to perform well on
data it hasn't seen before. So what do
you think we should do to properly set
up our machine learning model training
and testing?
Well, if you guessed we should create
actually let's write ourselves a little
note here. So far we've been training
and testing on the same data set.
However, in machine learning this is
basically a sin.
So let's create a training and test set.
All right. So how many examples do we
have? Let's uh check how many examples
we have. We can get length of x. That'll
tell us. Beautiful. It's a th00and cuz
we use the make circles function all
those videos ago to create our data. Now
we could because our data is in random
order.
order,
right? If we look at that and we look at
why,
we could randomly split this using
scikitlearns
train test split,
this one here, or we could create a
train and test data set by indexing.
I'll let you choose how you do yours,
but I'm going to create mine using
indexing. So, split into train and test
set. And then I'm going to go here. I'm
going to set X-rain and Y train equal to
we're going to do an 80/20 split. So 80%
of this uh of our samples is going to be
training data and 20% is going to be
testing data. So the first 100 samples
will be or 800 sorry of X and Y will be
training samples. And then we'll do the
same for X test and Y test. However,
these will be the last 200 samples. So
from index 800 onwards. Wonderful. Now
let's check the shape of X train
and X test and then Y train shape and Y
test shape.
What have we got at the output? Okay, X
train and X test. So there's 800
examples in X train and it's of a shape
two. That's excellent. and 200 examples
of next test and then Y train has 800
labels and then Y test has 200 labels.
Beautiful. So now we've got a training
and test set. How about we recreate a
model fit on the training data and then
we can evaluate it on the the testing
data. This one here. So let's write that
there. Let's recreate a model to fit on
the training data and evaluate on the
testing data how we should have been
doing things right from the start. So
first of all we'll set the random seed
TF random set seed 42 and then we'll go
number one is create the model. Now
again we're retyping the same code
because we're making the same model as
lucky model 7 but we're getting
ourselves or we're getting a lot of
practice writing TensorFlow model code
which is very important because we are
doing a TensorFlow deep learning course.
So if we go over here and what was it TF
carers? Well we could find model 7
summary. What's this going to tell us?
Okay, so we had two hidden layers with
four hidden neurons each.
So let's recreate that layers dense
four. And what was our activation
function? It was nonlinear
relu
come back here.
Then we'll create another layer. TF
keras layers
dense for activation equals relu. And
then what was our output activation?
We recreated it again. If you need to
refer to the slide which which has the
activation for the output layer of a
binary classification model, but it was
sigmoid. So now let's compile the model.
So we want model 8mpile.
We want to set the loss function to
binary cross entropy cuz we're working
with a binary problem.
And then we want to set the optimizer
equal to TF caras optimizers.
However, we're going to make a change
here. We're going to change the learning
rate. So Adam's default learning rate is
0.001.
But I have a feeling that if we increase
it to 0.01, 01, our model might be able
to discover the patterns it found in our
data faster than what it did previously.
So with model 7, we fit for 100 epochs.
If we come back up here,
we fit for 100 epochs, but it looks like
our model was a little bit slow out of
the gates.
So it really didn't start learning much.
So, it's still in the 50s here, and then
it's not until about halfway that it
starts to really increase up past the
60s in about five epochs, and then the
70s and about another 10 or so epochs,
and then it gets really close to 100%
accuracy.
So, the reason why, do you remember how
we discussed what the learning rate is?
It's okay if you don't, but we mentioned
it briefly in a previous video.
I'll set the metrics here. But the
learning rate dictates because let's
actually let's take a step back. The
optimizer tells our model how it should
improve or how it should update its
internal patterns that it's learned. The
loss function says how wrong these those
patterns are. And then the optimizer
says, hey, you should improve them in
this way. And the learning rate is how
much our model should improve those
patterns. So if we set the learning rate
to be a lower value, say Adam's default
such as 0.001,
uh this might be the equivalent of
saying, hey, every time you take an
epoch, improve your weights by 0.001.
So if we were to change it to 0.01,
we've increased it by 10%. So basically,
every epoch, we've given our model the
potential to improve its weights by 10
times as much. Now, that's not exactly
how it works, but that's how I
intuitively understand the learning
rate. And in practice, you'd be
surprised how much that simple
definition of going, hey, the higher the
learning rate, the more our model will
update works in practice. And by the
way, Adam's default learning rate of
0.001 and these other default parameters
here are actually very good for the
majority of problems that you'll you'll
work on. Of course, as you start to get
more and more into the the deep learning
world, you can start to tune these to
your problems. But you'll find that a
lot of the values that we use in
TensorFlow, all of these parameters that
are hardcoded already have been
experimentally found as very, very, very
good. So now, let's go here. We're going
to fit the model.
Model 8.
We're going to fit on the training data.
Oh, what have we been getting into the
habit of the last few videos? That's
right. Setting the history variable y
train epoch 25.
So I've increased the learning rate by
10 but decrease the amount of epochs our
model uh is going to look at or in other
words the amount of times the model is
going to go through the training data by
four times. So let's see if it can still
get just as good results with only 25
steps versus 100 steps.
What have we done wrong here? Oh, we
need to set this as model 8.
All right, where do we get to? Oh, do
you notice what's happened here straight
away? So, we have epoch 1 out of 25. We
start at 54%. And then we're we're into
the 70s after only eight epochs.
And then we're into the '9s after only
15. I mean, if we go back up to our
previous model, we didn't reach the '9s
until epoch 77.
Wow. Okay. So, if we come down here,
look at that. By the end, we're we're
borderline 98% accuracy after only 25
epochs. Now, the real test here, of
course, is for evaluate the model on the
test data set. So let's go model 8 dot
evaluate. Now that we have a test data
set, X test Y test, how does it perform?
Wow. So on the test data set, in other
words, data our model has never seen
before, it gets an accuracy value of
100%. That means every single test value
it's it got correct. Out of 200, it
predicted them all correct. That is
amazing. Now we can really evaluate
this. How how can we do so?
by plotting the decision boundaries.
We've done this for our complete data
set before, but now let's do it both for
the training data set and the testing
data set. Plot the decision boundaries
for the training and test
set.plot.figure
fig size
equals 12 6. Beautiful. plot dot subplot
and one two one
plot.title title. Let's go train data.
So here we're just saying, hey, we want
to create a subplot. We want it to have,
I believe it's rows, columns. So we want
it to have one row, two columns, and the
first value is going to be the training
plot. Plot decision boundary. And then
we're going to pass it model 8, our
trained model. X is going to be X train
our training data. And then y is going
to be y train our training labels. And
then we'll set up the second subplot.
This is going to be one two and the
second plot. So one row two columns. The
first one is training. The second one is
going to be
test.
And then again plot decision boundary
model 8 x equals the test data set. So x
test y equals the test labels.
Beautiful. And let's show our plot.
Doing binary classification. Oh, would
you look at that. So our model's
decision boundary, the yellow line goes
through the training data. It misses a
few examples here. I think it might have
missed that one, the red dot, and I
think this blue dot. So that's why our
model doesn't get 100% results on the
training data set, but that's okay. We
want our model to generalize to data it
hasn't seen before. So when we look at
the test data set, it gets about 100%.
Or it does get 100%, not about 100%, it
gets 100% accuracy.
Excellent. And that was after 25 epochs.
So the main takeaway is that all we did
was we we took our model 7, the model
that was working, but we increased the
learning rate by 10 times. Now again,
this is one of those hyperparameter
things that won't always work. Our
Adam's default learning rate is actually
very good for the majority of problems
you're working on. The only reason I set
this to be a little bit higher is
because I had an inkling that our model
might learn a bit faster because it
performs so well on our data set. So
again, in practice, you may take a
little bit of a may take a little bit of
tweaking for the learning rate, but
we'll see in an upcoming video how we
can design a function to find the ideal
learning rate value for us.
But in the meantime, let's uh since
we've been saving our model's training
history to this history variable, in the
next video, let's see how we might
visualize that training history.
[Music]
You may have noticed in a few of the
previous videos and whenever we called
the fit function and created our
different models, we've been getting
into the habit of setting history equals
model 8.fit something or model whatever
number.fit.
Now I think we have seen this before
plotting the history variable but we
haven't discussed it very recently. So
let's go through that now. Let's see how
we can plot the loss or also referred to
as training but probably most often
referred to as loss curves.
Let's go here. Now the reason why we're
doing this if we come back to our
keynote let's find our where's our data
explorer's motto. First of all what is
our machine learning explorer's motto?
It is visualize visualize visualize.
and we've had a little bit of experience
visualizing our data and our models and
our predictions but not so much
training. So that's what we're going to
cover here. And of course it's a good
idea to visualize these as often as
possible. So let's see what the history
variable is. But first to understand it,
how about we look up the dock string of
TensorFlow
fit
function.
What does this give us? Here we go.
tfkaros.model. So this is the model
class which uh our sequential model is
built off. And then when we call fit off
this. So if we look for fit um does it
have here? Once the model is created,
you can config the model with losses and
metrics with model.compile. We've seen
that in practice. Train the model with
model.fit. That's not what we want. We
want the fit function. I want the fit
function to discuss what it returns.
Here we go. Fit. So we can actually pass
a fair few things to fit. If we look
here, if we go X Y batch size, a whole
bunch of things. We'll see a fair few of
these throughout the course, but if you
want to skip ahead and read them, you
definitely can. But I want to see what
it returns.
Where do we go here? Returns. Here we
go. A history object. History.History
History attribute is a record of
training loss values and metric values
at successive epochs as well as the
validation loss values and validation
metrics if applicable. Beautiful. So
that's why we've been setting up this
history variable. Now history the
documentation just said that
history.history. So what happens if we
go historyhistory? What does that give
us?
ah accuracy.
Okay, so it looks like there's about 25
values there. So I think it tracks at
every epoch, which is beautiful. So
really what history tracks for us is
this output here.
And it's good to look at these in
numerical form, but we can also look at
them in visual form. So let's um convert
the history object into a data frame.
make it nice and tabular so we can
structure it up. PD dataf frame and we
want to go history.history.
Let's see what it looks like. Beautiful.
So we can see our model's loss started
at around 7 and decreased right down to
about.14 and the accuracy started at
about half 50%. And then increased right
up to 97% and that was on the training
data set. So now how about we look at
the the loss curves. We just plot this.
So plot the loss curves. PD dataf frame
history dohistory.
And then we're going to just go simple
plot. I believe by default it's going to
be a line plot. So this is going to be
model 8 training curves or we'll call
them loss curves because that's what
we've been calling them loss curves.
And let's have a look.
Wonderful. Now that is probably the
ideal loss curve scenario we want to see
for a binary classification problem or
actually most classification problems
because accuracy
metric is going up and loss is going
down. So actually for many problems I
think this is a key we can add in here.
Let's write
I'll write in here note the loss
function because remember what is the
loss function? It's how wrong our model
is. So for many problems
the loss function going down means the
model is improving.
The predictions it's making are getting
closer to the ground truth labels. Now
you might be wondering, okay, I
understand that it's good to see the
loss curve going down, but what other
values or what other value can I
generate from looking at plots like
this? Well, in future videos, we're
going to see how we can compare multiple
different models and check out their
loss curves. So, the value in this is if
we were running, say we ran 10
experiments at the same time, and we
plotted all of our models loss curves
together and say model 8 had a learning
rate of 0.01, 01. Model 10 had a
learning rate of 0.001.
And we noticed model 8's loss decreases
far quicker than model 10's, but after
about 100 epochs, model 10 starts to
catch up. So that's where we'd sort of
be able to use this visual knowledge to
guide our future experiments. So just
keep in mind that whenever we set the
history variable, we can inspect our
model's training curves by plotting them
like this. And we'll see in the future
another way to do this with TensorBoard.
But for now, let's just leave it there.
And in the next video, we're going to
check out how we can use loss curves to
find the best learning rate. So I'll see
you there.
[Music]
In the previous lectures, we've seen how
much the learning rate hyperparameter
can influence our model's training. So,
wouldn't it be great if we had a method
that we could use to find the ideal
learning rate? I mean, a value which
when our model started training meant
that its loss decreased as fast as
possible because remember the loss is a
measurement of how wrong our model is.
So, if we want to decrease the loss as
much as possible, what value could we
set the learning rate to be?
H. Well, to do this, we're going to have
to visualize our loss decreasing and
potentially decrease our learning rate
during training. Now, have we visualized
the loss decreasing?
I think we have.
There we go. Okay. So, we've visualized
the loss decreasing. Now, how might we
decrease the learning rate during
training? Because so far, we've only
hard set the learning rate, so using
something like LR equals 0.01.
But I haven't actually introduce you the
answer to this question that I just
asked. So, don't worry if you're not
sure. But that's what we're going to do
in this video. It's finding the best
learning rate. And to do so, I'm going
to introduce a new concept which is
called to find the ideal learning rate.
In other words, it's uh the learning
rate where the loss decreases
the most during training because
remember that's the loss metric. We want
that to decrease during training.
We're going to
use the following steps. So the first
one is a learning rate call back. Now
call back is the new concept that we're
going to talk through this lecture or
we're going to code it out actually. Now
a call back if you're wondering what it
is is you can think
of a call back as an extra piece of
functionality
you can add to your model while it's
training. So when our model trained
before, it went through the data
and found patterns. But it'd be great if
we could execute some other kind of
functionality while this training has
taken place. That is where callback
comes into play. So that's the first
one, a learning rate call back. We'll
see that in a second. And we're going to
need another model. So we could use the
same one as above, but we're practicing
building models here.
And finally, we're going to need a
modified loss curves plot. So, very
similar to what we've got here, this
loss curve, but we're going to have to
modify it because what we're going to
set up is a learning rate callback to
start at a certain value of a learning
rate and gradually decrease or increase
that learning rate during training. And
then we'll make another plot to plot the
loss versus the learning rate to find
out the learning rate value that the
loss decreases the most. Now, if that
didn't make sense, remember what's our
motto? If in doubt, code it out. So,
let's get started. We're going to create
a new model. So, set random seed. TF
random set seed. Beautiful. Now, we're
going to create a model. Now, it's just
going to be the same as model 8.
Remember how I said we're going to have
heaps of practice coding our models?
Well, we're almost up to 10 models so
far in this one little section. So,
model 9 equals TF car sequential. And
that's really the only way to to get
better at programming or any kind of
machine learning or or data science is
to just just to keep writing more code,
keep working on different things.
There's no secret here.
ReLU. We could go up and see what it is,
but or you could just follow along with
what we're doing. So, looks like it's
got Oh, we forgot the layers.
It's got two dense layers with four
hidden units with the ReLU nonlinear
activation. Beautiful. And then we have
an output layer with one hidden unit.
And we're using the sigmoid activation.
Beautiful. So now what's next after we
create the model? So we're going to
compile the model just the same as
before except now we're using model 9.
Model 9.mpile
loss. We're working with binary.
So we're going to do binary cross
entropy.
And then we're going to put in here the
optimizer. Let's use atom.
We use the textbased version of Adam.
Metrics equals accuracy. Beautiful. And
now here's the step that's going to be
different. So we're familiar with these
steps here, but we're going to
introduce, as I said, a learning rate
call back. There are many different
types of callbacks, but for this one,
we're using a learning rate call back.
And so a call back works during model
training. So to get it to run during
model training, it has to exist before
model training. So before we call model
9.fit, fit ya ya ya our callback has to
exist so let's create that create a
learning rate call back so we're going
to call it lruler
because as you'll see in a second if we
go tf caras callbacks there's a call
back called learning rateuler beautiful
and now how do we find out the doctring
of this we can press command shift enter
or command shift space sorry it came up
automatically so this is learning
rateuler at the beginning of every epoch
This callback gets the updated learning
rate value from schedule. So schedules
this parameter here uh function provided
at init with the current epoch and
current learning rate and applies the
updated learning rate on the optimizer.
Okay, so this is kind of a way of saying
every epoch if we put in some
functionality here to change the
learning rate, it's going to this call
back if it's running during the fit
function, it's going to give our
optimizer, in our case atom, the updated
learning rate. So let's see what we
might do. What we're going to do is do
lambda epoch.
Oh, lambda needs a b epoch. And then
we're going to go 1 e
-4. So this is just 10 ^ of -4 here. And
then times 10
epoch / 20.
Wonderful. So essentially what this is
saying is to or for the learning
rateuler every epoch to traverse a set
of learning rate values starting from 1
e -4 and increasing by 10 to the power
of the epoch divided by 20 every epoch.
So let's see what that looks like. Now
we go here fit the model. So this time
we're going to pass LR scheduleuler call
back model. Oh, we're getting into the
habit of saving history. History
9 equals
model 9.fit.
Now let's pass it our training data and
training labels. Epoch, we'll run it for
100 epochs and then callbacks. Now
callbacks come as a list. So, we're
going to pass it in LRULER
because you can pass you can pass
multiple call backs here. So, say you
wanted call back two, call back three.
I'm not sure why I'm putting an
underscore between call and back. It's
actually just one word, but that's all
right. We don't have call backs two and
three. We only have one. So, let's just
see what's going on here.
We run this.
Beautiful. Now our training appears to
be just working as normal. This is we're
very familiar with this. We've seen this
before. But I want you to have a think
about it. What might be different now?
If we passed it our learning rateuler
here,
what's it going to do every epoch?
It's okay if you're not sure cuz
remember our motto. If in doubt, code it
out. Now what we can do is check out the
history. Remember we saved it to history
9. So we'll turn it into a data frame so
that we can plot it. History 9history
and then we're going to go plot fig size
only my favorite two numbers in poker.
107 also just a handy size to put out in
this this kind of window. X label equals
epoch. Beautiful. Let's see what this
looks like.
Oh, okay. So this looks a little bit
different to the loss curves that we
plotted before. It's the exact same sort
of plot that we created up here. So this
is our model 8 loss curve. So see here
history.history.
Now this is model 9 history 9.history
plot. We've just given the X label
epochs.
So if we look at this axis the y-axis
our learning rate starts very low
basically zero and then as the epochs go
on and on and on it starts to increase.
Okay, so that's essentially what this
code up here is doing.
So it starts at a low number 1 e -4 and
every epoch it increases by 10 ^ of
epoch / 20. So that's why towards the
end we get this exponential curve. It
starts to increase really fast. But what
happens here with our accuracy?
Our accuracy
seems to go up slightly but then goes
down. And the loss goes down fairly
significantly here and then it stays low
but then goes back up.
H. So looking at this, what did we want?
What did we want before? We wanted the
learning rate where our model's loss
decreases the fastest. So potentially
this value here whatever it is the
learning rate at say let's say 45 epochs
maybe this where our learning our loss
seems to be decreasing the fastest. I
got an idea. Let's plot the learning
rate values during training versus the
loss. So how might we do that? H let's
go plot
because we can't really see what's going
on here. Let's get accuracy out of there
and let's just compare learning rate to
loss. So plot the learning rate
versus the loss. So our LR's is what did
it start at? It started at 1 E -4 * 10
to the power of now we're going to have
to do we need a range. So TF range
100 cuz that was how many epochs we did
divided by 20. How does that look?
That should give us There we go. Shape
100. Beautiful. So if we go len LR's LR
is short for learning rate. There we go.
We have 100 different values of learning
rate. So you see how it starts here at 1
E -4 and then it slowly increases as we
go along. All we've done with this line
of code here is we've just replicated
the same thing we passed to the learning
rateuler except that we had to
substitute in 100 as an integer for
epoch because we set 100 here. That's
all we've done. So these are our
different learning rates that our model
tried out. Now what can we do? How about
we create a plot
plt dot figure size
equals 107.
Wonderful. Now we're going to do a
semi-log x plot. So that means this just
means we want log on the x scale. Make a
plot with log scaling on the x-axis.
You'll see what I mean by that in a
second. So log x lrs. So we're just
passing it this that we created which is
just a tensor of this shape. May have to
be numpy. I'm not entirely sure. We'll
find out. If in doubt, run the code. And
now we're going to go history. We're
going to get the loss. So this is what
we want. So on the x-axis we want LRs.
And on the y-axis we want the loss
values from our model history.
Beautiful. Now we're just going to
decorate our plot with X label can be
learning rate and the Y label can be
loss. And finally, let's give it a
title. plot.title
learning rate verse
loss. Beautiful. What does this look
like?
Aha.
Okay. Now, before we even discuss this,
I want you to maybe pause the video for
like 10 seconds and just have a look at
it and have a think about where do you
think our ideal learning rate would be.
Totally okay if you're not sure, but
just have a look. Remember, what we're
trying to do here is we're trying to
pick a learning rate value where the
loss decreases the fastest or the most.
So, what what part of this graph is the
loss decreasing the most? So have a
think about that and press play when
you're ready to go.
How'd you go? Well, I'll tell you the
methodology to figure out the ideal
value of the learning rate or at least
the ideal value to begin training our
model with. The rule of thumb here is to
take the learning rate value where the
loss is still decreasing. So maybe here,
but not quite flattened out like it like
it looks here. and it's usually about 10
times smaller than the bottom of the
curve. So in our case, our ideal
learning rate would be somewhere in this
section here. So it ends up being
between 0.01, so 10 -2. So this value
and 0.02.
So it would be about this value here.
Now I actually did prepare something
that highlights this a bit better
earlier. Finding the ideal learning
rate. Here's the code that we ran.
There's just our plotting code. Boom. So
this is what I was talking about. So
this is the lowest point on the curve
somewhere down here. But remember the
ideal learning rate is somewhere about
10 times smaller than that. So we have
to go back here. So it's somewhere in
there. Now if we go back, if we said
that the ideal learning rate is 0.01 or
10 ^ of -2, what was the learning rate
we set above here that our model got
really good performance?
Where's model 8? Wow. We set our
learning rate to be 0.01.
How phenomenal is that? You could call
it a lucky guess.
But there's another little horistic we
can go here. We can either find the best
learning rate through this methodology
here or we could use the default value
learning rate or just take a guess
because an example of other typical
learning rate values are
10 to the^ of 0 10 to the^ of -1 10 to
the power of -2 10 the^ of -3 and 1 e4
for if we have a look at these. So you
see here what do these have in common?
They're all multiples of 10. Now you
could have a learning rate that's 0.03
or you could have a learning rate that's
025.
Realistically, there's a whole bunch of
different range of learning rates that
you could use. But why Daniel, you've
given me these values, but you've said
your learning rate can be almost
anything. The reason why I'm
highlighting this is because whenever
you use an optimizer, say let's go
TensorFlow
atom.
If you use a pre-built optimizer, as
we've discussed before, their default
parameters are generally pretty good. As
in, for most of the time, they'll work
pretty well. But for when they don't,
well, you've got some other learning
rates that you can try here. You could
just try hard coding these. Typically,
you probably won't ever use one, you'll
start to use below one. So, starting
from here and keep going down. Or you
can use this methodology to find the
ideal learning rate. So, up here by
using a learning rate scheduleuler, have
it decrease during training and then
plot the log learning rate versus your
model's loss and find the value where or
the learning rate where the loss curve
decreases the fastest. Or in other
words, just like this graphic here. So
you see here loss decreasing very fast.
The ideal learning rate is going to be
somewhere between the lowest point in
the curve and about 10 times smaller
than that point. So have a practice with
that. Potentially you could train
another model on some other data that
we've worked with and find the ideal
learning rate. But now that we found the
ideal learning rate, let's fit another
model. Actually, you can probably try
that before we go there. So, pick a
learning rate in this little section
here. Create a new model and fit it to
our data. And I'll meet you in the next
video.
[Music]
In the last video, we saw how we might
be able to find our model's ideal
learning rate by the learning rate call
back and starting with a learning rate
of a fairly low value and slowly
increasing it every epoch and then
plotting the different learning rate
values versus our model's loss. We also
discussed that the ideal learning rate
is somewhere between the lowest point on
the curve and if we jump back to a
region where the loss is still
decreasing. So how about we try building
a model with a learning rate of so if we
look at this point on the curve it's
still the loss is still decreasing
sharply. So that's the value we actually
set before. If we come back up to model
8,
model 8. There we go. So this is the
learning rate we used before. 0.0
1. And then if we come down here,
which is the same as 10 ^ of -2. So if
we go 10 the^ of -2.
But we can also see that if we go jump
up one little notch here because this is
a log scale remember that the loss is
still fairly sharply decreasing here. So
that would be a value of 0.02.
So how about we try build a model with
that learning rate and see if it can
achieve similar results to model 8 that
we use this learning rate with in less
epochs. Because remember what does the
learning rate code for? It codes for how
fast our model should try and update its
patterns. The higher the value, the more
our model is going to update its
internal patterns every epoch. So let's
see it in action. Let's try using a
higher ideal learning rate with the same
model as before. So, we're going to set
random seed.
TF
random dot set seed 42. Beautiful. And
then we're going to create the model.
We're up to model 10 now. How good is
that? Double digits sequential.
Wonderful. Now, you could probably just
jump ahead here if you really wanted to.
We're going to have two layers that are
dense with four hidden units and a
nonlinear activation
relu. And then we'll do the same for the
other layer. TF caras layers dense and
it's going to have four activation. It's
going to equal relu
or relu depending on how you want to
pronounce it. And then the final layer
just the same as model 8 dense one
activation equals sigmoid. And now what
do we have to do next? This is where
we're going to compile the model with
the ideal learning rate. So the value
we've just picked off the curve. So this
one here
0.02.
And then we'll go we'll put this here.
learning rate we used before.
Model 8.
We come here. We're going to go model 10
compile. Loss is going to be what loss
are we using? Binary cross entropy. And
then we're going to go optimizer equals
TF carers. We have to use the the class
version of atom here. LR equals what are
we going to set it to? 0.02.
Beautiful.
Because we said we're picking this
learning rate value here. Let's come
back down. Metrics equals. We'll just
keep everything else the same except
we're increasing our learning rate by
0.01.
And now let's go fit the model for 20
epochs cuz model 8 we fit for 25 epochs.
So five less than before. Let's see what
kind of results it achieves with less
epochs. model 10.fit.
Actually, I might save this as history
10. That's probably better. History 10.
X-rain, Y train, and epoch equals 20.
Let's see what happens.
We come down. How did it go? All right.
Loss equals 0
878 and accuracy is 98.24.
So, what was our model 8 results? Let's
go. Let's go back. So, let's save this 0
0.9824.
And if we come back, so 9824. You
remember that, too? Where's model 8?
Are we going too far? Model 8.
There we Oh, 25. And it's 9749.
Ah so 9824
model 10 has performed slightly better
than model 8 on less epoch. So less
chances to look at the data all because
we increased the learning rate slightly.
So it gave our model 10 a chance to
learn patterns in the data faster
because they were updating with larger
steps. Now, of course, as with every
hyperparameter with our deep learning
models, that might not always happen,
but that's just another example of how
powerful tuning the learning rate for
your models can be. So, how about we
evaluate it?
Evaluate
model 10 on the test data set. Actually,
we could have just done this to begin
with. Model 10 evaluate
um X test Y test. Wonder how they
perform differently on the test data.
99. Beautiful. And then we go evaluate
model 8 on the test data. How did this
perform? Model 8. Evaluate
X test Y test.
H. So our model 10 gets a lower loss
value on the test data set than our than
our model 8. but a model 8 gets a higher
accuracy value. H now that depends on
which one of these metrics you want to
optimize for. So again remember the
metrics you get from the training data
set aren't always important as the
metrics you get from the testing data
set. So this is something you'd want to
investigate further depending on what
your needs are and depending so
potentially our model 10 uh just because
it's learned faster doesn't mean that
its eventual performance on a test data
set or unseen data will turn out to be
better. So this is where it takes a
little bit of a trial and error to
figure out which model is ideal for your
use case. Now, how about we finalize
this model 10 with the ideal learning
rate and just see how the predictions
look. So, I'm going to go plot the
decision boundaries for the training and
test sets. So, pltigure
[Music]
fig size
equals 126.
Wonderful. And then we'll do we'll get a
subplot going.
That's one two one. Then we'll do the
training data first. We'll set up the
title there. Then we'll bring in our
trusty function plot decision boundary.
And that takes in model 10. It'll also
take in x train first. We need an
underscore there. And then it'll also
take in y train. And then we want to
subplot again. One, two, two. So again,
this is row one, column two or two
columns. And this is section one. So the
first one, and this is one row, two
columns, and the second section. So
we'll see what that looks like in a
second. plt.title.
It's going to be test. And we want to
bring in our
fancy function from above. Model 10 x= x
test. And then y= y test. Beautiful. and
then plt.show
binary classification. Wonderful. So
again, our model 10 using a ideal
learning rate that we picked off the
loss curve gets basically perfect
predictions on the training data set and
the test data set. And see this is what
I was talking about before with the
subplot function. We want one row, two
columns and this first plot train is the
first element here. And this test
plot is the second element in the
subplot. So with that being said, we've
explored a few ways to evaluate our
classification models. We visualize
them, but there are a few more
classification evaluation methods that
we should really be looking at. So let's
have a look at those in the upcoming
videos.
[Music]
So far we've seen a few visually rich
ways to evaluate our classification
models. But let's have a look at a few
more evaluation methods that we can use.
Now these are some of the most common
evaluation metrics that you should have
in your machine learning and deep
learning toolbox to evaluate your
classification models. So let's have a
look here. They've got a key. TP equals
true positive, not toilet paper. TN
equals true negative. FP equals false
positive. And FN equals false negative.
So let's have a look. We've got metric
name accuracy. We've seen that one.
There's the formula there. True
positives plus true negatives. So all of
the true predictions on top of divided
by all of the other predictions. If you
wanted to do this in code, the accuracy
metric, you can do tf.caras.metrics
accuracy. or if you want to use
scikitlearn, it has a accuracy score
function in there. Now, when should you
use accuracy? Well, it's the default
metric for many classification problems.
However, it's not the best for
imbalanced classes.
So, for example, if you had 10,000
examples of one class and only 10
examples of another class and you got a
classifier to score 99.999%
or something like that, it could just
predict that everything is all one class
and get that sort of result. So, in that
case, you probably want to look into
using metrics like precision. Now,
there's a formula for precision here.
true positives over the total positive
predictions including false positives
there. If you wanted to use it in code,
you can use precision. Higher precision
leads to less false positives.
Let's go to Google and go, what is a
false positive?
Oh, corona virus testing. What is a
false positive? That's a topic of the
moment.
What is a false positive? There we go. A
false positive is when someone who does
not have corona virus tests positive for
it. So you can see where a false
positive may have altercations or it's
not a good thing because if someone
tested positive for coronairus that
could go on to have a whole bunch of
adverse effects that you didn't want. So
if you're training a machine learning
model, deep learning model as well, and
you wanted it to predict less false
positives, in other words, predicting if
someone had corona virus when they
actually didn't, you probably want to
optimize for the precision metric.
Now, there's also recall, which is true
positives over true positives plus false
negatives. If you wanted to use a code,
you could use one of these two functions
here. Now higher recall leads to less
false negatives. So if a false positive
is someone being predicted as having
corona virus when they don't actually
have it, what do you think a false
negative is? And what would what would
be the implications there? So the false
negative in this case would be if
someone was to predict we did a
coronavirus test and I actually had
corona virus but my test came back and
said that I didn't have it. So you could
imagine again that would have
consequences as well. If I had a test
that said that I didn't have coronairus
and I go about my life and do whatever I
want to do and then I start giving it to
other people. Well then that's not an
ideal scenario is it? So for problems
where false negatives are not good for
your use case you want to train your
deep learning models to have higher
recall.
However, you might be thinking, why
don't we just increase precision and
recall? There's often a trade-off
between the two. I'll show you here.
What is the precision recall
tradeoff?
Precision recall trade-off. Beautiful.
Let's understand precision recall.
Where is it? Have we got a trade-off
curve?
There we go. This is probably it.
Unfortunately, you can't have both
precision and recall high. If you
increase precision, it will reduce
recall and vice versa. This is called
the precision recall tradeoff. So keep
that in mind that in an ideal case, your
model would have high precision and high
recall. But usually when you try to
improve one versus the other, the other
one goes down. So say we wanted high
precision, you would have lower recall.
and the inverse of that is also true.
Another option is to try and improve the
F1 score which is like a combination
between precision and recall. So the F1
score is one of the scores that I like.
It's usually a good overall metric for
your classification models. But again,
me just reading these evaluation metrics
to you out loud probably doesn't make as
much sense as to when you start to code
them up and start exploring them
yourself. So just keep that in mind.
We're just talking about these here.
We're just naming these different
metrics. It's not until you get to start
to use them in practice that you'll
really start to understand when to use
which. And then finally, another great
one is a confusion matrix. So this is
particularly helpful when you're dealing
with all the way from binary to
multiclass classification.
You can create your own custom function
or scikitlearn has a built-in confusion
matrix function. So when to use? So when
comparing predictions to truth labels to
see where the model gets most confused,
it can be hard though to use with very
large numbers of classes as we'll see in
a future project. So keep these in mind.
Again, this is just a slide. In the
upcoming video, we're going to have a
practice implementing a confusion
matrix. But for these ones here, the
accuracy, we've already we've already
done this during our model training. So
for precision, recall, and F1 score,
your homework for this video is to dive
into the TensorFlow or scikitlearn
documentation for each of these three.
And then I'll see you in the next video.
We'll start applying some of these
metrics or evaluation methods to the
problems we've been working on.
[Music]
In the last video, we introduced some
more classification evaluation methods.
Now, again, these are some of the most
common and the ones you definitely want
to keep in your toolbox. So, make sure
you you screenshot this or take a note
down of these metrics because they're
probably going to be the ones you most
commonly see with classification deep
learning models. We didn't write much
code last video, so let's make up for
that. We'll go back to our notebook here
and we'll write another heading here,
more classification
evaluation methods. And so alongside
visualizing
our models results as much as possible,
there are a handful of other
classification
evaluation methods and metrics
you should be familiar with. And because
this is not a markdown cell, it's going
to keep going. That is all right. We'll
just put a little few dot points here.
We'll just replicate that slide that we
had. We had accuracy, which is probably
the most common. And then precision.
A higher precision leads to less false
positives. And then we had recall.
A higher recall leads to less false
negatives. However, there is the
precision recall trade-off, which is an
important concept to be aware of.
There's the F1 score, which is a
combination of precision and recall and
usually a good overall classification
metric. And then of course there's the
confusion matrix which is another visual
way of looking at things. And then
finally this is not TensorFlow specific
but it is also another way that you can
see everything. Classification report
from scikitlearn.
Is that on the slide before? I don't
think so. We can put this one here. I go
here. Classification
reports scikitle learn. The good news is
a lot of these classification metrics
all have a very similar principle. They
take in some true values and they
compare them to our model's predictions.
That is the the crust of of whenever
we're evaluating a model. What were the
true values the model should have
predicted and what were the values that
it did predict? Let's compare the two.
So I'll just put this little link in
here for the rest. The previous slide
that we looked at shows the code
examples that you can use. So let's
start off with accuracy. I mean our
model has already used it because we
passed in accuracy up here as the
metrics. Let's write some code to make
that look a little bit better.
So we can go here.
We've got check the accuracy of our
model and we got loss accuracy
equals model 10 dot evaluate
and we're going to evaluate it on the
test data set. And then we're going to
print we'll make this a little bit
prettier. So model has
or just model loss on the test set. That
sounds a bit better, doesn't it? And
then we're going to do because this is
an F string, we can pass that.
Beautiful. And then model accuracy on
the test set.
Wonderful. And then we can just
cuz we're going to set accuracy up here.
Let's times that by 100. And then we'll
just put
we'll shorten it to two decimal places.
Will this work?
I hope so. There we go. Okay. So, see
what we did there was we just evaluated
it on the test data set and then we
found out the loss cuz that's what our
model's going to return with the
evaluate function. So, the loss is there
and then the model accuracy. It comes
out in this dot point notation. We've
just adjusted it here to be more
visually appealing. So, what should we
work on next?
How about a confusion matrix? So what we
do is
we'll end this video here and we'll come
back to the next video and we'll see how
we can make a confusion matrix with our
model.
[Music]
Last video we left off with the
question, how about a confusion matrix?
We checked out the accuracy. Again, if
you want a little bit of homework, you
can check out our models precision,
recall, and F1 score. Before we create a
confusion matrix, let's see what the
anatomy of a confusion matrix is. So,
this is what we're going to be working
towards creating. So, it's a matrix. So,
it's got rows and columns. And on the
y-axis of a confusion matrix are usually
the truth labels. So, this is what
ideally our model did predict. But on
the x-axis is what our model actually
did predict. So inherently what happens
when you lay it out like this is that on
the diagonal are the correct
predictions. In other words, the true
positives and the true negatives. So you
might be wondering, well, what's a
positive and what's a negative? Well, in
the case of binary classification, a
true positive is when the model was
supposed to predict one when the truth
is one. And a true negative is that the
model predicts zero when the truth is
zero.
So knowing this, you might have guessed
that the ones outside of this diagonal
are the false positives and the false
negatives. In other words, a false
positive is when a model predicts one
when the truth is zero. And a false
negative is a model predicts zero when
the truth is one. So again, this is just
seeing the name of things. Let's code
this up and see what it looks like.
Let's go back to the previous slide and
see where the code is at. Confusion
matrix. What is the code? Custom
function or sklearn metrics confusion
matrix. All right, let's not reinvent
the wheel to begin with. Let's bring in
scikitlearns confusion matrix.
So, scikitlearn
confusion
matrix
metrics.confusion matrix. Beautiful.
Okay. Is there an example? All right.
So, example from sklearn metrics import
confusion matrix. We need true labels.
We need some predictions and then we
just pass it to the function. Wow. Okay.
Let's try it out. So,
create a confusion matrix. So, from
sklearn
import confusion
matrix. Wonderful. Then we need to make
some predictions. How do we do that with
our train model?
So we'll save on Y preds equals model 10
dot how do we make predictions with a
train model.
If you guess predict you'd be correct.
So we'll predict on the test data set
and then we can create our confusion
matrix.
How good is that? I love scikitlearn. I
love tensorflow.
Some beautiful beautiful code libraries
out there.
So that's all it is. We've got the Y
test which are our test labels and our
models predictions. Let's yeah see what
the confusion matrix looks like. Oh no.
What's happening? No module named
sklearn. Oh typo standard
sklearn.metrics import confusion matrix.
This should work.
Oh no. Classification metrics can't
handle a mix of binary continuous
targets.
Hm.
What is happening here? Well, what do we
do when we face value errors like this?
We have to inspect what we're trying to
predict on. So, what does Y test look
like? Maybe we view the first 10 values.
Okay. So, that's what Y test looks like.
What do our PRs look like? Do they look
the same?
Ah, I see where the trouble is. Look,
this is why we're getting a value error.
Classification metrics can't handle a
mix of binary and continuous targets. So
our test values are in binary form. So
it's zero or one. Whereas our
predictions are in continuous form. So
they're not zero or one. So what should
we do here? What do we have to do to
compare our test array and predictions
array?
Well, we're going to have to convert
our predictions array into zeros and
ones. But what even are these values
here? Well, these are called, let's
write this down. Oops. Looks like
our predictions array
has come out in prediction probability
form.
So this is the output.
We'll make that in bold. So
the standard output from the sigmoid
or softmax as we'll see later on
activation functions.
So what we're going to have to do is
convert them. So they're in prediction
probability. So this is a value that the
model has output. So the closer the
value is to one, the more the model
thinks that it's a one label. And the
closer the value is to zero, the more
the model thinks it's a zero value. So
out of the knowledge that we've learned
with TensorFlow so far, is there a
function that we can use to potentially
round these values, there was a little
hint there, to their closest integer
value? So for example, if one of these
was, let's see if this actually outputs
9.852. I'm not going to type out the
whole thing cuz that's going to take way
too long. What does this actually look
like? There we go. So, is that closer to
zero or one? Now, you might be
wondering, of course, it's closer to
one. So, this one would go to one,
right? And then the same would be for
this one. And then the same would be for
this one. And then we might have to go
on for a while before we find our zero.
You might be wondering, what's the cut
off? Well, the cutoff is 0.5.
So, anything higher than 0.5 will go to
one and anything lower than 0.5 will go
to zero. However, this is a value again
you can tune. But for simplicity's sake,
we're not going to use that for now.
We're going to just convert
our prediction probabilities
to binary format and view the first
10. Now again, I want you to try and
figure out this yourself. How could we
round these to zero or one using
TensorFlow?
If you guessed we use TF round, or
perhaps you already knew that. You're
like, Daniel, come on. I'm I'm about 50
videos into using TensorFlow. I know
this stuff already.
We go boom. What does this look like?
Ah, that's what we want. That's looking
more like what our test labels are. So
remember whenever we're comparing things
again, one of the biggest issues that
you'll run into is your tenses or your
data types being of the wrong format. So
all it is is about thinking about how
you can get them into the right format.
So that's what we want. So this is what
you'll often have to do with a
classification problem. The outputs of
your model will come in prediction
probability form and you'll have to
convert them to human readable form. In
other words, integer form. Now let's see
what our confusion matrix looks like. So
let's create
confusion matrix.
Confusion matrix
Y test. And then we want to go TF round
Y pres.
And we're off. H. Okay. So we got down
the diagonal. If we go back to the
anatomy of our confusion matrix, let's
see. All right. All right. So, we got
the same values here, but this isn't as
pretty. This is just an array. What we
might do in the next video is pritify
our confusion matrix to look something
like this. We know what the y-axis is.
We know what each row is. We know what
each column is. We know what the x-axis
is. So, we can look at this one, but
it's not the ideal type of confusion
matrix. I mean, if you imagine you were
trying to share that with a colleague,
what is even going on here? So next
video we'll see how we can pritify our
confusion matrix.
[Music]
So in the last video we made our first
confusion matrix. However, we said that
it's nowhere near as pretty as a
confusion matrix we have here. So how
about we write this down? How about we
pritify our confusion matrix because
there has to be more beauty in the
world. And excuse me, I'm getting a
little bit poetic here, but code can be
poetic, too. So, the function we're
going to use or the code we're going to
write is I'm going to put a little note
down here. So the confusion
confusion matrix code
we're about to write is a remix of
scikit learns plot confusion
matrix function. Now we've got this up
here. So if you look into this we can
click on the source code and we can
follow through with that fairly
extensive source code. Scikitlearn is
beautifully documented and you can go
through that. I tried out this plot
confusion matrix function and I found
out it only works with estimators. So a
scikitlearn model is referred to as an
estimator whereas we want to use
tensorflow. So we want to adapt it to
our tensorflow code. So you'll often run
into this, right? You'll often run into
it's like where you want some sort of
functionality
but it exists somewhere else but to get
it working for your use case you kind of
have to tailor it. So that's what we're
going to do here. So just follow along
and we're going to make a pretty
confusion matrix with this here and with
our TensorFlow model. So let's set up a
fig size just so we can use this again.
Now because we're writing we're going to
be writing a fairly extensive amount of
code here. What's our principle?
Whenever we're writing something, if you
get stuck or you're not sure what's
going on or my explanation isn't as
great as it could be, always pause and
rewrite it yourself and see what
happens.
That's what I do whenever I don't
understand something. So, we want to go,
we can just create our confusion matrix
just like we've done up here. So,
nothing new so far. Y preds. Beautiful.
You might have noticed I've imported it
tools. We'll see where that comes into
play in a second. We're also going to
create a little normalized feature. So
if we come back here, this is how we're
going to get percentages here. Right? So
this label 98% of them are correct and
2% are incorrect. This confusion matrix
works out well because we have 100
examples. Might be different depending
on the amount of samples you have for
each label. But let's not get
distracted, Daniel. Let's code. So we're
going to go cm as type
float.
Now because remember our confusion
matrix so far is just an array a very
non-pretty array that is we can just
divide it by the sum to normalize it.
Access equals 1
and then we're going to go here np new
axis. So this is going to be this will
normalize
our confusion matrix.
Again, if we wanted to see what that
looks like, we could go cm norm.
Wonderful. There we go.
And now let's go in here. We want to set
up number of classes.
So we can do this by getting the shape
of
our confusion matrix again to see what
something does. cm.shape.
This will be helpful for if we had
multiple classes. So, right now we're
only dealing with binary classification,
but what if we had 10 classes? We may
see that later on. Spoiler alert. So,
let's pritify it.
Got my dyslexia kicked in there. Got my
FS and T's mixed up. So, we're going to
create a figure and an axe using
plts subplots. And we're going to set
our fig size to equal fig size. We
hardcoded this up above here. You don't
have to do that, but I just decided to.
And then we're going to
create a matrix plot.
So, CX equals
axe domat show. So, this is a matrix
plot.
We'll go here. Mattplot lib mat show.
What does this do?
Display an array as a matrix in a new
figure window. Beautiful.
Again, we're remixing some code from
scikit loan here. So, matt show, what do
we want to pass it?
We want to pass it our confusion matrix,
right? So, cm. And then we'll set the
color map to equal plt cm blues. A lot
of cms here. Don't confuse our confusion
matrix cm with plt.color map blues.
Wonderful. And then we'll go figur.
[Music]
And finally, we can create classes. We
need to set up a bool.
So if we do have multiclass, we want it
to do something. And if we do only have
binary class, which is what we're
working with now, we want it to do
something else. So that's why we're
creating a conditional. If classes
labels equal classes
else
labels equal np a range
cm.shape
zero
wonderful. So if we have a list of
classes if it exists we'll set the
labels to equal classes. But if it
doesn't, which is our current scenario,
we'll set the labels to be just a range
of our confusion matrix shape on the
zero axis, which is just two. So it'll
be 0 to one. Wonderful. And now let's
label the axes.
Axe set. Because we're going to be
writing a bunch of labels here, we're
just going to use the shortcut ax set
for mapplib. Confusion matrix. That's a
title. And remember, we're prettyifying
it. So it looks like this. That's our
title. This is our X label. This is our
Y label. So these are the ones that we
need. X is the predicted label.
And then Y label equals true
label. Wonderful. We're also going to
set up the X tick to be NPA range and
classes.
And then the Y tick is going to be the
same thing. np a range and classes.
That's the number of little dashes that
we want to have on our figure. We'll see
what that looks like in a second. We can
also set the x tick labels to be our
labels variable, which in the case of
binary classification, it's just going
to be the range. If we don't have a list
of class names, the labels are just
going to be the class integer values.
And same thing for the y tick labels
equals labels.
We need to set the threshold for
different colors. This will make sense
in a second. So the threshold equals cm
dom max. So the max value in our
confusion matrix plus the minimum
and then we want to divide that by two.
So that's the threshold. So this is
going to give our confusion matrix
different shades of squares depending on
how many values are in there. So a
typical confusion matrix is you want the
diagonal axis to be really dark where
the correct predictions are and all of
the other ones where there's not many
prediction values to be light. This will
make a lot more sense once we visualize
our pretty confusion matrix. So plot
we're going to plot some text on each
cell.
So we can go for i j in it tools. So it
tools is going to iterate through
whatever we pass it. So we want range cm
shape zero and same thing for range cm
shape 1.
We're going to go boom plt. Now this is
we're going to set the text for each
square. So j i. This is giving ourselves
coordinates. And then we can set up an F
string for our confusion matrix for I J
coordinates J index that is we want that
and then it's also going to be and we
want in brackets CM norm
again the same local index
times 100
we want one decimal point here F and
then squiggly bracket fix up the F
string
and then percentage.
Oh, our brackets correct here. Fingers
crossed.
There's going to be an error here
somewhere, isn't there? Let's traverse
back through. Where have we got an
extra? Because we want this to be the
same color. Okay, wonderful. Yep, that
should be like that. Oh, our F string
has ended too early.
That's what we want. Wonderful. And now
we can have a little comma after that.
We want horizontal
alignment
equals center. And then we want to go
color equals white. This is going to be
the color of the text. If our confusion
matrix at a particular index is greater
than the threshold, else we want the
text to be black.
Wonderful. And the size can be 15.
There's a fair bit going on here, but
again, if you're not sure what's
happening in any of these lines,
remember, break it apart. We've got a
lot going on here. This took me a while
to remix this function here. So, I
didn't just pull this out of the hat.
Let's see what it looks like anyway. See
where our errors appear. Wonderful. Text
has no property horizontal alignment.
Oh, I have a no N.
There we go. Woo. So, we've basically
just replicated our pretty confusion
matrix. See what I meant with the color
threshold here? We want our squares with
lots of values in them to be darker and
the squares with not many values in them
to be lighter. And the more values is,
we can't really see it with this
confusion matrix, but we will see it
later on. The higher the values, the
darker the shade of the square will get.
Maybe if we go confusion matrix
images. There we go. This is a good
example. So this is without
normalization. So we see here all these
squares are light and all these squares
are darker because they have more
values.
So with that being said, we probably
could increase the text size here. So
that's a bit more visual. How about we
do that? What do we want to do? We want
to set the Xaxis labels
to the bottom.
That's what will get that to the bottom.
See that zeros and ones? We want to get
those down there.
So, we can go axe dot xaxis
access the x-axis parameter set label
position. And then we're going to just
type in
bottom. Then we go xaxis
dot tick. We want to have the ticks down
there as well. Correct. And then we want
to adjust the label size. We can do that
by going ax yaxis
label set size 20.
And then we're going to go ax x-axis
label set size
20.
And then we can do the same for the
title. Axe dot title dot set size 20.
Now, of course, beautiful. That's
looking way better. Have a go at this.
We've just pitified our basic confusion
matrix. We come back up here. Albeit it
took a fair bit of code. We have this
array here.
And now looks like this.
Now, of course, we could functionize
this. Maybe that's a little task for
you. could functionize this to work with
any Y test values and Y prred values. So
maybe we look at that doing that later
on or maybe you could try that now. But
as you could see, this is a really great
and visual way that we could quickly
show someone how our model's performing
and see how it does on different classes
where it gets confused. Now our model's
doing pretty well right now on our data
because as we can see the diagonal is
very dark. But going forward, it'll make
more sense once we work with a
multiclass classification problem and
make a bigger confusion matrix to see
where our model messes up when
comparing, say if we had 10 different
classes. So that's where we'll finish
this video. We'll get rid of these two
cells here and then in the next video
we're going to start tackling a larger
example, more specifically multiclass
classification.
So go back through, see if you can turn
this into a function. Maybe you want to
turn it into a version of plot confusion
matrix. So you could do
def plot confusion
matrix
blah blah blah and see if you can pass
it something like true labels and
predicted labels and have it come out
with something like this. So give that a
shot and I'll see you in the next video.
[Music]
Alrighty, welcome back. I'm super
excited for this series of videos
because now we're going to start. Let me
write this down actually working with a
larger example. In other words, or more
specifically, multiclass classification.
So, we'll just make this little heading
here. Actually, it's probably worth
making it a size one heading. So, we'll
turn that into markdown.
Oh, we've spelled multiclass
classification wrong. That's all right.
Now, what we're going to do in this
series of videos is so far we've create
our own data set. Now, we created a blue
and red circle data set, which is a
fairly simple data set. And then we went
through a whole bunch of different steps
to model it. We learned how we can
improve our models. We learned about
nonlinearity, which is important if we
want to model data that is nonlinear, so
has different shapes. We learned how we
could evaluate and improve our
classification models. We learned how to
plot the loss curves, find the best
learning rate, and a whole bunch more
classification evaluation methods. So
now to really drive all of these
concepts home, let's start with a new
problem, a multiclass classification
problem. So let's say you were a fashion
company and you wanted to build a neural
network to predict whether a piece of
clothing was a shoe, a shirt or a
jacket. So in that case you have three
different options. So let's write that
down. When you have more than two
classes
as an option,
it's
known as multi-class
classification.
So two classes on their own is binary
classification but more than two is
multiclass classification.
So that means this means
if you have three different classes
it's multiclass classification.
It also means
if you have a hundred different classes,
it's multiclass classification. Now, the
good news is with a few tweaks,
everything we've worked on so far in our
binary classification problem, we can
apply to a multiclass classification
problem. We could just look at here to
see the steps that we've gone through.
Or we could just jump back into our
presentation and remind ourselves of the
steps in modeling with TensorFlow.
I love that little animation. Look at
this colorful little picture. So, we
need step one is get data ready. Turn it
into tenses. Okay. Step two is build or
pick a model. We've built a lot of
models so far, so we're pretty familiar
with this. Yeah. Fit the model to the
data and make a prediction. Okay.
Evaluate the model. improve through
experimentation and save and reload your
trained model. Wonderful. Let's start
with step one. Let's get the data ready.
So, I kind of hinted at what kind of
data set we're going to be using before.
We're going to pretend that we're a
fashion company and we want to build a
neural network to classify different
images of clothing. So
to practice
multiclass
classification,
we're going to build a neural network to
classify
images of different items of clothing.
Now the beautiful thing about this is
that we can use the fashion emnest data
set which is built in to the
tensorflow.caras
data sets module. So let's have a look.
If we go tensorflow
fashion mnest
beautiful fashion mnest tensorflow data
sets so the tensorflow data sets module
is up here. It has a whole bunch of
different built-in data sets that you
can use to practice on your own
problems. So, they're great to use to
sort of get familiar with how a neural
network you can build with TensorFlow
and get it working before you adjust it
to your own data set. That's a really
important concept. A lot of the time in
machine learning and deep learning,
you'll work on problems that have
existing outlines and then slowly adjust
them to whatever you're working on. So,
we've got a description here. Fashion
MNEST is a data set of Zelando's article
images consisting of a training set of
60,000 examples. So, it's going to be
our biggest data set yet, and a test set
of 10,000 examples. Each example is a
28x 28 grayscale image associated with a
label from 10 classes.
You could go through that documentation
there, or you could just follow along
with the code here. To get started with
the TensorFlow data sets, we'll
reimpport TensorFlow as TF
and we'll also get the TensorFlow.caras
data sets module. We're going to import
Oh, that needs to be data sets. We're
going to import fashion emnest.
Now the good thing about the TensorFlow
data sets is that all of the data sets
are basically as many as you can or at
least the ones I've worked with so far
which is a fair few have already been
split into training and test sets. So we
can go the data has already been sorted
into training and test sets for us. So
to import it we can use tpples. So it's
going to be train data, train labels,
and then another tpple for the test data
and test labels. So we'll set that up as
equals fashion_mnest,
which is this module we just imported
here. And then we'll use the load data
method. So of course, if you wanted to
see this in the documentation, you could
come back here. And the split says that
it's already in test and train. 10,000
in test, 60,000 in train. And then there
should be an example somewhere of how to
import it. I guess not.
That's all right. It's somewhere here.
It may be in the overview section. Let's
go back to focusing writing code. So, we
shift and enter that and it'll download.
It'll be pretty quick cuz it's a
relatively small data set and it's
stored on Google storage. Now to check
out an example,
we can go show the first training
example,
we go print f
training sample and I'm going to put it
on a new line just so it's nice and
neat. We'll go train data and then we'll
just get the zeroth index from that and
then I'll get a new line after that. We
need to finish that fing. There we go.
And we're going to do the same thing for
the training label.
We'll just do the same index here. Of
course, we could have done this
randomly, but I'm going to just get one
out of there cuz that's one of our first
steps, right? When whenever we're
downloading data, we want to become one
with the data. We want to visualize,
visualize, visualize.
Okay. So, what have we got here?
Training sample. We've got some sort of
array of numbers. Beautiful. And the
numbers are varying from it looks like
from zero up to about 255. Okay. And
then the training label is a nine. All
right. So this array of numbers or this
matrix or this tensor of numbers
represents
training label number nine. So if we go
here, we look up Zelando Researchers
Fashion MNEST. These are the images that
we're going to be working with. So
there's 10 different classes of image.
We got shoes, we got dresses, we got
shirts.
Come down.
We can get the data from there. Okay,
here's the labels. So the labels, each
training and test example is assigned to
one of the following labels.
So number nine, this is an ankle boot.
Wonderful. So this first sample is an
ankle boot. Now of course we could keep
looking at numbers. What else do we want
to investigate whenever we're working
with a new data set? We want to look at
the shape and we also want to look at
what it looks like. So let's do that.
Let's go check the shape of a single
example.
So let's go train data. We need to get
the first sample dot shape. And let's go
train
label
labels zero dotshape.
Wonderful. So we've got a 28x 28
tensor here and the train labels is just
a scalar. So it has no shape there. Now
how about we
get visual plot a single sample
and come in here. So to plot it, we can
import mapplot lib
pipplot as plt. We go plt because we're
working with an image m show. And then
we can just pass it a single example.
We go zero. And because it's training
label is nine, what should it look like?
It should look like an ankle boot. Let's
see if this works.
Oh, okay. So, fairly pixelated here, but
you can kind of see the outline of a
boot there. And how about we try another
one?
Wonderful. So, that looks like a
sweater. And then check our samples
label. So, if we go train labels and
pass it the same indexes here, what
label does this get?
All right, it gets a two. And if we come
back here, oh, it's a pullover. Okay,
so I said that this was a sweater. Kind
of looks like a sweater to me, but
again, depending on what data set you're
working with, it will have different
labels. So, we've gotten pretty familiar
with our data. We're probably going to
have to set up a little list to index on
our training labels. So, let's start
doing that in the next video. And then
we'll slowly keep getting more and more
familiar with the data before we start
to build a multiclass classification
neural network.
[Music]
Let's keep going getting familiar with
our data that we're going to be working
with for the multiclass classification
problem. So far it looks like that our
labels are in numerical form. And while
this is going to be fine for our neural
network, we probably want them in human
readable form. So what we're going to do
is create a small list of the class
names that we found on the data sets
GitHub page. And that way we can index
onto that list so that instead of just
having a train label as two, this
actually reads as pullover. So right
down here,
create a small list so we can index onto
our training labels so they're human
readable. Again, this is all part of
becoming one with the data that we're
working with, understanding what kind of
problem that we're working on. So we go
here. All I'm going to do is just copy
these in the same order. So, it'll be
t-shirt, top, trouser, pullover, dress.
So, let's just write that down. Might
speed this little section up.
Wonderful. Now, we've got a list of
class names that reflects the actual
human readable name and we've got their
labels in integer form, which will be
great for our neural network. So, let's
have a look. How many classes are we
dealing with?
10. So again, anything over two is
classified as multiclass classification.
So now we've got our class label names.
Let's plot another example. So we go
here, plot an example image and it's
label
and go plt. Mshow and which number or
which index should we pick this time?
How about we go 17? And then we're going
to set the color map to be plm
for color map to be binary cuz this is
going to be grayscale.
We'll set the title plt.title.
Now, this is where we can index onto our
class names list. Class names list with
train label
and the same index here.
Or actually, we can probably just create
an index
of choice
variable. Boom. So that we can just
update that index of choice. Again, we
could probably just set this up to be
just a random number generator
if we really wanted to, but this is how
we're going to plot different examples.
Wonderful. So there's a t-shirt/top. And
we can do it again.
20. There's a dress 10.
t-shirt top 100
2,000 there's a bag
and there's a coat. Wonderful. Now again
we could keep going through these and
visualize different examples so that we
know the data that we're working with.
But what's next? How about we set up
rather than just running this cell
dozens of times, we set up just some
code to plot a bunch of random examples
so that we can get familiar with them.
Now, with a data set like this, you
could probably just look at what it
looks like here and start to understand,
okay, this is pretty straightforward
what it is. It's 10 different types of
pictures and they're all grayscale. But
if you had a larger data set with a 100
different categories and your images
weren't grayscale, they had a whole
bunch of other details in them, you
probably want to have a look at multiple
or hundreds, if not thousands of samples
before you start building a model. And
when I say that number, again, that's an
arbitrary number. It's just the number
of samples that you start to feel
familiar personally with the data that
you're working with. That's the most
important part. So let's write some code
to plot multiple random images
of fashion emnest
import random
plt.figure figure. And right now we're
working with an image data set. But the
same would go for any type of multiclass
classification data set that you're
working with is to visualize visualize
visualize as many samples as you can. So
we'll probably loop through four samples
at a time. We'll set up an axe with
plt.plot.
Actually, we need a subplot, don't we?
Cuz we want to plot multiple different
things. We'll give it two rows, two
columns, and the index can be i + one
because we're looping through. We'll
select a random index using random dot
choice, and we'll set up a range for the
length of how many training samples we
have, which is 60,000.
So, all this is going to do is just pick
a random number in the length of our
training data. So, it'll be from 0 to
59,999.
and then plt.mim
show. We're going to show the training
data
um sample that appears at the random
index. Then the cmap can equal
plt.cm.binary
so it comes out in grayscale. And then
the title, what do we want the title to
be? We want the class names. And then
again, we'll index on our training
labels where the random index occurs.
And we don't want any of the ticks
because we don't need them. We just want
to see the images. So hopefully this
works. We should be able to run this
cell and visualize.
Okay. Shirt, bag, sneaker, trouser.
Wonderful.
T-shirt, shirt, dress, bag. So you can
imagine if we went through this, this is
a bit easier than what we just did here.
So this is what I like to do whenever
I'm working with a fairly large data set
is
never underestimate the power of
randomness. is I like to look at samples
randomly and just keep going through the
data set, start to build an image of
what these look like in my own mind
before I start to build a neural network
that's going to distinguish patterns in
them. Now, I want you to have a careful
look at the data that we're working
with.
What kind of shape does it take on?
Are there many straight lines? Are there
any curved lines?
How does that relate to what we've
learned with linearity and nonlinearity?
So, do you think with this type of data
that we're working with, are we going to
need a neural network that uses just
linear like straight lines, or will we
need a neural network that's going to
have some kind of nonlinearity in it?
So, that's something to think about. But
have a play around with this little
section here. Become one with the data.
Run it about 25 more times. So you've
seen at least 100 different samples and
then next video
if we come back to our modeling we've
got our data ready. Luckily it's already
in tenses for us cuz we've downloaded it
from the carers data set module. However
often times your data won't be ready
turned into tenses. So we'll takeick
this step off. Now we're up to building
or picking a pre-trained model to suit
our problem. So we'll get on to that in
the next video.
[Music]
Okay, so we've started to become
familiar with the multiclass data that
we're working with. Now, let's build a
model. So, we go building a multiclass
classification
model. Now, I said earlier,
well, we've already built a fair few
binary classification models, and I did
mention earlier that the multiclass
classification model, we're only going
to have to tweak a few things to get it
to work with our multiclass
classification data. So, if we go back
to the keynote, we look at the typical
architecture of a classification model.
So, we've got multiclass classification
over here is what we're working on now.
So the input layer shape same as binary
classification depends on the number of
features you have. Number of hidden
layers again same as binary
classification.
Neurons per hidden layer is the same.
We've got another difference here. It's
the output layer shape. So for binary
classification is one one class or the
other. Multiclass classification has one
per class. For the hidden activation we
can use the same. In binary
classification, we use the nonlinear
function relu. And I asked you in the
last video to think about whether or not
we might need nonlinear activation
functions.
The output activation, okay, that's
different too. For multiclass, we're
going to need the softmax function
rather than sigmoid.
We also need a different loss function.
So for binary classification, we use
binary cross entropy. But for multiclass
classification, we use categorical cross
entropy in TensorFlow. And then finally,
the optimizer can stay the same. So we
can use our trusty atom. All right,
let's write some of these down. If we go
here
for our multiclass
classification
model,
we can use a similar architecture to our
binary classifiers.
However,
we're going to have
to tweak a few things.
Namely, first of all is the input shape.
So, what is our input shape? What what
shape is our data? This is what we
explored before. So, the train data
sample0shape.
Okay. So, our input shape is 28x 28. So,
let's write that down. 28x 28
the shape of one image. Wonderful.
And we're also going to have to modify
the output shape. What was the output
shape? If we come back to our typical
architecture of a classification model,
the output layer shape is one per class.
So how many classes did we have again? L
class names.
Beautiful. So the output layer shape is
10. one per class of clothing.
That's what we want.
And what else was there? Ah, the loss
function. Loss function equals has to be
TF carers.lossescategorical
cross entropy
instead of binary cross entropy.
We'll put that there. And then I think
that's all we'll have to do.
Beautiful.
because we can keep the other things the
same.
Ah the output activation that's what we
need to write down.
Output layer activation
equals softmax not sigmoid. All right.
So with this being known, how about we
put together our first multiclass
classification neural network model. So
we're going to go through the exact same
steps that we've been through before.
So, set random seed. TF random set seed.
We'll give it 42. Wonderful. Let's just
try replicate the exact same model we've
been using in previous videos. So, we'll
go create the model.
I think we're up to model 11 now. TF
caras sequential.
Wonderful. And then what did we have? We
had two dense hidden layers. TFARS
layers dense with four hidden units. and
a relu activation. So we'll just make
two of those. TF carers.layers.dense
4 and ALU activation. Wonderful.
Nonlinear because this is the answer to
the question I posed before.
If you guessed that we're going to need
nonlinear activation functions for our
data, you'd be correct because it is
composed of straight lines and
non-straight lines. So that's why we
need nonlinear activation functions for
our output layer. Oh, this is going to
be different, isn't it? We can't use the
exact same one. How many output shapes
do we need? We need 10. Wonderful. We
can do that. 10. And we're going to
change our activation from sigmoid to
softmax.
So we can do it like this. Softmax. Or
we can also do it like this. TF carers
activations do softmax. Now I'm not sure
if it's a capital S. Let's look it up.
TensorFlow softmax
activation.
There we go. So, I believe we might just
be able to put it like that.
Let's see if autocomplete helps us out.
Softmax.
There we go. Wonderful. Now, what else
do we have to do? We have to compile the
model. As always, compile the model.
We're up to model 11. dot compile
loss equals what do we have to change
here? We needed to change it to
categorical cross entropy. TF carers
losses or categorical cross entropy.
Wonderful. We can do that. And then we
can go optimizer. We don't have to
change that. We can do our trusty atom
optimizer.
We'll leave all the default parameters
there. And finally, let's set up
metrics. So we'll just set as accuracy
our trusty accuracy. Beautiful. So let's
fit the model and we're going to save
this as non
norm history.
And you might be asking why is that?
Well, I will reveal all in an upcoming
video.
Now we can fit directly on the train
data and the train labels.
And we might go just for 10 epochs. So
10 passes through all of the data and
we're going to introduce a new parameter
here to the fit function which is
validation data. So this is where you
can put in we don't have a validation
data set but we already do have a test
data set. So what what we can do here is
at the same time the model is fitting
and trying to find patterns in the
training data. So the relationships
between the training data and the
training labels it can evaluate how well
those patterns
are on the validation data. But in our
case, we don't have a dedicated
validation set. So we'll just use our
test data here. So this data will remain
unseen. And that way we can evaluate how
good our model's patterns are that it's
learning in the training data are when
we use them on unseen data.
So let's see how this goes. Oh no, what
have we got wrong here? Value error.
Shapes.
Ah, the classic shape error. Hm. What
have we got? Shapes 31, 32, and 32, 28,
10 are incompatible. Where are we
getting these shape errors from?
Ah, four. So, what's 28 + 4? That's 32.
You know what? I'm going to have to
introduce a new layer here. And this is
a layer that you're going to often need.
So, it's worth exploring it a little
bit. We need to flatten
our data. And what does flatten mean?
I'm going to give it here input. What's
our input shape of our data? Input shape
is 28x 28. So the input shape, we're
going to pass it here as 28 28. This is
telling our neural network that hey,
we're passing you some images that are
28x 28. And you might be wondering, what
does flatten do? Well, let's explore it.
We'll get a new cell here and go flatten
model equals. We're just going to create
a model with a single layer. TF car is
sequential
and TF carers. Of course, if you want to
skip ahead and just look up the
documentation rather than listen to me
talk about what the flatten layer does,
you can 100% do that. But I like
exploring things by writing code
cuz I'm not sure about you, but I tend
to read like documentation three
different times and still don't
understand it. When I write the code, I
start to understand it. Let's check the
shape of this. What is happening here?
Ah,
none. 784. Where did 784 come from? I'll
give you a little hint. What's 28 * 28?
784.
All right. Well, now that we've seen the
flattened layer in action, now let's
look it up in the documentation. So,
TensorFlow flattened layer.
Flatten layer flattens the input. Okay.
Does not affect the batch size. Do we
have a demonstration here?
Okay. So, this is the original shape. 1
* 10 * 64. And if we use flatten on
that, it's going to turn it into 1 * 10
is 10. 10 * 64 is 640.
Ah, I see. So what it turns it into
instead of being a 28x 28 array, it
flattens it all so that it's now of
shape none 784.
Cuz what you'll often find is a neural
network likes everything to be in one
long vector. And then we pass that
through these other layers here and we
get to the outcome that we like. So,
our data needs to be flattened
and we'll go here from 28x 28
to
none 784.
So, if you ever run into a shape error
in your neural networks and you find
that you haven't flattened your data
into one long vector, could be because
you're not using a flattened layer as a
very first layer of going into your
neural network. Some layers can flatten
your data automatically, but typically
you'll need to tell your neural network
that, hey, here's the data. Here's my
input shape. I want you to compress
that, flatten that into one long vector,
and then pass it through your other
layers. So now that we've flattened our
data, let's see if it gets rid of this
value error. The shapes that are
incompatible.
And again, as I said, the shape error,
it's one of the most common errors
you'll come across. Oh no, what have we
got here? Value error. Shapes 321 and 10
are incompatible.
H, you know what?
I think it's our loss function.
You might be thinking Daniel that loss
function is the exact same that we have
in the multiclass classification in the
architecture of a classification model.
What's going on?
Well,
there are two types of loss function.
Now one is for if your data is one hot
encoded. So if we go what's our training
labels look like? Train labels zero.
So our training labels are in the form
of integers. So maybe we'll have a look
at the first 10.
There we go. So 9 0 03 02
7255.
Let's have a look at the documentation
for categorical cross entropy.
I'll just copy and paste that in there.
Computes the cross entropy loss between
the labels and predictions.
Use this cross entropy loss function
when there are two or more label
classes. We expect the labels to be
provided in a one hot representation. If
you want to provide the labels as
integers, please use sparse categorical
cross entropy loss. So that is where
we're getting our value error from now.
The shape error is coming because the
loss function categorical cross entropy.
Now I get confused with this all the
time but this loss function expects our
labels in one hot representation.
But if we change it to sparse
categorical cross entropy it should
work. So let's try that sparse
categorical cross entropy.
Let's see what happens.
Oh yes look at that. Our neural network
is running. Beautiful. So, two little
tidbits to take away from this is that
our binary classification model that
we've used before. This is very similar
this model 11 to all the other models
we've been building throughout this
entire section can work for multiclass
classification data with a couple of
tweaks. namely defining what the input
shape of our data is. Changing the
output layer activation function as well
as how many classes we're after and
updating the loss function to reflect
the problem that we're working with and
also the style that our labels are in.
So I wonder if we can oneh hot tf1hot
oneh hot train labels depth equals 10.
What happens if we do that? There we go.
Oh, I wonder if that would work.
Categorical cross entropy. Change that.
And then we just TF one hot train labels
depth equals 10. And then we can do the
same. We're going to have to do the same
for the validation.
This is our experimenting on the fly
here.
Depth equals 10.
Man, we need an extra bracket here,
don't we?
Boom. Our neural network starts to run
as well. Wonderful. So, let's put a
little note down here.
If your labels are one hot encoded,
use categorical
cross entropy.
And if your labels are integer form,
use sparse categorical cross entropy.
This one has tripped me up a whole bunch
of different times. just takes a little
bit of practice. So again,
oh typo there. If you get any shape
errors with your models, the three
things you have to look at, input shape,
output shape, and the loss function that
you're using. So they're the three main
value errors or shape errors that you're
going to come across. Of course, there
could be more, but they're the three
that I most often run into. So how
exciting is that? We've already built
our first multiclass classification
model. Let's continue on with where we
were in the next video.
[Music]
In the last video, you built your first
multiclass classification model. So, you
should be very proud of yourself. Give
yourself a little pat on the back. But
there was one thing we we kind of forgot
to talk about. We did code it but we
didn't really explain it. Is this
validation data parameter? So you might
have noticed a change in the output of
our model training log here. If you see
loss accuracy, val loss, val accuracy,
you might have sort of figured out what
the val loss and the val accuracy is. If
not, that's fine. The loss here with no
prefix is the loss on the training data.
So how wrong the model is trying to
figure out the patterns between the
training data and the training labels.
The accuracy here with no prefix is the
model's accuracy on the training data.
But the val loss and the val accuracy
here is the model's loss on the
validation data and the accuracy on the
validation data. Now this is important
because this is the data that a model
has never seen before. So it trains on
the training data and then it validates
itself to see how good its patterns are
on the validation data. So whenever you
pass a validation data parameter with
some kind of data, you're going to get
these extra outputs here. And so this is
a way to tell, remember, a model's
results in the training data set don't
necessarily reflect how it's going to
perform in the real world. You really
want it to perform well on data it
hasn't seen before to sort of get an
idea of how it's going to perform in say
your application or something like that.
So right now our model is getting a
score an accuracy score of about 35%.
Now that's better than guessing cuz
we're working with 10 classes. So if we
do 10 or 100
100% accuracy divided by 10 for 10
different classes. If our model was just
guessing it would get about 10%
accuracy. So okay, we're getting about
three and a half times that. But let's
see if we can improve it. First, let's
get a a model summary. Check the model
summary. I want to highlight something
else before we move on to try and
improve this accuracy is the the input
and output shapes of our model, which is
a very very important point. Check the
model summary. Model 11 summary.
We go here.
So, we can see that the flatten layer
takes our 28x 28 images, flattens them
into a 784 long vector. It passes
through these two dense layers. So dense
layer one here, dense layer two there,
and then it gets output into a size of
10 for 10 different classes. So now do
you recall back when we were
pre-processing data? We haven't done
much pre-processing data with this
problem because the data we got from the
carers data sets module, the fashion
mnest data set is already numerical.
However, we spoke about in a previous
video the concept of scaling or
normalization. If you're not sure, if
you can't remember it, that's okay.
We're going to go back through it. So,
we come back to our keynote. I
deliberately left this arrow here so
that you would think about, hm, what's
Daniel missing out on there? Well,
number one is turn all data into
numbers. Luckily, we've already done
that. Number two is make sure all your
tenses are the right shape. Well, we've
already been through that, too. So, tick
tick. Number three is scale features.
Hm. What does this mean? Normalize or
standardize? Neural networks tend to
prefer normalization.
So, let's remind ourselves of what that
is. Better yet, let's remind ourselves
of our training data. Check the min and
max values of the training data.
Train data min and train data max.
All righty. So 0 and 255.
Now if we said in this slide that neural
networks prefer normalization,
what is normalization?
Well, it's also referred to as scaling.
So let's write this down. Neural
networks prefer data
to be scaled or which is also referred
to as normalized or depending on what
circle you're from.
This means they like to have the numbers
in the tenses they try to find patterns
in between 0 and 1. However, right now
our data is between 0 and 255.
So, how might we get our training data?
And we're going to have to do the same
for our validation data between 0 and
one.
Well, we can do that by dividing all of
the data by the max number. So, let's
have a look. So, write this down. We can
get our training and testing data
between
zero
and one by
dividing
by the maximum. So this is referred to
as scaling or normalization. Again,
sometimes you'll find different names
for different things, but if I use the
word scaling or normalization, I'm
referring to getting our data set
between 0 and one. So, let's go here.
Train data equals train data. And we'll
divide by 255 as a float cuz that's the
max value 255. Then we do the same for
the test data equals test data
divided by 255. Or actually what we
might do to save the fact that we're not
overwriting our original variable, we'll
do train data norm
and
test data norm. Wonderful. And then we
can check the min and max
values of the scaled
training data.
Train data norm domin
and train data norm dom max.
Boom. What do you think they'll be? Zero
and one. Beautiful. Now that our data is
between zero and one, let's see what
happens when we model it. So we're going
to actually just use we'll change
nothing, absolutely nothing from model
11 except for the data that we're using.
So the only thing that it will change is
we'll use train data norm and test data
norm and everything else will stay the
same. So you can go ahead and try to
replicate that before I replicate that
in the video. But I want you to think
now that our data has been normalized.
It's one of the things that we can tune
with our neural networks to make the
performance better. What do you think
will happen with a different data using
the exact same model?
It's all right if you're not sure. Let's
find out.
So now our data is normalized.
Let's build a model to find patterns in
it.
So we'll come here,
set random seed, TF random, set seed,
and then we'll go create a model. Same
as model 11.
We'll call this model 12 actually.
a dozen models. How good is that?
tfkaras dose sequential.
We're going to know how to do this off
by heart.
And then we're going to need a flatten
layer
to get our data from 28 to 28.
Flatten
input shape equals 28 28. Telling our
model, hey, I'm passing you images that
are size 28 by 28. And then we'll create
our two hidden layers with nonlinear
activations. ReLU
DF cars layers
dense
so that our model can find
non-straight line patterns in our data.
And then the output layer needs a shape
of 10 because we have 10 different
classes and an activation of softmax
because we're dealing with multiclass
classification.
And we can compile the model. We go
model 12 dot compile. We'll set the loss
function. It's going to be TF carers
losses
dot sparse categorical cross entropy
because why? Because our labels are in
integer form. I say integer like
differently every single time I say it.
But if our labels were one hot encoded,
we just want to get rid of sparse. The
optimizer is going to be TF carers
activations our trusty friend Adam.
Beautiful. And then the metrics is going
to be accuracy, which is a great
baseline metric for all classification
problems. And then finally, we can fit
the model. We want norm history. So we
saved our model's training history to
non-norm history. I said I'd come back
to this variable, but this time we're
going to use norm.
model 12.fit
on train data norm.
And then train labels is just going to
be the same thing. We use 10 epochs
equals 10. Wonderful. And then the
validation data is going to be test data
norm and test labels. Woohoo. Look at us
building multiclass classification
neural networks like a bul. All right.
Do you reckon this will work? What do
you think is going to happen before we
run it? We've normalized our data.
That's the only thing we've changed.
Have a think about it and I'm going to
run the code in three, two, one.
Hopefully no errors. Oh, of course.
Activations.
Activations.
Did I spell that wrong? Oh, no. This has
to be optimizers. Oops. Honest mistake
again. What have we got wrong here?
Expected float 32 passed to parameter Y
got type string. Hm.
Test data norm test labels. Oh, this is
an error I haven't seen before. What are
we missing out on here?
You know what? I believe it's just a
simple brackets.
Wow. Yeah. So, those are the type of
errors you're going to run into. Just
simple little things like that. That was
kind of a lucky guess, but you can spend
hours sometimes troubleshooting them. So
don't worry if you get stuck on
something like that because even I
myself who have built a fair few of
these models still run into simple
syntax errors like that. But look what's
happening here. 10 epochs. We've run
this just as before. What has changed
between our two models? The data. We've
normalized it. That's it. The val
accuracy has shot up from 0. What was it
before? Come back up here. Has shot up
from 35%.
To 70 to 80%. So it was over doubled. So
almost 2.5 times as good. Now all we did
is we normalized our data. So that's
something to keep in mind. So let me
write this down as a key. So,
um, we'll get our key emoji out and
we'll write here, note
neural networks tend to prefer data in
numerical form
as well as scaled/normalized.
So, numbers
between zero and one.
Beautiful. So just by normalizing our
data, we got a fairly dramatic increase
in performance. How cool is that? So
that's something I want you to keep in
mind. So in the next video, what we'll
probably do is we've saved our
norm history, our model training history
from norm history as well as nomnorm
history up here. So that's a bit of a
tongue twister to say. So if you want
to, you can have a go at plotting those
loss curves between each other from the
history variables and see how they look
compared to each other.
[Music]
We've trained two neural networks using
the exact same architecture but except
one has or one was trained on
non-normalized data. And then we said
neural networks tend to prefer data in
numerical form. That's definitely needs
to be in numerical form, but they also
prefer it to be scaled/normalized. In
other words, the numbers need to be
between 0 and one. So, let's compare the
loss curves of each model. In other
words, the loss curves of normalized
data versus non-normalized data. So,
we'll grab pandas. We'll import that and
then we'll plot the non normalized data
loss curves. So PD dataf frame
non-norm history do history and then we
go dotplot
give that a title of non
normalized data
beautiful and then we can plot
normalized
data loss curves
pd dataf frame
and And we want normistory dot history
plot
title equals normalized data. Wonderful.
Let's see what this looks like.
Wow. So from these two plots, we can see
how much quicker the model with
normalized data improved versus the
model with non-normalized data. So have
a look at this. If we go here, our model
with non-normalized data, the loss
decreased and then it kind of flattened
out there. But with normalized data, our
model's loss dropped. It even started at
a lower value than what this model
finished with and then it kept
decreasing as it kept going. And the
accuracy, who knows, maybe if we train
this for longer, it would keep going.
The same with this one, actually. But it
would probably take a fairly long time
to find the same level of results that
our normalized data found. Another key
point to remember is when making
comparisons of different models. So
we'll put down here. Oh, where's our key
emoji?
When comparing the results. So we'll
write down note
the same model with even slightly
different data. So this is the same data
set. All we've done is we've just turned
it from non-normalized to normalized. So
it's the same data that we're working
with, but
the same model with even slightly
different data can produce
dramatically, let's get really dramatic
here, dramatically different results. So
when you're comparing models,
it's important to make sure you're
comparing
them on the same criteria.
Eg same architecture, this is the
comparison we're making here, but
different data or same data but
different architecture.
So that's something to keep in mind here
when you're comparing comparing the
results of different models. Keep your
comparisons to only comparing as smaller
variables as possible. What I mean by
that is don't change like 10 things and
then compare one model to the other.
Change one thing and then compare the
results from one to the other. So you
know what change is making the
difference in performance. So with that
in mind, when we're pre-processing data,
let's come back here. When we're getting
our data ready for our neural networks,
we need to turn it all into numbers. We
already started with that with fashion
mnest. We have to make sure all of our
tenses are in the right shape. We did
that using the flatten layer. So the
input shape to our neural network as
well as the output shape of our neural
network. And number three, scale your
features. So normalize or standardize
them. Remember neural networks tend to
prefer normalization.
With this being known, how about we try
tweak another one of our neural network
hyperparameters. Let's see how we might
find the ideal learning rate just as we
have before in a previous video and see
what happens to our neural network
training.
[Music]
We've seen how even slightly changing
the data we input to our neural network
can produce dramatically different
results. How about we try let's go in
here finding the ideal learning rate and
see if that changes anything. So just as
we done before in a previous video,
we'll set the random seed.
So the ideal learning rate is the
learning rate value where the loss
decreases the most. So to do that, we're
going to set up the exact same model
that we did before.
Create model. We're up to model 13 now.
TF carers sequential
and we had a flatten layer. TF carers
layers layers do flatten. Wonderful. And
the input shape equals 2828.
Of course, that will change depending on
the data you're working with. TF Caras
layers. We need a dense layer. I'm
spelling everything wrong today.
activation is going to be ReLU.
Again, not sure what's going on with the
code editor. I want to keep our code
nice and neat, right? So, if someone
else had to read this, they're not
going, "Holy gosh, what did you even
write here, Daniel?" And don't forget
when you're writing code that someone
else could be you just later
layers dense. We need a layer with 10
output units and an activation function
of softmax because we're dealing with
multiclass classification. We can
compile model model 13 compile.
Everything is going to stay the same as
above. Loss function we're using TF
carers losses sparse categorical cross
entropy. The optimizer is our trusty
friend atom.
Optimizers.
Wonderful. And the metrics we're going
to use
accuracy.
Boom.
And now we're going to fit the model.
Oh, what do we have to do first? I
forgot a step. I'm too used to going
create compile fit. We need to
create
the learning rate call back.
So, LR scheduleuler
equals TF carers. Do you remember how to
create a callback? We need a call back
and we need learning rateuler.
And we're going to go lambda epoch.
Let's start at what can we start at? 1
eg -3. Let's start at yeah negative3 for
now because our model's already
performing pretty well. So we'll start
there epoch / 20. So that just means
remember this just means start at this
value here and slowly increase the
learning rate every epoch by 10 to the^
of epoch / 20. And to fit the learning
rate we can go fit r history. We'll
create another history variable
model_13
fit. We're going to use train data.
We'll fit on the normalized data and
train labels. We're going to go epochs.
We'll go about 40 this time.
And then the validation data of course
is going to be
test data norm and test labels. And then
finally the callbacks is llrr scheduler.
Wonderful. So this should work. Fingers
crossed. Woohoo. There we go. Okay. So
this is going to take about 3 seconds
per epoch. So what's that? Times 40 120
seconds. So I'll come back. I'll speed
this up and I'll come back once uh our
model has finished finding the ideal
learning rate.
All righty. Looks like our model has
finished. Let's see what's happened
here. Okay, so it looks like it ends up
on a pretty not as great validation
accuracy as what it once was. We get a
pretty good range of values here, but
let's not look at just this training
output. Let's How did we do it before?
We plotted the learning rate decay
curve. So let's do that again. Plot the
learning rate decay curve.
So we want to import numpy as MP.
import mapplot lib. We've already got
these in our notebook as before.
However, if you were continuing on from
this, we'll just reimpport them just for
completeness. LRS equals what did we do
before? 1 E -3*
10 ^ of NP a range 40 cuz we use 40
epochs divided by 20. And actually, we
don't even need to use NumPy. Let's just
use TensorFlow. Stick in the spirit of
using TensorFlow. And we're going to
plot. We want to semi-log x because we
want our learning rate to be on a log
scale. ls and then find lr history, our
history variable from up before dot
history. And we only want the loss
component from the history.
And then we should make our plot pretty.
So we'll add x label. This is a learning
rate. And the y label is going to be the
loss. And then the title can be finding
finding the ideal learning rate. Let's
see what this looks like.
All righty. So we can see when it
started off at 10 ^ of -3, the loss
decreased fairly sharply and then it
kind of plateaued all the way up to 10^
-2. Then the loss just sharply increases
as it gets closer to 10^ the negative 1.
Hm. Now if we come back to our keynote,
where is our finding the ideal learning
rate slide? There we go. So the ideal
learning rate is where the loss is
decreasing sharply. Find the lowest
point on the curve and then go back a
little bit. All righty. So let's do
that. So the lowest point on the curve
is about here. Then if we went back a
little bit to where it's still sharply
decreasing, I would say you know what 10
-3 is
10 -3 is probably our ideal learning
rate which happens to be what optimizer
are we using? We're using atom. So as I
said this is proof look at that 0.001
that the default parameters for a lot of
the different optimizers and other
functions in TensorFlow are pretty darn
good. So looks like for our problem in
particular, the ideal learning rate is
just the default value for Adam. So with
that being said, for completeness, let's
refit a model with the ideal learning
rate. So let's refit a model with the
ideal learning rate.
And we'll go here. Set random seed. TF
random set seed 42. Oh, don't want the
ad symbol. Getting ahead of myself.
Pressing the chef key. Shift key.
All of the keys are chef keys because
what are we doing? We're cooking up
neural networks. That's what we're
doing. TF car is sequential. Wonderful.
Now we come over here. TF carers. We
have to flatten our data from
28x 28 arrays or tenses into a vector of
784. In other words, 28x 28. And we go
TF carers do layers dense. Just the
exact same as what we're doing before.
Activation equals ReLU.
We'll do another hidden layer.
Layers dense for activation equals
ReLU. Wonderful.
And then we can come down here. TFAR's
output layer. What does our output layer
need? How many classes do we have? We
need a hidden unit for every class. And
we also need an activation function that
is not sigmoid. Daniel, come on.
Softmax. Thank you for catching me on
that one. And we're going to compile
model.compile.
What's our loss function? TF carers
losses. We're dealing with
integer values. So we need sparse
categorical cross entropy. When would we
use just categorical cross entropy if
our labels were in one hot encoded form?
Now our optimizer,
we don't even we actually don't even
have to change anything here with our
optimizer, but we will just because we
spent all that time finding the ideal
learning rate, so we might as well put
it in there. LR equals 0.001.
So again, the default learning rate for
Adam is actually 0.001. 001. We can
check that by looking at the dock
string. Wonderful. And then we can set
up metrics equals accuracy.
And we can fit the model.
Let's save this to just history 14 so
we're keeping track of what we're doing.
Model 14. Ffit train data train labels
epochs. This time, how about we go 20
epochs? We found kind of what the Oh no,
we forgot the normalized data here cuz
remember our model performs much better
on normalized data. We'll go the
normalized data. We've got the ideal
learning rate which we actually didn't
have to change anything for atom. But
we'll fit for a bit longer this time cuz
we've worked out that our model is doing
pretty good. So maybe this time we'll
the thing that we'll tweak is how long
it looks at the data for. Validation
data
equals test data. We need to do this as
a tpple actually. test data norm and
test labels.
Wonderful. Let's fit that. Shouldn't
take too long.
Invalid syntax, of course. Oh, did you
catch that? I didn't catch that.
Wonderful. So, I'm going to let this
fit. But this should turn out to be a
fairly well-trained model with close to
the ideal learning rate and performing
pretty well. So we've got a couple more
options based on what we've done in the
previous lectures. We can evaluate and
improve our classification model. So
we've done a bit of improving. So it's
probably now time to once this is
finished fitting is to start evaluating
it with some techniques that we've used
before. So I'll let this run through and
then I'll see you in the next video.
We'll start to to run some more
evaluation methods on our multiclass
classification model.
[Music]
Welcome back. So now that we've got a
model trained with a close to ideal
learning rate and performing pretty
well, let's check back in with our
workflow and see what we should do next.
So we've got build or pick a pre-trained
model. We've kind of done that. We've
definitely got our data ready, turned it
into tenses. We fit the model to the
data. We haven't quite made any
predictions yet, but making predictions
is kind of synonymous with evaluating
the model because again, what happens
during evaluation? We compare what the
model should have predicted with the
what the model actually did predict.
We've actually already improved through
experimentation as well. So, this is
another key point to highlight here is
that although this is kind of like a
linear step through way to do things,
this is just like a rough overall
guideline to steps in modeling with
TensorFlow. You can always jump back and
forth through different steps and suit
them to whatever problem that you're
working on. That being said, let's make
a little heading here.
Evaluating our multiclass classification
model. And we'll turn this into
markdown.
And let's put in here a couple of things
that we could do or actually make this
sound a little bit better to evaluate
our multiclass
classification model.
We could
what are some things that we could do?
Hm. Evaluate its performance
using
other classification
metrics
such as a confusion matrix.
Assess some of its predictions through
visualizations.
That's always a fun one. We could
improve its results
by training it for longer or changing
the architecture. Now, this isn't really
evaluating, but that's just another step
that we could do now. And then, of
course, referring back to our tested
modeling in TensorFlow, we could save
our trained model so that we could use
it later. So, we'll put that there just
so we know. Save and export it for use
in an application.
Wonderful. How about in terms of
sticking with evaluation, we go through
this. Oh, and I've made a typo. Assess
should have a double S on the end. We go
through the top two.
So, we'll write down here. Let's go
through the top two.
So, the first things first is create a
confusion matrix. Now, we've actually
got some code up here that we did with
the binary classification problem
evaluating and improving our
classification.
That should be
model. But let's find where's the
confusion matrix code that we had
before. It should be somewhere up here.
Far out. We've written a lot of code.
Here we go. So, this is our remix of
scikitlearn's plot confusion matrix.
What we might do instead of typing that
all out again, I might just break my
rule here of writing code and I'm just
going to copy everything that's in this
cell and I'm going to come down to
evaluating our multiclass classification
model. This is why it's handy to make
headings here, you know, so really lays
out your work so you can just jump
between different sections. Create a
confusion matrix
and have an idea. Why don't we
functionize this? Cuz I kind of issued
that as a challenge in a previous video.
It'd be cool if we could actually go
through it together. So from sklearn
metrics import confusion matrix just in
case we don't have it. We want this to
run as a a standardized cell. Now to
functionize this, what should we call
it? Let's call it make confusion matrix.
And because the whole premise of an
evaluation function or a confusion
matrix in general, this function here is
comparing the test labels with
predictions. At a bare minimum, our make
confusion matrix should take in test
labels. But I believe scikitlearn calls
them is it y true? Yeah, let's make it y
true. So we stick with theirs. And is it
also yred?
Yeah, yred. Y.
And then the classes, we're going to set
that equal to none because if we want a
list of classes, we can set it there.
And then what else should we set? We've
got the fig size. We've hardcoded that.
Let's hardcode that into here. 10 10.
Wonderful. Then what else is there?
Well, we could do the label size.
[Music]
Why don't we just do the text size? text
size equals we'll set that to 15 by
default. Okay, that should be some good
parameters there. We'll have to tab all
this across
tab. So, it's part of the function. And
then now what do we have to update here?
So, confusion matrix, this is now going
to be y true.
And then our pres are going to be
y prred. Wonderful. And then fig size
equals fig size. That works there. So we
can get rid of this. Yes. Yes.
Wonderful. Classes equals We're going to
have to set that to
I don't think we need that anymore.
So
set labels to be classes.
So if classes exist now, the labels
should be the class names. Yes. Else
they don't exist. It should just be a
range of how big our confusion matrix
is. Wonderful. Anything else we need to
change? We could change these to text
size.
Text size.
That way, I know I set it as 15 up the
top, but we'll just make all the text a
very similar size. Why don't we do that?
That should work. Text size. And we can
adjust it. Size equals text size. Okay.
Now all the text should be the exact
same size. Beautiful. So now we've got a
function make confusion matrix which
takes in a bunch of true labels and a
bunch of predictions and classes. That's
what's important. So let's run this.
Make sure all the syntax is correct.
Remind ourselves of what our class names
are.
Okay. Beautiful. But what don't we have?
We have a function make confusion
matrix.
And we could have really put some dock
strings here to tell us what the
function does, but that's all right.
I'll leave that for you to explore. We
have some true labels. So this is our
test labels, but we don't have any
predictions with our model yet. So let's
make some predictions. Make some
predictions with our model. And we're
going to create Y probs for prediction
probabilities. You might be wondering
what does that mean? we kind of covered
in a previous lecture, but we're going
to reiterate here. So to make
predictions, we just go model 14.predict
and then we want to make predictions on
the test data. And I'll put a little
note here. Probs is short for prediction
probabilities.
Because we've got the activation
function of our output layer, the
outputs of our model's predictions
are going to be prediction
probabilities.
So, view the first five predictions and
we're going to have a look at Y probs.
[Music]
Oh, what do we get here? Why are these
like that? H.
So, has it already rounded it for us?
That is interesting. So, this is H. Did
we set our model to have Let's come back
up here. Train labels test labels sparse
categorical cross entropy. Yes.
H.
That means it's outputed what label it
should be.
I'm going to pause the video here and
inspect this and then come back to see
what's going on here. Now, that took me
a while, but I figured out what went
wrong here. So I said that probs is
short for prediction probabilities but
as we see here we get whole numbers
here. Now you might have spotted what
went wrong. Our model predicted on the
wrong data. Now this is a very important
point to highlight here is that we want
our model to predict on the same kind of
data that it was trained on. Now what's
the key difference here? What did we do
before with test data and test data
norm? That should be the hint as to what
we got wrong just before.
So here's our test data and our test
data normalized. What did we change?
Well, let's have a look at this first
example here. Test data zero and test
data norm. What did we do? We normalized
it. Remember?
So our test data samples still have
values between 0 and 255.
Whereas our test data normalized have
values between 0 and one. So that's why
we get interesting outputs from our
model when we ask it to predict on test
data non-normalized. But if we passed in
test data norm which is with the
variable we created before
we get the outputs that we should get.
So that's a very key point. I'm going to
write a little note here. Even this is a
reminder to myself. Let's get a key
emoji in here. Note,
remember to
make predictions on the same kind of
data your model was trained on.
Eg. If your model was trained on
normalized data,
you'll
want to make predictions on normalized
data.
Beautiful. So what are prediction
probabilities? So these are different
numbers. Let's get the first one
actually. Y propbs.
We'll get the zero.
So the highest number here indicates the
index. So we'll get our class names back
up.
Indicates the index that our model
thinks is most likely the value. So I
believe it is this last one maybe. So y
propbs. We can use the arg max.
Let's use tensorflow tf a arg max for
arg maximum which is going to give us
the index where the maximum probability
occurs. So all of these are values as to
how likely the sample zero is t-shirt or
top or trouser or pullover. So the first
one here is t-shirt or top. So that's a
fairly low number e to the1. This is
even lower. So it's definitely not
trouser or at least in the model's eyes
it's definitely not trouser. So it seems
that the highest value is this one here
which is should turn out to be about 08.
So that's the last one. So our sample
zero our model is predicting as ankle
boot. So let's have a look here. So,
numpy 9 and then if we take that index
tf a arg max let's get y probs zero and
then we index on our class names list
with that we should get ankle boot.
Wonderful. Okay. So now let's turn our
prediction probabilities from this from
a prediction probabilities array into
integers. So we can do that by going
and get rid of this. Now convert all of
the prediction probabilities
into integers.
So let's go y prreds equals
yprobs.argmax
and we'll go it on the first axis
and then we will view the first 10
prediction labels.
Y preds.
Boom. So now our predictions are in the
same format as our test labels. So what
can we do now?
Well, we can compare the two. So I will
leave that as a challenge for you. But
in the next video, we're going to use
our make confusion matrix function to
compare these two. And how about you
could try another metric? Maybe look
into creating an accuracy score as well.
See if you can reproduce the accuracy
score we got from model 14.evaluate
[Music]
by comparing these two.
[Music]
Wonderful. Now that we've got our
model's predictions in the same form as
the true labels, let's create a
confusion matrix to evaluate our model's
predictions. But first, we're going to
create a boring confusion matrix
using scikitlearn
just so we can demonstrate how valuable
our make confusion matrix function is.
confusion matrix. Y true is the test
labels and Y prred
is Y prreds. So let's see this. It's
kind of hard to tell what's going on
here, but with our confusion matrix, we
know that down the diagonal we should
have the highest numbers. So looks like
our model's performing pretty good
across all the classes. So the highest
numbers are down the diagonal. Now let's
remind ourselves of what an ideal
confusion matrix looks like. So the
correct predictions, the true positives
and true negatives are down the
diagonal. But right now, this is pretty
hard to interpret. So if we were to send
that to someone, they're probably going,
"What the hell is going on here?" And
this is where our pretty confusion
matrix comes in. So let's make a
prettier confusion matrix.
And we'll go make confusion matrix. Yes.
Yes. Yes. Y true equals just the same as
before. test labels y prred equals
yreds. Now we have classes
is going to equal class names. That's
our list of class names of the fashion
mnest data set. Then the fig size
it's hardcoded as 1010 but let's make it
1515 just to have a little practice. And
then text size equals 10. Let's see what
happens.
Ho ho. Look at that. I think I'm going
to have to zoom out because that's a
that's a big dog. Or maybe we we change
this to 1010. See what happens. Might be
the text is too big now. Yes, it is. So
1515 was a good size. So let's go back
to that.
All righty. This one looks much better
than before. So this is where you can
really see the power of a confusion
metric. See how visual this is? Like you
could you could send this to someone and
they'd be like, "Okay, they'd start to
intuitively figure out." So the true
label here, let's explore this for the
t-shirt/top.
So again, the darker the square, the
more predictions the model got right for
this class. However, if we come over
here, this square is pretty dark. I
mean, 16% of the predictions of t-shirt
top were shirt. Ah, okay. Well, does it
make sense that our model is getting
confused between the shirt and the
t-shirt/top? I mean, to me, how do you
decipher those, you know? And then
where's another one? What else is our
model getting confused with? So,
pullover. Okay, again, the darkest
squares down the diagonal. That's good.
But it's also getting confused a lot
with the coat class. So, it's predicting
coat when it should have been pullover
188 different times. Now, are they
similar?
Now, we're going to have to explore that
in a second, but you can kind of imagine
pullovers looking similar to coats. And
what else is getting confused on ankle
boot and sneaker? Okay. Well, at least
for ankle boots, they're a shoe at
least. So, a sneaker is a shoe. So, that
kind of makes sense. This is the kind of
thing that you'll you'll be doing with
your own problems. I see the shirt class
gets confused a lot with t-shirt top,
pullover. It's to explore where your
model is making errors. And what you can
do with this information is start to
improve your models. So maybe the
t-shirt/top class should actually be
incorporated with the shirt class. Or
maybe we need some more data of actual
just shirts so that our model really
learns to differentiate it between
t-shirts and tops. But anyway,
investigate this confusion matrix a
little bit more. It's nice and pretty.
In the next video, let's go back through
what did we say we're going to do to
evaluate our model. Let's get visual a
little bit more, assess some of its
predictions through visualizations. So,
the confusion matrix is one way to
visually explore it, but it's something
else to actually look at a picture and
look at the label that our model
predicted versus the true label. So,
let's write some code to do that in the
next video.
[Music]
We're going to start this video off with
a little key. So, let's go here. Key
note.
So, what's our motto or one of our mods
is visualize, visualize, visualize. So
often when working with images and other
forms of visual data, it's a good idea
to visualize
as much as possible to develop a further
understanding
of the data and the inputs
and outputs of your models.
So I've also discussed the power of
randomness when exploring your data. So
how about we create a fun little
function for let's write this down so we
don't forget. How about we create a fun
little function for
hm what should we do? We should plot a
random image.
We should make a prediction on said
image and we should label the plot with
the truth label and the predicted
label. Yeah, this is a great great way
to evaluate our model. So let's do that.
First we'll need random from Python
because we want to plot a random image.
And now we'll create a function defaf
plot random image.
I often create a lot of these little
helper functions. If I want to do
something over and over again, I make
sure to functionize it so that I can use
it multiple times. So we'll pass it our
model where right now I think we're up
to model 14. We'll pass it a list of
images we want to inspect, what the true
labels are, and the class names.
Beautiful.
So we'll make a dock string here so it's
a little bit complete. Picks uh what
does it do? Picks a random image, plots
it and labels it with a predicted
with a prediction. That'll do. And true
label.
Wonderful.
So we need to choose a random number. So
setup random integer.
We'll go i equals random dot rand int
between zero and len images. Oh, we need
a comma there. So, does that make sense?
So, i is just going to be a random
number between zero and the length of
images. And images is going to be the
images we want it to look at. Wonderful.
And now, let's create predictions and
targets.
So the target image is going to be
images I. We'll just index on our random
number. And then the preds
is going to be model.predict.
So it's just going to take our model
here.predict
on target image dot reshape
1 28 28. So we'll make sure it's in the
right shape for our model.
because we're only predicting on one
image here. So our model right now is
trained on images in 28 28 size, but
we're telling our model, hey, we're only
passing you one image at a time this
time. Then we're going to do PR label
equals
classes
preds
and we want to get the arg max.
Wonderful. And then the true label is
just classes.
True labels
I
beautiful.
What else do we need? Oh yeah, we need
to plot the image.
So plot imshow. We want target image to
be the plot. And we'll set it to a
binary color map. So it's just black and
white.
Beautiful. And now let's do a little
something a little bit fancy here. All
right, we're going to change the color
of the titles. So change the color of
the titles depending
if I could type correctly depending on
if the prediction
is right or wrong. So how might we do
that?
Let's just create boolean. So if pred
label equals true label that makes
sense. So if the pred label is the same
as the true label. So the prediction our
model has made is the same as the true
label. Yes. We want to set the color
green cuz green means good. And else
the color equals red.
Wonderful. And then let's add
X label information.
And we want prediction slash true label.
Let's go plt.x label. And how can we do
this? Maybe we just do it with a dot
format PR.
What's the pred?
And then we want to change this to let's
put the confidence in there. Why not?
2F.
I always get confused when I'm typing
out this little section of code. There
we go. And what else do we want? We want
the true. So the true is just going to
be like that. And then chuck a dot
format on here. We could have done this
as an F string. Or could we? Cuz we have
to have a calculation. Yeah, we probably
could have. Oh well, that's all right.
This will do for now. We want 100 times
for the confidence. We want TF reduce
max. So in other words, find the maximum
value in PR probs and then we'll just
finish off with true
label and then I believe color goes down
here. Color equals color. We've
definitely got something wrong in this
function. But so set the color to
green or red based on if prediction is
right or wrong. So does this make sense?
Plot random image. pass it a model,
picks up a random integer, picks a
target image from images, then uses the
model to make a prediction on the target
image. We have to reshape it to be
because it's only one image. The PR
label is going to be classes.pred probs
argmax. So the index of PR probs which
has the highest prediction probability.
The true label is going to be the true
labels indexed on I.
Yep, that's correct. plot the image,
target image, cm binary, yada yada yada.
We could keep going through that, but
let's just run the code. See if an error
comes up. Oh. Oh, no. We're just
defining the function there. We haven't
actually run it yet. Let's do this.
Check out a random image as well as its
prediction.
Oh yeah. Plot random image. Model equals
our current model, model 14. images
equals test data cuz that's the test
images we want to work with. Test or
true labels, sorry, equals test labels
and then classes is class names.
How does it look?
Nice. Okay, there we go. So, the
prediction is coat 100%. So, that's a
very high prediction probability. But
the true is shirt. Does that look like a
shirt to you? kind of. Let's look at
some more.
Ankle boot 100% true sandal. Ah, is that
an ankle boot? I mean, that's a fairly
pixelated image. That's kind of hard.
Come on. Surely our model got something
right. Prediction coat true is actually
shirt. Well, that's a bit See, that's
that one's on the edge to me. As long as
our function's correct. There we go.
There's one that's right. T-shirt top
100% true. Wonderful. Oh, what did we
also forget here? This should be norm.
Come on, Daniel. Remember, always make
predictions on the same type of data
that your model has trained on. I'm
going to write a note here. Always make
predictions
on the same kind of data your model was
trained on. And by the same kind, I mean
pre-processed in the same way.
All right. Ankle boot. Yes, that's what
we want. Green. Show me some more green.
Ah, damn it. T-shirt top. It's a shirt.
See, this is what I mean. I'm not sure
who created this data set, but to me,
t-shirt and top is kind of the same
thing as shirt. Maybe it's not. Maybe
it's not.
Trouser. Yes, that's what we want.
Shirt predicted. Oh, the true is a coat
kind of. So, as you see here, you could
keep going through this for a lot. We
could have really just set this up to
plot more than one at a time so we don't
just have to keep sitting enter. Maybe
that's a little bit of an extension for
you is to see how you could functionize
this to plot say four images just like
we did before when we were exploring our
data.
So going through this I would say do
another 20 or so go through them and
then figure out does visualizing these
predictions here help you to better
understand this confusion matrix here.
And what I mean by that is does it start
to make sense where the model gets
confused? In other words, like does the
overall shape of a trouser relate to
pants or does the overall shape of an
ankle boot is the model getting mixed up
with a sneaker because an ankle boot is
a similar shape to a sneaker? Does that
make sense?
So, we've done a bit of evaluating for
our model. How about the next video?
We've talked a lot throughout this
entire series that our model is learning
patterns. So how about in the next video
we discuss what patterns exactly is our
model learning.
So have a little bit of a play around
with our visualization function here.
See if you can start to get an idea of
where the model gets most confused and
compare that to the confusion matrix.
Then I'll see you back in the next video
and we'll check out the patterns our
model is learning to make these kind of
predictions.
[Music]
We've covered a fair bit. Well, actually
fair bit is probably an understatement.
We've covered a lot in neural network
classification with TensorFlow. But this
whole time we've been talking about how
neural networks learn patterns in our
data so that we can use those patterns
later on. But what exactly do those
patterns look like? Well, let's use this
video to find out. So, first we're going
to crack open one of our models. So,
find the layers of our most recent
model. So, if you're not sure by now,
take this as me stating it
straightforward is that a deep learning
model is constructed of layers and each
one of those layers has a specific role
in finding patterns in the numbers that
we feed it. So, let's have a look at
model 14 layers. So we can see we start
off with a flatten layer and then two
hidden dense layers and then an output
dense layer because we typically go from
this direction. So top to bottom. So
this is input and this is output. And
now we can inspect what's going on in a
target layer using indexing. So how
about we extract a particular layer. I'm
going to go with the first hidden dense
layer by indexing on one.
Wonderful. So there we've got an
individual dense layer there. And now we
can find the patterns learned by a
particular layer using the get weights
method. So let's try that out. So
get the patterns of a layer
in our network.
So we're going to set this up as weights
and then biases. We'll have a look at
that in a moment. Model 14
layers one.get get weights.
You can just search this method up if
you want, but I prefer to figure out
things ourselves. Then we'll go shapes.
So we'll view weights. And then we'll
also view weights.shape.
Ah,
okay. So this is what the internal
patterns of a single layer or
specifically the first hidden layer of
our neural network look like. So to you
they might just look like a whole bunch
of random numbers and I mean to me
that's what they look like but to our
neural network it considers these as the
patterns which contribute to the
decisions that it makes. So we're
working with fashion emnest data. So the
shape of this weights matrix corresponds
to the shape of our input data. So
remember our input data was a 28x 28
image. That's where this number comes
from. So if we go 28 by 28
and then we remind ourselves of what our
model looks like, what our model input
shapes look like. So if we see here,
this is a 784, the flatten layer to
begin with. And then this four comes
from the number of hidden units in our
first dense layer.
So that means that for each
data point in our input tensor,
our weights matrix has four because see
here four has four numbers that it
starts to learn and adjust to find
patterns in these 784 numbers. So just
looking at this off face value can be
very confusing. And don't worry if
you're not sure of it to begin with.
This is the first time we've ever
cracked open one of our neural networks.
And this is kind of where the term deep
learning gets the idea of black box
from. As in when you crack open a deep
learning model, you get all these random
numbers. If you or I would try to
interpret them, we can't really
interpret what's going on here. But
somehow they correlate to our model
finding patterns in the input data.
Now
what each one of these values does so
each value in the weights matrix it
corresponds to how a particular value in
the data set should influence the
network's decisions. Now you might be
wondering how does our neural network
even create such values? How does it
learn these values? Well and let's go
back to the keynote to see a highle
overview of how this might happen. So
we're working with the input data here
which is grayscale images of the fashion
MNEST data set. So we have code ankle
boot shirt sneaker. Wonderful. And then
we might encode them to well we should
have actually normalized this data
shouldn't we? We might transform them
into a tensor to pass it to our neural
network. And then what's going to happen
in our neural network we've seen this
kind of overall schematic before is that
it's going to learn the representation.
In other word is patterns, features,
weights. These can all mean similar
things when you're hearing our neural
network talk. For now, we're referring
to them as weights, but I have been
referring to them as patterns. And so,
what's going to happen is at the
beginning, our neural network is going
to initialize itself automatically with
random weights at the very start. So if
we come back, so all of these numbers,
the weights, the internal weights of one
of our neural network layers are going
to start off as random numbers, just
completely random. And it does this, we
can look at if we go to the TensorFlow
dense layer.
It does this using
which parameter? There we go. Kernel
initialization.
Now, a little bit of extension on this
video is that you can read into what
this is. glow root uniform but it might
actually tell us here kernel initializer
initializer for the kernel weights
matrix. So glow root uniform is a form
of randomness. So just take that from
here. We're not going to dive into what
glow root is but just understand that
that's a form of randomness. Now if we
come back so it's going to initialize
itself with random weights in each
particular layer. And then what's it
going to do? Well, we're going to show
it different examples of the data we'd
like it to learn. So, we might show it
images of coats, images of ankle boots,
shirts, and sneakers. And it's going to
keep looking at these and slowly update
its representation outputs or weights
and biases into a different kind of
tensor, which is come back to here.
If we come back to our notebook, this is
what it's going to learn over time.
these representations
as we continually repeat with more
examples. So, does that make sense? Our
neural network, we feed it input data.
It starts off with random numbers,
random weights, random patterns, however
you want to refer to it. And then as it
looks at more and more examples, it's
going to go, "Hey, I'm going to try
these random numbers and see if they
correlate to any of the patterns in the
data." And if they don't, well, the
neural network's going to try and
correct itself thanks to our optimizer,
Adam. And then we're going to repeat it
with more examples as we have more data.
Then it's going to slowly adjust these
patterns to better suit the data as best
it can to eventually hopefully it starts
to output all correct predictions.
That's the ideal case. So let's come
back. Now we've only talked about the
weights matrix so far. But alongside a
weights matrix is also a bias vector. So
let's come down here. Let's have a look.
Now let's check out the bias vector.
Let's go bias
and biases shapes.
We go biases
and
biases dot shapes.
So again this is from a single layer
more specifically the first hidden layer
in our current neural network. So we
come down what does this correlate to?
Number four. So this is a bias vector.
So there's only four here.
So it's this four here. So that means
for every hidden unit in our neural
network in the first layer, it has one
bias vector. Whereas for a weights
matrix,
this is the difference, the key
difference between a bias vector and a
weights matrix. A weights matrix has one
value per data point. Whereas a bias
vector only has one value per hidden
unit.
And so what the bias vector does is
we'll write this. Every
neuron has a bias vector.
Each of these
is paired with a weights matrix.
And so the bias vector also gets
initialized. But this time let's have a
look up in the dense layer. What is our
bias initializer? Zeros. So you can kind
of intuitively guess what zeros mean. If
we come down, what does it say? How does
it get initialized? Initializer for the
bias vector gets initialized as zeros
at least in the case of a TensorFlow
dense layer. Now where it kind of gets
tricky right is that sometimes depending
on what layer you're using in deep
learning so within the TensorFlow carers
layers module your weights matrix or the
kernel and initializer and the bias may
be initialized differently however as
you can see we never actually set these
variables they got initialized by
themselves so this is what I'm saying a
lot of what TensorFlow does for you the
majority of calculations are done behind
the scenes So, of course, you can dive
deep into this as much as you want. And
I actually am big advocate for that, but
to begin with, I just want you to focus
on writing as many neural networks as
possible and just getting them working.
And then once you want to know more,
start diving into the nuts and bolts of
what's going on behind the scenes. So,
we've said what a bias vector is. Now,
what does it do? So the bias vector
dictates how much the patterns within
the corresponding
weights matrix should influence the next
layer.
Okay.
So for every hidden unit there's a
weight matrix and a bias vector. So if
we change this to 10, how many weights
matrices would there be? And the same
thing for this one. How many bias
vectors if we change both of these to
10? How many weights matrices would we
have? And how many bias vectors would we
have?
Just have a think about that for a
second. So this is a key point here as
well.
How much the patterns within the
corresponding weight matrix should
influence the next layer. Another big
thing about deep learning. Let's
refamiliarize ourselves with our model
architecture. So model 14 summary
what it looks like. Okay. So we've built
a few of these. We've built a few deep
learning models.
So now's right about the time to point
out the whole concept of inputs and
outputs not only relates to the input
layer of a deep learning model and the
output layer. It relates to every single
layer within a model.
So let's go to the keynote and go to the
next slide. So this is inputs and
outputs layer by layer. So if we imagine
this is our model here very similar to
what we've just built model 14. And if
we we can create this in a second but if
we imagine we have this input layer that
takes in our input data in our case
which is images 28x 28 into a tensor. We
should have normalized this if we were
really preparing our image data
correctly. So this is going to take the
inputs
and they output it to the flatten layer.
The flatten layer outputs to the dense
layer or the first dense layer. The
first dense layer outputs to the second
dense layer
and the second dense layer outputs to
the output layer.
So for each layer in a deep learning
model, the previous layer is its input.
So this is the crux of deep learning.
This is what makes deep learning deep is
that you have multiple layers and of
course as you added more layers the
deeper the neural network would go. But
each subsequent layer does its part to
find patterns in the original data and
then feeds it on to the next layer. So
as you keep going through the patterns
get more and more refined towards the
ideal or hopefully they get close to the
ideal outputs that you're after. So
let's have a look at how to replicate
this. And I believe that will probably
be more than enough to wrap up this
video. So let's check out another way of
viewing our deep learning models.
So we can do this with from
tensorflow.caras carers
utils import plot model
and see the inputs and outputs of each
layer.
Plot model
14 and show shapes equals true.
Beautiful. So this is what we have. As I
said, this is inputs and outputs layer
by layer.
So it starts off the input. The none
here is for batch size. We'll tackle
that in a future video. Students can
look that up if they want. Let's just
say we put in 28 x 28 images. So that's
the input. Then this layer outputs the
inputs to the next layer. And then this
layer outputs the inputs to this
following layer. And then so on and so
on and so on until we get our idealized
output, which is this 10 shape here
because that's how many classes we have.
Now we've covered all of this.
very quickly of what's actually going
on. I just wanted you to get familiar
with cracking open one of our models.
Now, if you want to dig deeper on what's
going on behind the scenes here and how
these calculations actually happen layer
to layer, I've put some resources in the
extension section, but otherwise, I
think that just about wraps up
everything in the introduction to neural
network classification with TensorFlow.
So,
go back through
Pat yourself on the back because we've
covered an incredible amount. Take a
break to let things sink in. And when
you're ready, make sure you check out
the exercises. So, I'll just put this
here. Next, check out exercises and
extra curriculum.
Check out the exercises and extra
curriculum to practice and cement what
you've learned. I've put in some stuff
there to not only practice everything
that we've gone through that's in the
exercises, but as I said before, the
extracurriculum will help you really
dive in to what's going on behind the
scenes here.
So, with that being said,
congratulations on finishing section
two, neural network classification with
TensorFlow. I will see you in the next
module.
[Music]
Welcome to convolutional neural networks
and compute vision with
TensorFlow. That animation's never going
to get old. Now, before we get into what
convolutional neural networks are and
what computer vision is, I want you to
have a think about something. This is a
very very exciting module because if you
imagine computer vision, we're going to
be teaching computers to see. So, start
to think about what you do in your
day-to-day life. What are some
applications where it might be helpful
if a computer could understand patterns
in visual data? Now, I want you to let
your imagine run wild here. Visual data
could be something from a webcam. It
could be something from the camera on
the front of a car. It could be a
security camera. just have to think of
different applications where it might be
helpful if our computers could
understand what's going on in a visual
scene. But before we get into that,
let's answer the most important
question, and that is where can you get
help? If you run into problems, here's
the steps. Follow along with the code
we're going to be coding together.
Remember our motto, if in doubt, run the
code. And don't worry if I go too fast,
you can always pause the video, write
the code yourself, and then get back
into it. As always, try it for yourself.
There's that motto again. If in doubt,
run the code. If you want to see the
dock string for some information about
the code we're running, press shift
command space if you're using Google
Collab. And commands you might have to
sub out for control if you're on
Windows. I put command there cuz I'm on
a Mac. If you're still stuck, remember
you can search for it. You'll probably
come across resources such as Stack
Overflow or the TensorFlow
documentation. very important when you
first start reading these two resources
because there's so much information
there. It can be quite intimidating, but
after a little bit of practice, you'll
start to to be able to navigate through
it and understand what's going on. Then,
of course, try again. If in doubt, run
the code. And finally, ask a question.
Don't forget the Discord chat. And yes,
that means that definitely means the
dumb questions. I just want to add one
more thing to this while we're here.
Just going to go to Google Chrome and go
to deb.link
uh tfc course GitHub. This is going to
be also your ground truth for what's
going on. There's a little short link.
This will be in the resources section,
by the way. This has got all of the
ground truth notebooks. At the moment,
it's still a work in progress, but by
the time the course launches, it'll look
a lot more better. Here's all the course
materials that you're going to need,
such as data sets. This is a new one
that we haven't seen just yet. But we're
going to dig into this throughout this
module. So, if you want to have a look
at the contents, there's the notebook
that we're going to go through and
replicate, but I won't jump too deep
into all of this. Rather, let's get
hands- on first and we can see this as
we go.
So, let's have a look or let's ask the
question. What is a computer vision
problem? So, as we said before, did you
imagine anything of what a computer
could be used for?
figuring out patterns in visual
information. Well, in essence, that's a
computer vision problem. Anywhere you
want to have a computer figure out
something or understand what's going on
in a visual scene, it can be talked
about as a computer vision problem. But
let's have a look at a few examples and
get specific. So, of course, we've had a
look at classification problems before.
There's binary classification, whether
one thing is something or another. In
this case, we might want to build a
computer vision app called food vision u
to be able to decipher whether a photo
is of a picture of steak or a picture of
pizza. Spoiler alert, we might be
actually building this throughout this
module.
Then of course there's multiclass
classification. So is this photo a photo
of sushi, steak or pizza? And in that
case it's more than one thing or
another. We've seen some multiclass
classification problems in the previous
module. And finally, another very common
one is object detection. So, where's the
thing that we're looking for? If we have
an image here, this is the rascal that
uh crashed into my car and sped off. So,
this is some security camera footage.
What we might do for this is use our
regression skills to figure out where
this box is. And then we use our
classification skills to figure out
what's inside the box and find that car.
And on the topic of object detection, I
just want to show you another use case
of where this comes into play. If we go
here, Tesla autopilot.
This is probably one of my favorite
ones. So, in the topic of self-driving
cars, we zoom in a little bit here. Now,
if we check out this video, maybe skip
ahead a little. If you see what's going
on here, all these boxes, well, that's a
form of object detection. And the Tesla
car, of course, has multiple cameras on
it. And this is what the cameras are
seeing. And so all of these boxes here,
some form of object that the car's
camera has used computer vision to
detect.
So when I say that computer vision can
be used for almost anything you can
imagine from a visual aspect, that rings
true. That's just one example of what it
can be used for. And again, here are
some other problems. So, let's have a
look at what we're going to cover
throughout this module. Be a bit
specific. Well, actually, this is pretty
broad. We'll get specific when we're
actually going through it. We're going
to get a data set to work with. We're
going to use the pizza steak data set to
begin with, which is photos of pizza and
steak. We're going to build a little app
called Food Vision. We'll have a look at
the architecture of a convolutional
neural network or a CNN for short. So if
I say CNN uh I'm referring to
convolutional neural network which is
one of the most popular computer vision
architectures for computer vision. Wow I
said computer vision architectures sorry
deep learning uh neural network
architectures for computer vision. We'll
see how we can build one of those with
TensorFlow. Then we'll have a look at an
endto-end binary image classification
problem looking at how we can build a
CNN to classify pictures of pizza and
steak. We'll go through our steps in
modeling with CNN's, creating a CNN,
compiling a model, fitting the model on
some data, and then evaluating it. We'll
also look at an end toend multiclass
image classification problem. And
finally, we'll see how we can make
predictions using the CNN models we've
built on our own custom images. Now,
that's really fun.
And how?
Well, in the spirit of a cook, of
course, we're going to be cooking up
lots of code.
So, let's go have a look at our computer
vision inputs and outputs. We've seen
this slide before in a previous module.
Well, let's talk about it in the context
of what we've just gone through. So,
we've got our problem. We've got some
images of food, and we want to build an
application called food vision to
identify what's in those photos. And so
with our machine learning problem, we
might have some inputs, a machine
learning algorithm, and then some
outputs. And in our case, the inputs are
numerical encoded versions of our
images. And then the outputs are
prediction probabilities of which class
is most likely depending on the photo.
And so in this case, it's got two out of
three correct here. It got the sushi
photo correct. It didn't get the steak
photo correct, but it also got the pizza
photo correct. And our model might learn
these from looking at lots of different
examples of actual outputs. So actual
examples of images and their proper
labels. Now one thing we haven't alluded
to is that this machine learning
algorithm is often in the case of
computer vision because this is a
computer vision problem is often a
convolutional neural network or a CNN.
So that's what we're going to go
through. We're going to look or see how
we can build this part here. Now, of
course, algorithms here might already
exist, but we're going to have a look at
throughout this entire module to see how
we can build one. And then later on in
the course, we'll see how we can use an
existing convolutional neural network
for our own problem.
And again, let's have a look at the
input and output shapes of a computer
vision neural network for an image
classification problem. We might have a
picture of a steak here. Of course,
there's some eggs in there, but this is
a 224x 224 size image, height and width.
We might numerically encode that. That's
our inputs into our ML algorithm. Then
our outputs are prediction probabilities
of some sort. And in this case, it looks
like our computer vision algorithm has
got it correct being a photo of stake.
And so our input shapes might be here.
None for batch size, 224 224 for height
and width, and then three for color
channels. And of course, these will vary
depending on the problem you're working
on. But our output shape here is three
cuz we're working with three classes.
And if we are working with three
classes, is this binary or multiclass
classification?
If you guess multi-class classification,
that is 100% correct because there are
more than two classes. And again,
this machine learning algorithm is often
a CNN. And we're going to be building
CNN's to do this part.
So, we've spoken a lot about CNN's, but
we've only really alluded to the name of
them. In the next video, let's check out
what exactly is a CNN.
[Music]
Welcome back. In the last video, we
spoke a lot about convolutional neural
networks. So, in this video, how about
we answer the question, what is a
convolutional neural network? Or in
brackets, CNN. So, if you hear me say
CNN again, does this is just a reminder?
I'm probably referring to convolutional
neural network rather than the news
site.
So let's go through this slide and again
when we first look at it it's going to
be very daunting. Don't worry we're
going to step through this. Let's have a
look at the typical architecture of a
CNN. This is what we're working towards
building. So if we start off we got here
the CNN model equals TF.ARS model
sequential. We've seen that before. Now
what we haven't seen before is a bunch
of these layers and probably their
internal parameters. So let's start to
step through them.
We might have an input layer which in
our case is TF car's layers com 2D which
stands for convolutional 2D uh which for
two-dimensional data such as images that
have height and width that's that's
where the 2D comes from there. Now this
has a bunch of parameters filters kernel
size activation input shape. We're not
going to step through them too much for
now because we're going to we're going
to go through these as we get hands-on
with coding. But this is an actual
convolutional neural network. So let's
have a look what does it do. The input
layer takes in target images and
pre-processes them for further layers
such as the typical values. The input
shape might be the batch size, image
height, image width, and color channels.
So that's this variable here, the input
shape. And again, the input images are
target images you'd like to discover
patterns in. And of course the typical
values here can be whatever you can take
a photo or video because a video
remember is just a series of photos
played over and over again of
we might have a convolution layer.
Actually the same here TF carros com 2D.
We've got another one here TF car 2D and
it's got the same parameters as above.
It just it's missing this input shape
parameter but we've got the same values
10 3 in order. Again, don't worry too
much about what's happening internally
here. We're going to write lots of these
and get used to them. I just want you to
get an overview of what an architecture
of a CNN looks like. Now, what does the
CNN or the convolutional layer do? This
layer extracts/learns
the most important features from the
target images. And oh, look, I've got a
typo there. I'll need to fix that. What
are the typical values? Multiple. You
can create it with tfkaros.layers layers
com XD ah where x can be multiple
values. So we've got com 2D here cuz
we're working with images but as you'll
see in later on modules we can also use
com 1D for text data and then there's
also com 3D for three-dimensional data.
Then again we'll go again we've got the
hidden activation. We've seen that
before. So this is a nonlinear
activation function. In our case, it's
typically relu for convolutional layers.
What does it do? It adds nonlinearity to
learn features. We had a look in the
previous module at what nonlinearity is.
Non-straight lines because remember a
lot of the data in the world is is not
just straight lines. We need some curved
lines to to help us figure out patterns
or better yet help our neural networks
figure out patterns.
Then we typically have a pooling layer
of some sort. In our case, this is TF
Car's layers max pull 2D. You might have
guessed that 2D is the same. We've got
2D here is the same reason. We've got
com 2D because we're dealing with images
which are height by width
two-dimensional.
And then we've got pool size 2 padding
valid. Not sure what those are just yet.
Again, we're going to go through those.
And what does this layer do? Reduces the
dimensionality of learned image
features. H. So, as we'll see later on,
max pooling the layer or the pooling
layer. If our convolutional layers learn
features from images, such as if we're
thinking about a car, a feature of a car
might be um the fact that the hood is a
straight line and the wheels are
circular. Those might be two features.
Again, we don't define these. They are
learned automatically by the
convolutional layers. So, it might learn
the most important features from a
target image. Then the pooling layer
further reduces. So the pooling layer
learns even the most important features
out of the already learned features. So
this layer dramatically reduces the
number of calculations a convol uh a CNN
has to see this is what I'm saying. I'm
probably going to say CNN more than I am
going to say convolutional neural
network because convolutional neural
network is a bit of a mouthful. So the
pooling layer reduces the number of
calculations a CNN has to make. So
that's what makes it really
computationally efficient. Then we have
a fully connected layer which is often
the output layer. Now we've seen this
one before TF car layers dense and the
output shape here is one. You might
deduce from that that we're working with
a binary classification problem. We've
also got the sigmoid activation function
here
which is our output activation. Again
this adds nonlinearities to the output
layer.
We've been through a fair bit here, but
you can go back through this on your own
time. As I said, we're going to go
through this and have a lot of hands-on
experience writing these. I just wanted
to show you right at the start what
we're working towards building. And now,
to make this example concrete, this
network here, this convolutional neural
network, which is actually the same as
tiny VGG, which is from uh the CNN
explainer website. We'll have a look at
that in a future video. But if you want
to jump ahead, you can also go to this
website here. This convolutional neural
network will actually work to build a
computer vision algorithm to deduce
whether these photos are of pizza or of
steak.
So, we'll see how to build that later
on. But finally,
oh, got a little note here, a reminder.
There are almost unlimited amount of
ways you could stack together a
convolutional neural network. This slide
only demonstrates one. Yeah, that's an
important point. All of these layers
here, this is an architecture called
tiny VGG.
Again, those details, we'll jump into
that later on. But a convolutional
neural network is typically a stack of
convolutional layers and pooling layers
and a bunch of nonlinear activations.
So let's have a look at it in terms of
the colored block addition. So this
might be a simple CNN. Might have some
inputs which are images. We might
process those images. In other words,
turn them into tenses, split them into
different color channels. TensorFlow
will help us do this. And then we pass
it to our This is our architecture here,
which is typically some sort of
combination of convolutional layers and
usually relu activation and pooling
layers. This could be just one of these
or it could be 10 of these. And then
finally, we top it off with a fully
connected output layer. And in this
case, it's got the prediction correct.
We want it to predict pizza and it's
done it. Then we might go to a deeper
CNN, which is again, that's where the
deep and deep learning comes from is
when you stack these layers on top of
each other. Our input images,
pre-process them as normal. Have a
convolutional layer, a pooling layer, a
com layer, pooling layer, com layer,
pooling layer. That could be a song. Com
layer, pooling layer. I'll stop. And
then again, a fully connected output
layer. So, we've seen enough. We've seen
enough pictures. We've seen enough
slides.
Let's code. I'll see you in the next
video. We're going to get hands- on and
start to write some computer vision code
with TensorFlow.
[Music]
It's time to write some code. So, let's
go to open up a web browser. We're going
to go to our ever faithful Collab. So,
collab.resarch.google.com.
And I'll just zoom in a little bit. 150.
I'm going to start up a new notebook.
Beautiful. I'm going to call this um 03
introduction
to computer vision.
with TensorFlow
video. That's what I'm going to call
mine. Remember, I got a video tag
because these are the notebooks I'm
writing during the videos. So, what can
we give this heading? Um, we'll just
write in here introduction to
convolutional
neural networks and computer vision with
TensorFlow.
Beautiful.
and we'll give us a oneliner definition
of computer vision. So computer vision,
how can we do this? Computer vision is
the practice of writing algorithms
which can discover
patterns in visual data such as the
camera of a selfdriving
car recognizing the car in front. Nice
and simple. So the first thing we're
going to have to do is if we want to
build convolutional neural network is to
get some data. We need some image data.
We need some visual data. Let's make a
heading here. Get the data. So I said
before that we're going to actually
build a little application called food
vision or an algorithm called food
vision to decipher photos of food. We've
actually got a data set ready from the
food101
data set. So, let's look at this. There
we go. kaggle.com. I'll leave all the
links, by the way, in the place where
you can find links, wherever that is.
So, don't worry too much if you're if
you're trying to write all these links
down. They'll be where you can find
them. So, long story short, this is a
food 101 data set. It contains images of
food organized by type of food. and it
was used in the paper food 101.
So there's a paper to go along with this
data set which is often the case with
many uh large public data sets. So it's
about 5 GB. So what I've done is I've
downloaded this data set from Kaggle.
I've done a little bit of pre-processing
on it. If you want to see how that
pre-processing took place, there is a
notebook in here called image data
modification.ipb.
I just got it into the format that we
need it in for computer vision problems,
which we will see in a moment. But
again, that's just a heads up. I've
modified this data slightly from how you
might download it. And the code to do
that is in here. Again, all of the links
you need will be where you can find
links. But
to get our pizza and steak data set, we
can copy this link here. Copy link
address.
So this is a data set that's already
stored. I've already created. We're
going to import zip file cuz it we
downloaded as a zip file. Wget and then
we're going to put in this little link
here. This links to a Google storage
bucket which is just basically like a a
folder that's on Google Cloud for the
ZTM TensorFlow course under the project
food vision which is what we're working
with at the moment and pizza steak. So,
if the food 101 data set is has 101
different types of food, you can
probably guess how many different types
of food the pizza steak data set has.
But that's enough talking about it.
Let's see what we might be able to do.
Download it. And by the way, this little
code here, wget, means get this file and
download it into Google Collab here. So,
let's see this in action. We'll unzip
the downloaded file and we'll call this
zip ref equals
zip file dotzip file. And then we have
to pass it the directory name here or
the
document path. There we go. And then we
can go zip ref.extract
all
and then zip ref close.
Now we'll shift and enter. That
shouldn't take too long.
Wonderful. Took about 3 seconds to
download from Google Cloud. And we've
got it here. Pizza steak zip saved. And
what do we have once we've unzipped it?
There we go. Pizza steak. There's the
zip file. And there we go. Pizza steak
test
train. So this is the generic format for
data for a computer vision problem is
that you have your overall folder here
and then within it you have a training
file with your training images and then
a test file with your testing images and
the images for your training data. All
of these images in the stake folder you
might have guessed are images of steak.
We'll have a look at these later on. And
all of these images are images of pizza.
And so the reason why we format the data
like this, this is what image
modification notebook does. You can find
this on uh the course GitHub as well as
where you find links. That's what that
notebook does. It pre-processes our data
to be in this format. So with the file,
the directory name is the class name of
all the images in there. Same with this
other class. And the training folder is
going, hey, once we write a class going
going forward, if you look in the
training folder and find images of pizza
and steak, they're all training set. And
if you go into the test folder and find
images of pizza and steak, they all
belong to the test set. So I'll just
copy this link here.
Get the data.
The images we're working with are from
the food 101 data set, which is 101
different classes of food, and I'll link
that there.
Um, however, we've modified it to only
use two classes,
pizza and steak. Now, you might be
thinking, why don't we just use all 101?
And that is a phenomenal question. It is
because
remember we're serial experimenters.
What's a machine learning practitioner's
motto? Oh, I don't want that emoji. I
want steak. Well, there we go. We've
modified it to use only two classes
using the image data
modification
notebook. And I'll just link this here.
And while we're linking this, you might
be thinking, why do we start small?
I'll link this. We'll come back here.
Let's put a little key here. We'll
answer that question.
Okay,
this is why we're starting with a
smaller data set. We start with a
smaller data set
so we can experiment
quickly and figure out what works or
better yet
what doesn't work before scaling up.
Beautiful. So that's what we're doing.
We're starting with a small problem and
we're going to gradually increase the
complexity as we work towards building
our overall application called food
vision over the next few modules. But
now that we've got a data set that took
us a few steps and often times when you
download a data set like this it might
not be as pre-formatted as what we've
already done. So that's why I've linked
a image data modification notebook just
so you can see how that's done. But now
we've got a data set, we can start to
inspect the data. So let's see that in
the next video.
[Music]
In the last video, we downloaded our
data set which is a pizza stake.zip file
and then we unzipped it. So now let's
inspect it, which is a very crucial step
at the beginning of any machine learning
project. So inspect the data. In other
words, become one with it. That's what
we want to do. We want to build our own
internal deep learning model or neural
network model in our brain that
understands the data before we write any
code to understand the data. So let's
write this down. A very crucial I don't
say that lightly. crucial step at the
beginning
of any machine learning project is
becoming
one with the data.
And for a computer vision project,
this usually means visualizing
many samples
of your data. Beautiful. That's all we
need to write. Let's start to inspect
programmatically what our data looks
like. pizza steak. Remember, we've
downloaded a data file here called pizza
steak. Pizza steak. And it's got test
and train. So, let's start to have a
look at what's going on in there. And ls
is just basically stands for well, not
basically, it stands for list. So, it
says list the files in pizza stake. Test
and train. Wonderful. And now we're
going to go ls pizza steak. And let's
have a look at train. Pizza steak.
Beautiful. Very exciting. And how about
inside the steak directory? Pizza,
steak, train, steak. There we go.
Whoa. Okay, so we've got a whole bunch
of different images here. I wonder how
many there are. We could write some code
to figure that out. We might actually do
that. They've all got different file
names. We haven't actually looked at
what they look like yet, but we'll get
to that. Hm. What might we do next?
We've got a bunch of images. How many
are there? Yeah, let's do that. That's
what we said we might do. So, import OS.
And we might write some code. Walk
through the pizza steak directory and
list number of files. That's a great
idea. So there's a little function in OS
called walk. So if we go for dur path um
d names file names in os.walk
then you pass it the file you want it to
walk through. In our case it's pizza
steak. And then we go print f and then
there are let's go string format blend
dur names
and
directories
and blen file names um images in we want
dur path.
Is that what we're after?
So there are length durames directories
and length file names images in dur
path. This should work. Let's see what
it says. Okay, there are two directories
and one images in pizza stake. Hm. What
is that? ls pizza steak.
Hm.
That's strange.
Files.
If we go lsla.
Um, can we go like that?
Ah, here we go. So, usually DS store
appears in there. So, for some reason
when creating this file on my computer,
DS store has appeared in there. Hence
why it seems like we've got one image in
there, but it's actually just DS store.
So, we'll ignore that one. But anyway,
there are zero directories and 750
images in pizza steak train. So, we've
got 750 images of steak for the training
set and 750 images in pizza steak train
for pizza. So, okay, 750 images of each
steak and pizza. And then looks like we
have 250 images of steak and pizza for
testing. So, I'll just write a little
comment here and go the extra file in
our pizza steak directory is
DS store. Now, if you're confused at why
that's coming up, that is equivalent to
that. And same again for the for the
test. Now, let's start to view them
because having a look at this doesn't
really look very visual, right? We're
working with a computer vision problem.
And right now, we've just looking at our
images in text format.
So I think in the next video we'll start
to get a list of our class names
programmatically and then we'll write
some I think we'll use mapplot lib to
write some code to visualize uh our
photos our images so they're not just
writing on a screen.
[Music]
Welcome back. In the last video, we saw
how we could inspect the directories
which are storing our data. Now, we're
going to write some code to have a look
at our images visually, which is a much
better way to understand our images. Um,
I just want to show you one more way,
another way to find out how many images
are in a file.
Num stake images train. We can set up a
variable. Then we can go lenos.list
list dur which is so Python is the OS
module and then list dur is a little
method to list all of the names in a
directory. So we'll give it our pizza
steak maybe train. We'll give it the
stake file. So this is going to go tell
me the length of how many files are in
the pizza stake directory under the
train file in the stake file. So let's
go num stake images terrain
750. So just a way to sort of directly
inspect one directory rather than this
one which kind of loops through them
all.
Let's write some visualization code.
I'll write a little note here
to visualize
our images. First, let's get the class
names programmatically.
Now, there's actually quite a fair few
ways we could visualize images. We could
just click on them and just go through
them, but whenever possible, I like to
write code to inspect stuff. And even
within when we're writing things
programmatically, uh the ways that I'm
showing you are only just one way that
you could do it. So, take them with a
grain of salt. These are just ways that
I sort of figured out after working
through these problems lots and lots of
times. So one way is
to get the class names programmatically.
If we go here import path lib and import
numpy as numpy and I'm going to set up
the data dur which is stands for data
directory to be pathlib.path.
So, I'm going to turn our pizza steak um
train into a path object. And then I'm
going to set up class names
equals an np array of sorted.
We want this item.name
for item in data.glob
star.
Wonderful. And so what this is going to
do is create a list of class names from
the sub
directories. The subdirectories being
the subdirectories in the training
folder. So if you guess this is going to
come out as print class names
ds store h. We want to actually get rid
of that. So why don't we? I'm going to
we'll set this we'll just reset this
here. So class names equals class names
we will go one onwards
pizza steak. Now I'm getting it like
this. I've got that DS store file in
there. By the time you
go through this I'll fix up it to remove
DSto. But just remember if you get DS
store in your class names you can just
remove it like this. So if you need to
remove it you can index on list.
Beautiful. So now we've got our class
names which is pizza and steak. It's
time to write some code to visualize an
image. Let's visualize our images.
So if we want to import mapplotib
pipplot, I'm going to use mapplot lib to
view our images as plt. Remember the
reason we're doing this is because we're
becoming one with the data. We're
visualizing visualizing visualizing.
Whose motto is that? That is the data
explorer's motto. If a machine learning
practitioner's motto is experiment,
experiment, experiment, uh the data
explorer's motto is visualize,
visualize, visualize. Now, I'm a big fan
of the power of randomness. And as you
get more into deep learning and machine
learning, you will also become a big fan
of randomness. Target the target class.
So, as you might have guessed, this
little function here, view random image,
is going to select an image randomly
from this path uh of whatever class that
we want. We're going to define the
target directory and the target class.
And then it's going to plot that image.
So, let's do that. We will
set up the target directory. will view
images from here. Nice little comment of
what's happening. So the target folder
equals
target dur plus target class. Wonderful.
And then we will get a random image
path. So that can be random image um
equals random sample os.list
dur.
Um, we want the target folder
and then we want one. We're using the
random module here in Python. We are
basically just saying, hey, randomly
sample one of the items in here.
Beautiful. And then we're going to read
in the image and plot it using mapplot
lib. So image equals MP image. So this
is this module here mapplot image mp
image.mim imread and then we want to
pass it the target folder. Want to also
give it a slash. That's what I always
forget. Don't forget when you're passing
something through. If it's got multiple
subdirectories. This is where that
little slash comes from. Then we want to
pass it the random image and we want to
get the zero index because um this is
going to be returned back as a list. So
we want to just index on that. So it
becomes back as a string. And then by
the way if you wanted to see what was
coming out of this you could just go
print random image and that'll print out
the random image that it's selecting. We
go plot imshow img. So that's a function
in mapplot lubs plt image show im show.
So this is going to read it in and this
is going to show it. And then we want to
add a title. plt title is going to be
the target class. So pizza or steak. And
then we don't want to see the axises.
And finally we're going to also print
the image shape because that's important
to know when we're dealing with any kind
of data in deep learning. We want to
know its shape.
because eventually it's going to be
turned into a tensor. And what's one of
the most important things with a neural
network is lining up the input and
output shapes. So let's go show the
shape of the image.
Return img.
All right. So
let's step through this. View random
image. We pass it a target directory and
a target class. We add those together.
These are going to be in string format.
By the way, we could add a little dock
string to to tell us that. Then we get a
random image path using the random
module. And then we use mapplot lib as
well as mapplot lib image to plot that
image. Let's see if this works.
So we might go view the random image
from the training data set
and we want to go img equals view random
image and we put it target dur
we said the training data set. So we'll
go pizza steak train and then we will go
target class equals steak and let's see
what this looks like.
Oh no. Where did we mess this up?
Um,
so we're missing Okay, so this is what
we get. No such file directory pizza
steak. So there's somewhere we've messed
up. Print random image target folder.
We've given this target folder plus
random image target dur plus target
class. Oh, we need a slash here.
See these little errors that you're
going to always get. I get these little
errors all the time. So, let's see what
this looks like. Boom. So, there's the
image number that we've got. There's the
image shape. 512 5123. Does that sound
familiar to what we've looked at? So,
512 512. Is that actually uh No, it's
not. 512 / 2 is 256.
So previously we've looked at images 22
4x3
but this shape is 512 5123. Let's have a
look at another image.
Oh another image. We could keep going
here for a while. See this is the
problem with working with um different
data sets. They may influence to how you
feel. After looking at a fair few images
of steak you might get hungry.
Okay, let's see if our function also
works for pizza.
So, having a look at what are we trying
to look for in these different images?
We're trying to figure out are they
different? Is there anything that stands
out between them? Is there anything that
we should know? And the beautiful thing
is we're looking at random samples here.
So, if there is something wrong, we may
not come across it. But because we're
looking at so many different random
samples, I mean, it's a good idea
usually to look at at least a dozen or
so of each class. Uh, whenever you're
working with a computer vision problem
or a binary classification problem, just
to get an idea of, okay, are the pizza
images, is there anything outlandishly
wrong that's going to affect our neural
network? so that when it comes to
training it later on and evaluating it,
if it's getting some crazy results and
we know, okay, some of the pizza images
are fairly dodgy, uh, we can understand
if it doesn't perform too well on those
sort of images.
Just want to highlight one more thing.
You could keep going with this for a
while as as long as you want is that the
reason why I've returned the img value
here is because our image when it's read
in by in our case we've got MP img M
read when it's read in it turns into a
big array which of course is also we can
consider this as a tensor. So if we go
image dtype u int8 and if we go what if
we try to turn this into a tensor um tf
constant img oh we don't have tensorflow
import tensorflow as tf
beautiful so how cool is this is where
it's starting to get really exciting
things are starting to come together we
are learning how to pre-process images
into tenses. And what can we use tenses
for? As inputs to our neural network. So
if we see here, all of these are
different pixel values of red, green,
and blue color channels. And this is
what equates to a picture like this.
Now, do we necessarily need to
understand what's going on in this big
tensor?
No. We much better understand what's
going on here, but our neural networks
are going to much better understand
what's going on in here. So, with that
being said, in the next video, let's
start to see how we might begin to
pre-process it and then figure out how
we can start to pre-process this tensor.
I'll give you a hint. We've actually
looked at this in a previous video. So,
if you got an idea of how we might
pre-process this, go straight ahead and
do that. And then we'll look at how we
might build a convolutional neural
network to accept this kind of tensor as
input.
[Music]
So far, we've visualized a fair few of
our images. You could keep going if you
wanted to to get a better idea of them,
but we're going to leave that for now.
We've seen that when we read our images
in, they're actually in the form of a
giant array or a tensor after we've
turned it into a tensor with TensorFlow
of different pixel values. And these are
because there's three color channels
red, green and blue are between 0 and
255. Which means if this was red, green,
blue, and this was zero and this was
zero and this was 255. This individual
pixel here would be a very blue pixel.
So we come down. If we wanted to check
the shape of our image. So view the
image shape go img.shape.
we get here 512 width, 512 height and
three different color channels for red,
green and blue. So this is returns um
width
height color channels. Now we could pass
this because this is a a numerical
representation of our image data to a
neural network. However, in a previous
section, we've discussed it's often
important to pre-process our numerical
data to get it ready for a neural
network. And what was one of the most
important steps when pre-processing our
numerical data before passing it to a
neural network? If you're not sure,
that's completely fine. We're going to
go back over it. To jog our memories,
let's return back to the keynote. We'll
go back to the previous slide. Jump
straight to the end. So, we've got our
computer vision inputs and outputs.
There's a lot going on in this slide,
but essentially what we've got is an
image. Let's focus on one. Its width is
224, its height is 224, and it's got
three color channels. We pass it in in a
numerical encoding form with normalized
pixel values. Hm. Or scaled. Another
word for normalized is scaled. And
that's our inputs to a neural network or
machine learning algorithm. So, we've
kind of given it away there. But let's
go back to our modeling with TensorFlow.
And if we go right to the end, what
we're doing now is getting our data
ready. So turning it into tenses. Have
we turned all our data into numbers?
Well, we've got one way to do that with
our images. So neural networks can't
handle images. They need to be
numerical.
Make sure you give your tenses the right
shape. Well, we haven't quite jumped to
that yet. So we're going to skip ahead
to scaling features. So normalize or
standardize. Neural networks tend to
prefer normalization,
which is getting all of the values in
our input tensor between zero and one.
Again, that another word for
normalization is scaling. So, how might
we do that? So, get all the pixel values
between zero and one. We can just do
image divide by 225. And I do the dot
here. So it turns it into a float. And
there we go. All of the pixel values are
now between 0 and one. How easy was
that? Now you might be thinking, Daniel,
if we had to do that for every image,
that would be extremely tedious. And
you're completely right. Right now, our
view random image function only views
one image at a time. It returns one
image. And then if we wanted to
pre-process it to pass it to a neural
network, we'd have to write some code to
pre-process them all and get the values
between zero and one. But
it's going to take a long time. So you
might have guessed TensorFlow actually
provides some functionality to do all of
this for us. So that's what we're going
to have a look at in the next video.
We're going to actually go end to end.
If we come back to the keynote, we're
going to build the architecture of a
CNN. So if we have a look, we go through
all of these steps. We got the input
layer, the convolutional layer, the
hidden activation, the pooling layer,
the fully connected layer, the output
layer, the output activation.
This is tiny VG. So that's what we'll go
through in the next video. We'll look at
an end to-end example to both read in
data, prepare it for a convolutional
neural network. We'll also have a go at
coding these layers for the first time.
We'll compile our model just like we
have before and then we'll fit it to the
data.
So I'll see you then.
[Music]
Welcome back. Now, I'm super excited for
this video. It's going to be a full-on
one, but I think you're up to the
challenge. We're about to build our
first end-to-end convolutional neural
network. However, before we jump into
that, I just want to put a little tidbit
here. I added this to the last video.
Just continuing off, we put this little
key emoji here for write down things
that we should note. As we've discussed
before, many machine learning models,
including neural networks, prefer the
values they work with to be between 0
and one. Knowing this, one of the most
common pre-processing steps for working
with images is to scale, also referred
to as normalize, their pixel values by
defining the image arrays by 255 since
255 is the maximum pixel value. Now,
that's just a follow on from what we did
before to finish off that image array,
normalizing all of the red, green, and
blue pixel values. So, without any
further ado, let's get into an endto-end
example. Now, as I mentioned before,
there will be a few things in this video
we haven't quite covered before.
However, I'm going to show you the end
to end to example. This is where we're
going to end up and then we're going to
work backwards from there. It'll all
make sense in a few videos time. Let's
have a look and end to end example.
Let's build
a convolutional
neural network to find patterns in our
images. More specifically, we need a way
to
load our images,
pre-process our images to get them into
a state like this because doing this one
by one would be would take far too much
time. pre-process our images
um build our CNN to find patterns in our
images and then we need to compile our
CNN and then fit the CNN to our training
data.
So let's do it. What do we start off
with? We're going to need TensorFlow, of
course. Import TensorFlow as TF. And
we're also going to need a new module we
haven't quite seen before. Remember I
gave you a warning. There will be a
couple of things we haven't quite seen
before, but as usual, we're going to
step through it by writing code. So,
image data generator. You may be able to
guess what that means. So, let's start
off how we always do. Set the seed. TF
random. Set seed 42. And now we're going
to set up the data prep-processes. So
pre-process data get all of the pixel
values between zero and one.
This is also called scaling slash
normalization.
Wonderful. And to do this, we're going
to set up two instances of image data
generator. And we're going to pass it a
parameter here called rescale.
And it's going to be 1 /
255.
Let's just run this so that we can view
the dock string of this. What does it
say?
Here we go. Generate batches of tensor
image data ah with real time data
augmentation. Now we haven't covered
what data augmentation is and we haven't
really covered what batches of tensor
image data is but we'll see what this
looks like in practice and then we can
go through if you want to go through the
documentation before we keep writing the
code feel free to pause this video and
come back but I'd like to encourage you
to just continually write code see what
the outputs of your code are and learn
by doing so now we're going to import
data from directories
and turn it into batches. H, we have
seen the word batches before or batch
size before, but that's on the keynote.
I'm going to finish all this code first
before we go back to the keynote. So,
we're going to set this up as train data
and I'm going to bring in the instance
of image data generator we just created.
So, train data.flow
from directory. You might be able to
guess what this does. Now we need to
pass it a training. Oh, you know what?
We've missed a step. We need to tell
TensorFlow set up paths to our data
directories.
Now, there's a couple of ways we could
do this. We could grab this and we could
copy the path using that. So, for one,
we might do that. So, this is the
training directory of pizza steak. So
I'm going to copy path and I'm going to
write this down here. Train dur equals
paste that. This is the same as we'll go
here. Test dur equals. You actually
don't need this content at the start
there. So we can just do it like this as
well for the test directory.
Wonderful. And then we're going to pass
a trainer here. So see what we did. We
just took this variable training
directory with all of our training
images and passed it into train data and
this actually needs to be train data gen
because we're taking that and putting it
here. So this is a flow from directory
method. What does this do? Let's view
the dock string. Takes the path to a
directory and generates batches of
augmented data. Now the arguments can be
directory. So the first one directory
that's what we've passed there. We might
use the parameter name so we know
directory equals train dur can be a
string path to the target directory.
Okay, beautiful. And if you keep reading
it's going to say that should contain a
bunch of images. There we go. Any PNG,
JPEG or etc. images. And then the next
thing we're going to pass in is batch
size. So this equals 32. Now I believe
by default it is equal to 32. Batch size
32. Batch size 32 is actually very
common. And you're going to see that a
lot in uh the world of deep learning.
But writing code, I will show you
something. Google yarn laccoon batch
size. And that's where you'll get 32
from target size equals. Now, this is
the size we want all of our images to
be. We saw when we imported one of our
images that it was 512 x 512. But for
our case, we want to reshape all of the
images to be 224x 224. You might be
wondering why this shape. Well, it's a
very common shape to use. It also
contains a certain amount of
information. Say a 512 x 512 image may
have almost or will have over double the
amount of information. However, 224 x224
in practice still withholds a lot of the
information within an image. Now, again,
you can mess around with this size to
however you like, but this is a pretty
good size for us to work with for now.
The class mode we're going to be working
with is binary.
So that's telling our train data gen
that we're importing data in a binary
format. Uh we're working with two
classes. Then finally, we're going to
pass seed equals 42 for reproducibility.
We've covered a bunch of different
parameters there fairly quickly. If you
want to have a look into what each of
them do, we can kind of deduce what they
do from reading them here, but don't
forget you can read the documentation
for this. We're going to keep going. You
might have guessed we also need to set
up one for the test data, but this is
going to be a little bit confusing. I'm
calling it valid data, but that'll be
clear in a moment as well. We're going
to go um valid data gen dot. This is
coming from up here, by the way. Valid
data genflow from directory. You might
have guessed we're going to pass in this
directory. This is going to be the
tester. The batch size is going to be
32. the target size. What shape do we
want our images to load in? It's going
to be the exact same as uh the training
images. Same thing for class mode. It's
going to be binary. And then the seed is
going to equal 42.
Wonderful. So, what have we done so far?
We've loaded our images. We've
pre-processed our images uh using the
image data generator class. Now we're
going to build a CNN to find patterns in
our images.
We are covering a lot. So build a CNN
model. And this is the same as tiny VG
on the CNN explainer website. If you
want to skip ahead, you can Google the
CNN explainer website and check that
out. We're actually going to have a look
at that in a moment. But I'm just going
to write the code to replicate the model
on the CNN explainer website which is
the tiny VGG
convolutional neural network
architecture which might not mean much
to you for now but we will see what it
looks like in practice. So I'm going to
set up a KA's sequential model and then
the first layer is going to be TF Keras
layers com 2D. The first parameter I'm
going to pass here is filters. There
will be 10. Then the kernel size. Again,
we've covered none of these, but we're
going to see them soon. I just want you
to get hands- on writing your first
convolutional neural network as fast as
possible. Activation equals relu. Then
we're going to set up the input shape.
Now, this is the input shape our first
layer is going to take. It's going to
take images in the form of 224x 224x3.
So, that's what this pre-processing
function is going to do. reshape our
images into 224 224 and then also the
three is going to be for color channels.
So that's where that comes from. And now
we're going to go into another layer.
Let's have one of the exact same. So
this is going to be TF caras layers com
2D except this time we're not going to
pass it any parameters. This is going to
be 10 three and then activation equals
reu.
Wonderful.
This layer here is actually the exact
same as this layer.
If we look at the dock string here goes
filters kernel size strides. The default
is 1 one. Again, you're going to read
all these and go, "Wa, what is going on
here?" Daniel, we're writing
convolutional neural networks from
scratch. We go, oh, actually, not really
from scratch. We're using TensorFlow to
do so. Max Pool 2D.
This is coming from directly from
colored block edition. Pool size equals
2. And the padding here is going to
equal valid. We haven't seen the maxpool
2D layer, but that's okay. And now the
tiny VGG architecture actually has two
more convolutional layers. TF caras
layers com 2D 10 3 activation equals
ReLU. Oh, we've been using double
apostrophes, so I'm going to keep it
that way. Reute and then it's going to
have another convolutional layer layers
com 2D and then we're going to go 10 3
activation equals relu. Wonderful. And
then we're going to have another max
pool. So TF carers layers domax poolool
2D. You see the pattern we're sort of
building up here. We can pass it to
there. Um
we've got two com layers and a max pool
layer. two com layers and a max pool
layer. Now we're going to have TF carers
layers flatten of the max pool layer
before we pass it to the output layer.
tfkaras do layers.dense
one because we're using binary
classification activation function is
going to be sigmoid. Boom. Look at that.
This is our first ever CNN model. I'm so
excited. And I do know we did cover this
very fast, but don't you worry. If you
want to jump ahead and see what's going
on here, I highly encourage you to
search up the CNN explainer website. But
as I said, we're not leaving this Google
Collab until we've built an end toend
example. So this is our model. Now what
do we do once we've built a model? Come
up here. We've built CNN to find
patterns in our images. Now we have to
compile our CNN. So let's do that. Now
this is something we've had practice
with. Compile our CNN. I want you to
have a think about it. What might our
loss function be if we're doing a binary
classification problem? What might our
optimizer be? And if we're doing a
classification problem, what's a good
metric to begin with?
So, you might have already gone ahead
and compiled our model, but if not,
that's okay. Let's do it together. We're
going to use binary cross entropy as a
loss function cuz we're working with a
binary classification problem. And let's
set up our optimizer to be TF carers
optimizers. Adam. Wonderful. And then
we're going to set up the metrics. Now,
because we're working in classification,
accuracy is a good metric to start with.
And then we're going to fit the model.
History 1 equals model 1.fit.
What do you think we can pass in here?
If we came up here, where's our training
data? Here we've got it. We've got it in
a a training data object. We've got
validation data here too. So what we can
do with the fit function is pass train
data directly to that. And because train
data is an image data generator class of
the training directory already
pre-processed for us. So we can pass
that straight to our neural network.
Then we're going to fit it for five
epochs.
The beautiful thing about this class
actually flow from directory is that it
creates the data and labels
automatically for us. So that's why we
can pass it here train data without
having to pass X and Y to do ffit. The
labels and data get created for us in
this train data object. Then we can go
epoch 5. We're going to introduce
ourselves to another parameter here
which is steps per epoch.
Steps per epoch is going to be len train
data. So if we have a look at the length
of our training data. Oh, we can't see
it there. I'm going to just
comment this out. Run this cell. There
we go. So, this is what outputs when we
run these.
It has found 1,500 images belonging to
two classes and 500 images belonging to
two classes. So, this is the training
data, 750 images per class. And this is
the testing data, 250 images per class.
And if we go lend training data, 47.
Now, you might be wondering where that
came from, but if we go 1500 / 32
because we've got a batch size of 32,
what does that equal? 46.875.
So, now we have 47 batches of 32 images
and their labels because there's 1500
total images. Now, the reason we put
things into batches is because that way
it can all fit in memory of our computer
chip. So if we tried to calculate uh the
patterns on all images in one hit, our
computer might run out of memory. But if
we merge them or if we splice them into
batches, it gives our computer a chance
to calculate the patterns on the first
32 images, then the second 32, the third
32, and so on and so on to work out the
patterns for the overall entire image
data set. So if we come in here, that's
why we want to go steps per epoch. We
need it to do 47 steps per epoch. So 47
passes across the data, one for each
batch. And then the validation data is
going to be equal to the valid data. And
then the validation steps, which is the
same thing as steps per epoch, is going
to be len valid data.
Far out. Did we just cover a lot of
ground or what? But I'm so excited
because we're about to look at this all
in one cell. This is what we've done.
We've loaded our images. We've
pre-processed our images. We've built a
CNN to find patterns in our images.
We've compiled our CNN. Now, it's time
to fit it to our training data. Do you
reckon we've made a mistake?
Are you ready to run it with me? I'll
give you a second to catch up. If you
need to write some code, that's okay.
Finish writing out what's in this cell.
I'll scroll up a little bit so you can
see what's going on. But we're about to
run it in 3 2 1. your first
convolutional neural network built by
yourself. Or if you've done deep
learning before, this might be another
one. But the first one we've built
together in this course. Look at this.
Oh my gosh, this is beautiful. It's
fitting on the data. Epoch 10 of five.
Now, we've got a fairly large ETA here.
This is probably the longest ETA we've
had to wait for. This is over a minute.
H.
What's going on here? Why is this taking
so long?
I might give you a guess. Why do you
think this ETA is taking so long? I'll
give you it's nothing to do with the
code we've written. It's actually to do
with the runtime we're running. You see
here, we're not using any kind of
hardware acceleration. So, we're trying
to calculate all the patterns on our
data without a GPU.
So, that's probably what we'll do in the
next video is we'll change our hardware
to a GPU. I'll let this run for about
one epoch and you can do the same. You
can actually let it run the whole time
if you've if you got the patience to.
But I'm going to stop mine after one
epoch and I'm going to come back in the
next video and compare and see how much
faster it runs with a GPU versus without
a GPU, which is what we're doing now,
running without a GPU. So, let yours run
for a little bit, at least one epoch, so
you can see how long it goes for for one
epoch. And I'll see you in the next
video and we'll speed up this code.
[Music]
Welcome back. So, in the last video, we
covered a fair bit of ground. And I know
we went through a lot, but that's okay.
We're going to slowly break it all down
over the next few videos. I just wanted
to show you what we're working towards.
We've got a full end to- end example
here in one cell. And we kind of cut our
well I did. I'm not sure if you waited
for it to fit, but I cut my uh neural
network short. So, it only got about a
third of the way through its second
epoch because it was taking 91 seconds
per epoch. And we found out that it was
probably taking that long because we're
not using GPU acceleration. So, what I'm
going to do here is just to exemplify
the power of a GPU and the beautiful
thing about Google Collab offering us
the use of one for free. I'm just going
to put this here and comment it out just
so we know. It took 91 seconds for the
first epoch. And usually, this is
another just little tidbit while it's on
the top of my head is whenever you're
running a neural network that's going to
load images or load some kind of data,
usually the first epoch is about 50%
longer, give or take some arbitrary
amount. So, the first epoch is usually a
fairly significantly amount of time
longer than the other epochs because it
has to load all of the data into memory.
But then subsequent epochs, as we'll see
in a second, are usually quite faster
than the first one. So that's just a
heads up for going forward. Now, let's
improve the speed of this by going
runtime, change runtime type GPU. I'm
going to save.
Now, what we're going to have to do, we
might have to rerun some cells above
because when you change the runtime
type, it actually resets all of the save
variables that you have. But let's just
see if we can run this cell. All we've
done is change from not using a GPU to
using a GPU.
Shift and enter.
Uh, no such file or directory. Did it
delete all of our files? It did. So
that's what we're going to have to do.
We're going to have to come back up here
to get the data. And we're going to have
to redownload the data here. We press
shift and enter. Lucky we've set up this
code cell before. Shouldn't take too
long.
Do we have it there? We do. Beautiful.
And so now we can just go back. We'll go
back to our end to- end example.
And we're going to run this cell.
Hopefully that's the only cell we need
to run.
All right. Found 1500 images belong in
two classes. It's the training data.
This is the testing data. 500 images.
Epoch 10 of five. What's our time to
beat? This is a time trial for the
Google Collab GPU powered instance. We
had 91 seconds. Oh my goodness. Look at
this. This is the powerhouse already.
How long was that? About 10 17 seconds.
So,
wow. We might actually do all five
epochs in the time it took us to do one
just by giving ourselves a GPU to use.
We can even just sit here and talk. This
is one of the my most favorite
activities in the world actually is just
watching a a neural network train. And
it's even better when the when the
metrics are coming out pretty good. Like
look at the performance on the training
data. Remember the neural network we're
building here is to classify whether an
image is of pizza or of steak. So right
now it started at 64% accuracy, finished
at 85 12% accuracy on the training data
set, and finished at close to 90%
accuracy on the validation data set. How
cool is that? And all of it, what's
that? 17 + 9, 26 + 9, 35 + 9, 44 + 9, 53
seconds. So almost half the time to
train five epochs as it took to train
one epoch by just switching to GPUs. So
I'm going to put a little note there.
This is just for you going forward.
Note,
if the above cell is taking longer than
10 seconds per epoch,
make sure you're using a GPU by going to
runtime change runtime type and then
hardware accelerator GPU.
You may have to rerun some cells above.
Beautiful.
All righty. And you see what I mean by
the first epoch taking a almost double
the amount of time as the subsequent
epochs. That's just because it was
loading data to into memory to be able
to run and find patterns on. And clearly
our network is doing pretty well. And
speaking of doing pretty well, I want to
let you in on a little secret of what
we've just done. So if we search data
vision food 101,
it should come up. Does it come up with
the original? Here we go. This is the
original food 101 paper
scientific paper. So food 101 mining
discriminative components with random
forest. So this this paper was published
I believe it was 2014. There may be a
date here. You can correct me if I'm
wrong, but let's just say it's 2014.
They used a random forest algorithm.
What we just used was a convolutional
neural network. So actually
convolutional neural networks have been
around for a few decades now, but
they've only sort of recently in the
last say 10 years or so become really
prevalent and accessible to people like
you and me. But what we've done is
where's the results here? So they on 101
food categories with 110 101,000 images
had an average accuracy of 50.76%.
Wow. Okay. Across 101 class food
categories. Now if we go back and it
even says theirs their model outperforms
alternative classification methods
except for CNN. A what have we done?
We've used a CNN. So if we look at this
50.76%
accuracy, this is across 101 food
categories though. But with our 10 line
convolutional neural network, we've just
got 88% accuracy. But again, theirs was
on 101 classes. Ours was only on two
classes. But that's just giving you some
sort of insight into the power of a
convolutional neural network. Now before
we finish this lecture, I said in one of
the previous videos that this
architecture that we've got here is a
CNN model based on the tiny VG on the
CNN explainer website. So this is going
to be your homework for this video. I'm
going to write this down here.
First of all, we're going to get a model
summary. We're going to go model one dot
summary and have a look at it here.
We can inspect it. There we've got
there's the input layer. All these
different parameters here. Total params
31,11.
But I want you to pay attention to the
layers we've built. Conf 2D, com 2D, max
pool, com 2D, conf 2D. Well, this is a
bit of a tongue twister. Maxpool,
flatten, dense. So, pay particular
attention to this. And now we'll go
through the CNN. Well, actually, this is
your homework. So you're going to go
through the CNN explainer website. Now
this is one of my favorite resources of
all time on convolutional neural
networks. So I want you to just have a
play around with this for at least 10
minutes. Again, there's probably a lot
of stuff here that you haven't seen
before, but that's okay. Check out the
the layers of this neural network here
and how does that compare to our own?
Step through all of these images here
and have a read. might take a little bit
longer than 10 minutes, but I'd like you
to read through this entire page because
this is this is going to be in the
extracurriculum as well, but this is
probably the best explanation or one of
the best cuz there's many great ones out
there of what happens in a convolutional
neural network on the entire internet.
So, have a play around with that. That's
your homework for this video. And the
question is, how does this network
relate to the one that we've just just
created? We've kind of given away the
answer, but I want you to make the
connection yourself. So, I'll leave this
link here. I'll write down here.
This is going to be practice slashex
exercise.
Go through the CNN explainer website for
a minimum of 10 minutes and compare our
neural network with theirs.
turn that into markdown. Of course, this
will be in the practice slash exercise/
extracurriculum section as well, but I'd
like you to go through that before we
push forward with convolutional neural
networks.
I'll see you in the next video.
[Music]
How did you go? Did you go through the
CNN explainer website? Did you figure
out what the commonalities are between
or commonality is between the model that
we built and the model that's running
across here? We've got an input com
layer relu comu max poolool comu comu.
Where have we seen that before?
Well, I'll give you a hint. We
replicated it here. Now, where they've
got ReLU here, what we've done is we've
just included it within our
convolutional layer. So, we've just set
the activation here. If you really
wanted to, you could also set it up as
where could we put it? We could put one
here. TF caras. This is how you might
add an activation just on top.
Activations
TFN.
ReLU. So that would be the same as this.
So if we remove that, we've just got the
exact same setup that we had before, but
now we've just added in an extra layer.
So I'll move that there. The trade-offs
of that
won't seem very clear now, but if you do
need to separate your activation
functions from your uh individual
layers, you can always put them as their
own layer. So just keep that in mind
going forward. But that's all the
difference is there. But otherwise, how
exciting is that? You've already
replicated a neural network that you
found in the wild. And speaking of
neural networks that we found in the
wild, how about we see just to
demonstrate the versatility of neural
networks in general, how about we see
how the same model that we've built
before goes on our different data set.
Yeah, that sounds like a good idea. So
in this video, we're going to see how we
can use the same model as before. More
specifically, the model we built with
the TensorFlow playground.
playground.tensorflow.org.
Is it.com or
We're about to find out, aren't we?
There we go. Wonderful. So, we can
tinker with a neural network right here
in your browser. Remember in a previous
video we built a neural network with two
hidden layers and four neurons per layer
and then we made sure we had ReLU
activation. We changed the learning
rate. What was it? Atom trusty learning
rate of atom is 0.001
and we had classification but that
neural network only worked with dots.
Now we're working with a little bit more
of a complex data set. We've got images
of pizza and images of steak. But let's
replicate. I'm going to have some more
practice building neural networks. Let's
replicate this model to see if it works
with our image data set. Hm. How might
we do that? So, let's go here. Let's go
set random seed. And then, oh, we'll put
a little note here.
Let's replicate the model we've built in
a previous section to see if it works
with our image data.
And the model we're building
is from the TensorFlow
playground.
So I believe we can just copy this here.
This is the beautiful thing about the
TensorFlow playground. That's a really
long link. I'm actually going to
get rid of that. And then I'm just going
to turn this into a link. Wonderful.
There we go. That should link to that
model. So we'll set the random seed TF
random set seed. This is going to be an
awesome test to see
first of all compare the convolutional
neural network to another style of
model. Create a model to replicate the
TensorFlow
playground model and also be a good
experiment to check the diversity of
neural networks in general.
We'll come back over here. Now, we've
got a little bit of practice building
these dense neural networks. If I could
get my cursor to the start of the line,
tfkaras
do layers. The first one we're going to
need is flatten because we need to
reshape or at least tell our neural
network that the input shape of our data
is actually now 224 2243 because that's
what our images are in. Remember in our
train data object we can have a look
here train data.
There we go.
Directory iterator.
And then we can build our two layer
neural network. DF cars layers dense
dense. There we go. Four. Activation can
be relu. And we go in here.
TF.ARS.layers.dense
4.
And the activation is ReLU. Wonderful.
And then TF caras layers dance one
activation equals sigmoid.
Now how do we uh compile
compile the model? So the beautiful
thing about this is that we can just use
the exact same compile parameters as
above equals binary cross entropy.
Beautiful. And the optimizer is going to
be TF car optimizers. Yes, that's that's
the correct typing, not my own typing.
Metrics equals accuracy and beautiful.
And then of course we're going to fit
the model. We'll save it to history 2
equals model 2. Oh, getting ahead of
myself there. Model 3 train data because
then we can pass it just the train data
block that we've created before. And the
epoch is going to be for five. Just the
same as what we've done for our
convolutional neural network. Steps per
epoch equals len train data. And then
we'll go validation data equals.
So we want the validation data parameter
here as you might have seen just gives
our model a chance to train and validate
itself at the exact same time. Um
validation steps kind of like calling
model.evaluate
as training is happening. Beautiful. Now
are you ready for this? Is it going to
work? Will our neural network, our new
one, same one we've built before in a
previous video, work with images?
There's only one way to find out. If in
doubt, run the code. All right, fingers
crossed. Wow,
this is going really fast
cuz you know why? We've switched to a
GPU hardware accelerator and our data is
already loaded. So, it's just flying
through these epochs,
but its performance isn't doing too well
compared to our convolutional neural
network. Like look at this. It's
basically guessing on the validation
data
and the training data is not doing too
well either. But the good news is that
it's we're failing fast here at the end
of the day because our experiment is
small. We've reduced our training data
to only two classes and we're only
fitting for five epochs. That's only 40
seconds or so that we know. Okay,
something's wrong here. Our neural
network model 2, the same neural network
that we've used in a previous video
isn't quite working with our image data.
Now, I want you to have a think about
how might we upgrade this model or
something else. Maybe we fit for longer,
but I'll give you a little nudge here. I
don't think Do you think fitting for
longer is going to really help? Val
accuracy has basically been a coin toss
the whole time. So, give you a hint. We
need to change this architecture in some
way, shape, or form.
Do we add more layers? Do we increase
the number of hidden units? H I'll leave
you to think about that until the next
video. And what we'll do is we're going
to upgrade this model to see if it can
find patterns in our image data. But
before we do go to that next video, I'd
like you to give it a try. So take this,
create model 3 and upgrade it in some
way, shape or form so that it finds or
the we actually beat the validation
accuracy of model 2 because a coin toss
is let's just let's just say that's not
going to cut it.
So give that a shot and I'll see you in
the next video.
[Music]
How'd you go? Were you able to create
model 3? Did it perform better than
model 2? I surely hope so because model
2's performance was absolutely
atrocious. Look, before we create model
3, let's get a get a summary of model 2.
Oh, I didn't even type any code there.
Come on, Daniel. Model 2s summary.
All right. Holy beas, 602 trainable
parameters. Why does that number ring a
bell? Well, it doesn't really. It's just
a lot higher than our convolutional
neural network. So, a convolutional
neural network had a total parameter
count of 31,1
and they were all trainable. Whereas
our dense model fully connected layers
had 602,000
or just over 602,000 total parameters.
So despite having 20 times more
parameters
um than our first model, model 2 still
performs terribly. What's going on here?
Is this is this giving us a little uh
hint at the power of convolutional
neural networks? Hm. Well, let's uh
let's not stop there. If you're
wondering what parameters are or
trainable parameters, well, these are
learnable patterns that the model can
figure out during training. That's how
you think about them. So, there's a big
weights matrix and these are all numbers
of uh individual numbers in that big
weights matrix aka patterns that a model
learns when it starts to look at
training data. H let's try to improve
this model too. So let's uh
let's write down here
despite having 20x more parameters
than our CNN model one
model 2 performs terribly. Let's try to
improve it. Now I hope you did have a go
at improving it. Some ways that we can
improve our models is add layers or
increase the hidden units. So why don't
we do that? We'll add an extra layer and
we'll increase the hidden units and see
if model 3 goes a little bit better. So
we'll start off set the random seed TF
random set seed for reproducible
results. And then we'll go create the
model. Same as above, but let's step it
up a notch. Yeah. Model 3 equals
TFARS.sequential.
Beautiful. I'm coming over here. TFRA
layers. I think we'll add an extra
layer. So the previous one had two
layers. This is going to have three. And
the previous one had four hidden units
per layer. This one's going to have 100.
So we're increasing number of hidden
units by 25 times per layer. Activation
can be relu TF carers do layers dance.
We could really just um copy and paste
this, but we're following our rule. If
in doubt, write the code. If in doubt,
run the code. And then we're going to go
here. Kev caras layers dense 100. And
activation is our faithful relu.
Beautiful. And then we need an output
layer which is going to be the exact
same as above. TF lers no TF carers. TF
TF lers.kers.
Don't pay attention to what I'm writing
as I'm writing it. Only copy the
finished product. There we go. Okay. Or
write it for yourself. Now compile the
model. Your model ability writing
ability is probably better than mine by
now. Uh cuz we've written so many gosh
darn models.
Loss equals binary cross entropy. What
did I say at the start? We were going to
write a lot of models. Um and then we're
going to go optimizer equals
tfkaras.optimizers.
Beautiful. And then we're going to set
up our metrics just to be accuracy. This
is the same as we haven't actually
changed the compilation across all three
models. And then we're going to fit the
model. So, we need history 3 equals
Model 3.
Tesla have a car called the Model 3.
It's a great car. I see a couple of red
ones and I think a gray one driving
around my home city. Steps per epoch
equals len train data. And you know what
they use? They use computer vision. And
you know what we're writing? We're
writing computer vision code. Albeit our
dense model isn't performing as well as
our CNN model. valid data. Validation
data
validation steps equals len valid data.
Beautiful. All right. Do we think so?
Model 2 performed like trash. Basically,
a coin toss in our binary classification
problem, but we've upgraded model 3.
We've added a dense layer and we've
increased the hidden unit size by 100 or
sorry, by 25 times to 100. So, from 4 to
100. Let's run this and see what
happens.
Oh my goodness. What do we get wrong? Do
we need a flatten layer? Yes, we do. Um,
that's what we need to do. We need to
flatten our image tenses as they come
in. Layers.flatten
input shape.
But why does this error give us anything
away?
Loits and labels must have same shape.
So yeah, whenever you see a value error
that says something about shape, there
is 100% or not 100% cuz nothing's ever
certain, but there's most likely some
sort of shape mismatch with your model
somewhere. So remember in a deep
learning model, each layer's output is a
subsequent layer's input. So if we have
a dense layer here as our first layer,
we didn't have this flattened layer here
before. It's expecting things in a
certain input shape. So, we need to
flatten our images as we pass it in.
This should work. Fingers crossed.
Beautiful. There we go. Okay, now this
is still operating nice and fast. That's
what we want. We're all about speed
here. Oh, yes. Validation accuracy 68%.
We've already seen an improvement. And
again, this is only taking about 8
seconds, 9 seconds per epoch. So, we can
probably sit here and and chat to each
other while it trains. That's one of the
problems you're going to have actually
as you get further and further into deep
learning is figuring out what you should
do while your model trains. I actually
like to do push-ups or go for a walk
just because sitting at a computer for
too long, you know, not great for the
posture.
Oh my goodness. Maybe that's a future
course. Posture for machine learning
engineers. Look at that. 74%. That is
much better than our scumbag model 2,
but still nowhere near as good as our
convolutional neural network at
performing close to 89% accuracy on the
validation data set. And uh let's really
I want to show you something. Get a
summary of model 3.
Model 3 dots summary. This is going to
really exemplify the power of
convolutional neural networks. So, this
model 3 did okay on that data set. We
got to give it to it. It did okay. It
did better than guessing. Not as good as
our convolutional neural network.
However, we can't let it off the hook
with this. Look at that. 15 million over
15 million trainable parameters. So, not
only did it have I think it's 500 times,
what did our convolutional neural
network have? 15 million divided by
31,000.
Is that going to work? No, it doesn't
take commas.
Oops.
500 times or close to 500 times. So,
model 3 has almost 500 times the
parameters of our convolutional neural
network. Come back up here. Here we go.
Total parameters. This is our
convolutional neural network. but still
doesn't perform anywhere near as good.
Hm. That is a prime example of the power
of convolutional neural networks. So,
with that being said, I hope now that
you understand how powerful
convolutional neural networks are,
especially for things like computer
vision. But I hope you also understand
is that deep learning models are
actually quite versatile. So, or neural
networks in general. So depending on how
you format your data, as long as it's in
some kind of tensor, chances are you can
build almost any neural network you want
with TensorFlow. And I mean, if there
are patterns there, the neural network
will probably find them. It just depends
on what kind of problem you're working
with will dictate what kind of model you
should use. And generally with computer
vision, you use convolutional neural
networks. So with that being said, what
did we do before? We we did an end to-
end example, but we burned through it
really quickly. So, what we're going to
do over the next few videos is break
this down and see what's going what's
actually happening in each one of these.
And so, for the meantime, before we get
into what's actually going on in this
convolutional neural network model, u
you've got another bit of homework and
that's to go back through the CNN
explainer website and read through this.
I'm going to read through it as well, so
I'm going to be talking to it as well.
It's a phenomenal phenomenal resource.
So just read it end to end even if you
don't understand something and have a
play around with this graphic because
that's going to come through in a later
video. So I'll play around with it too
and we'll uh we'll discuss this in a
future video. So with that being said,
take a break, enjoy yourself, and I'll
see you soon.
[Music]
Welcome back. Before we kick off this
video, I just want to let you know here
that I've added a little key and a note
before to follow on from what we were
talking about in the last few videos
about the number of parameters. So you
can think of trainable parameters as
patterns a model can learn from data.
Now intuitively you might think there is
more is better. As we saw here, our
dense neural network or model 3 had 15
million trainable parameters, but it
actually it had 500 times the amount of
parameters in our convolutional neural
network. And in lots of cases, having
more parameters is better. But in this
case, our case, uh the difference here
is that the two different styles of
model that we're using. Oh, there's
another typo. Finding all these typos
here. So where a series of dense layers
has a number of different learnable
parameters connected to each other and
hence a higher number of possible
learnable parameters, a convolutional
neural network seeks to sort out and
learn the most important patterns in an
image. So even though there are less
learnable parameters in our
convolutional neural network which is
model one. So if we go model one
summary, we'll check this out.
Boom. We have 31,000 trainable
parameters. So even though there are
less learnable parameters in our
convolutional neural network, these are
often more helpful in deciphering
between different features. Remember a
feature in an image can be almost
anything you can imagine. But we
actually don't define those features. So
a convolutional neural network might
learn an important feature of a picture
of pizza is that it's round around the
edges. So convolutional neural networks
are better usually at deciphering
features in visual data than dense
networks.
But as you saw, we can also use dense
networks or dense layers to still get a
result on our image data. It just wasn't
as good. So with that being said, let's
jump back to our end to end example. We
saw this a couple of videos ago and we
did cover a lot, right? Right, we loaded
the images, pre-processed the images,
built a CNN, compiled the CNN, fit the
CNN. So, we really did breeze through
all of these steps. So, over the next
series of videos, what we're going to do
is break all of these down, particularly
what's going on here when we build a CNN
model. So, with that being said, let's
come down to where we were before. We'll
get started. So, I'm going to write this
down. Binary classification.
Let's break it down.
Wonderful. And we really went through
about seven different steps here. Become
one with the data. Remember our motto.
The data explorer's motto is visualize,
visualize, visualize. Should do that
with every new data set that you're
working with. Then we pre-process the
data. So we prepared it for our model.
And the main step here was scaling
slashnormalizing.
And then number three, we created a
model. And it's usually good with any
neural network problem or any model
building experiment session is to start
with a baseline. In our case, the
baseline was the food 101 original
paper. Uh we wanted to try and beat
that. That's usually a good way when
you're working with deep learning is to
have some sort of paper or some sort of
benchmark and try to create a model that
beats that benchmark. Um if a benchmark
doesn't exist, you can create one
yourself. Then we fit the model. Number
five is we evaluate the model. And
number six is we adjust different
parameters
and improve the model. Here's where we
try to beat our baseline.
Remember, machine learning and deep
learning is very experimental. So
experiment, experiment, experiment. And
that's this step. Repeat until
satisfied. experiment experiment
experiment. So what we're going to do is
we're going to break each of these down.
So number one, let's become one with the
data again. So what did we do? We
created a function before to visualize
data. What I might do actually is also
make another heading here. One become
one with the data
just so we can break it down. And also
in Google Collab, that also makes it a
nice little uh it'd be good if they had
the the number there, but they don't.
That's all right. Visualize that. Might
be an update to Google Collab in the
future to have that. So, we're going to
create a plot. We'll create a subplot
because we're working with binary
classification. Remember, our data set
is pizza and stake. And we'll get a
stake image. We can do that by using our
view random image function. And then we
pass it the path to our
stake training data. Or it could be I'm
going to visualize the training images
rather than the testing images. We'll
pass it that cuz then see there if we go
get the we really should have wrote a
dock string for this function so we
could get some information on it for
ourselves later on. But we just had the
target directory there. In our case,
I've used the training directory and the
target class is stake. And then we can
create
pizza image equals just the same as
above,
but we're going to change it to pizza
steak train. And we'll view one of our
training images. Again, this is going to
just visualize. Let's just run the code.
Actually, that's better way of
explaining it.
Wonderful. So, there's the file path for
our stake image. This could be a little
bit prettier, but that's all right. We
get the point. The image shape of our
stake is 384512. This will this is
important to remember here. So this is
the the shape of the array or the tensor
that our image gets read into. And if
you recall back to our end to end
example, we reshaped our images to all
be the same size. The image shape here
is this is a pizza. So obviously it's
got a it's a lot higher. So this I
believe is height width color channels.
Then if we do it again,
we see steak and some pizza. Oh, that's
an interesting looking pizza.
So this is where it's helpful to become
one with the data, right? We visualize.
We could do this. Ideally, you do this,
you could do this for as long as you
want, depending on what data set you're
working with, depending how many classes
you have,
depending on what you're looking for.
See, we're working with a relatively
simple binary classification problem.
Um, and our data is pretty distinct from
each other. But if you had a more
complicated data set, you might want to
spend a little bit more time than what
we're doing getting familiar with it and
becoming one with the data. And I'm a
real big fan of viewing random samples
of your data set rather than sort of
just picking them and choosing them.
Wonderful. Okay. So, we're getting
pretty familiar with what's going on
here. Oh, this is quite Do you see how
this shape here is very similar to
what's going on here? So, that's an
interesting one. That may be a an
instance where our model uh gets a
little bit confused with what's going on
cuz see here we got some very similar
shape, very similar color in these two.
But again, when we discussed that a
convolutional neural network is going to
learn features of an image, we visualize
another one.
So a feature of pizza might be that it
has some sort of distinct edges. That
could be the slicing a steak. I mean,
who knows what it's going to learn. This
is a beauty of neural networks is that
it figures out the things that it has to
learn itself. Can you imagine trying to
handcode uh some sort of program to go,
okay, if you see an image and it's got
darkness around some sort of edge, well
then it probably is a steak. And if you
see an image and it's got some some red
mixed in with some yellowy color, uh,
then it's probably a pizza because
that's sauce and cheese. I mean, there's
too much going on. And this is only in
two different classes.
All right, we've visualized enough
images. So, what we're going to have to
do next is what's step number two?
Become one with the data. And again, our
data set, we've actually downloaded this
in a previous video, has already been
formatted for us into this sort of
standardized way. There is a notebook in
the TensorFlow deep learning repo if you
want to see how that was pre-processed
there. Image data modification.
So this notebook just goes through a few
steps of how you can pre-process your
images to be in a format
like this.
So that was becoming one with the data.
Visualize, visualize, visualize. The
next video we'll go over step number
two, which is pre-process the data.
We'll see how we can do that.
[Music]
Welcome back. So, in the last video, we
became one with the data. We visualized
a bunch of random samples, checked out
our steak and pizza classes. Now it's
time to prepare our data to be used with
a model. And this is I'm not just
pulling these out of the air, by the
way. These are like these are actual
steps within steps with modeling with
TensorFlow. Like look at this beautiful
graphic. You're going to not pay
attention to this beautiful graphic. So
we're up to here. And I mean I know
these steps are a little bit different
to the ones we're going through right
now, but we still have to get our data
ready. And this what we're going to take
care of here is turn our data into
numbers. Neural networks can't handle
images directly. We need to make sure
all of our tenses are the right shape.
And we're going to scale our features.
So, normalize or standardize. Neural
networks tend to prefer normalization.
Again, also referred to as scaling.
Getting the values between zero and one.
Let's go back to our keynote. We're
going to put a little heading here. You
want to go be pre-process the data.
Prepare it for a model.
Wonderful. And so one of the most
important steps for a machine learning
project is to split data into a training
and test set. So if we come to our
files, luckily that's already done for
us. We could create a validation set,
but for now we're just going to leave it
at training and test. Beautiful. And now
that that's done for us, let's define
our
directory data set paths. So we have a
train dur is going to be pizza uh steak
train and the tester is
pizza
steak test. Now of course depending on
where your uh data is stored if it's
even called pizza or steak you might
want to change this. But in our case
this is our training directory. This is
our testing directory. So we'll go
there. And now I'm going to write this
one down. And that's our next step is to
turn our data into batches.
Now we have spoken of this concept of
batches in the past but we haven't quite
exactly seen it. I'll just give you a
quick definition. A batch is a small
subset of the data set that a model
looks at during training. So rather than
looking at all 10,000 examples at at one
time or 10,000 images at one time and
trying to find patterns, a model might
only look at 32 images at a time. So
I'll write this down. So a batch is a
small subset of data rather than look at
all
10,000 images. Remember 10,000 this is
just an arbitrary number. I think we're
working with about 1,500 training
images. So you could insert that if you
want at one time. A model might only
look at 32 at a time.
Now it does this for a couple of
reasons. Does this for a couple of
reasons. Number one is 10,000 images or
however many you have. You might have a
million images. So, a million images,
10,000 images or more, potentially a
million or even more than a million,
might not fit into the memory of your
processor
GPU. In our case, Collab has been so
kind to lend us a GPU. We can check out
which one we're using by running Nvidia
SMI.
So, we have a Tesla T4, which is a
pretty darn powerful GPU. It's got close
to 16 GB of memory. So, that's actually
a fair bit. But remember, we have to fit
in our models, patterns, and whatnot and
all different things in there. So,
who knows how many images it can hold at
one time, but it certainly can't hold a
million images. So, that's why we look
at 32 at a time. and trying to learn the
patterns in 10,000 images in one hit
could result
in the model not being able to learn
very well. So, what it does, it'll look
at 10,000 or 32 images at a time and
then calculate how it can improve on
that 32 and then another section of 32
and then another section of 32 until
it's through all of the training data
and then add it all up and that would be
one epoch. And now you might be asking
why 32. And I'm going to say because 32
is good for your health. You're like,
"Daniel, are you a are you a mad man? 32
is good for your health. Is this like 42
being the answer to the universe?" Well,
no. If we Google Yan Lacon batch size,
Yan Lun on Twitter
and we'll zoom in here. Training with
large mini batches is bad for your
health. More importantly, it's bad for
your test error. Friends don't let
friends use mini batches larger than 32.
Wow. Okay. Thank you for that vital
piece of information. Yan. You might be
wondering, well, who's Yan Lun? Well,
Yan Lun is the founder of convolutional
neural networks, and he's also a
professor at NYU, chief AI scientist at
Facebook, researcher in AI, machine
learning, and ACM touring award laurate.
Far out. The deeper you get into deep
learning, the more you're going to learn
about Yan Lun, one of the pioneers of
the field. Anyway, of course, 32, that
tweet you can consider little bit of a
meme, but it actually does hold some
truth because 32 is actually possibly
the most commonly used default value in
many deep learning libraries. But then
again, of course, like many different
hyperparameters uh in deep learning. You
can change it to suit your needs, but
for the majority of what we're going to
be working with, we're going to use a
batch size of 32.
Now, that was a fair bit of explanation.
Let's get back to writing code. So the
first thing we're going to do to start
turning our data into batches and
pre-processing it for our convolutional
neural network is we're going to create
train and test data generators and
rescale the data. This is another
important concept you'll often come
across in many deep learning libraries
including especially TensorFlow is
efficient ways to load your data into
memory so that a neural network can find
patterns into it inside it sorry in it
or into it anyway you you know what I'm
talking about so what we're going to do
is go from TensorFlow cars
pre-processing
do image import image data generator and
then we're going to go train _ data
genen. Uh we'll have a look at what the
image data generator class is in a
second. I just want to finish writing
this code before we jump around to
somewhere else. 255.
And then test data gen is going to be
image data generator rescale = 1 /
2.255.
So that rescale parameter here is
basically saying when you load our
images, divide all of the pixel values
by 255.
Cuz if you remember what we did back up
here when we inspected our data, we
loaded in an image array and it came in
values like this, red, green, blue pixel
values of between 0 and 255.
However, we wanted them between 0 and 1.
So that's why we divided them by 255
point uh with a decimal so it turned it
into a float. Let's go back down to
where we were. You might be able to hear
I'm not sure. I think my MacBook is uh
is telling me I'm using too much
processing power. So there's a slight
fan in the background. I apologize, but
I'm sure you'll be able to hear my voice
over that. Let's check out the
documentation for image data generator.
Of course, we could look at the dock
string by pressing shift command space.
There's a few. I I often find this a
little bit hard to read. I'll often
scroll through it and try to find one
little thing, but usually I just look up
uh TensorFlow image data
generator.
Then we come in here. There we go. Image
data generator. I find this a little bit
easier to read. Whoa.
we find there's a lot of different
parameters that we can pass to the image
data generator. So we won't go through
all of these. We're just going to go
through some of the most important ones
that we're going to actually be using.
If you want to go through each and every
one of them, you can go through the
documentation here. But if you see this
rescale parameter, that's the one that
we've used. So if we come down find it
here, the rescaling factor defaults to
none. If none or zero, no rescaling is
applied. Otherwise, we multiply the data
by the value provided after applying all
other transformations. Might be
thinking, what are all these other
transformations? Well, we've got
vertical flip, horizontal flip,
shear range, zoom range, width shift
range, height shift range. Now, what is
going on? Well, generate batches of
tensor image data with real time data
augmentation.
Wow. Now, we haven't covered what data
augmentation is, but we will cover that
later on. Right now, we're just focusing
on creating batches of Tensor image
data. So, that's what we're focused on.
We've spoken about what batches are. And
when we load our images in, TensorFlow
or specifically the image data generator
class is going to automatically turn our
images into Tensorage data. So, let's
come back. Has this got a This one
doesn't have a batch size parameter cuz
a lot of TensorFlow data loaders have a
batch size parameter. I believe the next
function or the next class we're about
to use the method sorry we're about to
use has a batch size function. So let's
create that shift and enter. Now we need
to So this is a image data generator
class. It's going to automatically
rescale our images using that function.
Now we need a way to load them in from
our directories.
So let's uh load in our image data from
directories and turn them into batches.
So we can go train data equals train
data genen dot there's a method here
called flow from directory and we're
using flow from directory cuz our images
are in directories. So we'll go there
and now within this flow from directory
method is four main parameters that
we're going to look at. So directorying
it's going to be train do and then
target size is going to be I'm just
going to type these out and we'll go
through them in the next video. Target
size is going to be 224 224. Again this
is adjustable.
If you want to you can look ahead to see
what these parameters mean in the
documentation. Class mode is going to be
binary. You might be able to guess what
that is. And batch size of course is
going to be 32 as we've discussed. Now
we also need one of these for the test
data. So I'm going to end this video
here before it gets too long. But if you
want to go ahead read up on the image
data generator class um and read up on
the flow from directory method, see what
these parameters here do. And if you
want create your own test data loader
using a very similar setup to this, but
you may have to change the directory.
it's going to look at. I'll see you in
the next video.
[Music]
Good news. Looks like my uh MacBook has
stopped screaming at me. Classic Google
Chrome. Let's uh continue on creating
our test data image data generator
instance. So, did you have it a go
yourself? If we want to create it, we
can go test data genen do.flow from
directory and then we're going to pass
directory. We actually don't need to cuz
that's the first parameter. We'll write
test there and then the target size is
going to be 224 224 good square image
again that's another parameter that you
could change if you want. The default
one here as you can see by the dock
string is 256 256. And we could also
change the color mode actually. That's
pretty cool. This is red, green, blue by
default. So if your images were uh blue,
green, red, you could flip it around.
You can find that out. Actually, that's
something you would realize is if you
were reading in images here and they
were really blue or really green or
something like that, the color values
might be mixed around. So a lot of
default libraries that read in images
read in red, green, blue. However,
that's just something to think about if
you notice that your images and you're
visualizing them are very blue. Have a
look at the color channels and see how
they organized by default. So, let's
come back down here. That was just a
little bit of an aside. Class mode,
we're going to set that to binary cuz
we're working with binary data. And then
the batch size by default in the flow
from directory method is 32. So, we
don't actually have to set that, but
we're going to do that anyway. So let's
just uh write here target directory of
images target
size of images and what does this say?
Arguments target size couple of integers
height width. So let's write that down.
Height width and then what does class
mode say? We have class mode one of
categorical. You could imagine what is
categorical if we're working with binary
sparse input. The default is categorical
but in our case we're working with
binary data. So type of data you're
working with. And then the batch size is
size of mini batches. When you hear
batches actually you'll also hear them
referred to as mini batches. So size of
mini batches to load data into.
Wonderful. We'll hit shift and enter.
And of course, we looked at the dock
string from within Google Collab for
this method. But if you wanted to, you
could go down here in the TensorFlow
Keras pre-processing image image data
generator class. Scroll down and you'll
be able to see the methods that are
associated with it. So, it's actually
got quite a few. Apply transform, fit,
flow, flow from data frame. So if you
did have images or file names stored in
a data frame, you could load them in
from that and if those files are sorry
those columns in a data frame pointed to
different files. We've used this one the
flow from directory just because of how
our images are stored. So depending on
how your images are stored, you may want
to check out the TensorFlow
documentation to see how you might best
load images. And here's our flow from
directory method. There's a bunch of
different things that we could go
through, but we've used some of the main
ones there. There's seed. If you wanted
to shuffle them in a or transform them
in a deterministic way, you could set
the seed. Otherwise, it's going to use
random. Beautiful.
Let's go back. And after running that,
we found 1500 images belonging to two
classes in the training data set cuz
this one went first. So that's 750
images per class. and 500 images
belonging to two classes in the test
data set. So that's 250 images per class
for the testing data set. Again, we
could create if we wanted to validator
equals dot dot dot and split some of
maybe our test data into this validation
data set, but for now we're just going
to work with training and test sets. So
now that we've turned our images into
data batches, let's have a look at what
they look like. So we can get a sample
of a training data batch by going images
labels. So this is a beautiful thing
about this method as well is because
we've stored our data how we've stored
it in the pizza steak test and train
style it's going to automatically infer
the labels of all the images in the
stake folder training images in a state
class. So let's see what this looks
like. So images labels equals train
data.next
and next is basically saying get the
next next batch of images
labels in train data.
Wonderful. And we'll check out len
images and len labels
32 and 32. Wonderful. So I say wonderful
a lot. We got another a word to describe
how beautiful this code is and learning
deep learning is. So this is because of
this little parameter here batch size
equals 32. So we have 32 images along
with their uh associated of labels. So
32 images 32 labels in each batch. Now
how many we've seen this before but
we'll go over it again. How many batches
are there? So we go len train data
47 and this is because we have 1500
training images divided by 32 it's going
to round that up. So instead of being if
the the total number of images or total
number of data samples you have doesn't
evenly divide it's going to just get
rounded up. So the last batch may have
only 15 images or something like that
whatever equates to uh equal 1500. And
now, how about we see if this has loaded
our images in and hopefully
automatically rescaled them for us. Why
don't we have a look at what an image
tensor looks like? So, we'll get the
first two images. So, images, we'll go
two here.
So, this is first two images of the
first batch. And then why don't we also
get the shape of the first image.
Oh, we need images.
Oh, look at that. So, here's the first
image. We can see that the pixel values
have been scaled to be be between zero
and one. So, this is red, green, blue.
Then it's going to continue on for
however long. This is the next image
here. Got the separator with the dot dot
dot. And then again this image value is
in a tensor exactly how we want it. And
the values have been scaled for us. See
this is this is how beautiful TensorFlow
is is if you can think of a need in deep
learning, TensorFlow has probably
implemented it somewhere. And so a big
need in deep learning is to load data
and to scale it and prepare it for a
deep learning model. With these two
functions, we can load all of our
training data to be ready to be used
with our convolutional neural networks,
which we're going to build shortly. Then
if we check out the shape, it's been
reshaped to 224
3. And then if we wanted to go check
another image, images, which one should
we get? Let's get number seven.
Images seven.
Wonderful. That's been normalized as
well or scaled. And then we can get the
shape of that one.
Where'd it go? It's also been resized to
224 2243.
Again, this is just a nice number to
have. You could reshape it to to
whatever shape you wanted. It's just uh
this is going to cover the majority of
our images. So, that's something you'd
have to look at as to what's the best.
That's another value you can tune
actually. So if you imagine if we
upscaled this to 512, our image our
images may contain more data, but that
means it also may take longer to compute
and find patterns in. So finally, let's
uh view the first batch of labels and
we'll go labels.
See what that looks like.
Okay, so they're ones or zeros. So we
can imagine we have zero for pizza and
one for steak. Now, that would take a
little bit of inspecting. I don't know
that for sure. It might be one for
pizza, one for steak. But as we go
through and uh build our model and then
evaluate our model's predictions, we'll
be able to reverse these into figuring
out what the labels actually mean uh
when we evaluate our model. So, with
that being said, let's uh have we gone
through the steps that we've need to so
far? Have we pre-processed our data?
We've become one with the data. We've
visualized, visualized, visualized. Now,
we've pre-processed our data. But we
prepared it for our model and the main
step here was scaling normalizing and
I'm going to add one here
and turning our data into batches.
Beautiful. So let's go on to number
three. We'll do that in the next video
actually. Create Oh, far out. Create.
Step three. Create a model. Start with a
baseline.
All right. I'll see you in the next
video.
[Music]
Welcome back. So, we've become one with
the data and we've pre-processed our
data into batches of images and labels.
So, now it's time make a little heading
here to create a CNN model. We're going
to start with a baseline. And have we
discussed what a baseline is? Well, if
we haven't, we will here anyway. So, a
baseline is a relatively simple model or
existing result that you set up when
beginning a machine learning experiment.
And then as you keep experimenting,
you try to beat the baseline.
So before we write some code, we'll just
return back to the keynote for a moment.
Won't be here long. We're focused on
writing code. So if we have a look at
the architecture of a convolutional
neural network, we have the inputs. We
pre-process them into our rig, green,
and blue color channels. They get
represented as a tensor, of course. And
then if this is our simple CNN, it might
only have one convolutional layer,
potentially a pooling layer. You know
what? It actually doesn't even need a
pooling layer. It could just be a
convolutional layer and then a fully
connected output layer. And then so that
could be our baseline. See how this
model performs on our data. And then we
try to improve to see if a larger model
will work better on our data. So in this
case, it's got three convolutional
layers and three pooling layers. So how
about we go for something in between
these two? We'll build our baseline can
be a convolutional neural network with
just three convolutional layers. And I
know we haven't fully discussed what's
going on with the convolutional layers.
We're going to get to that in a second,
but let's write some code first. So
we'll come back here.
Let's uh let's make the creating of our
model a little easier. We've been typing
out a lot of TensorFlow code that we
don't necessarily have to. So from
TensorFlow
carers optimizers,
we'll import atom and then we'll go from
TensorFlow.caras
layers import dense flatten com2D. What
else do we need? Well, we use MacPool.
We'll get that while we're here. And
also activation if we wanted to. Oh,
again typo. We can also get from
TensorFlow.caras
import sequential. This is going to save
us just a fair few amount of keystrokes
when we're building our model. So we go
here,
make some imports. Now let's create the
model. This will be our baseline
a three layer convolutional
neural network.
Excellent. So we go model 4 equals
sequential. We don't have to type
tfkaras this time cuz we've already
imported sequential. And then we'll go
back over here. And we don't have to
type tfkaras.layers.com2d
cuz we've already imported it in the
cell above. So we can actually just
delete all of that. And we can write com
2D
filters. I'm going to set that to 10.
Again, we haven't discussed what filters
is, but if you want to, again, you can
look up or you might actually know what
filters is if you did the homework, if
you read the CNN explainer website cuz
they talk about what filters are there.
But I'm just going to breeze through
these and type them in. I'm going to go
through them in a moment. Padding and
the activation can be activation. I
spelled that right? ReLU.
And then the input shape is what's our
input shape. We've created it already.
What's the size of our image tensor? And
look at this. This is beautiful. We're
getting a filters kernel size strides
paddings
or padding. Padding is not a plural.
Input shape is going to be 224. And why
are we putting the input shape here?
Well, because it's the first layer in
our neural network. Often times you have
to tell your neural network. A lot of
the time actually it can infer the input
shape but a lot of the time you also
have to tell it what the input shape is.
And where do lots of errors happen with
our neural networks and it's mixing up
tensor shapes. So that's why we just
tell it hey you're getting images of 224
uh 2243
activation
equals rel
and again this is filter kernel size
strides by default is equal to one. So
we don't actually have to set that. And
padding is also valid by default. So we
don't have to set that either. Now I'm
well aware we haven't discussed any of
this, but I just want to get through
building this model. So then we can go
back through and talk about it. All
right. Activation equals rel. And then
we're going to flatten the outputs here.
So we can just pass flatten there so
that our dense layer, our fully
connected output layer gets a big vector
as its input. sigmoid. Wonderful. So
there's our model 4. Check that out. So
I'm going to write here this is the
input layer. Specify
input shape. And then this is the output
layer cuz it's the final layer. Output
layer working with binary
classification.
So only one output neuron.
Beautiful. All righty. So
there's our model. There's our simple
baseline model. Now, why is it important
to make a baseline is we'll write a note
here.
Or maybe we'll put it up here.
There we go. Note. So, a common question
you're going to ask all the time is,
Daniel, how do you know what
architecture to use? Well, in deep
learning, there is almost an infinite
amount of architectures
you could create. So, one of the best
ways to get started is to start with
something simple and see if it works on
your data and then introduce
complexity as required and then eg look
up which current model is performing
best in the field for your problem.
So
that's an important point here is that
just to get familiar with with your
data, just to make sure things are
working. It's always good to just start
with something simple like we've built
here. This is a relatively simple model.
It's three layers throughout your deep
learning journey. We're actually going
to get familiar with this in a later
module. You may be working with neural
networks that have over 200 layers. So
we we'll get to that when we come to
that. But uh if we're working on a
problem, we're working on what are we
working on? image classification. So
because there are so many different ways
that you can create deep learning
architectures, you start with something
small, something that you can control
and then research something say which
model or which deep learning model is
performing best on image classification.
So that could be through a Google search
or you could go somewhere like uh what's
it called? Papers with code. It's one of
my favorite websites.
browse state-of-the-art. Here we go.
Now, we're not going to dive into this
for the time being because I want to
just get focused on writing TensorFlow
code, but in your free time, you can
have a look at what's the
state-of-the-art. There we go.
State-of-the-art for image
classification. That could be some
homework before the next video. So, I'll
let you check out the state-of-the-art.
And in the next video, we're going to go
through what's happening in this model.
I'll see you then.
[Music]
So, we've got our baseline model ready,
model 4. It's just waiting to be
compiled and fit on the data, but we've
put in these convolutional 2D layers. We
haven't quite really explained what's
happening here. So, we've got a few
parameters, filters, kernel size,
strides, padding, activation, input
shape. and activation and input shape
are probably the two we have the most
familiarity with. Let's discuss these
other ones of what's happening here.
Now, we're going to approach this from
two different angles. We're going to hit
the keynote. We're going to break down
the comp 2D layer in a textbased way.
So, we have some example code. Here's
just a very similar line of code to what
we've written is TF caras layers comp
2D. We've got this one has 10 filters.
It has a kernel size of 3x3.
Stride is 1x one. Padding is same which
is the same as above or sorry this line
here is the same as the above. Uh TF
carers layers com 2D 10 filters. Kernel
size this parameter you can also pass
in. You don't need to pass it in as a
tpple uh because it is a square. Same
with strides. It doesn't need to be a
tpple. And then the padding is also the
same in this one. This could be padding.
Another one is valid as well. So let's
explain what's going on here. Remember
this is a textbased version. So a lot of
information coming at you and this is
pretty dense but then we'll have a look
at it visually in a moment. So the
hyperparameter name the filters what
does it do? Well this dictates how many
filters should pass over an input
tensor. Eg sliding windows. If you're
not sure what a sliding window is don't
worry we'll have a look at that in a
second. Uh sliding windows over an
image. So if we set com 2D to have 10
filters, there's going to be 10 filters
per layer going over our input image.
The typical values here are maybe 10,
32, 64, 128. The more filters you have,
the more complex your model. Just like
with our dense layer, the more input
neurons we had. Remember we made a dense
layer before with four neurons here.
When then we made one with 100 and it
had a lot more parameters. Similar type
is set up here. So the higher the value,
the more complex the model. Kernel size,
also called filter size. So this kernel
size here, 3x3, is going to determine
how big a filter is. So if this is 3x3,
then the filter, we'll see what that
looks like in a moment, is going to be
3x3. So this determines the shape of the
filter. So the sliding window over the
output. And then we have typical values
of 3x3 or 5x5 or 7x7. And in this case,
lower values learn smaller, more fine
grain features and higher values learn
larger features. So you could imagine
this is for our pizza problem. A lower
value, a smaller filter uh might learn
the toppings of a pizza whereas a larger
uh filter size might learn the edges of
a pizza. So something larger. And again
this is we actually don't define what uh
these filters learn as they pass over an
image so that the neural network figures
that out itself. The hyperparameter name
padding if we go up here. So this pads
the target tensor with zeros if it's the
same to preserve the input shape. So
keeps the amount of um information in
the input or it leaves it in the target
tensor as is. So if valid. So this
lowers the output shape. So if you want
to hold more information in your input
tensor, you'd leave it as same. But if
you want to keep compressing the amount
of information passing through each
layer, you would set it to valid. So
typical values, same or valid. We'll see
what these look like visually in a
moment. And then strides. Strides is the
number of steps a filter takes across an
image at a time. Eg. If a stride equals
one, a filter moves across an image one
pixel at a time. So the typical values
for striding is one default or two. And
again, you can change these to however
you want, but these are just some of the
the typical values you'll see most use
in practice. So this is a textbased way
and it's quite hard to understand what's
going on. So that's why we're going to I
hope you've done your homework and
looked at the CNN explainer website, but
if you haven't, that's okay. Let's have
a look at it together.
See, it's so good. I've already got it
up here. So, let's choose an image
related to ours. The CNN explainer
website. Learn convolutional neural
network. CNN in your browser. This is an
amazing website. Possibly the best uh
interactive convolutional website on the
internet. So, here we go. We have image
input. This is pizza. Now, the input
here is 64x 643. So, in their case,
they're using 64x 64 images with three
color channels. You can see the color
channels here, red, green, blue. And now
their first layer is a convolutional
layer that takes an input 62 62 10. So
that means there's 10 filters in this
convolutional layer. We go 1 2 3 4 5 6 7
8 9 10. Now what does that relate back
to? Same with us in our com 2D layer.
Ours has 10 filters. Beautiful. Let's
click on one of these and see what's
happening. You got to be careful with
this cuz this is moves fast. This is the
sliding window I was talking about. This
is a filter or a kernel.
So there we go. Slide kernel over input
channel to get intermediate result. So
you can move it here. Oh, look at that.
So as we move across the image, can you
see that little square? It's sliding
across. And what's happening is it's
performing. So there's a little 3x3.
Look at this. How cool is that? So, if
we This is what's happening. This is a
sliding window. This is one of our
filters. So, if we slide this across,
it's going to go across the image. It's
going to output some sort of value on
the right. And then it comes down and it
goes across here and it goes across
here. La keeps going right across. And
you can go through this and explore it
yourself. I'd highly recommend it going
through. But let's see what's what's
what are these values. Do you see those
values in the middle?
It's important to recognize that those
values in the middle aren't actually
determined by us, but the kernel size
is. So, if we were to set this to be
5x5, instead of being a 3x3 square, this
would be 5x5. So, if we come back,
this is the kernel size.
So, this is 3 3x3. So, right now we're
looking at the kernel.
And if we come over here, if we click on
this, you might see up in the top, this
is quite hard to see, but it comes up in
the top right. If you hover over this
little square here, you can see uh a
vector with kernel weights. So this is
the value that the kernel gets
initialized with when it starts off in
in a convolutional layer. So these could
be completely random numbers. And then
as we get out of this layer, as our
model
passes the input data through layer by
layer, these numbers get adjusted. So if
we come right down to this convolutional
layer here, wo, look at that. The kernel
weights are different. So see up in the
top right where the blue comes up. So
that's what we've covered so far. We've
covered filters and we've covered
kernels. So when we set up there's going
to be 10 filters. That means for each
one of these it's going to pass through
10 different filters. And for each
filter there's going to be a different
kernel and the kernel the size of the
kernel is going to depend on what we've
set it here. And this three is the same
as being 3x3.
So now let's have a look at strides and
padding. We come up here. We scroll
down. Now if you want a textbased
explanation of what's going on in a
convolutional layer, I highly recommend
reading that. You can see an example
there. It really does take a fair bit of
playing around to understand what's
going on in a convolutional layer. But
now let's understand the
hyperparameters. So if we have an input
size, this this is going to be the input
size of our image. In this case is a 7x7
image. You can see this red square
moving around. That's our kernel. You
see the kernel size here. I'm going to
dial that up to three. So we got a 3x3
kernel. Now if I want to Here's what
we're looking at. padding and stride. If
we come back to our code, we have
strides equals one, which is the same as
we could just set it like this, but I
just like to see one number there. And
our padding is set as valid. So, let's
inspect what the different values of
strides and padding do. We'll come back
to the CNN explainer website. So, this
is our input shape. Right now, padding
is zero. So, we say padding here.
Padding is often necessary when the
kernel extends beyond the activation
map. This another word for activation
map is this whatever's going on here.
Padding conserves data at the borders of
the activation map. So if you imagine
let's increase this to padding equals 1.
We go two. Did you see what happened
there?
After padding the shape increases to 9
by 9. So you see now how we have a
border around here that's gray. This
border
adds
a blank pixel around the outside. So
that means if we had information in our
image that was close to the edge, we
might want to use some padding. So that
way it preserves the output shape. So
see here, input 7x7, the light gray
squares on the inside of this big square
here. The output stays the same because
we've increased padding to one. But if
we decrease it, the output gets
compressed. So we might lose some
information around the edges because you
see here this square gets compressed
into one square on the right this square
the next one the next one the next one
the next one. So let's go down to here
stride. So in our case sorry here if we
set padding to zero that's the same as
setting it to
same. So there's no padding. So the
input the output size I'll write down
here if same
output shape is same as input shape. If
valid output shape gets compressed
we go set it to valid. Now we go here.
Now you might be asking which one should
I set it to valid or same? It's by
default it's going to be valid. Again
this is up to experimentation. and see
which one works best for your problem.
Let's come back and we'll check out the
stride parameter. So right now stride is
one. So that means when this filter
takes one step, it's going to compress
the output to this one square. Same
thing if we go another one step, one
step, one step, one pixel at a time. But
if we increase the stride to two, did
you see what happened to the output
shape?
And then even if we increased it right
up, we can't go any higher than five. So
what's happening when the stride goes to
two? So if we come here,
oh the red square on the left moves two
pixels at a time. So that means we're
compressing the information in the the
input into a smaller space on the
output. So we keep going, keep going,
keep going. Now again, this will depend
what you set this hyperparameter to
stride will depend on whatever problem
you're working on. If your images aren't
uh informationally dense, you may get
away with setting stride to two. If not,
you can keep stride at one. So there we
go. There across across across.
Wonderful. So we've now covered what's
going on. Filters is a number of sliding
windows. Remember a higher number. Let's
write this. Filters is the number of
sliding windows going across an input.
And then higher equals more complex
model.
Kernel size is the size of the sliding
window going
across an input.
And then strides is um the size of the
step the sliding window takes across an
input. And so what's happening here is
the convolutional 2D layer is trying to
if we go back up to the top here is
trying to go across an image. And if we
click on the kernel
is trying to go across an image and
detect edges, detect features. So this
one here we got maybe that's a piece of
pepperoni or something. Detect different
parts. So that's what it's trying to
detect as it goes through different
layers. And so by the end, we've got
an input that is being compressed into a
small feature space. So look, I can't
even tell what's going on here. And
that's another very key point. This is a
max pooling layer. We haven't quite
touched on that yet, but that might be
something you want to skip ahead to. So
the important point here is that
although this is a very visual
representation of what's what's going
on,
most of this isn't defined by us. The
only thing that's defined by us is the
number of filters going through. And
remember, if we wanted to increase these
filters, we could have a more complex
model. And the size of the kernel that's
uh sliding across, whether there should
be padding or not, and how big the
stride is of the kernel sliding across.
The rest is all figured out by the
model. In other words, what features it
should learn, what numbers it should set
the kernel to, uh the different values,
and all of the filters and so on. So,
we've covered a fair bit of ground in
this, but it's going to be a little bit
more uh homework, and that is to go
through this website to play around with
it for another 10 to 20 minutes. Make
sure you you read through it, and most
importantly, write some code, try to
change these different values, and step
through your convolutional 2D and what's
going on here. We can even check, we
have to compile it first. We'll do that
in the next video. But that's the
homework is to um
practice
is understand what's going on in a com
2D layer by going through the CNN
explainer website for 10 to 20 minutes.
So that's your homework.
You'll be able to find the links where
you find links. But in the next video,
we're going to compile this model. You
might want to try and go ahead compile
this model to work with our binary
classification data. Then we're going to
fit it to the training data. But well
done if you made it through this video.
We covered a whole bunch of concepts.
And of course, there's always a a little
bit more to what's going under the hood.
But for now, we've we've got enough to
continue building convolutional neural
networks.
[Music]
So we've built a model and we've gone
through what's happening in the in each
of the layers. Essentially when the
input image goes in these COMD D 2D
layers are trying to learn the most
important features of our images. And
again we do not define the features.
These are learned or the features are
learned by these layers. What's the next
step after we've created a model? We
have to compile it before we can fit it.
So let's uh compile the model. Model 4,
we just have to compile.
So the loss is going to be binary
cross entropy. Why is that? Because
we're working with a binary
classification problem. Our data is
pizza and steak and optimizer. See, this
is a problem with working with food data
sets is it makes you hungry. I just said
pizza and steak. And now I feel like
mistake. And there we go. So optimizer,
we've set it as Adam. Remember, we can
do this. We don't have to write out TFK
as optimizer. You might want to if you
you can if you want to, but since we've
imported ADOM just up here, we don't
have to write it fully out. So we'll
compile it there. Wonderful. And now
we're ready to step four. Four is fit
the model. So, oh, we want to change
this to markdown. Fit the model.
Beautiful. I just want to give you a
little demo of get a summary of our
model of how
the input shape changes throughout.
So, if we see here, what's happening is
we have an input shape of 222.
Um, this is because we've set padding to
be valid, right? If we wanted it to be
the same, we'd set uh we'd set it to the
same. So, it starts off as 224 224 uh
for these two values. And then as it
goes through each convolutional layer,
it loses the edges. That's again because
the padding here is valid. So, that's
just like on the CNN explainer website.
As it goes through layers, the input and
output shape slowly gets smaller as it
goes through the convolutional layers.
So if you imagine the image starts as a
square, that square is slowly
progressively getting smaller towards
the output layer. And the reason we'll
see it later on, we haven't got any max
pooling layers here. So this this
convolutional neural network has a lot
more parameters than the first one we
built, but we're getting a little bit
sidetracked here. Um, if you want to
dive more into that, I'll let you go
ahead, but we're going to focus on
fitting our model. So to begin, let's
check the lengths of training and test
data generators. We created these
before. This is len training data. These
should both be in uh batches.
So we've got 47 batches of um training
data and we have 16 batches of test
data. Remember our batch size is 32. So
now we're going to fit the model.
It's going to be we saw these when we
went through the end to end example
earlier but there's two parameters here
that we haven't quite looked at train
data and then epoch is going to be five.
Remember train data this is a
combination of labels and sample data.
So we don't need to pass x and y. uh
TensorFlow the fit method is smart
enough to figure out when we pass our
train data block it uh it is a
combination of X and Y. We'll run it for
five epochs just so experiments are nice
and fast. And then this is steps per
epoch equals len train data. So this is
telling TensorFlow to step through the
train data or each epoch take 47 steps
because we want it to go through all of
the batches. And then the same with the
validation data.
That's not how you spell validation.
Validation data is going to be test
data. Little confusing there. We just
want it to validate at the same time as
it's training. validation steps equals
length test data. Oh, this is getting
exciting. We're getting so close to
fitting our baseline convolutional
neural network. And of course, if we
wanted to, we'll go the dock string
there, but you can also look this up in
the TensorFlow documentation. There's a
fair few things going on here. What does
it say for where's steps per epoch?
There we go. The default is none. Same
with validation steps. So if we come
down what steps per epoch tell us
validation data shuffle class weight
sample weight lot of parameters here can
dive into those if you feel like it
steps per epoch total number of steps
batches of samples before declaring one
epoch to finish and starting the next
epoch. Okay. When training with input
tenses such as TensorFlow data tenses,
the default none is equal to the number
of samples in your data set divided by
the batch size or one if that cannot be
determined or one if that cannot be
determined. Uh if X is a TF data set and
steps per epoch is none, we can keep
going. Okay, so we'll let this run.
We'll set that up.
Oh yes, our baseline is running now.
Now, this should be pretty quick as well
because we've gone runtime, change
runtime type, hardware accelerator, GPU.
I'm going to skip the video ahead until
it's done and then uh we'll inspect
what's going on.
Beautiful. So, that took all of about 45
seconds. Um we see okay, not too bad for
a first try. Now, it's not performing as
well as model one because it's uh I
mean, what was model one's results? Can
we still have that? Is model one still
saved? Model one evaluate. We should be
able to evaluate it on the test data,
right?
Model one evaluate test data.
Okay, so model one has an accuracy of
about 88%. And if we go model one
summary,
it's because we've got a combination in
here of convolutional. So two
convolutional layers, max pooling,
convolutional, convolutional, max
pooling. So that may hint that our
baseline model here, model 4 may improve
if we add some convolutional layers and
some max pooling layers. So exit out of
that. We've trained a baseline model.
Right now we know it's its accuracy
score on the test data set. So if we
wanted to further evaluate it, what are
some classification metrics that we
could do? What are some other ways that
we can evaluate our model? I want you to
have a think about that cuz that's what
we're going to look through in the next
video is evaluating our baseline model.
[Music]
By the results we got from the last
video, it looks like our baseline model
is learning something. Remember, with a
binary classification problem, depending
on how many samples of each class you
have, guessing is usually around 50%.
But ours is getting well over that. But
of course, we'd always like to improve
it. But before we do, let's um let's
evaluate it. So did you have a think
about what are some ways we can
evaluating our model
and all right it looks like our model is
learning something let's
evaluate it so did you have a think
about some different ways that we can
evaluate our model I mean one way is to
check out the training curves another
way of course if we go back to our
keynote is uh some common classification
evaluation metrics we've used the
accuracy. So that's built into our
compile method. We come up here. We've
used accuracy here. But there are some
other ones we could use. Precision,
recall, the F1 score, confusion matrix.
Usually if we wanted to use any of
these, we'd have to make some
predictions and then compare them to the
ground truth labels.
But let's go with the the plotting the
loss curves option. So because we've
saved our model's fitting history to
this variable, we can use the history
object within that to plot our model's
training curve. So we're going to need
pandas as PD cuz it's just really easy
to uh create a plot if we create a data
frame. Um
we'll access the history of model 4. So
history 4.
Then we're going to write plot and we'll
set up the fig size to be my favorite
hand in poker which is 107.
Wonderful.
So we got here I'm just going to write
here. Let's plot the training curves.
Oh, so what's this telling us? So this
is the epochs we have here. So a model
it starts off the losses starts really
high and then starts decreasing. That's
what we want cuz remember loss is a
measure of how wrong our model is. The
higher the loss value, the more wrong
our model is. But it seems as though
which one's this? This is accuracy. So
this is hard to look at four different
curves on one plot. This accuracy, the
training accuracy is going up. The
validation accuracy basically sits at
the same all the way across. Same with
the validation loss. So let's um this
plot's a little confusing to me. Let's
let's separate these into separate
curves.
So, so we should plot the validation and
training curves separately.
This is another real important concept.
Like if you if you're not sure of like
visual when you visualize something,
that's the whole idea of the data
exploration motto, right? Is visualize,
visualize, visualize. But if the
visualizations that I'm making or you
see someone else creates, you think,
"Oh, these don't really make sense. I'd
like to improve them to my own style."
That's something you should be a big
advocate of or a big uh experimentter
with. Remember machine learning
practitioners motto is experiment
experiment experiment. So we're going to
create a little helper function here. We
will give this one a dock string because
uh we're good Pythonistas. So returns
separate loss curves. That's a that's a
simple way for training and validation
metrics. Now of course we could have
actually What could we have done to have
this automatically done for us? We could
have created
callbacks equals tensorboard
call back. We could have made a
tensorboard call back. We did that in a
previous video. So that may be a little
extension to this notebook, but for now
we'll uh push through with plotting our
own loss curves. So we're going to set
up the training loss first. can be
history dohistory
loss and then it can be the val loss.
We'll do the loss. We'll do the losses
first. History dohistory.
Um, we want val loss. So, what we're
doing here, I know I'm jumping around a
little bit, but I just want to tell you
where I'm getting these variables from.
So, these variables are logged.
We're just taking them from here.
They're just going to all be the same as
what we're taking from these. So they're
all these are like in a table sort of
format but we want to visualize them
like this. So we're coming down here
now. Let's get the accuracy. So the
accuracy is going to be history
dohistory dot oh sorry we'll index the
accuracy and then the val accuracy
history dohistory
val accuracy
and then we can set epochs can be range
len how long is our history object
history
and we'll get one of the items from
there
how many epochs did we run for and then
we want to plot the loss. So let's do
the loss first. Plot loss plt.plot
epoch on the x-axis loss the training
loss on the y-axis and then the label
that can be the training loss wonderful
and then we'll go plt.plot
epoch we'll plot the val loss
we'll get both loss curves on the one
plot val loss and then plt.title title.
This can be just nice and simple. Loss
plt.x label. Type an X as if you'd say
it. And then this can be epoch. And then
we do want a legend so we can tell what
curve's going on where. So that's going
to be loss. Oh, no. I like that entered.
And then we also have to plot the
accuracy.
Plot the accuracy. We could actually I
know I'm a big advocate for just writing
the code out, but let's just copy this.
save ourselves the mundanity of
rewriting it all. We're just going to
change this to accuracy. Change this
to accuracy as well.
Accuracy. Beautiful. And then we want
also the title can be accuracy. Now,
it's also probably a good idea if you
find yourself writing a lot of these
different like helper functions, you can
uh save these to like a script called
helper_functions.py
Pi or something, upload it to GitHub,
and then just import it every time
you're you want to run some sort of
notebook experiment. That way, you're
not rewriting the same helper functions
over and over again. We'll probably look
at doing something like that later on
cuz I think this one might come up at
some point again. But in the meantime,
let's check out check out the loss and
accuracy of model 4. So, plot loss
curves history
four. Is this going to work? Oh, why
didn't that work? I want it to be on
separate plots. Oh, you know what we
need to do here is plot figure. We need
to create a new figure. Every time you
want a new figure, write plot.figure.
So, we just kind of recreated the same
plot we have here with a whole bunch of
code and all because we didn't include
this one. So let's run this.
Oh, we need to rerun our function.
Classic Collab notebook or Jupyter
notebook. There we go. So the loss. Ah,
okay. This is much easier for me to
understand cuz there's only two curves
on each plot. So there's a real key
thing going on here. The validation loss
is the training loss is good. It's going
down. That's what we want. But the
validation loss doesn't really do much
here. So this is this is probably an
important note. I'm going to write this
one down is uh let me get the key key
note because this way we'll have a
reference to it and it's not just me
saying how do we describe this. So when
a models put this in bold validation
loss
starts to increase. So our validation
loss isn't quite increasing but it kind
of it's trending that way. It's
certainly not decreasing. and put it
that way. But when a a model's
validation loss starts to increase, it's
likely that the model is overfitting.
This is another key term actually
overfitting.
This is overfitting the the training
data set. Overfitting is a problem
you're going to see a lot and we're
actually we'll explore different ways to
uh to prevent overfitting. But
overfitting this means it's learning the
patterns in the training data set too
well. Right? And thus the model's
ability to generalize to unseen data
will be diminished.
Did I write that right? Deminished.
Okay. So what's happening here? The
training loss going down. Good. Model is
learning on the training data set.
Validation loss not going down. It's
probably starting to go up. So it's
likely that our model is overfitting the
training data set. So that means it's
it's learning the patterns too well in
the training data set. Wow, this
sentence actually makes sense. And thus
the model's ability to generalize to
unseen data. So the validation data,
remember the model hasn't seen that.
It's trying to learn the patterns in the
training data so that it can recognize
the test data. So this is what you want
to have a look out for is when your
model's starting to train and the
validation loss starts to go up, there's
something going on. Your model is
overfitting.
So we're going to have a look throughout
the next uh few lectures um or next few
videos and probably throughout the whole
course actually is ways to prevent
overfitting. That's a big part of uh of
deep learning and machine learning in
general. And if you want a little
analogy, it's like if you were remember
our do I still have this in the keynote?
Yes. Remember our three data set slide.
So possibly the most important concept
in machine learning. Overfitting is like
learning the course materials far too
well when you're studying at university.
So, if you just learn the course
materials off by heart, but then the
final exam comes along, in other words,
the test set and you you can't do
anything on the test set because it's or
the final exam because all you've done
is memorize the training set and the
final exam, your professor's like, haha,
I'm going to put this this tough
question on there and it's going to be
really hard for the for the students to
get it. But if they've learned the
principles, the patterns in the course
material, they should be able to figure
it out. That's what we want our machine
learning and deep learning models to do.
We want them to generalize to unseen
data. And when they've learned the
training set far too well, it's referred
to as overfitting. If they've learned
the training set poorly, so they haven't
even gone well, the loss isn't
decreasing on the training set. That's
often referred to as underfitting. So,
without further ado, let's uh or
actually, if you wanted to, you could
keep evaluating our model. This is step
five. Um, we've only looked at the
training curves. If you wanted to figure
out its accuracy and uh, precision and
recall, I'll let you do that. But what
we're going to do is in the next video,
or next few videos actually, is adjust
some of the parameters of our model and
see if we can combat this overfitting.
I'll see you then.
[Music]
Welcome back. Now you might notice I put
a little note here. One of our keys is
that ideally these two loss curves here
or the accuracy curves training and
validation will be very similar to each
other. That means that if the training
loss was decreasing like so ideally so
would the validation loss. It would be a
very similar curve here. It's when you
have these large discrepancies between
the loss curves that your model may be
overfitting or underfitting. And in our
case, if the training loss is decreasing
but validation loss is increasing, it
tends to to show that our model is
overfitting. In other words, learning
the training data set too well and then
not generalizing to unseen data. But
this isn't actually a bad thing. In
fact, let's let's create a new heading.
That's what we're going to start working
with now is six. adjust the model
parameters.
So fitting a machine learning model
comes in three steps. And now of course
there are many different steps but we're
breaking it down. We're we're we're
simplifying it here. So zero step zero
we'll start from index is create a
baseline. So something simple a simple
model to just see if it works on your
data. Number one is beat the baseline
by overfitting a larger model and then
two after that reduce overfitting.
And so you might be wondering
okay what are some ways we can induce
overfitting. So let's uh let's write
that down. So ways to induce
overfitting. Now we've seen some of
these before. One of them could be
increase the number of com layers. So in
other words, make our neural network
deeper. The other one could be increase
the number of com filters. So if we come
back up to where we created our com
layers, filters is the first parameter.
So filters here, we could increase this
from 10 to say 32 or something like that
or even 64. This number is is really
arbitrary. You can make it as small or
as large as you like. will really just
depend on your problem. We come down
here back to where we were. And then
another way would be add another dense
layer to the output of our flattened
layer. And there's a few more of course.
However, really these are probably the
main like knobs you're going to turn
when increasing or inducing overfitting
is increasing the number of layers in
your network or increasing the number of
hidden units or filters in terms of a
convolutional 2D layer within those
layers. And then if we wanted to reduce
overfitting um one way we could be we're
going to have a look at these throughout
the next few videos by the way reduce
overfitting could be add data
augmentation. We haven't looked at this
yet. Um, add regularization
layers such as max pool 2D and add more
data. So on and so on. So we'll we'll
we'll dive a bit more into these as we
go on, but for now we're going to set up
let's take on this one. Actually, if our
model is already overfitting, I think
it's a good time now to take down the
the Maxpool 2D layer. So what we'll do
is we'll we'll build it first. We'll
build the model first and then we'll
discuss what's going on. So create the
model.
This is going to be our new baseline.
So we want model 5 equals sequential
because we don't have to type out TF
carers. We've already imported some
stuff. And then we're going to go com 2D
pass it 10 three number of filters is
10. The kernel size is three because
we're just not using these parameters
there. We're just saving ourselves some
keystrokes. Activation of course is
going to be re and because this is the
first layer in our model we need to
define the input shape don't we? 224
2243.
And then we can go here. Now we're going
to introduce a new layer and that's the
max pool 2D. We're going to write the
whole network size before. We could
actually just leave this pool size as
22, but uh I'm going to just type in
this parameter here. We'll type out the
whole network first before we dive into
what max pool 2D is doing. If you want
to jump ahead, you can pause the video
now and check the documentation for max
pooling. But let's add another
convolutional layer. And this might
start to look familiar to the colored
block addition that we we described in
the keynote earlier. So max pull two. We
don't actually have to type two. We can
just leave it like that. And then we're
going to add another convolutional
layer. 10 3 activation equals value. And
then we are going to add a final max
pool. So you're starting to see the
pattern we're doing here is we're going
com layer pooling layer com layer
pooling layer com layer pooling layer.
And that's a very standard approach that
you'll see in many convolutional neural
networks is to have a convolutional
layer, some sort of pooling layer or
some kind of activation layer and so on
and so on. It's like a remix of of all
the different layer types. Beautiful. So
if we look at this, where have we seen
something like this before? If we go to
our favorite CNN website, CNN explainer,
they've got comu comu max pool. So
they've got an extra convolutional layer
in between max pooling layers. But we're
starting to to really replicate what's
going on here. And then we hit shift and
enter on this.
Oh, I've got a typo. Activation
equals relu. Beautiful. So let's have a
look at the dock string of what happens
in max pull 2D. Max pooling operation
for 2D spatial data down samples the
input representation by taking the
maximum value over the window defined by
pool size for each dimension along the
features axis. Okay, so that's a little
bit of a text explanation. But what do
we do when we want to understand
something more? We go we go to visualize
click on the max pool layer here and if
we pick an a random spot. So maybe here.
Do you see what's happening with the
square in the middle? So this kernel
that we're using is you can see under
where my mouse is. It's 2x two. So it's
a four. It's a square of four. Then in
the middle there you have max brackets
and then a square that's four. And you
have 0.08 in the top left. 0.12 0.1
0.04. Then if we get the max of that,
what has it done? It's reduced the
number of squares from 4 to 1. So it's
taken the maximum value. You see how
0.12 is the maximum value. Now this is
what happens if you what is that going
to do if you start to reduce if all I'm
doing is taking
if I have a bunch of max pulling layers
and I take four squares and turn them
into one. What's it going to do? Have a
look at the difference between the input
shape on the left 60x 60 and the output
shape on the right. What's it done? If
I'm changing four squares to one for
every four squares,
it's harved the output. So what we're
doing here is we're condensing the input
to a smaller output. And the idea of max
pooling, if a convolutional layer finds
features in an image, while max pooling
finds the most important parts of those
features, if that makes sense. And so if
you look at the square I'm hovering over
now, we have max and then the top left
is zero. Then we have 0.09 on the right,
0 on the bottom left, 0.36 on the bottom
right. The max pooling layer has taken
the max value. So it's determined that
to be the most important feature out of
those four squares. So now let's go back
to our neural network code. What do we
do after we've created a model? Let's
compile the model. And then we want to
go model 5 dot compile. We just have to
do it exactly the same as before. Should
be really had some really good practice
at compiling models by now. Binary cross
entropy because we're working with our
what data set are we working with? Pizza
state loss equals binary
cross entropy. Optimizer is our faithful
atom. And then metrics of course is
going to be accuracy.
And now let's fit the model. We'll save
history 5 equals model 5.fit. The reason
why I'm going to fit it straight away is
just to see how it goes.
Right? We'll see the effect of what
these max pooling layers have done.
steps per epoch equals len train data
and then validation
validation data and then validation
steps is going to be len valid data.
Beautiful. We'll fit this. It shouldn't
take too long.
So I'm going to pause the video here and
then skip ahead. I'll see you once this
architecture is finished fitting.
All right, that didn't take too long on
my end. Um, so we got a validation
accuracy of nearly 86%. So, how does
that compare to our Model 4? Just off a
quick site value. We go back up here.
Model 4. Oh, wow. Just by adding those
max pooling layers, we've increased our
accuracy by about 5% on the validation
data set. And let's not leave it there.
Let's get a summary of our latest model.
So get a summary. I want to really
exemplify what's happening with these
max pooling layers of our model with max
pooling. We might go model 5 dot
summary.
The most important point here is to pay
attention to the shapes. So we have our
input here. We've got a convolutional 2D
layer. It's reduced the size by a pixel
around the border. So 22 224 uh 224 has
gone to 222 222. Then we had this max
pooling layer and what's it done? It's
halved it. It's halved the size. And
then again we've got another
convolutional layer. Reduces it by 2
pixels. And we have a max pooling layer
halves it. So 109 is halved to 54.
Convolutional layer again haved. Now
look what's happened. What have we done
here? The total number of parameters is
8,861.
What did we have for model 4? model 4
summary
477,000.
Now despite having what's that? That's
like 50 times 50 times more parameters
thereabouts in model 4 uh versus model
5. Model 5 outperforms
model 4. This is the power of a max
pooling layer. Now let's see what it's
done to the training curves. Um, we can
plot loss curves using our helpful
function. Plot loss curves
is history
five.
Look at that. What did we say before?
Ideally,
the training and the validation loss
curves are very similar to each other.
Now, that's okay. That's not perfect.
You're right. But it is certainly much
better than what was happening up here.
Look at that. This one's basically
horizontal and this one's going
downhill. Whereas over here, at least
they're both heading in the same
direction. And it looks like if we kept
training for a little bit longer, they
might have kept going down. Same with
the accuracy. Validation accuracy looks
like it's a teetering off here. So what
max pooling has done, it's essentially
we come back to the CNN explainer. The
input 60x 60, it's halved it to the
output. So it has really just removed
the features in here. If we go over
there, the edge, it's removed the
features, it's only taken the max. So,
it's just really condensed the features
so that the model has the best chance to
only learn the most important features.
So, even though it's got a lower number
of total learnable parameters, those
parameters are of a higher value, or at
least that's what it looks like from
from evaluating our model. So now that
we've taken care and we've seen one way
to reduce overfitting and that's to not
not only just reduce the number of
features in a model but to um sorry if
we're reducing overfitting I I didn't
explain this before reducing overfitting
is also known as regularization. So
that's I'll I'll put that in a little
tidbit here. Um
note
note whenever you hear the term so
there's a lot of jargon in deep
learning. So that's what I'm trying to
sort of educate you as it's going along
is to sort of as the word comes up,
we'll use it in the context of when it's
been learned rather than just bombarded
with definitions. Reducing overfitting
is also known as
regularization.
So this is exciting, right? We've now
got we've got tools in our arsenal.
We've got tools in our arsenal to fit a
machine learning model. We created a
baseline. We beat our baseline by
overfitting a larger model. Now we've
got tools to reduce overfitting. So
that's deep learning in a nutshell. Now
we've also talked about another method
here and that's data augmentation. How
about we start to tackle that in the
next video.
[Music]
In the last video, we saw how the max
pooling layer condensed the features of
what our model's learning by taking the
most important values. So if our
convolutional layer has identified some
features, max pooling is going to
condense them into a smaller space. So
having a smaller number of total
parameters to learn, but those
parameters are what the model has deemed
most important in those images. So now
we're going to keep digging into our bag
of tricks. We'll go back up to what we
wrote down was reduce overfitting by
adding data augmentation. And just a
little reminder, reducing overfitting is
also known as regularization. And what
we're trying to do here is align these
loss curves cuz that's the most ideal
position because if these curves are
aligned, it means that our model is
performing just as well on the training
set as it is on the validation set. And
remember in our case the validation set
is data the model has never seen before
which is our ideal case. A model that is
able to generalize to unseen data. Let's
make a little heading here. Opening our
bag of tricks and finding
data augmentation.
All right. So what we're going to do as
always is code something up first and
then we'll discuss what's actually
happening. So to introduce data
augmentation, I know we haven't even
discussed what data augmentation is. You
can probably have a little bit of a
guess what's happening, but if you're
not sure, that's all right. We're going
to go through it.
So we're going to reinstantiate our
image data generator classes or
instances. Um create image training
instance with data augmentation.
So if you remember when we created our
train data genens, this one's going to
be augmented now. and you'll be able to
notice the difference of what's
happening. Generator, if we read the
dock string, what does it say? Generates
batches of tensor image data with real
time data augmentation. Yes, that's what
we're after here. So, if we pass in the
first thing we want to do is rescale.
Rescaling is actually a form of uh data
augmentation. Rotation
range
equals 0.02. Remember if you want to
understand what rotation range is
happening, you can read uh the dock
string here. We should have rotation
range somewhere in there. Anyway, it's
there somewhere. Rotation range is going
to be 0.2. You might be able to guess
what that does. Shear range equals 0.2.
I'm going to try and remove this dock
string so it's not in the way of the
code. H looks like it wants to appear.
That's all right. Zoom range equals 0.2.
Data augmentation. I'm actually very
very excited to show you this because
this is one of the coolest things. I
know when I first learned about it, it
it blew my mind. Height shift range.
That's what we want. We've done width.
We done height. What else could we do?
We could do a horizontal flip. Ooh, that
might reveal something of what happens.
Equals true. So, I'm not going to
explain those for now. You can read the
dock string if you wanted to have a look
or you can look up the image data
generator class on TensorFlow. But this
one is an instance of train data being
augmented. So now we're going to just
reinstantiate our original train data.
So
create image data generator
without data augmentation.
So we go train
data gen equals image data generator.
Now, we don't have to rescale, but
rescaling is probably one of the most
important ones to do if we, as we saw in
a previous video. Um, it dramatically
speeds up the speed at which our neural
network is able to find patterns in
data.
This is going to be for the test data
set. So test data gen
data gen equals image data generator
rescale equals 1 to 5. Beautiful. Okay,
we shift that. All we've done is
reinstantiated this. We've written these
two lines of code before. The only
difference is is this one. And so now
you might be thinking, okay, Daniel,
we've written this data augmented train
data gen class or instance of image data
generator.
question you might have. Let's get out
our faithful emojis.
Question
equals, what is data augmentation?
So, let's answer that. Data augmentation
is the process of altering our training
data
leading it to
have more diversity
and in turn allowing our models to learn
more generalizable. This is
hopefully more generalizable
patterns and altering might mean uh
adjusting the rotation of an image,
flipping it,
cropping it or something similar.
Now, saying it and doing it is one
thing. So, let's uh let's go back to our
faithful little keynote. improving a
model from a data perspective this time.
So this is a method to improve a model
to reduce overfitting. Remember our
steps are create a baseline, overfit,
reduce overfitting. To reduce
overfitting, you could add more data. So
that's just get more images of pizza and
steak. Or we could introduce data
augmentation, which is what we're
covering now. So increase the diversity
of your training data set without
collecting more data. Increased
diversity forces a model to learn more
generalizable patterns. Generalization
patterns. Hm. That could be
generalizable. Fair few typos in here.
No wonder I make so many mistakes with
my code. And we're going to have a look
at Oh, actually, I was about to go to
the next slide, but you know what I
think would be a better option is we
just write some code to visualize what
data augmentation is doing cuz slides
are boring, right? Let's come back. Um
right here let's write some
code to visualize
data augmentation.
And what we might do is because to
prevent this video from getting too
long, we're going to write some code in
the next video to visualize data
augmentation. If you want to skip ahead,
if you like, you can look up this class,
image data generator, see what each of
these do, and try to have a guess at
what what it's going to do to our
images, or even just look up a blog post
on data augmentation. But if not, I will
see you in the next video, and we'll
augment some of our images.
[Music]
All righty. We've spoken of data
augmentation. It's time we had a look at
what that actually looks like. I've got
a slide ready to go for this, but I
prefer to write code. So, let's um
import data and augmented from training
directory.
And we're going to print this is going
to be our augmented
training data.
We can go train data augmented. name it
really simply so we can identify the
difference between normal data and train
data training data that is augmented.
We're going to use the flow from
directory method which what does it do?
Takes the path to a directory and
generates batches of augmented data.
Beautiful. That's exactly what we want.
We're going to pass it trainer cuz
that's our training directory. What do
we set up the target size to be? It
actually you could set it up to what you
want, but we're using 224 224.
And then the batch size, what's that
going to be? We don't actually have to
put this because it's by default 32.
What type of classes do we have? Class
mode. We have to set this up because the
default is categorical, but we are using
binary. And we're going to for this
instance, we're going to use shuffle um
equals false. Now, this is only going to
be for demonstration
purposes only,
just so we can see what's going on when
we uh augment our data. And so
here, cuz usually it's a good idea to
introduce as much randomness as possible
to your data set. That's why the shuffle
parameter is set to true. Now, we're
going to create non augmented train data
batches. So we go train data equals
train data gen. This one's the
unogmented one dotflow from directory to
say hey TensorFlow go into our training
directory. Get all of the images in
there. Check out what the directory
names are such as train
steak or train pizza. They're the class
names. Turn them into batches of 32. And
the class mode is binary. So there's
only two classes. If you find more error
and then shuffle. This one is also going
to be false just to demonstrate.
And then finally,
I might actually just print this out
just so we know. So, I'll put a
colon there. And non- augmented
training data. And then finally, we will
create nonogmented
uh this is probably a note as well. I'm
going to put down test data batches.
I'll put this this down in a second.
Print um non augmented
test data.
So test data equals test data genen dot
flow from directory test dur. And then
what's the target size? Has to be same
as a training size. We could have really
put this up in a a variable. So like img
size equals this is a convention. And
you'll see just while I'm thinking of
it, you might see im size as like
capitals and then you could go 224 224
set that up as like a global variable in
Python and then we could just type in
here image size cuz that's the same same
thing. Then we might go or we will go
batch size equals 32. Again, don't need
that cuz that's the default. I just want
you to get used to understanding what a
a batch is. And then this one's going to
shuffle automatically. It doesn't matter
if the test data set. We're going to
visualize the training data set. And so
this was the note I was going to add. I
just realized this as we're coding up.
Uh oh, what have we got wrong? Target
size. That's what we need. Beautiful.
So, here we have it. Augmented training
data. We have 1500 images belonging to
two classes. Non- augmented training
data. 1500 images. That makes sense
because they're both coming from the
same directory. And then non- augmented
test data. Now, that's a key point
there. That's what we're going to put in
here. We're going to put a little key.
And
note,
data augmentation is usually only
performed
on the training data.
So, using image data generator
built-in data augmentation
parameters, our images, this is another
key point. are left as they are in the
directories
but are modified as they're loaded into
the model. So I just remembered both of
these actually that could really be two
points but we'll talk about this. So
data augmentation usually only happens
on the training data. So we want to make
our model we want to make it difficult
basically for our model to learn what's
happening in the training data so that
it can generalize to test data.
So, if a model performs well on
augmented training data, hopefully that
translates to performing well on non-
augmented test data. And you're like,
Daniel, we haven't even seen what
augmented training data looks like. And
that's what we're about to do. And then
finally, using the image data generator
built-in data augmentation. So, when our
images are augmented, as we're going to
see in a second, they actually don't
change in the the directories where we
have them. So, they'll stay the same
there. they change as they're loaded in
to our model. So without any further
ado, this is really exciting. We're
going to write ourselves a little
message.
Finally,
let's visualize some augmented
data.
Woo! All righty. So, we need to write
some code. Let's get some sample
augmented data batches. That's a good
idea. So images labels equals
oh no we just need we need non-
augmented and augmented. So the first
one we'll get is train data do next. So
train data remember is not augmented.
That's this one here non augmented. So
next and then we'll go augmented images
and augmented
labels equals train data augmented
do next. And I'm going to put a note
here is that labels aren't augmented
only data images. So this is just a a
way to store this variable here. We
could really just put underscore and use
this labels parameter cuz they'll be the
same. Uh because this is why we didn't
shuffle them so that we know that the
next batch of each is going to be the
same cuz they'll be loaded in. They
won't be randomized. Typically when we
load our data in, it'll just take these
two. It'll take the training file and if
shuffle was untrue, it'll load files
from the training folder randomly. But
because we're not shuffling it, it'll
come in sequential order. So whichever
pizza image comes first will be loaded
in in the first batch. That's why we put
shuffle to false. But just to reduce
confusion,
augmented
labels.
Okay, wonderful.
So now let's uh show the original image
and the augmented image. This is again
why we where we didn't shuffle our
images so we know random number equals
random. Um have we imported random? If
not, we'll do it again. I think we have
when we coded up our plot random image
function. We want a random int between
zero and 31 cuz our batch sizes are 32.
So we'll get a random integer between
those two numbers. Or actually I think
we can do 32 because that'll go up to
minus one. If not we'll get an error
eventually. We're going to plot a random
image from the first batch. So im show
images random number. So there we go. So
we're just getting this image from the
non- augmented training data the first
batch and we're plotting the whatever
random number comes up. So I'll write
down here actually just a little print
statement showing image number
random number. I love randomness. Love
love love showing random images. Um plot
title. What title could we give this?
Let's just go keep it simple. This is
the original image. And then we're going
to we don't want an axis axis, however
you pronounce that. We want another
figure because we want another plot.
This time we're going to im show
augmented
images. And we're going to get the index
on the random number. And then we're
going to do the same thing. Plot title.
Not the same title though. Um, augmented
image. Oh, this is so exciting. We're
about to see our first augmented images
after talking about it for about three
videos with the three videos. And I
think that's it. Let's see what this
looks like. Hopefully it works. A yes,
here we go. So this is image number 23.
So we're working with pizza. As I said,
because we've set shuffle to eagle foss
is going to load in pizza first. And
this is the augmented image. So let's
visualize a few and then we'll discuss
what's what's going on. See if you can
pick it up.
Okay. So what's happening? This is the
original. This one looks backwards.
Looks like it's moved like on the edges
up here and here. And then so that's
number 11.
Number 21. Oh, is that a photo of pizza?
See, this is what I'm saying. We
becoming one with the data. That's to me
that's Where's the pizza in that? So,
that's labeled incorrectly. So, that's
uh very interesting. So, that's number
21. If we went into our pizza directory,
not sure if these will show up because
collab sometimes takes a while. There we
go. We went down to image number 21.
Whichever one that is, we could remove
that potentially and improve our pizza
data set.
Okay, this one's been manipulated a
little bit.
So, okay, manipulated. So, now we're
starting to see what's happening
with our images. Okay, so this one's
shifted across to the left a little bit.
Do you see how we're the the training
data already has some diversity, but the
augmentation really amplifies that of to
to really make sure if we were building
our food vision app of people taking
different photos, the photos might not
always be the same as what we have in
the training data, they might be all
over the shop as someone's taking a
photo. They might move or they might
have different lighting. So, that's what
we're trying to introduce artificially
with augmented data. Okay, we're getting
the same 23s come up a lot. I think
there was a movie about that.
Let's go back to where we created our
image data generator with data
augmentation. So this is where the
changes to our data or the augmentations
are coming from. So how much do you want
to
rotate an image? So this is going to
rotate it around as a square. Now of
course we could set this to three. We
can set this to one. You might be asking
Daniel, what values do you set for this?
Again these are other hyperparameters
that you can choose. You might find that
I'm using 0.2 because I've used it a
fair few times. I found it works pretty
okay. If you go right up to 0.9, it may
rotate your images too much. So that' be
all basically upside down. And another
thing to to really remember is that if
you read into the image data generator
uh class documentation, these all happen
randomly. So what one Oh, I set that to
03, not O2. What? One image might be
rotated 0.1 and it might be sheared
zero. It might be zoomed in 0.05. It
might be moved to the left zero. And it
might be moved up or down 0.2. So then
shear, how much
do you want to shear an image? Now, if
you wanted to explore these on their
own, you could just comment them out,
right? And then you could just make the
class with only a rotation and a
horizontal flip. Do you want to flip an
image? And that way you could see what's
going on with all of them, but I'll let
you explore that in your own time. I'm
just introducing you some of the the
main ones that you might see. zoom in
randomly on an image and then move your
image around on the x axis and then move
your image around on the y axis. And of
course you can go through the
documentation here and it'll tell you
what all the different parameters do.
Rotation range, width shift range,
height shift range, etc. So, we'll just
reenter that to fix up the 0.3. We'll
also re-enter that.
And we'll get some more data. And we'll
view one more for good luck.
Oh, number 10. Far out. Our random
number generator is not giving us very
random. 17. Have we seen that one? I'm
not sure. We come back. This was a
pretty slide I was talking about. What
is data augmentation? Looking at the
same image, but from different
perspectives. So, we might have the
original photo of the delicious piece of
steak and some eggs that I cooked the
other day. Hopefully, our pizza steak
application can pick that up as being a
photo of steak. Rotate. So, see that's
what might happen with rotate. Shift. We
might shift it up to the left a little
bit. And then if we want to zoom it in,
boom. So, this is a little note here.
There are many more different kinds of
data augmentation such as cropping,
replacing, shearing. This slide only
demonstrates a few. So, I'll leave that
as some extra curriculum to look into
what other different types of data
augmentation is. And don't forget when
you're asking if you're not sure like
how much you should use. Again, this is
one of those things to just experiment
with. Look at examples online of where
other people have found data
augmentation to work and try them out
for yourself to see how they change your
model. And speaking of that, how about
we do that in the next video? Let's
train a model with augmented data. If
you want to skip ahead, I always
encourage you to try and skip ahead and
try it for yourself before we go through
it together. Try to train a model, build
model six, same as model 5. So some
convolutional layers, max pulling
layers. And this time when you fit it,
don't use the normal training data. Use
the augmented training data. And if you
really wanted to plot some loss curves
when you're done, but if you don't want
to jump ahead, I'll see you in the next
video. We'll do it together.
[Music]
Now that we've seen what augmented
training data looks like. So we see our
original image is here. We pass it
through some data augmentation and it it
changes it up. Let's view one more just
for the hell of it. It's going to be 23.
Oh, number zero. Okay. So this is it
goes over here. Now again this might not
be an ideal setup of data augmentation
like this uh shift from to the right. So
that's the width and the height that
might not be ideal. So again that's
that's part of the experimentation of
different parameters in the data
augmentation. So we could we could can
these two and see how that affects our
model. But for now we're going to leave
it and come back down. Let's build a
model that trains on augmented training
data.
So create the model or actually let's
leave ourselves a little note just so we
know what we're doing. Now we've seen
what augmented
training data looks like. Let's build a
model and see how it learns on augmented
data. Wonderful. So create a model and
this is going to be the same as model 5
which followed the typical convolutional
neural network structure
again I say typical is that the right
word h well it's a very common structure
of a convolutional neural network is to
go convolutional layer like we're
building now that's the thing about deep
learning right is that there are some
rules and tricks but because the field
is changing so fast it's like common
practices that that are are working
today. And that's that's actually a
really important point. The things that
I'm teaching you today, you might find
in a year's time invalid. There's a
better way to do things. And that's
that's a very important mentality to
have is to not to sort of go to not be
stuck in your ways is to be a serial
experimentter.
Try new things. See how it goes. If you
find something that works, good. But if
it if you find something that's
potentially better, try it. See how it
goes. Max pull 2D. But yeah, back to the
typical structure is convolutional
layer, max pull layer, convolutional
layer, max pool layer, com 2D, and we're
going to go 10 three
activation. Again, we're just making the
same as model 5 com pool. Com pool.
That's a little song there. Max pool.
Yes. And then flatten. And then dance,
we want one. Activation is going to be
sigmoid. Why? because we are working
with a binary classification. What
activation would we use if we were
working with multiclass? That's a little
test for you. Then we're going to
compile the model.
Model 6. The answer to the pop quiz is
softmax. By the way, model 6.compile is
going to be loss equals binary cross
entropy.
How come? Cuz we're working with a
binary problem. Pizza or steak. Adam
metrics. If oh, here's another pop quiz.
If we're working with a multiclass
classification problem, what might be
our loss?
Could be sparse categorical cross
entropy or it could be categorical cross
entropy. Fit the model history. We're up
to six. Beautiful. So, six models so
far. We are model building fiends.
Actually, did we build a model zero? So,
model 6.fit train data. Oh, what did we
miss out on here? This should have been
augmented. So that's a key thing to also
keep in mind is that when you're fitting
on different types of data, make sure
you're on the when you're running
experiments that is on different types
of data, make sure your variable names
are correct. Uh I've been caught out. I
almost got caught out just then. uh many
a times of running different experiments
wondering why there's almost no
difference and then it's because I've
messed up the variable names equals len
train data augmented
and validation data
equals test data a little confusing with
nature there but
I'm sure you understand what's going on
built a few models like this already
okay so we're ready we've created a
model same as model 5. Uh we've compiled
it and oh well once we run the cell
it'll be compiled and we're going to fit
the model. I'll write a little note
here.
Fitting model six
on augmented
training data.
So we're going to run this. Oh, what do
we get? Steps per epoch. Not epochs. Did
you pick that up? Here we go. Okay. What
do you notice first?
You might have noticed that the ETA is a
little bit longer than what we ran
before. I'm just going to scroll back up
back up while this is running. Back to
model 4. What did we take per epoch? 9
seconds. And if we come back down to
model 5 that's running live or model 6,
sorry. Oh, actually that that was model
5 that took 9 seconds. This one's taking
22 seconds. Why do you think that is?
I'll let you have a think about that
while I uh wait for this to finish and
I'll come back here. It'll only be about
a second or so for you, but I'll come
back here once my model is finished
training.
All right, so my model has just finished
and it took about 20 seconds, 22 seconds
per epoch, which is almost double the
amount of time the previous model took.
And if we check the the accuracy didn't
even start off too well, I believe on
model 5, what did we start off as?
We started at 58 and jumped straight to
75. That was without augmented training
data. And we come back down to this one.
And we're 43 54. We barely crossed 60 by
the end. And validation accuracy isn't
as good either. H. Now, I ask you to
have a think about why you think it
might take a little bit longer per
epoch. And if you're not sure, that's
completely fine. That's what we're here
for is to learn this together. It's
because when our model is fit on train
data augmented, the important thing to
note is that it doesn't alter the
training data that we have in here. So
our training data stays the same. The
reason why this takes longer is because
our train data augmented class or
instance augments data on the fly. So as
it's loaded in. So that takes up a bit
of pre-processing power of our computer
chip or GPU. I believe it might actually
be pre-processed on the CPU. So that's
why it takes longer per epoch. So that's
just something to note. If you're using
data augmentation, it may take longer
per epoch because it has to be
transformed as the model is learning
patterns in it. And the other thing is
the results aren't that good. But let's
check our what are we trying to do? What
were we trying to do to begin with?
We're trying to reduce overfitting. We
may have made it too hard for our model
to learn on the training data. So what's
another knob that we can turn? If we've
we've we discussed this before, maybe
we've put in too many data augmentation
options. So again, that's just something
to keep in mind. If the results aren't
where you want them, you can sort of
turn your knobs to go back and forth,
increase augmentation or decrease
augmentation. So let's um check our
models training curves because that's
what we're trying to do, right? We're
trying to reduce overfitting by aligning
the training curves.
History six.
What happens?
All right. Okay. They're getting a
little bit closer.
A little bit. They're not perfect. So,
the ideal one here is right from the top
left to the bottom right. And for
accuracy is from the bottom left, the
top right. Now, the perfect loss curve
is kind of like uh Santa. Sounds like
fun until you realize it's not really
real. Most of the time, you're going to
get them like this. If it's going if
it's the trend is going down, that's
good. If the trend is going up for
something like accuracy, that's that's
also good. Now, you might be wondering
why our model got such poor results in
the beginning. Well, first of all, we've
we've we've made our training data
harder to learn on. But it's also the
fact that if you remember, we didn't
shuffle it.
So, how about we see what happens when
we just we actually turn shuffle on.
Remember how I said shuffling? It's
always it's basically always a good
thing to shuffle your training data to
increase the amount of randomness. So
let's recreate train data augmented, but
this time it's going to be shuffle
equals true. And we'll build another
model and see how it performs. So if you
want to go ahead and give that a go,
train a model just the exact same one
that we've built before, but on training
data augmented and shuffled.
I'll see you in the next video.
[Music]
Welcome back. In the last video, we
trained our first model on augmented
training data. However, we noted that
this data is not shuffled. And it seemed
that our model performed fairly poorly
on the training data set all throughout.
Like it didn't the validation data is
way eclipsing the performance on the
training data set. which is showing
there might be something up with our
training data. So now we're going to to
shuffle it and see what happens. So
let's uh right here let's shuffle
and I want you to have a think about it
before we even code it up just while
while we're talking about it. Now let's
shuffle our training data and our
augmented training data. I better put
that there. All right, because we're
dealing with a few different types of
data here. So just to be precise in our
wordage, let's shuffle our augmented
training data and train another model
the same as before on it and see what
happens. I want you to think about this
is if our model's going through our data
in a sequential fashion, if we look at
here, it's loading it in in the order it
comes in. Why might it perform poorly at
the beginning and then slowly start to
correct itself over time? Have a think
about that. And uh we'll see why that
might be as we go through this. So let's
um we need to reimpport data and augment
it and shuffle it from training
directory. So we can just take our train
or we'll set up a train data augmented
shuffled variable and we'll set that to
train data genen and it's going to be
augmented dotflow from directory. So
we're just taking the instance that
we've created before
which is up here this one train data
augmented. So this is the image data
generator class that has our data
augmentation built in. And see how we
used used it here to flow from directory
with shuffle equal to false for just
train data augmented. We could actually
append this to be not shuffled but we
won't. We'll just leave that there. So
we're just setting up this exact same
setup but changing shuffle to equal
true.
Let's do that. Come back. This is going
to be train dur. What's the target size?
We've said throughout 224 224, but
again, you can change that to whatever
problem you're working on. The class
mode is going to be binary.
We could
not set the batch size cuz it's going to
be 32 anyway. And we don't even have to
set shuffle, but we're going to anyway
cuz see is shuffle in here.
flow from directory. Shuffle. I can't
find shuffle.
That's all right. We'll just set it to
stream. Shuffle data this time.
Wonderful.
That is the default. Found 1500 images
belonging to two classes. Now we have
train data augmented and it's shuffled.
So what we're going to do is create the
model. Same as model 5 and model 6. Do
you see the type of experimentations
we're running here? Is that we've kept
the model the same for the past two
experiments. What we're tweaking is the
data. So that way we know if there's
something different between model 5 and
model 6, it's within the the thing that
we're changing because the models are
the same. com 2D. And what are we
changing? We're changing the data. One
had non- augmented data. One had
augmented data not shuffled. And this
one has augmented data is shuffled. So
we'll come over here. Com 2D com pool.
Com pool pool. Max pool. We actually
don't need to put a two there cuz two's
by default. Com 2D.
We can go 10 three activation equals
reu. Wonderful. Max pull 2D. Oh, we are
becoming expos. Look at look at this.
You're already writing your what's this
by now? fifth or sixth convolutional
neural network.
You should be proud of yourself. Maxpool
2D. Wonderful. And flatten. We've saving
ourselves a lot of keystrokes here too
by in importing
different modules as we we need them.
Sigmoid because we're working with
binary classification. And we're just
going to compile it. Compile the model.
We could actually if we're using the
same model architecture, this is just a
note to go forward. We could actually
functionize this in some way because
we're using the same model and we're
compiling it in the same way. So we
could have actually wrote a small helper
function called build model or something
like that. And that way we could just
call that function to build our model
with all these different parameters and
then we just tweak the data when we fit
it. But that's uh that's a note for
future us. And then we're going to fit
the model. So we're up to history 7 now.
History 7 equals model 7.fit
train data. This is important. Augmented
shuffled. We want it to
we're fitting on augmented
and shuffled data. Now I really want to
see what happens when we augment our
data and shuffle. Epoch is going to be
five steps per epoch equals len train
data augmented shuffled.
Beautiful. And then
validation data equals test data and
then validation steps equals len test
data.
Wonderful. And so I think just want to
go back up to the previous model. I
think we didn't do this step for model
six.
Oh no, we did. That's correct. L train
train data augmented. Forget what I
said. Let's run this. This might take a
little while as well because it's
working on augmented data. Remember when
it's loading it in, it has to augment
data on the fly. So that's why the epoch
takes a little bit longer. So, I'll uh
I'll let this fit and uh I'll skip the
video ahead to when it's when it's ready
and I'll see you there.
Oh, would you look at that? So, my model
took again about 20 seconds per epoch.
We can see the accuracy starts off
higher on the training data set and then
increases pretty pretty steadily and
comes up to almost on par or actually
the val validation accuracy is is really
good on this one. So that's the power of
shuffling training data. Let's plot the
loss curves. I want you to have a think.
We haven't discussed why that might be
but just have a think about why that why
is that happening. Plot loss curves.
We'll get the function here
plot
loss curves um history 7.
All right. Okay. So, there's some pretty
good looking loss curves, training loss
decreasing, validation loss decreasing.
It does tick up a little bit here. So,
that maybe that's hinting that our
model's starting to overfitit towards
the end. But let's compare the the
accuracy numbers here on the training
data set to our previous model. So you
see here this starts at 43 reaches 55
doesn't really increase in the next
epoch marginally gets to 62 by the end.
Whereas when we shuffled the data it
starts at 56. So first of all already
already beating ours that uh our
previous model that was on epoch 3 and
then it finishes up just above 76.
Now, why might this be? So, let's have a
think about it. If we go back to our
training data file, pizza steak, if our
model was to load in the training data
sequentially, so without shuffling, it's
going to have a look at all of the pizza
images first, learn the patterns in the
pizza data, the pizza photos, and then
so once it's about halfway through, once
it's been through all of the the pizza
data, so maybe that's by epoch 2 and a
half, it's going to go on the images of
steak, and then it's going to try and
figure out all the patterns in stake.
but it's already learned the patterns in
pizza. So, it has to adjust to the
patterns of steak after it's gone
through the pizza images.
Whereas, if we shuffle the training data
as it goes through, it's looking at
random images of pizza and steak
throughout every epoch. So, it's
learning the patterns simultaneously,
and it doesn't have to do a backflip on
itself halfway through training. So,
that's one of the importance of
shuffling your training data. If you're
getting a uh some sort of weird results
and low accuracy or something like that
on your training data set while your
model's training, make sure that it's
shuffled. Make sure that it's in the
right order. Make sure that it's been
pre-processed. These are come some of
the things you want to keep in mind. Uh
if your model's not performing well on
your training data from a data
perspective, that is. And so now we've
uh we've tried a fair few things. We
come back to where we were adjusting the
model parameters. We've created a
baseline. We've beat the baseline by
overfitting a larger model. We've
reduced overfitting. We've seen ways to
induce overfitting. We could increase
the number of convolutional layers,
increase the number of com filters.
We've tried to reduce overfitting by
adding data augmentation, trying to get
those curves closer together. And we've
tried regularization layers such as
Maxpool 2D. So, uh, what was what was
our next step? Repeat until satisfied.
So, we might cover what what other
things we could do in the next video.
So, if we come back down to where we
were, we'll scroll down. I think we've
been through enough experiments for now.
Let's uh tackle what what you might want
to do to to keep pushing forward with
this type of experimentation. in the
next video.
[Music]
Welcome back. We've trained a fair few
models so far. And as always, I've add a
little note to the previous one just to
cement what we were talking about so
that it's in text form and it's not just
me speaking. When shuffling training
data, the model gets exposed to all
different kinds of data during training,
thus enabling it to learn features
across a wide array of images at the
same time. So in our case, it's learning
the patterns in pizza and steak images
at the same time instead of just pizza
than just steak.
So now we're up to Oh, we don't need
that extra one. We're up to step seven
of the breaking it down. It is repeat
until satisfied. However, I'm going to
issue a challenge to you here because
we've already built a fair few models.
I'm going to write down a few steps.
What we might do if if you wanted to or
have a think about it. I'm about to
write these down, but just have a think
out loud to yourself or in your head.
Doesn't matter. You don't have to say
it. How would you keep going? How would
you improve our model's performance? We
have looked at a slide.
I'll show you what it looks like.
Improving a model. There are some things
common ways to improve a deep model. We
could add layers. Increase the number of
hidden units. These are talking about
dense layers. But how does this relate
to a convolutional neural network? Could
we change the activation functions?
Well, we've seen ReLU Reu works pretty
good. Could we change the learning rate?
Well, at the moment, seems like our
convolutional networks learning pretty
good with the with the default atom
learning rate. So, the main probably
four out of this would be adding more
layers. So potentially convolutional
layers and max pooling layers,
increasing the number of hidden units,
which in our case would be filters for
convolutional 2D layers. If we had more
data, that would probably be really
good. And then if we did have more data,
we'd probably try and fit for longer. So
let's write some of these down.
Since we've already beaten
our baseline,
there are a few things we could try to
continue to improve our model.
So, one of them would be increase the
number of model layers. So, eg add more
com 2D slash max pool 2D layers. And
then we could also go increase the
number of filters in each convolutional
layer.
Eg. We could go from 10 to 32 maybe or
even 64 if you feeling adventurous.
Again, these are not numbers set in
stone. These are just common numbers
you'll find in practice. A lot of them
are found through trial and error. And
then we could train for longer. Right
now, we've only trained for five epochs.
So maybe the first step would be just
keeping the uh architecture the same and
training for 10 epochs instead of just
five. So double it up. We could find an
ideal learning rate using the methods
we've looked at in a previous video. But
it seems at the moment the default one
for atom is working pretty well. Again
when the TensorFlow developers build the
library, they often use really good
defaults in their uh classes and modules
that they build. we could get more data
to give the model more opportunities
to learn. You could perhaps take some
photos of your own uh pizzas and their
own steaks that you've cooked and add
them to the data set and see if that
influences. We could also use transfer
learning. Now, we haven't we haven't
seen transfer learning yet, but we're
going to cover that in a future module
to leverage what another image model has
learned
and adjust it for our own use case. So,
there's a bunch of options you've got
here, but I think the the challenge I'm
going to issue for this video is to this
is a practice. So practice can be
recreate
the model on the CNN explainer website
which is the same as I believe it was
model one up above same as model one and
see how it performs on the augmented
shuffled training data. So that's your
practice. So there's a fair few things
you could try. So I can't stress enough
how important it is to just try these
things out yourself.
But yeah, give that a go. You can go to
the CNN explainer website or you could
just scroll back up and copy uh copy
model one. But I'd highly recommend just
trying to build the model yourself.
We've got a convolutional layer with
relu com layer relu max pool relu com r
relu max pool. And then uh these layers
convolutional layers use 10 filters. You
might want to try see what happens if
you increase it to 32 filters for your
conver 2D layers. So, there's a little
challenge to you. In the next video,
we're going to have a look at how we
might make a prediction with our trained
model cuz I mean on custom images cuz
right now we've only worked with the
pizza steak data set. What if we had
another image that we've just taken of
our delicious steak or pizza or
something like that? We want to see how
our model performs on our own data. So,
have a go at this and I'll see you in
the next video.
[Music]
All right, we've covered a mammoth
amount of creating CNN models and
training them and whatnot, but the real
end goal that we're trying to work
towards is creating a neural network
that we can use in our food vision app.
So, let's give it a go. Let's try out
we'll make a little heading here. or
making a prediction with our trained
model on our own custom data. So this
isn't part of any data set, right? This
is our own custom images because after
all, what good is a model if it's not
working on your own custom data or the
data that you want it to work for. So
let's um remind ourselves of the classes
we're working with. Of course, we know,
but we just want to make sure the class
names variable is how we want it. So
we've got pizza and steak. Beautiful.
that's stored. We created this way back
when, probably somewhere up here,
getting the data and inspecting the
data. Um, now what we're going to do is
get an image. So, view our example
image. So, I've created some of our,
you've probably seen this image before,
but I've got some images saved in the
TensorFlow deep learning. So, if you go
to the GitHub repo, of course, the links
will be where you find links. And then
the images folder
should be number three. We're going to
try it on steak. So if you click on
this, this is how you get an image from
GitHub. You go download.
We're going to copy this link here. Copy
that. There's a beautiful delicious
steak that I cooked the other day with
some eggs. And we're going to wget and
pass that link in. It has to be the
raw.github user content one. So if
you've got just a GitHub link and it
doesn't say raw, it might not work. So
we're going to go stake equals MP imgre
and we're going to go 03 stake just this
file name here
jpg. And if you don't have um mping,
write this import mapplot lib image. Oh
no. From
from mattplot lib image import mp img
which stands for I think it's mattplot
image.
Does that work? Oh no. Cannot import
name. Let me scroll up. MP image.
Oh import. My bad.
See
even though I don't know these things
off by heart. So, let's scroll down here
and we're going to paste that there.
Wonderful. We'll run that. Does it work?
It doesn't work. What happened? No such
file or directory 03 state.jpeg. What
did we get wrong?
Why did that not work? H. Do we have it
saved here? Oh, there it is.
What is going on? Oh, jpeg. Forgot the
e. Okay, that's all right. Um, then
we're going to plot im show. This is
going to be from mapplot lib. Again, if
you don't have this, we need to uh
import mapplot lib. I know this one off
by heart. Oh, Google collab's helping me
out. Thank you. Ish show. I'm going to
do stake plt.ac.
Uh, we're going to turn that off and
write false. Now, before we run this, if
you wanted to get your own image, you
can also upload a file here. So, there
we go. upload to session storage. Uh
just be aware that if you exit Collab or
it runs out of um the Collab runtime
runs out, all of these files are going
to get deleted. So just be wary of that.
If you upload anything to here, if you
download anything, it's going to go
eventually when Collab closes your
runtime. So that's why I've stored it on
GitHub here. So let's run this. Let's
view it.
It's going to redownload it, of course,
cuz we got the download code there.
Beautiful. There's our image looking the
same, just a little bit smaller. And so
now, what do we have to do? Can we pass
this? This is a pop quiz for you. Can we
pass this directly to our model right
now? I'm just checking the shape of it.
It's a pretty large image. So, this
might be a little bit of a hint. Or if
we have a look at what it looks like
now, mistake. What do the pixel values
look like? Can we pass this? It's an
array. You know, neural networks and
convolutional neural networks like uh
they can't read just images. They need
to be in an array or a tensor.
So, I'm going to leave you with that
question. But if you want the answer,
you can go up and uh I would say look in
here, inspect the data or an end to end
example and that will tell you whether
or not we can use it straight away. In
the next video, we'll go through the
answer.
[Music]
Did you figure out the answer? Can we
use our stake image as it is and pass it
directly to our model?
Why don't we just try? Great idea. Model
7.predict
stake. What happens? Oh no. What do we
get? Input zero of layer 6 sequential
sequential six is incompatible with the
layer none min and dim found and dim
equals three. Hm. How do we fix this?
Well, we've running into a shape issue.
You know what? I believe it's because
we've loaded in our stake image and it's
only got dimensions height, width, color
channels. But what does our model
expect? It's expecting a batch size as
well. So, what if we pass a tf expand
dims
um stake and then we want to expand it.
Oh,
or actually let's let me show you what
this looks like first. If we go
tf.expand
dims, will this work? Access equals
zero. See what happens.
Uh
typo.
Ah, there we go. Okay, now we've got a
dimensional 4 tensor. And if we go
shape, what does that give us? There we
go. Okay. So, what if we pass this
expanded
stake equals that? We want to get rid of
shape. Wonderful.
And then we get our model to predict on
expanded
stake.
What happens? Oh,
input zero. Oh, we need to reshape it.
H, okay, we can fix this. Let's write,
we need to pre-process our image. So,
this is this is an important point. I'm
going to write a note down here. So,
I'm actually doing this one during the
video rather than after it. So note when
you train a neural network, I believe
we've discussed this before and you want
to make a prediction
with it on your own custom data. It's
important that your custom data or new
data is pre-processed
into the same format as the data your
model was trained on. Beautiful. Okay,
that's the missing piece here. So now
what we're going to do, how about we
create a little helper function to
pre-process our data. So create a
function to import an image and resize
it to be able cuz that's another point
here. This image is massive 4,000 uh by
3000 and what's our model trained on?
224x 224. So there we go. 224 224. So
there's a big discrepancy there. The
shape of our tenses aren't aren't
aligned. Um and resize it to be able to
be used with our model. Wonderful. So,
we're going to call this def load and
prep image. We'll pass it a file name,
which can just be uh like the the string
file name, something like that. Nice and
simple. And
M shape
can be 224. Wonderful. And then we'll
leave a dock string. So reads an image
from file name just a oneliner and turns
it into a tensor and reshapes it to
imshape imshape
color
channels. Does that make sense? That's
not very pythonic. It's going across the
line of some sort. Maybe that'll do.
Okay. Now, to harness this, we're going
to use a module we haven't quite used
before. Read in the image, but it's it's
nothing you're you can't take on. It's
going to be tf.io. IO stands for input
output. It's got a little function in
here, a method in here called read file,
which we can pass a file name. So, reads
and outputs the entire contents of the
input file name. That's what we want.
And we've got here read file, file name.
So, file name is a tensor of type
string. So we pass it a string, it's
going to read that file. All right,
that's what's going to happen. And then
we can decode
the red file into a tensor. So we can go
image equals tfim image. So TensorFlow
has a lot of modules that are dedicated
towards certain types of data. So when
you're going through the documentation,
you're like, hold on, I want to load
some tabular data. Well, TensorFlow
probably has a module for that. And if
you want to load some image data, well,
you might want to look into TensorFlow
image. And then the same thing for other
kinds of data. So image decode.jpeg
cuz our image is a JPEG. We might Oh,
does it have a decode image function?
That's probably more generic. Yeah.
Detects whether an image is a B. Yeah, I
think we'll use decode image instead of
decode JPEG. We'll pass it our image.
And then we'll go resize the image.
image equals TF. We've done some
reshaping before, but we can use the
image module. It also has a resize image
and then we're going to go image shape.
So, this is saying resize passes a file.
Where is it? Resizes images to size. So,
it's going to be size.
Oh, that dock string is getting a bit
scooif. And then we're going to go image
shape. Wonderful. And then finally, what
else do we have to do? What do our pixel
values look like right now? Code stake.
They're between what? 0 and 255. But
what do we need them between? We need
them between 0 and one because what's
our model trained on? Images with their
pixel values scaled between 0 and one.
So let's do that. Rescale the image and
get all values between zero and one.
Wonderful. Image equals image divided by
255. Return image. Wow. So, we went
through a fair few steps there, but if
you break it down, what are we doing?
We're reading in an image from a file
name. We're decoding it, turning it into
a tensor, and we're resizing the image,
tf.image.resize,
and then we're passing it in shape,
which is 224. We've hardcoded that up
here. Of course, we could change that if
we wanted to. Then we're rescaling it to
be between 0 and 1 by dividing by 255.
Now you might be wondering Daniel, how
did you know where to create this
function? Well, I searched. So if I went
TensorFlow, how to load images?
And then if you go load images, might
not be this exact tutorial, but if you
scroll through, TensorFlow has a lot of
information on loading and
pre-processing data. So depending on
what your data, what data you're working
on, I'd suggest just searching it up.
TensorFlow how to load CSV or TensorFlow
how to load text and then going through
the the documentation and trying it out
for yourself. You might stumble across a
function very similar to this. And of
course, you can adjust it to your needs.
So now we've got a function to uh load
in and pre-process custom images. Let's
try it out. Load in and preprocess our
custom image. So this is very exciting.
I love when we create a little helper
function to help us out. load and prep
image. And what was the file name? Come
over here. 03 stake. We've got two
copies of that cuz we downloaded it
twice. No trouble.
It's going to be 03 stake.
JPEG. This is really fun when you start
to get when you start to be able to use
your neural networks on custom data. Oh,
look at that. That is what we want. Load
and prep image working like a boss.
Shape 224. 224 three and pixel values.
Have a go at that. All between zero and
one. Okay, this is this is what's
happening. We're using our model with
it. Let's uh let's make a prediction.
Let's just pass stake right in here.
Model 7.predict stake.
Oh, what?
Ah, you know what we have to do?
Expected minim
Oh, sorry, four. And ours is three. What
we're going to have to do is we could
have built this functionality in there.
Maybe expand dims, but we won't because
if we don't want to expand the
dimensions, we don't have to. There
could be a boolean in there to to put
this in there. This should work. You
ready? Our trained convolutional neural
network on our own image or my image of
stake. And of course, you can try this
on your own images. What's it come up
with?
Array. Hm. What's that? Where have we
seen that before? I'll give you a second
to think. You know what that is? It's a
prediction probability.
So, in other words, this means how
likely the image is belongs to one class
or another. And so, since we're working
with a binary classification,
if the prediction probability is over
0.5, which it is in this case, the
prediction is most likely to be the
positive class. So, class one. But
you're like, "Daniel, our classes aren't
called 0 or one, they're called pizza or
steak." Well, in our case, they're
probably in alphabetical order. So, my
guess is that if we go into the training
file, that class zero is pizza and class
one is steak. So, how about in the next
video we write another little helper
function to sort of decipher this
prediction probability and we want to we
want to see it with the image. So yeah,
that's a good idea. Let's let's try that
in the next video.
[Music]
So we've written a little function to
help us load in images and pre-process
them so they can be used for our neural
network. But right now our model 7
convolutional neural network is
outputting prediction probabilities
which is which is very helpful for us.
But we still need a way to to
reinterpret these in in a way we can
understand. And I'll just show you where
we have seen this prediction probability
before. So if we come back to the
keynote and we find the magical slide,
go back here.
Boom. There we go. So this is very
similar to what we're working on. It's
in fact it's the same image. So we
resize it to 224. 224. And then so this
is coming full circle. This is so
exciting. I'm so I'm so happy we're
finally getting to explain this cuz I
know when you first looked at this
you're like wo what's going on cuz
that's what I looked at these things and
was like holy there's too much there's
too much going on but now we're we're
connecting the dots we're all bringing
it back together so we bring this in we
pre-process it we turn it into batch
size width height color channels there
we go 32 32 is a very common batch size
as we found out 32 is good for your
health 224 2243 we normalize the pixel
values that's our inputs to our model We
pass it to our machine learning
algorithm which is our CNN that we've
used very common very helpful uh very
popular machine learning algorithm
because it works on image data then our
outputs are now prediction probabilities
and in this case this makeshift model on
a keynote got it correct it comes out
looking something like this now I know
this is for this is for three different
classes we're working with binary but
the principle remains the same again
these will vary depending on the problem
you're working on So coming full circle,
let's go back. Let's now write some
functionality to transform this
prediction probability into something
that we can understand. And you know
what? We're going to we're going to
follow our data explorer's motto. It's
visualize visualize visualize. Since
we're working with images, let's create
a function to plot our image and view
the class name. First, we'll write a
little note here to tell ourselves,
looks like our custom image is being put
through our model. However, it currently
outputs a prediction probability.
Wouldn't it be nice if we could
visualize the image as well as the
models prediction? You know what? That
would be nice. So, let's do it. Remind
ourselves of our class names. And then
we're going to go class names, which is
pizza and steak. Beautiful. And then
what we can do to really just show you
to convert this, we can index on these
class names to figure out which one is
uh our model predicting which class. So
we can index the predicted class by
rounding the prediction probability and
indexing it on the class names. Boom. So
PR class equals class names. And then
we're going to go intf.round
round to round a tensor on. We didn't
even save this to PR. Actually, I'll
save this to a variable so we don't have
to retype out that mumbo jumbo
prred. And then let's see what this
looks like.
Oh, would be helpful if we visualize
that. What do we get? Steak. Yes.
Oh, how cool is that? But we're not
finished yet. We are not finished yet.
So that's just saying the PR class is
stake. So how cool is that? Our custom
image, we passed it through to our
convolutional neural network. It's
getting it correct. This is this is
mind-blowing. Machine learning is
already exciting, but when it starts to
work on your custom data, that's when it
really takes off. So now let's create a
function called PR and plot. Nice and
simple. It's going to take in a model
file name and class names which can be
just equal to class names. So we don't
have to set that later. And then what
can we do? We can write imports an image
located at file name makes a prediction
with model and we'll go down here say
pyonic and plots the image with the
predicted class as the title. Oh yes,
these are my favorite kind of functions
to write. So we're going to import the
target image and pre-process it.
Luckily, what did we create in the last
video? GG for image can be load and prep
image file name. And then we're going to
make a prediction. And we're going PR
equals model.predict.
And then we're going to we have to
expand the dimensions cuz remember what
we had to do up here. Our model expects
a four-dimensional tensor whereas our uh
load and prep image returns a
three-dimensional tensor. Expand dims
img access equals zero. Remember, so
many of the errors you're going to run
into in deep learning is tensor mismatch
shapes. So get the predicted class. So
we can just copy this same functionality
here. So pred class equals class names.
We'll index onto class names. So we'll
go int
round PR. So what that's doing here, you
see this? This is just rounding that to
one. So TF round, round that to one.
It's saying, well, I'm predicting this
class. And if it was below 0.5, you
could go, well, it's going to round it
down to predict pizza. And of course,
that cutoff, you can actually change
that. So if you wanted like only round
it to uh if it's over8, it's class one,
but if it's under8, it's class zero. But
that depends on what sort of problem
you're working on and what sort of
cutoff you want to create. For ours,
this will do.
And then we're going to plot the image
and predicted class plot.show
image plot.title. What's it going to be?
We'll call it prediction.
Prediction can be predass.
Wonderful. P.axis can be false. We don't
want to plot the axis.
Yes. Our function is ready. Are you
ready? We're going to test our model on
a custom image. So, PR and plot model 7.
And then the image can be 03 stake. JPEG
and class name should be hardcoded. So,
fingers crossed this works. Yes. I'm
about to clap. So, I was like, I'm so
excited, but uh I didn't want to cause
the microphone and cause a disturbance
in your ears, but just we'll give
ourselves a little clap. Look at that.
How cool is that? We trained a neural
network on some data we found on the
internet, and it now makes predictions
on our own custom data. So, let's not
leave it at one, right? Let's not leave
it at one image. We're going to come
back to the GitHub. Of course, the links
will be where you find links. We're
going to come to images and let's pick
pizza, right? because we got a few. So,
by the way, the 03 is to do with um the
notebook we're working on. See 03. So,
the 03 images are going to be related to
this notebook. So, if we come here,
let's pick pizza dad. And we're going to
go download.
There's my dad. Two thumbs up. Hey, dad.
Let's copy this. And a nice big pizza
pizza in front. It's actually a
delicious pizza from a cafe. That's the
downside of working on food vision is it
makes you hungry. So, let's uh download.
Actually, we'll put a little excited
note here. The model works. Let's try it
on another image. This time, pizza. And
of course, wouldn't be complete without
an emoji.
Download another test custom image and
make a prediction on it. Oh, I'm getting
very excited if you can't tell. W get
pizza dad. Yes, this is it. And now look
what we've got. We've got PRED and plot.
This is how powerful a helper function
is. Model 7. And we're going to read in
03. Just this file name here. 03 pizza
dadad.
JPEG.
You ready? It's going to download an
image and make a prediction. 3 2 1.
Image downloaded. Prediction. Pizza.
Yes.
Two thumbs up. Woohoo. That is amazing.
Congratulations. You've uh you've
trained your uh well, it could be your
first convolutional neural network, but
I guess I'm I'm betting that it's your
first convolutional neural network. If
it's not, but it probably is your first
one to detect images of pizza and steak.
So, what kind of problem did we work
through? Oh, I love that so much. We
just worked through a binary
classification problem.
So, what you might be thinking is how do
we adjust what we've worked on to
multiclass? So, look, we've been through
a fair few steps. So, how about what we
might do in the next few videos, we
probably won't take as long this time
cuz we've uh had a bit of practice, is
we're going to go through a multiclass
classification problem. Let's step
things up a notch. Let's make it harder
for our model to identify our food
vision model to identify different
pictures of food by adding more classes
to the mix. So, we've worked with steak
and pizza. How about you think of some
other foods? What are some other foods
that you'd like to see? I mean, I've
kind of already made them, but just
think of some of your favorite foods.
They might be in this uh problem that
we're about to work through. So, stay
tuned for that. I'll see you in the next
few videos. Give it a go at predicting
on your own custom image. If you want,
you can upload a file to uh Google
Collab. Just don't forget once Collab uh
exits, that file is going to be deleted.
So, take a photo of a pizza that you've
eaten or maybe a steak that you've
cooked and see if the model can predict
whether it's a steak or a pizza.
I'll see you soon.
[Music]
You put in an amazing effort so far. I
mean, look at this. Two thumbs up and a
victory pizza. But what do we say right
back at the beginning? Here's what we're
going to cover broadly. So, we've got a
data set to work with. Pizza and steak.
Yes. We've been through the architecture
of a CNN. Yes. We've done an end toend
binary image classification problem.
Pizza verse steak. Yes. We've been
through the steps with modeling with
CNN's. Now, we're up to an end toend
multiclass image classification problem.
So, let's go back here. We'll write
ourselves another heading. Multiclass
image classification.
And so what we're going to do is we're
going to change the data set we're
working with, but we're going to do all
of the same steps we've done for binary
classification and step it up a notch
for multiclass image classification.
Now before we write any code, and this
is a rare exception you'll find with me,
I want you to have a think about how we
might change our models. Uh and that is
the exception is talking about a concept
before writing code. How would we change
our model to go from binary
classification to multiclass
classification? Just have a think about
it about the things we've talked about
and once you've had a think about it,
push forward with the video. Let's start
to write some code. Here are the steps
we're going to go through just as a
little reminder. So, what's the first
one? Become one with the data.
Two, uh, pre-process the data. So get it
ready for a model.
Three is create a model. Start with a
baseline.
Um number four is fit the model. So
we'll make sure it works on the data.
More specifically, we want to kind of
overfit it. Overfit it to make sure it
works. Remember overfitting is when a
model learns too well on the training
data and doesn't generalize well. But
then we can fix overfitting. Overfeeding
is generally a good thing because it
means our model's learning something.
Then we're going to evaluate the model.
Number six, we will adjust different
hyperparameters
and improve the model. Try to beat
baseline/reduce
overfitting. And then number seven is of
course repeat until satisfied. And as
you know because we're doing we're we're
serial experimenters, right? And so
these steps are just guidelines. They're
not set in stone. You can jump back and
forth between them wherever you want,
but this is just something that we can
push forward with. Um, when you're
learning deep learning, there's a lot
going on. So, it's good to sort of have
these little frameworks that you can
work to to to get the foundations down
pat. And just a little note here, we've
just been through
a bunch of the following steps with um a
binary classification problem. Pizza
verse steak. Now, we're going to
step things up a notch. I'm I'm doing a
lot of typos here. Should have typed
this out before, but we're typing it out
together. step things up a notch with 10
classes of food. Multiclass
classification. Beautiful. That's about
as much information as we need. All the
data is going to come from food 101 as
well. So, what do we have to do first?
Well, refer to our little steps. Become
one with the data. But we actually we
should put import and become one with
the data. So, where is our data? Well,
if we go to deburk.link/tfcourse
GitHub,
this should take us to the TensorFlow
GitHub. Beautiful. Uh, by the time you
watch this, this will probably be a lot
more prettier, but essentially this
table is going to stay much the same.
So, we're up to course materials. We're
up to section 3, TensorFlow computer
vision. Now, we need we've got the pizza
steak data set. That was our binary
classification. Now, we need the link to
the 10 food classes all data. So, I'm
going to copy that link address there.
I'm going to come back into the Google
Collab notebook. And now, if you want to
see how this was um this was created,
this data set, it won't always be
created like this for you. Remember in
the extras, there's a notebook here
called image data modification. That's
how this data set was made. So, let's go
into here. So, we're going to import zip
file and then we're going to download or
copy that link that we just copied there
from the GitHub. So, if we come back,
we'll go to this beautiful looking
table. I just did copy link address.
Again, all the links you need will be
where you find links. So, let's go here.
Now, we're going to unzip our data. So
we can do that by going zip ref equals
zip file dotzip file m10 food classes
all data dozip. Then we'll we could
actually functionize this or we could
actually just use the zip on the command
line. But that's all right. We'll just
do it this way. zip ref dot extract all.
Wonderful. And then zipref do.close.
Close.
All right, there we go. So, that's going
to download. It should be pretty quick
uh because it is stored on Google
storage. Here we go. Here, Google
storage. CTM course. Uh food vision 10
food classes all data.zip. And what does
that look like in our files? There we
go. 10 food classes, all data.
Wonderful. We could inspect it through
there in Collab, but that's relatively
slow. Let's write some code to check it
out. So, import OS
And then what can we do? Let's walk
through it just like we did. Walk
through 10 classes of food image data.
And then we'll go for durath m and
durames
file names in os.walk.
Let's go. We'll type in the food um 10
food classes all data. This is what we
just um downloaded. And then we want to
print f. We'll do an f string. There are
we could also functionize this actually
that would be very helpful but let's uh
let's have some practice writing out
some code right that's what we're all
here for directories so dur names is
going to tell us how many directories
there are directories and len file names
file names are going to tell us how many
images there are in each file in derpath
is going to tell us which class those
images belong to in derpath but we
actually have to have this as a singular
string. Otherwise, our f string is going
to get um a little confused. There we
go. So, let's uh run that. Oh, what have
we got wrong here? Oh, do we need a
squiggly? Yes. No, we don't need a
squiggly. We need just a curly.
Wonderful. So, there are two directories
and one images in 10 food classes all
data. I bet you that one image is
actually just the DS store, that little
pesky file that appears on um all Mac
files. Yes, there we go. So, there's the
two directories there. And this is that
pesky file. You don't actually have to
worry about that one. Uh by the time you
get this data for the course, I'll
probably have removed that from all of
the directories. So, let's just ignore
that for now. But if we can see here, we
have there are 10 directories in total
in the training data set. And there are
zero directories and 750 images in the
10 food classes all day data train
steak. So 750 images of steak, 750
images of chicken wings, fried rice, ice
cream, ramen, sushi, hamburger, chicken
curry, grilled salmon, pizza, and then
for the test data set, we have 250
images per class. Wonderful. And again,
all of this comes from food 101, the
food 101 data set, which you can find on
Kaggle. But I'll let you follow through
the steps there if you want to download
it and pre-process it yourself. Right
now, we're just going to work with some
already pre-processed data into the file
structure.
That's very common for uh image
classification is to have your data set
in train and test and then the different
classes are the file names for whichever
image is inside that particular
subdirectory.
Now, how would we inspect these further?
Well, let's um set up the train
and test directories. We're going a
little bit faster this time because
we've already been through some very
similar steps with the binary
classification problem. So, we're kind
of reiterating what what we've been over
already. So, if the pace seems a little
bit faster, don't worry. I'm sure you
can keep up. Um it's nothing we haven't
covered before except that we're working
with a little bit different data. So set
up the train and test directories. This
is the new data sets that we're working
with for the training file folder and
the test folder. Wonderful. And now
let's get the sub directories.
These are our class names. We could
actually just get this from the
TensorFlow image data generator object.
Yeah, how about we do that? We don't
actually want that. Let's just start to
visualize. Remember our motto of um data
exploration. Visualize. Visualize.
Visualize.
We have a function above
or actually I think this is why
I see why we want the list. So, ignore
that for a second. Let's actually let's
get the class names before we even
import them into TensorFlow. We'll get
the class names. So, we want path lib.
I'll show you one way we can do it
without TensorFlow. How about we do
that? Import numpy is numpy. Um the data
dur is the path lib.path.
I'm going to turn the training directory
into a Python path object. And then I'm
going to get the class names. going to
turn them into an MP array because
that'll help out our function in a
second. We'll turn them into a sorted
list because we want our class names in
alphabetical order. And we want item
name for item in data dur data dur is
just this path Python path glob glob
which is like a funny word for um like
path tree. So if this was like test
stake image one image two glob is like
what that a name for that that's just
how I understand glob um you might have
a better definition for that and it's
also funny to say glob glob print class
names you'll see what this comes out as
anyway so beautiful we got a list of
class names now of course we could get
them from just our TensorFlow image data
generator instance which we're going to
create probably in the next video but
this is how we'd get them if we didn't
have tensorflow and we just wanted to
use numpy arrays and python path
objects. So now we can use our image
equals view random image. We created
this function before should have really
set up a dock string so we know what's
going on. But we'll set the target
directory to be trained. And then the
target class is this is where random is
going to come in. Random choice. And our
class name list is going to come from.
So, you might be able to guess what that
does if you're familiar with random. If
not, what do we do when we're not
familiar with something? We write it
out. Random dot choice class names.
What's it going to do? Fried rice. We
run it again. Ice cream. What's your
favorite food?
Hamburger. Very close. I do like
hamburgers.
And then we go there and let's run this.
What does our image look like? Oh, look
at that. So there's the file name.
There's the image shape 5125123.
And it is steak. And we go again. Fried
rice. And so this is how we're getting
familiar with our data. We're starting
to we're working with a new data set. So
what's our data explorer's motto?
Visualize. Visualize. Visualize. And so
this is where we've we've built our
handy function here. And remember how I
said I really love visualizing random
data. That way you're getting a good
overview of the different types of
images that you're working with. We're
getting a lot of steak images here.
Chicken curry. Let's see. We got ice
cream. No ice cream. Yes. What a guess.
And so of course you could expand the
view random image function. If we're
working with multiclass data, you could
make it view random images. And you
might want to plot a subplot here of
five images at a time. So you can look
at multiple different classes. It's
totally up to you. But after going
through this, you should start to be
a little bit more familiar with the
different classes that you're working
with. So, okay, sushi. There we go.
Hamburger. So, there's a lot of um a lot
of noise going on here. It isn't
necessarily just an image of pure
hamburgers. We got a a drink cup and
some fries there. What else do we have?
Hamburger. Okay. Again, a lot of noise
going on here, but there's a there's
clearly a hamburger there.
So, these are the sorts of things we
want to start checking out. Fried rice.
That's pretty okay. That's a pretty
clear image. Grilled salmon. Okay,
that's that that could be quite a
difficult image cuz I mean I'm not sure
what that is. I'm not sure what that is.
But we got some lemon, some grilled
salmon. The plate could be some
interference to what our model's trying
to learn. We'll do one more for good
luck.
And then finish off with steak.
Wonderful. Okay, so let's have a look at
the steps that we've been through.
become one with the data. Of course, we
could keep visualizing images for as
long as you want. There is close to
we've got 10 classes and 750 images. So,
there's 700, sorry, there's 7,500
training images. Um, so you could be
here a while. Hence why it might be a a
good little extension to expand this to
do multiple images at a time. But let's
uh let's not hold ourselves back here.
We're up to pre-process
the data. Now, I want you to try and go
ahead with this uh before I catch up to
you in the next video. We've already
been through it, right? But the thing
we're going to have to change is now
we're working with multiclass data. So,
potentially if you want to skip ahead,
go back to this step, pre-process the
data, prepare it for a model, and just
run the exact same code, but change it
to work with the current data that we're
working with. So, give that a shot, and
I'll see you in the next video.
[Music]
All right. So, we've been through a
bunch of different images. How about we
just do one more for the hell of it? Why
not?
Oh, chicken curry. And one more for good
luck.
Ramen. Delicious. I had some of the best
ramen ever when I was in Japan. But
let's not get distracted. This is a
problem with working with food data.
Makes you hungry. Let's pre-process it.
So, we've seen a bunch of different
samples. We've made sure that our image
data directories are in the right
format. Let's set up our TensorFlow data
loaders. So, we'll come here. We need to
go from TensorFlow.processing
image import image data generator. And
then what are we going to do first?
Let's remind ourselves of our workflow.
Steps in modeling with TensorFlow. Get
data ready. Turn it into tenses. Turn
all data into numbers. Neural networks
can't handle images. Yes. Make sure all
of your tenses are the right shape. What
shape are we using? We'll see that in a
second. And scale features. Normalize or
standardize. Neural networks prefer
normalization. In other words, get the
tensor values between zero and one. So,
let's have a look. Rescale. How do we do
that? Well, we go train data genen. I
hope you had an attempt at this yourself
by referring to the code we wrote
before, but if not, let's do it
together. So, we need to rescale our
training data. Remember, in our images
current form, the pixel values are
between 0 and 255. So, we have to divide
them by 255 to get them to scale them
between 0 and one, which is also
sometimes referred to as normalization.
image data generator rescale 1 / 255.
Wonderful. And now we're going to load
data in from directories
and turn it into batches.
What's our favorite batch size?
Well, 32 is good for your health. Um,
but again, your batch size will depend
on the problem you're working on and
also the amount of compute power you
have. train dur but that will be
something that you get familiar with
when you start to work on your own
projects. So this is our target size 224
224. Remember this is not set in stone.
We could really set this to whatever we
want but this is this is a relatively
good size for a lot of sort of baseline
image classification models because look
at our current image shape is 384512 for
that photo of Ramen.
And again, there's three color channels.
And this this photo of Ramen is 512 512.
So, it's going to it's going to make
this image a little smaller. It's going
to go to 224 224.
Um, but that's okay. We'll go batch size
equals 32. We're going to set the class
mode. What is it? This is probably the
only parameter we needed to change from
the binary classification. We could
actually leave it as the default. Why?
Because class mode equals categorical
which stands for do we have here what
does it say in the doctrine class mode
one of categorical binary or sparse
categorical will be 2D one hot encoded
labels
see again that when you read these
things right without extensive
experience you might read it and go wow
I didn't understand a single word in
that see this is why I'm a big fan of
writing the code and seeing what comes
out and then lining up what you see with
what the descript description is. So
let's just ignore that for the time
being. Our data set is categorical
because we're working with multiclass
not binary
test data equals test data gen.
Wonderful. Flow from directory test dur.
And then we're going to go target size
equals 224 224.
And then we are going to go batch size.
We don't really have to set the batch
size uh because the default is 32. Hey,
look at that. Thank you, dock string,
but we're going to anyway just so we get
familiar with the concept of data
batches. Very important. Equals
categorical.
Wonderful. So, what happens if we run
this?
Would you look at that? So, we have
7,500 images belonging to 10 classes.
That is the training data. And that
lines up with what we did before with
our little inspection code here. 750
images across 10 classes. What is that
out to give? 7,500.
And we have 2,500 images belonging to 10
classes for the test data. Does that
line up with what we did before? Yes, it
does. So, have we become one with the
data? Well, not entirely. There are a
few more things we could do to inspect
it, but that's that's enough for now.
So, what's where are we up to? Let's
pre-process the data. That's that's step
three. What did we have next? Create a
CNN model. So that's what we're going to
have to do in the next video. Now again,
I want you to try and jump ahead of me.
So we're going to go create a model.
This is the next video. I'll just I'll
just write this here. Start with a
baseline.
Now, we've been talking a lot. Let me
write this down. We've been talking a
lot about the CNN explainer website. So,
we go CNN explainer, one of my favorite
websites in the whole world.
And they work with 10 different classes.
And what are we working with? We're
working with 10 different classes as
well. So, how about we just take their
model and replicate it for for our
problem? So, yeah. How about
we just take their model also on 10
classes and use it for our problem.
So that's your little challenge before
the next video is to replicate the CNN
explainer. This is a tiny VGG
architecture. You can find that uh it
says it somewhere on the page, but
replicate this. We have an input. Our
input shape is going to be different. Of
course they use 64x 64. What did we use?
224 x 224. Then we have a convolutional
layer. ReLU com max poolool comu comu
max pool output. So that's my challenge
to you before the next video. But
otherwise I will see you there and we'll
create it together.
[Music]
How'd you go creating the model? Did you
copy the one from the CNN explainer
website? What you might have noticed is
it's basically a clone of model 8, the
one we created before, but this is
multiclass. Actually, are we up to what
model number are we up to? Model 8.
Model 7. Ah, okay. Model 7. So yeah, the
CNN explainer website is basically a
clone of this except it uses how many
convolutional layers? It uses two
convolutional layers before a max
pooling layer. Whereas this went com
layer max pool com layer max pool com
layer max pool. So CNN is com with rel
with relu max pool com with rel with
relu max pool output. Okay, let's do
that. So it's actually that's a lie.
It's not not not a direct clone of Model
7, but it's pretty close. Let's go back
down here. Where were we? Far out. We've
written lots of code. You should be very
proud of yourself. Import. This is how
we create our model. Import TensorFlow
as TF. Wonderful. Of course, we've
already got many of these dependencies
from before, but we're going to practice
writing them out again. We want we want
the sequential. We'll turn it into a
sequential model. And what layers do we
need? keros.layers.
If we're going to be building a CNN
model, we need com 2D. We need max
poolool 2D. What else? We need a flatten
layer. And we need a dense layer for the
output. Wonderful. So, let's go here,
create our model. very similar to
previous models but actually the same as
CNN explainer website except of course
we're using different image shapes so
let's go we're up to model 8 I believe
um if not that's all right we can start
from model 8 com 2D what are we going to
set 10 three because remember where
we're getting these numbers they're not
made up 10 and then our filter eyes.
Where is it? If we come in here, they
use a 3x3 kernel. So, there's their 3x3
kernel. So, let's come in there. So,
that's our filters. We want 10 filters.
We want a kernel size of uh three.
Strides we can leave as one. Padding we
can leave as valid. But what do we need?
We need activation. They have
a activation. Now, I want to show you
something that we could do as well.
input shape equals just in case you
wanted to separate your activation
layers. Let's um let's bring in
activation as well. We haven't done this
before.
So let's go activation
activation equals ReLU.
So that's if you wanted to separate it
and create it exactly like they've got
it here. But you could always just put
in ReLU uh in here in activation. So
let's uh let's go to the next layer.
What do we need? We need another com 2D
with 10 filters. A kernel size of three.
Strides can stay as default. Padding can
stay as default. Uh these two parameters
here, but we're going to set activation
equals relu. And then we want what are
we up to? We're up to max pool. Maxpool
2D. We actually don't even have to pass
anything to max poolool 2D cuz pool size
is automatically two. That's fine by us.
And then we're going to do another two
com 2D layers. Three activation equals
relu.
And then another
com 2D activation equals trusty relu.
And then maxpool 2D. Wonderful. We need
to flatten our outputs from the max pool
layer. Pass it into a dense layer. Oh,
that was a spoiler alert. How many
activation or output neurons it does our
dense layer have? I'll give you a hint.
How many classes are we working with?
We'll go 10. And then what is the
activation function? So this is very
important between we did a binary
classification model. This is something
you would have had to have changed from
the binary classification model to the
multiclass classification model. And
what is that? It's instead of sigmoid,
we use soft max. Wonderful for um so
this is the change changed to have 10
output neurons and use the softmax
activation function.
And again, if you really wanted to, you
could delete that. And then you could
write activation softmax. You could do
that if you want, but we'll go here
activation equals softmax.
Wonderful. What do we have to do after
we created our model? We have to compile
it. There's another thing in here that
we're going to have to change from
previous models. Why? Because our data
is different. Compile. What's going to
change?
Give you a hint. It's our loss function.
We used binary cross entropy before. But
what data did we import? I'll give you a
hint. The class mode is here. So, we
have to change our loss from binary
cross entropy to
categorical loss entropy. Categorical
cross entropy.
Magnificent. And then what's our
optimizer? We can keep that the same.
our trusty TF carers optimizers
atom
and then metrics. We'll use a simple
classification metric of accuracy cuz
all of our classes are balanced and have
the same amount of images in each. So
compiled,
boom, look at that. Created a model. We
just did that. We copied a model that's
out in the wild, right? Using our
TensorFlow skills and adapted it to our
data set which has 10 classes of images.
How good is that? And I mean, if you had
a 100 classes of images, what would you
change? Well, you might need to change
this. That should be 100. And if you
wanted to change your input image shape,
what could you change that to? Well, it
depends on what you're working with.
Maybe you had 300 by 300 images. So, you
could change it to 300 by 300. You'd
probably also have to change this 300
300 300 300. We could actually set that
as um a global variable. Image size
equals
224 224.
And then well actually you don't do dot
dot dot do you? That would be a syntax
error. I was just doing that as I was
typing. Then we could just put image
size there, image size there, and
yada yada yada. Anyway, um so that's our
model created. What's our next step?
Let's refer back to the ones we've
already done. Fit the model. All right,
so now's your chance. We've got a model
created, model 8. Have a go before we
get to the next video. I'm going to
challenge you in every video on this one
in this section to fit the model to our
new data set. So, it's should have 10
classes. See how you go. I'll see you in
the next video.
[Music]
How'd you go? Did you fit our new model
and model 8? If not, that's okay. Let's
do it together. Fit a model. Wonderful.
So now we've got our model compiled.
Fitting it's actually quite easy once
you've got the data ready. This is this
is the beauty of TensorFlow. So fit the
model. Of course, we're going to save
our history to history 8 to line up with
the model number. We'll go model 8 fit.
What are we fitting on the training
data? So this is now 10 different
classes. probably a little bit confusing
compared to the cuz our binary training
data was also the same name. We could
have maybe made this train data 10
classes to prevent ourselves from
getting confused, but that's okay. We're
smart enough to figure it out. So, we'll
go for five epochs, our standard amount,
just so our experiments don't get too
long. And the steps per epoch is that we
want to tell our model to take enough
steps through the training data because
remember our training data is now in
batches of four uh 32. So we want it to
take enough steps per epoch to go
through every batch. And then we will go
validation data
equals our test data. Again, a little
confusing with the variable name there,
but we're smart enough to figure out
what they mean. validation steps is
going to be length of the test data.
Now, if you wanted it to validate a
little bit quicker, we'll probably see
this in a later video, but you could do
something like this. 0.25 * length test
data, and then, of course, you could
evaluate it on the full test data set
later. But that's just a little way to
um to speed it up. So that would mean it
would go through 25% of the test data to
validate it while training instead of
the whole test data set. So that's just
one example, but let's leave that as
just the full test data. And remember,
if we don't know something, oh, I don't
want to see my bookmarks. I need command
MB to go down there. Blend train data.
We write out another cell. So 235
stamps. Let's see where that number does
that come up if we run this. You ready?
3 2 1 about to fit our brand new model
which is very similar to uh this well
basically the exact same model as on the
CNN explainer website.
Let's run it. Oh yes.
Now I want you to pay attention to how
long my model takes per epoch. Um if
yours is taking far longer than that,
remember you may have to change the
runtime type. Make sure you're using a
hardware accelerator GPU, but we did
that at the start of the notebook. So,
the GPU should be running in the
background. And there we go. There's the
steps per epoch. So, I'm going to let
mine run for however long it takes. The
first epoch is generally slightly longer
than the others, but uh I'll check back
in once this model has finished fitting
to the training data.
All right. Would you look at that? Our
model looks like it's learning
something. Although the the validation
accuracy could of course be a little bit
higher. Training accuracy is pretty good
though. It's near 75%.
And so now I just wanted to do a little
bit of quick arithmetic. If we're
working with 10 different classes and
they each have 750 images in each class,
so there's no class with like 5,000
images and one with only 10 images. What
would guessing be in terms of an
accuracy score? 10 classes all with the
same amount of data. If we imagine 100
is the max, so 100% accuracy, which
would be 1.0.
And we have 10 different classes,
guessing would be 0.1. So in other
words, 10%. Because you would just
randomly guess a class in a group of 10
classes. So you could just say randomly
hamburger. You could say hamburger for
every image and you'd get about 10%
correct because 10% of the images are
hamburger. So our model's doing about
three times better than guessing, but
ideally we'd like to we'd like to
improve that. And another thing to take
note here is that why do you think our
model took longer to fit even though
we're running longer to fit? Sorry, that
is compared to our binary classification
problem even though we're running on a
GPU. So if we come back to model 7.fit,
fit. How long did that take? It took
about half the time. So this is 22
seconds versus what are we working with
right now? Model 8. So this is about 45
46 seconds. So yeah, about double. Well,
it's because we're now working with
instead of two classes of 750 images,
we're working with 10 classes of 750
images. And the same goes with um the
testing data. So that's why our model
just has to simply or our GPU just has
to simply look at more examples of data
to try and find patterns. That's why it
takes longer. And the same would go is
if we were working with 100 classes, it
would take longer. The more data you
have, the longer it's going to take um
whatever computing power you're using to
find patterns in it. And so now that
we've fit a model, we've got a trained
model, what's number five? What's the
next step? I'll want you to have a think
about it before we check it out, but
it's going to be
evaluate the model. So, let's write that
out. That'll be the next video. Evaluate
the model. So, I want you to go back up.
Of course, this is the challenge again
to um this step here that we did with
our binary classification problem and
just bring the steps down here, rewrite
them, but translate them to work with
our multiclass classification model. I'm
going to get rid of that and I will see
you in the next video.
[Music]
All right, this is exciting. We've
trained our first model on 10 classes of
food. Uh we're getting closer and closer
to building our food vision app. So
let's uh evaluate on the test data. How
might we do that? Well, we can take our
trained model, model 8, and then we can
just go evaluate test data. We'll
probably get a very similar result to
what we saw for this last epoch, but
we'll check it out.
Doesn't take too long.
Yeah, there we go. So, we get roughly or
basically the exact same as we did for
the output here. So, this is our
accuracy score, 30% accuracy. Remember,
what's guessing 10 classes of evenly
distributed data? It would be about 10%.
So ideally we'd like to increase this,
but good news is our model is doing
better than just guessing. So let's um
check out the model's loss curves just
like we did on the 10 classes. Um and
then we go plot loss curves. We we made
this function
back up somewhere in the in the
notebook. I think it was somewhere in
here, section five, because we wanted to
plot more loss curves. And we're going
to pass it history equals 8.
Oh
h what's going on here?
What do these tell us? What's the ideal
uh position for these curves?
We'd ideally we'd like them to to to
line up. If our training loss is going
down, we want our validation loss to go
down in roughly uh the same manner. But
we can see here validation loss starts
to go down but then training loss
continues to go down uh after about the
first epoch but the validation loss just
starts to explode. So what is that
telling us here? It means that our model
is potentially overfitting on the
training data. So it's learning the
training data too well and not
generalizing to unseen data which is why
the validation loss just starts to
explode here. And it's a similar story
with accuracy. There's a big gap between
the curves. Ideally, we'd like them to
line up and but of course with accuracy,
we want them to go up. With loss, we
want it to go down. And so, what can we
do here? Let's write ourselves a little
note. What do these loss curves tell us?
Well, it seems our model is overfitting
the training set quite badly. In other
words, it's getting great results on the
training data but fails to generalize
well. Remember, generalize is adapting
to data it hasn't seen before um to
unseen data and performs poorly on the
test data set. So, what we'll probably
do is we'll try to fix this, right?
Right? That's what we want to do before
we're exploring too too much further
into evaluation. It's this is this what
I mean by experimenting is once you sort
of get to a little bit of a knack about
the experiments you should run. If you
see this well that's that's something
you should try and tackle as soon as
possible. All you got to do is just look
at this one curve and go you know what a
model's overfitting. If I was to
continue pursuing with a similar model
um it's probably going to keep
overfitting. So I should probably try
and reduce that. So let's have a look at
what our next step is. adjust the model
parameters. So what did we write for
multiclass classification?
Uh adjust different hyperparameters and
improve the model. Try to beat our
baseline. So we've got a baseline score,
right? This is what we're going to do
next. We've got a baseline score with
our model we created in step three,
model 8. But we also want to reduce
overfitting because our baseline is
overfitting which is again overfitting
is not necessarily a negative thing when
it's your baseline because it means your
model is learning something. So let's uh
in the next video let's tackle step six
and we'll turn this into a header three
so this looks a bit prettier. So we'll
come back down into here and we'll go
here. In the next video, we're going to
adjust the model hyperparameters
to beat the baseline slash reduce
overfitting. Now, you can also refer
back to the steps we took above to
improve our model or to the lecture to
see how you might beat our baseline and
reduce overfitting. So, give that a
shot. That's a challenge for this video.
As I said, I'm going to challenge you at
the end of each video. But otherwise,
I'll see you in the next video and we'll
adjust our models hyperparameters and
see if we can beat our baseline and
reduce overfitting.
[Music]
Welcome back. So, what did we find out
in the last video? Well, due to our
model's performance on the training
data, we can assume that it's learning
something. but that what it's learning
is not generalizing well to unseen data.
That's why the the validation loss is
just exploding out of the roof here. So
that's what we're going to try and
tackle. We're going to try and improve
our baseline results uh and of course
reduce overfitting. So bring these
curves closer together. So what are some
ways we can do that? So let's uh let's
write ourselves a little note. We'll
reiterate what we just said. So due to
its performance on the training data,
it's clear our model is learning
something. However, it's not
generalizing well to unseen data.
So let's try and fix uh we'll put in
brackets overfitting here. Let's try and
fix overfitting by now. Here are some
methods that we can use to prevent
overfitting.
So number one is get more data.
Having more data gives a model more
opportunity to learn um diverse
patterns.
What's another one? We can simplify the
model. So if our current model is is
overfitting the data, it may be too
complicated of a model which means it's
got too many layers and it's it's
learning it's actually learning the
patterns too well on the training data.
So the layers the convolutional layers
maybe that means we have to take some
out so that it doesn't learn as much so
that the patterns that it does learn we
force it to to learn more generalizable
patterns rather than specific patterns.
Um, one way to simplify a model is to
reduce number of layers or reduce well
this is actually two ways but doesn't
matter. We'll just um you get what I'm
saying reduce number of hidden units in
layers. What's another way we can use?
We could use data augmentation.
Now I'm not pulling these out of the air
by the way. We've actually seen this
here improving a model. This is from a
data perspective. So this is a method to
uh improve a model, reduce overfitting
more data. What does it do? But what I'm
trying to do here is not always just
refer back to the slides. I'm trying to
get us to think about ways that we can
use to improve our models without just
always referencing. So this is we're
trying to commit these things to memory.
You know, data augmentation, increase
the diversity of your training data set,
better data, use transfer learning. Oh,
that sounds exciting. We're not actually
going to cover this one. This is a
spoiler for the next section. And then
if we come back to improving a model
from a model's perspective, if we wanted
to improve our model, we could add
layers, uh, increase the number of
hidden units. But this actually works in
reverse. When you're trying to reduce
overfitting, you want to remove layers
or decrease the number of hidden units.
So these are the things that we've
written down in the the notebook so far.
Um, use data augmentation.
data augmentation
manipulates the training data in such a
way to add more diversity to it. So
without altering the original data and
then finally another one we'll put it
down here for good measure. Use transfer
learning. Just want to give you a little
uh spoiler and a little excitement to
look forward to what we're going to do
in a module coming up. Transfer learning
leverages the patterns another model has
learned on similar data to your own and
allows you to use those patterns on your
own data set. So what can we do from
these? These are some of the main ways
to reduce overfitting. Get more data.
Can't really do that. We've got our well
we could but we want to sort of stay
within our notebook. Simplify the model.
We can do that. Yes, we can. We can
remove some layers. Use data
augmentation. We can do that as well.
And use transfer learning again. Um, we
could actually do that, but I'm going to
save that for a future module. So, how
about we try to do the simplest step
first? Our simplest experiment. That's
what you should keep in mind. What's the
simplest experiment that you can do to
get you to sort of eliminate uh what
works and what doesn't? So let's um how
about we try and simplify
the model first. So what's our current
model?
Model 8 dot summary.
And we've got convolution convolution
max pool convolution convolution max
pool flat and dense. I've got no idea.
Let's just whack
one of these convolutional layers from
each of these sections. So we we'll just
smack two convolutional layers out of
our model. That's a good idea. Let's try
to remove two convolutional layers.
So the ideal uh setup for this
experiment would be we'll simplify our
model. And what do we want it to do?
Ideally, we'd like it to get very
similar results to this or even better,
but also we want these curves to be
closer together. So they're the two
things that we're going for. We want
higher accuracy and we want these loss
curves to be very close to each other.
So that's that's the goal of this
experiment. Let's um we're up to model
9. We can code this out pretty quickly.
Sequential.
Now this experiment is quite simple cuz
all we're doing is just the same model
as model 8, but we're whacking two of
the convolutional layers out. ReLU
equals input shape equals 224 23.
Beautiful. Let's throw in a max pool and
then com 2D. This is simplifying our
model. Activation equals ReLU and then
max pool 2D and then flatten cuz we want
to pass the output of maxpool 2D into
our dense layer as a vector. That's what
flatten does. And the output is going to
be softmax. How easy is that? Beautiful.
All 9.
We want to what's our loss function?
Too late. Categorical cross entropy
because we're working with multiclass
data. The optimizer is going to be you
know this one TF car optimizers
do atom trusty atom and metrics equals
accuracy.
What just happened there? Okay,
wonderful. So there's our new model.
We'll go here. Let's um have a look at
our model 9 summary.
So model 9 equals convolution.
Beautiful. Huh.
The number of parameters didn't actually
change too much. But let's again let's
just experiment and see what happens.
We've removed two layers. Is this going
to change much? There's only one way to
find out and that is to fit it. fit the
model with two times com layers removed.
Model 9 fit. Oh, we need our history 9
cuz we're saving our history variables.
Train data epochs equals 5. Um, steps
per epoch equals what is it? The length
of our train data. And then we can go
validation data equals test data. And
then validation steps equals len test
data. Wonderful. Let's let this fit.
Should be a little bit quicker because
we're not using we've kicked out two
layers. So I'm going to let this uh let
this run through and then or actually
why don't we just guess now comparing
these two summaries. How do you think
this is going to come out?
My guess is that the loss curves are
actually going to be quite similar
because the total number of parameters
didn't actually change too much. It
actually increased from model 8 to model
9. So maybe we didn't actually simplify
our model too much. But again, we're
experimenting here. We've got our
experimenters hat on. I'll see you once
these epochs are finished and we'll
check out our model's loss curves.
All righty. Looks like my model's
finished up. Uh, how did yours go? Oh,
wow. We got a decrease in validation
accuracy.
Well, looks like that experiment that
goes down the drain. But this is an
important point. Let's uh let's um check
out the loss curves
of model 10. So, plot loss curves
history. Oh, no, it's not model 10.
We're up to model 9. I'm getting ahead
of myself. Model 10 is the next one
we're going to build. Um, history nine.
Wow. Okay. Our loss, our validation loss
curve is basically going to the moon.
Um, so no improvement there by removing
two convolutional layers. Why do you
think that might be? The hint lies in
the summary here. Model 9
summary.
Uh, and then can we print them both?
I'll just put them both here. Model 8.
So have a think about that
why why that might have occurred and
I'll give you a clue. It's cuz model 9
actually has more parameters even though
we've removed two layers. What
convolutional layers actually do is they
reduce the feature space. So adding an
extra convolutional layer here actually
doesn't increase too much complexity to
the model. It actually forces a model to
try and learn the better features in the
data.
And so this is an important point though
and I wanted to to showcase that not
every experiment that you run will
improve your results. But this is this
is the motto of the machine learning
practitioner. It's experiment experiment
experiment. So what have we got next?
What are our other options to try and
over fix overfitting by? We've
simplified the model. Well, we've
removed the number of of layers um
convolutional layers. We found out that
doesn't really work. How about we try
using data augmentation? That's
something we're familiar with. So, come
up in here. Do we have a opening our bag
of tricks and finding data augmentation?
Ha. So, we have looked at data
augmentation. How about we try that
next? So, we'll come down here. We'll
write a little note. Looks like our
simplifying the model experiment didn't
work. The accuracy
went down and overfitting continued.
How about we try data augmentation?
Yeah, that is a great idea. So, let's
put in here. So, we go here. Trying to
reduce overfitting with data
augmentation. So, this is my challenge
for for you. you're going to have to
create a new instance of um image data
generator with some data augmentation
built in. So you're going to have to
make a new
train data variable. Maybe it's called
train data augmented,
something like that.
And then you could even use a very
similar model to what we've used before.
So give that a go. Create some augmented
data and fit a model to it. and then a
model similar to what we've built
before, maybe the baseline model, and
then see if that fixes our overfitting
problem and increases the validation
accuracy. So that's your challenge. I'll
see you in the next video.
[Music]
How'd you go? Did you manage to create a
data augmentation instance of image data
generator? Did it work better than our
simplifying the model experiment? If you
haven't tried it just yet, that's all
right. Let's uh let's write ourselves a
note of what we're going to do in this
video. Um let's try and improve our
model's results by using augmented
training data.
Ideally, we want to
reduce overfitting. get the train and
validation loss curves closer.
And the next point is we want to improve
validation accuracy. So this is what we
want. This is what we want to do. How
can we do it? We need to first create an
augmented
data. That's a fun word, isn't it?
Augmented
uh instance. It sounds fancy. We want
train data gen. We're going to call it
augmented. Oh, we've already got uh this
variable. Again, we could name it
something better, but this is just for
simplicity. We want to go image data
generator equals rescale 1 / 255. And
then what can we do to to um augment our
data? Let's come back to our keynote.
And we want our
data augmentation
options.
So here's what our original images look
like. We could rotate them, we could
shift them, we could zoom them. And the
good thing is that TensorFlow has a
bunch of these methods um built in for
us. So let's let's just go wild and do a
combination of of a bunch of different
ones. So maybe we'll rotation. We'll use
rotation range. Where is this one?
Rotation range equals 0.2.
width shift
height shift range equals 0.2
zoom range we'll zoom in on it as well
and 0.2 two and then a horizontal flip.
Why not? We'll flip around the
horizontal axis equals true.
Wonderful. Okay. So, there's our
augmented data generator instance. And
then we can Oh, we don't want to do
that. We want to train data augmented.
Thank you for the autofill. Train data
augmented
um data gen. Now, of course, shuffle is
going to be uh true here by default.
Where's shuffle? I shuffle in here.
Anyway, shuffle's on by default. So,
that's why we don't have to put in
shuffle there.
Train data augmented. We're going to
pass it our training directory.
Target size is we know this one by now.
224x 224. Again, you could adjust that
depending on your use case. Batch size,
same thing. You could adjust this
depending on your use case. And then
class mode is going to be categorical.
Wonderful. So, oh, what do we have here?
Oh, we need to uh flow from directory.
Of course, we do.
Let's bring this out there. Bring that
one out there. Make our code look nice
and neat like the Pythonistas that we
are. Wonderful. We have 7,500 images
belonging to 10 classes. But now we have
an augmented version of our data. So
what can we do? Well, we can create
another model. So let's create another
model, but this time we'll fit it on the
augmented training data of 10 classes.
Does that sentence make sense? Sort of.
I believe we're up to model 10. So model
10 I want to introduce to you. So this
is one way we're going to cheat. We're
not going to rewrite all of our model
code. We're going to clone model 8. You
might be able to infer what this does.
So, model 8, let's remind ourselves of
what model 8 is. Model 8 is the same as
the tiny VGG
architecture. So, we're going to clone
that. But the clone model function, if
we come back down to here, clones just
the model architecture. It doesn't clone
any of the things that because what
model 8 has already learned some
patterns but clone model copies the
model without any it resets all of its
internally learned patterns. So if we
get the dock string here clone any model
instance model cloning is similar to
calling a model on new inputs except
that it creates new layers and thus new
weights instead of sharing weights of
the existing layers. That's a key point
there. So that's just model 10. We're
creating the same as our baseline except
what are we trying to do with our
experiments? We're trying to beat our
baseline. So, model 10 is a replica of
our baseline. However, the experiment
we're running is fitting our baseline on
different data to try and improve the
loss curves and increase the validation
accuracy. That was a mouthful, but we
are going to compile the cloned model
using the same setup as previous models.
So, we've done this a few times now. So,
model 10.mpile Compile and we want loss
equals categorical
cross entropy
entropy.
And then the optimizer is going to be of
course our trusty optimizer atom
optimizers.
And then finally the metrics is going to
be accuracy. Wonderful. Now, before we
fit this, let's just do a little
comparison just to see if I'm not a liar
with um model 10 being a clone of model
8
summary.
Fingers crossed I'm not a Pinocchio.
There we go. So, there's the total
parameters. There we go. How cool is
that? Just a little clone model feature.
If you ever see that, that's what clone
model does. Takes the architecture of
the model, you pass it, but it resets
all of the internally learned patterns.
Let's get rid of those. Now we've got a
compiled model. What can we do? We can
fit the model. That's going to be
history 10 equals
model 10.
And then what do we need to change here?
We need to fit it on the training data
augmented. Yes, we do. Epoch is going to
equal five. And then everything else
will stay the same. Steps per epoch.
That can actually be the length of the
train data augmented.
And then we will also set up the
validation data to equal the test data.
And then finally the validation steps
equals the length of the test data as
well.
Wonderful. Are you ready? You ready to
fit a model? What's what's going to
happen? Actually, do you remember when
we fit our other model on augmented
data? What happened to the length of the
How long did each epoch take? Did it
increase? Did it take longer? What do
you think's going to happen here? You
ready? Let's find out. 3, two, one, run.
Beautiful. Oh, we can see that the ETA
has increased quite a little bit from
what we had before. What did we have
before for the first epoch? 43 seconds.
Now, why is that? Well, it's because
we're using augmented data. And remember
this this function here, image data
generator, augments data on the fly. So,
as it's been loaded into the model. So
that means that all of our data here
remains unchanged. So it just augments
the data while it's loading in. Hence
why uh the epochs take a little bit
longer because it takes a bit more
compute power to augment that data. But
I'm going to let this run for however
long it takes to run. And then once it's
finished, we'll come back and uh we'll
check out the loss curves of of our
model fitting on augmented data and see
if it also improves the validation
accuracy from our baseline.
I'll see you soon.
All right, looks like the model's
finished training on some augmented
data. And you can see it takes a a
fairly long time compared to our model
trained on non- augmented data. about
double the amount of time. That's
because, as we said, when the data gets
loaded in, it gets augmented on the CPU
and then the data gets loaded onto the
GPU, hence increasing the amount of time
per epoch. But let's have a look. We'll
peek at our validation accuracy. It's
increased a little bit. That is
beautiful. What was the results on our
model 8, our baseline? So, evaluate
M test data. Let's remind ourselves of
that. And then we can do the same here.
model 10 dot evaluate test data. And
then of course we want to check out our
model trained on augmented datas loss
curves cuz that was one the other thing
we wanted to improve and go plot loss
curves history_10.
Oh that is looking much better. All
right. Right. So let's let's first step
through this. We see here this was model
8 our baseline trained on non- augmented
data. Same architecture for model 10
remember because we've used clone model.
Um but training on augmented data we've
seen an increase in validation accuracy
of almost 10 percentage points. That is
incredible. And even better the training
loss in the validation curves are
heading in the right direction. It even
looks like if we were to keep our model
training for longer, this would keep
decreasing and accuracy would keep
increasing. Of course, uh this curve is
not perfect, but it's a lot better than
what we had before. It's certainly
heading in the right direction compared
to if we had a look at here, our
original one plot loss curves history 8.
Certainly looking a lot better than
that. I mean, that was exploding up to
the top. At least our validation loss
curve for model 10 is heading downwards.
So let's uh let's exit out of that. So
uh we'll write a little note here.
That looks much better.
The loss curves are much closer to each
other than the baseline model and they
look like they're heading in the right
direction.
Certainly not the wrong direction.
So if we were to train for longer, we
might see further improvements. That
might be another experiment that you can
run. But where are we up to? So we've
we've covered a few experiments here so
far. Uh simplifying the model. What did
we do next? So we adjust the model
parameters there. We checked out our bag
of tricks data augmentation. and we
repeat it until satisfied. Okay, so I
think that's the that's the next one
we'll go through. So I'll leave this
here. Repeat until satisfied. This will
probably be a short lecture actually.
We'll just discuss some of the things
that we could try. But my challenge to
you is to write some of them down. Out
of all the things that we've tried so
far, what could we try? This is a little
hint here,
but uh I won't reveal anything until the
next video. So have a go at writing down
some things we could do to further
improve our model and we'll discuss them
in the next video.
[Music]
We're up to step seven. Repeat until
satisfied. Now just have a look at what
you've done here so far. We started with
a binary classification problem. We
broke it down. And then what we've done
is we've applied all of those steps we
did with our binary classification to
our multiclass image classification
problem. And so we can write here
what are some steps we could do? Well,
we could keep going here, continually
trying to bring our loss curves closer
together and trying to improve the
validation slash test accuracy.
How?
By running lots of experiments.
Namely, we could try restructuring
our models architecture, eg increasing
layers slash hidden units or in the
convolutional case, we could increase
the number of filters. Um, we could
adjust learning rate, but it does seem
like our model is learning pretty good
already with the default atom learning
rate. We could try different methods of
data augmentation. What I mean by that
is adjust the hyperparameters
in our image data
generator instance. So that would be up
here where we created our augmented
data. We could play around with some of
these to see if different values lead to
improved results. What else could we do?
We could try training for longer, eg 10
epochs instead of five epochs.
Most of these things we've kind of
talked through, right? We've got
improving a model from a model's
perspective. We could add layers,
increase the number of hidden units,
change the activation functions, change
the optimization function. So atom works
pretty good, but maybe we want to try
something else. Change the learning rate
fit on more data. So we could find some
more data somewhere online or we could
collect it ourselves and we could try to
fit for longer but there's another way
we haven't quite looked at yet. So we've
discussed more data. This is from a data
perspective this time. We've discussed
data augmentation. We've discussed uh
better data but how about using transfer
learning? So take an existing model's
pre-learned patterns from one problem
and tweak them to suit it for our own
problem. For example, take a model
trained on pictures of cars to recognize
pictures of trucks. So, we'll put that
one down here and then try transfer
learning. Now,
we're not actually going to go through
that in this module. That's going to
come up. It's actually going to be quite
a large section of this course, transfer
learning. But, if you want to have a
look into, you could run a bunch of
experiments such as these ones we've
we've just discussed here. But if you
want to sort of go ahead and um do some
research on what transfer learning is,
that's what we're going to do in the
next module. We're going to see what
transfer learning is and how we can use
that to improve our model's results. But
the goal here with repeating until
satisfied is that what we're trying to
do is of course make our loss curves
better and then improve whatever our
target metric is. So in our case, it's
validation accuracy. So that's that's
what step seven would be in a nutshell.
That's a nice and uh short little
textbased lecture. We didn't actually
write any code. But now since we've got
a trained model, let's make a prediction
with it. Making a prediction with our
trained model. More specifically, we
want to want to we've we've seen how our
model does on the test data set. So the
test images, how about we get some of
our own custom images from the 10 food
classes, remind ourselves what class
names is. We'll find some of our own
images based on these classes and see
what our model does. So I'll meet you in
the next video. If you want to give it a
shot, you can try to upload some of your
own data here. Try to upload something
that if you've got a photo of a chicken
curry you've cooked or maybe you're
really good at making homemade pizza,
upload that to to Google Collab and
we'll uh we'll write some functionality
or some code to make predictions on
those images.
[Music]
All right, time to take down one of the
most exciting parts of machine learning,
and that's making a prediction with our
trained model on Let me write in here.
Write ourselves a little note. Let's use
our trained model to make some
predictions on our own custom images.
And then we've got our little reminder.
Remind ourselves
of the classes our model is trained on.
So we got the class names there. And
what what do we need if we're going to
make some predictions on custom images?
What do we need? Well, we need custom
images. So what we're going to do is
download some custom images. So I'm
going to go to the course GitHub. I've
uploaded some images already.
And
so we've got we'll get the classic pizza
dad. How about what else could we get?
We get the hamburger. Let's get sushi.
And let's also get steak.jpeg.
And now we need the raw version of this.
So you need to click the download
button. We're going to copy that URL.
And we're going to go wget. Copy that in
there. That's pizza dad. The same thing
with this one. This is a hamburger. So,
we're going to copy that.
Wget.
And then the same thing for sushi.jpeg.
And we're going to go right wget.
Wonderful. And then finally, we need
steak.jpeg.
I am such a good chef. And wget. Boom.
So, this is all going to download and
we'll just clean up our browser tabs
there. Of course, if you don't want to
download the ones from GitHub, you could
just click the little upload button here
and you can upload your images to there
and it will Oh, wow. We've got a fair
few copies of Pizza Dad
and it'll upload it to the Google Collab
session. Remember, when Google Collab
disconnects, all of these will be
deleted. So, if you don't want to lose
your work, make sure you've got it
somewhere else that's not Google Collab.
So, we should have some images. And
we've got a function we made before.
Let's see if it works. So, make a
prediction using what's our current
model? We're up to model 10. I think we
are model 8, model 10. There we go. Make
a prediction using model 10. So, we've
got a function we created before called
Fred and plot, which takes in a model,
model 10. It also takes in a file name.
Which one should we do first? Well, we
downloaded pizza dad first. So, we'll
just copy this path here. 03 file name.
And now, of course, you can change this
to whatever you want, whatever your
custom image is. I'm just using pizza
dad. And then the class names is of
course the class names variable. Let's
see if this function works.
Oh no. What's it got wrong here? Um,
only one size arrays can be Python
scalers. Hm. What do we got going on
here? So, class names. So, there's the
error. Class names.
Ah, that's an array. Maybe if we pass it
in as a list,
it should work.
List.
All right, let's see if this works.
List.
It didn't work. Hm. I think there might
be something up with the image maybe.
All right, let's check our PR and plot
function
def prred and plot
class names int round tf prred passes it
in load and prep image file name. Yes,
class names equals class names. Hm. Ah,
you know what? It's our PR and plot
function is set up to only work with
uh I think that's what's going on is it
only works with
that's what's happened. It only works
with binary images. So, we need to
adjust it to work with let's bring this
down here. We need to adjust it to work
with multiclass images. So, this is
where our error is.
So reconfig
and plot function to work with
multi-class images. So how can we do
this? So there's the PR and plot
function.
I think we need to the prediction.
So if we go print F prred,
is this going to work?
Does it print anything? Okay, there we
go. I think that's where it is. Only one
size arrays can be converted to Python
scalers. So it comes out like this. We
actually want it to be
so the predict I think can stay. But
then we need to add in how can we do
this? We need to uh add in logic for
multiclass.
So if the length of prred zero cuz we
want to zero index it cuz look what came
out like that. We only want this inner
array here. If length of PR is greater
than one because remember if it's a
binary classification it's only going to
the sigmoid function is only going to
output one number but the softmax
activation function outputs uh
prediction probability for every class
that our model is trained on. So if pred
is longer than one which means it's uh
it's multiclass we want to go pred class
equals class names
um prred dot or tf dot argmax. So get
the get the position in here which has
the highest value. So in this case it
would be where is it this class
whatever index that is arg max cred
wonderful and then else. So this is for
binary classification else it would just
be
this
yeah that's correct.
Add in logic for multiclass and get pred
class name.
All right.
So, I think this should work. If in
doubt, run the code. What do we got
here?
Only integger scalar arrays. Hm. Maybe
we get rid of that list. This is some
live troubleshooting here. How fun is
this? Okay. Too many indices. Too many
indices for array is one-dimensional,
but 10 were indexed. TF arg max. If
length pred
This is how I troubleshoot functions.
Print pred no len prred zero
10. Wonderful. So class names yes if
pred zero is 10. If it's longer than one
yes pred class equals. Now this is what
we also need to print. We need to print
TF argmax PR. That's also what we need.
Shape equals 10. H.
Oh, we need to index on zero here.
Oopsies.
What happens there?
There we go. Boom. Have a go at that.
All right. This is predicting the wrong
class. It's ramen. We need to get rid of
that. These two outputs.
That might have broken that for binary
classification, but I'll let you check
into that. Um, let's uh let's run this
again.
Okay. So, I got that one wrong. Let's
try another one of our images. What do
we have? Steak.jpeg.
Grilled salmon. So, I got that wrong.
Did it get any of them right?
Oh
no. Oh, sushi. Sushi.jpeg.
Grilled salmon. Okay, you can kind of
understand why it might be picking
grilled salmon there. And then let's try
finally hamburger
sushi. All right. So, it looks like our
model is performing pretty poorly. Like
it didn't get any of those correct. So
that's probably where we'd be looking to
definitely improve it before we deployed
this into our food vision application.
So before we move forward onto the next
module and we'll see how we can actually
improve uh our model, let's um let's
save it just so we cover all bases. So
we'll see how we can save and load our
trained model. I'll see you in the next
video.
[Music]
Welcome back. Just a little update on
our updated PR and plot function. It
does work with binary classification. So
if we bring in model 3 which was trained
on binary classification, let me just
write a note here. Trained on pizza
verse steak. Um and we pass it the file
name pizza dad.jpeg. Pass it the class
names as a list. Pizza and steak. We run
it.
Wonderful. And then why not for good
measure we try it on the stake.
Stake. Beautiful. So just a note here as
well. Our model didn't do very well on
our own custom data, which is which is
kind of um
not the highest note to to sort of
finish on, but that's all right. We're
going to see how we can improve this out
of this world in the next module. Um but
let me just write a note here. Looks
like our model didn't perform very well
on our custom images,
but this is because so we kind of we
were kind of expecting this. This is
because it only achieved 39ish%
accuracy on the test data. So we can
expect it to function quite poorly on
other unseen data.
But that's all right. We'll see how we
can improve that in uh the next module.
But uh we said at the end of last video
that we're going to save and load our
model. So saving and loading our model.
This is important for if you wanted to
say deploy your trained model such as
model 3 which works a bit better on
pizza and steak or if you actually did
want to try this out model 10 in some
sort of application you could save it to
a TensorFlow saved model upload that to
Google storage or something like that
host it on Google's AI platform and then
deploy it as an API of course there's a
lot of words that I just said that we
haven't covered but this is just so that
you know how to save your trained models
and let's go here. This is how we save a
model. And let's say our target model is
10. Save. And then we want to save it as
saved trained model 10. And then we can
just run that.
This is going to appear save trained
model 10. The default is in the saved
model format in TensorFlow. So there we
go.
And then if we wanted to, we can load in
a trained model and evaluate it. This is
just so you know we've saved it here.
It's going to save the model, all the
weights, all the patterns, etc., etc. To
make sure that it's saved correctly, we
can go loaded model 10 equals TF carers
domodels.load
model and then we go saved trained model
10. And then we want oh, we need that
bracket back up there. Then we want
loaded model 10. We're going to evaluate
that on the test data. Shift and enter.
All right. So, we'll see how it performs
there. And then we can go here, compare
our loaded model to our existing model.
So, model 10. Evaluate M test data.
So, these should be the exact same
results. if our model has been saved
correctly. Reloaded back in. We've
evaluated the loaded model and this is
the model we originally saved. There we
go. Look at that. Beautiful. Same
results. Slightly slightly slightly
different at the back end of these
numbers here. Again, that is because of
how computers store numbers. They aren't
quite exact. So, once you get really
close to the tail end, you might see a
little bit of differentiation. But for
the first half a dozen decimal points,
we've got the numbers lining up. Well,
that is very exciting. Look how much
look how much you've covered.
Introduction to convolutional neural
networks and computer vision with
TensorFlow. We started with an end
to-end example. Went through that like
an absolute whirlwind and then we broke
it down step by step with binary
classification. We became one with the
data. We pre-processed our data. We
created a CNN model. We we actually
learned what a CNN actually is. We
learned what a convolutional layer is.
We learned what kind of layers and what
kind of things comprise a convolutional
layer. We had a look at our favorite
website for convolutional neural
networks, the CNN explainer website.
There's definitely a lot to take in. But
that's uh like with learning anything
new, anything worthwhile, there's a lot
to take in. So I'll let you read through
that again. We fit the model. We
evaluated it. Then we did it all over
again with multiclass image
classification. However, we kind of
finished on a model that didn't do too
well on our custom data. Hm. Sad face.
But that is all right. In the next
module, we're going to see how we can
improve uh a score like this using the
power of did we mention it up here? This
is a spoiler for the next section, by
the way, transfer learning. So once you
finish this video, I'm going to attach
some uh extra curriculum and some
exercises to practice what you've
learned all throughout this module. So
have a go at doing those first. Make
sure you try and do them yourself before
looking at any kind of solutions and
then yeah, I'll see you in the next
module. Oh, and PS um the
extracurricular and exercise will be
linked after this video. However, if you
want to find them, don't forget we've
got the GitHub. and go to TensorFlow
deep learning and then this beautiful
table will take you to everything you
need to know. So go to exercises. Boom.
There we go. 03 computer vision and
convolutional networks and TensorFlow
exercises. So number one, spend 20
minutes reading and interacting with the
CNN explainer website. You might have
already done that, but congratulations.
Massive effort on making your way
through section three. I'm not sure what
this section is going to be called on uh
wherever you're watching this, but it is
third notebook that we've gone and
you've written a lot of model code
specifically to do with computer vision
and TensorFlow. Give yourselves a pat on
the back. I'm very very very proud of
you and I will see you in the next
module.
[Music]
transfer learning with
[Music]
TensorFlow. I told you I'm going to do
that at the start of every new module.
And now I am so excited for this
upcoming series of videos. It's actually
going to be possibly the next three
notebooks at least. We're going to be
covering one of the most powerful
techniques in deep learning. And that is
transfer learning with our friend
TensorFlow. And this is part one. So
transfer learning, there's a few
different types of it. We're going to
cover in this section. and we're going
to cover feature extraction. Now, again,
you probably never heard of transfer
learning, or maybe you have, and you
probably never heard of feature
extraction. Well, if you haven't heard
of transfer learning, you've probably
never heard of feature extraction
either. But we're going to start to
cover these topics, so don't worry.
However, first and foremost, where can
you get help if you run into a problem
with this section? And just to remind
you, it's normal to run into errors with
your code. You might see me writing code
and not running into errors, but trust
me to get to that point. I went through
lots and lots of errors. So, first and
foremost, follow along with the code if
you can. Remember our motto. If in
doubt, run the code. Try it for
yourself. Write the same code as me, run
it. See what happens. What are the
outputs? If I'm writing a function and
you're not sure what's going on with the
function, write it out line by line and
see what happens there. Press shift,
command, and space if you're in Google
Collab to read the dock string. That'll
give you information about the functions
that we're running. If you're still
stuck, you can try searching for it.
You'll come across resources such as
Stack Overflow and the TensorFlow
documentation. Now, the TensorFlow
documentation is quite verbose, but with
some practice, and by verbose, I mean
there's a lot going on, but with some
practice, you will definitely start to
work your way around that. Once you've
done that, try again. If in doubt, run
the code. And then finally, ask. So,
don't forget the Discord chat. And yes,
that includes the inverted commas. I'm
putting my fingers up in the air. You
can't see me, but including the dumb
questions. There really are no dumb
questions. If you've gone through these
steps and you still suck, there's no
such thing as a dumb question. So, let's
get on to it. You're probably asking,
"What is transfer learning?" And that is
a great question. And what you may have
thought in the previous modules is that
we spent a lot of time constructing our
neural networks, right? But you may have
thought surely someone has spent the
time crafting the right model for the
job. And that's the exact premise of
transfer learning. Is that chances are
depending on what problem you're working
on, someone has probably already created
some sort of neural network that works
at least somewhat with the problem that
you're working on. And we're going to
get very hands-on with this throughout
this. So the slides are conceptual, but
we're going to see this in TensorFlow
code shortly.
Now this is some example transfer
learning use cases. We've covered
computer vision with that in the last
module. So one example of transfer
learning in computer vision is there's a
very popular computer vision data set
called imageet. We're not going to go
into that, but you can imagine imageet
as just millions of different images of
a whole bunch of different things. We've
looked at food images so far. We've seen
that slowly working towards building our
food vision app. Imageet is of food, of
cars, of animals, of plants, you name
it, it's probably an imageet. So a lot
of computer vision research around deep
learning for computer vision. Imageet is
used as a benchmark. So what that means
someone will spend a while crafting a
computer vision architecture much like
the CNN architectures we've spent time
building and they will try to get the
highest possible score on imageet and so
the better the computer vision model the
higher the score at least that's the
goal right and so hm if someone is
spending all of this time building
computer vision models to do really well
on imageet
how come we can't leverage that for our
own problems. I mean, take their
existing computer vision architecture
and apply it to our food vision problem.
Well, guess what? We can. And we're
going to see how we can do that. We're
going to see how we can leverage
state-of-the-art computer vision
architectures trained on imageet and
then adapt them to our use case. That's
the whole premise of transfer learning.
It's taking what one model has learned
in a similar domain to ours, in our case
classifying images, and then it's
applying it to our specific use case. So
where imageet might have a thousand
different classes of images, our food
vision application, we might only have
10 different classes of food, or in this
case, there's three different classes of
food here. And what you'll see is that
transfer learning we can get amazing
results in far less time and far less
handcrafting our own architectures
because again we're leveraging the hard
work of others.
And so it can also be used in problems
such as natural language processing. In
other words, finding patterns in textual
data. So you can imagine a neural
network trained to find patterns in
text. We haven't covered natural
language processing yet, but it's very
similar. The premise is very similar to
what we've done with image
classification. With natural language
processing, you turn text into tenses of
numbers. You build a NLP architecture to
find patterns in those tenses of numbers
and then you use those for whatever your
use case is. So for example, you could
imagine there's a neural network
architecture somewhere out there trained
on all of Wikipedia. So it understands
which word should come in which order.
And then we could probably adapt that
already trained architecture to our
problem of classifying whether or not an
email was not spam or spam in this case.
Look, I really wish this email wasn't
spam. I mean, that's a large sum of
money, but hey, if it's not coming into
my inbox because we've leveraged a
neural network to make sure that this
fishing link doesn't come into my inbox,
well then I'm happy. And so the premise
of transfer learning is a model, a deep
learning neural network architecture of
some kind learns patterns/w weightights
from a similar problem space. For
computer vision, it might be on imageet
or for natural language processing it
might be on all of the text on
Wikipedia. So just a large amount of
text. So it learns the patterns from a
similar problem space. And then in our
case where we want to use it, we want to
use the patterns on our specific
problems. So for the case of food
vision, it would be classifying
different images of food. And for the
case of deciphering whether an email was
not spam or spam, we might use it there
for natural language processing.
And so if that's the use cases, why
would we use transfer learning? And now
you might have already intuitively,
we've kind of hinted at this already,
but just to make it clear,
we've got a slide for it. So transfer
learning, why use it? We can leverage an
existing neural network architecture
proven to work. So that's the important
point. Instead of doing a whole bunch of
experimentations, writing our own models
for hours and hours, even days, even
weeks on end, we can find something that
other people have found that works on
their problems that are similar to ours.
So we can use that and instead of
waiting for our own handcrafted neural
networks to train on a large amount of
data, we can leverage a working
architecture. So one that's proven to
work, which has already learned patterns
on similar data to our own. So this
often results in great results with less
data. And again, this is just text on a
slide, so pretty boring. We're going to
see this hands-on with code very
shortly.
So here's the example. We'll redisuss
this again. So for computer vision, it
may learn patterns in a wide variety of
images such as using imageet.
Researchers may have discovered an
efficient net architecture. Efficient
net is one of the state-of-the-art
computer vision architectures at the
moment. Now again, this is important
point. The architecture efficient net is
currently proven to work. However, this
may change over time. But that's the
beautiful thing about transfer learning
is that once a new state-of-the-art
architecture has been discovered, it's
usually best practice depending on what
your problem is is to use the
architecture which is performing best in
your problem domain. Okay. So in our
case at the time of recording this,
efficient net is one of the best at
identifying patterns and images. So it
already works really well on computer
vision tasks. What we're going to look
at doing is tuning those pre-trained
architecture. So leveraging those
already learned patterns from imageet to
our own problem. In our case, it's
building this food vision app. And so
our model, well, this again, this is
just text on a slide. I'm going to back
up my claims with code in a minute. Our
model performs better than if we just
trained it from scratch because we're
leveraging patterns already learned. So
these are the main two reasons of using
transfer learning. But again, it's just
text on a slide. Let's go through what
we're going to cover and then we'll get
into code. So this is broadly. So we're
going to introduce transfer learning
with TensorFlow. We've talked about it
conceptually, but we're going to get
hands-on with code. We're going to use a
small data set to experiment faster. 10%
of training samples. So the exact same
data set we used in the last module for
computer vision. We're going to build a
transfer learning feature extraction
model with TensorFlow Hub. We haven't
discussed what any of this is yet, but
don't worry, we will. We're going to use
TensorBoard. We haven't seen that again.
To track modeling experiments and
results. And how? Well, we're very
familiar with this little diagram.
We're going to do it as cooks. In other
words, we're going to be cooking up lots
of code. So, uh, cooks, not chemists.
Lots of experiments.
You ready? Let's code.
[Music]
Hold on. This quote is actually perfect
timing. Do the best you can until you
know better. Then when you know better,
do better. So, so far we've been
building our own computer vision
architectures and they've been the best
that we've known. But now we're going to
be introduced to transfer learning. And
so when you know better, do better. Oh,
I'm so excited for this. Let's open up
our good old friend Collab. Is it
collab.resarch.google.com?
I should know this off by heart already.
I usually just go to collab. Beautiful.
We're going to start a new notebook.
And we're going to call this 04.
Let's go transfer
learning in TensorFlow.
and part one feature extraction. And I'm
going to put the video tag on the end of
mine because this is the notebook to go
along with the video. Just a little
reminder that if you go to
github.com/mrdberg/tensorflow
deep learning, all the course materials
are here. Hey, it's even got the little
course materials there. So if you want
the original notebook, there's a table
here in the TensorFlow deep learning
GitHub, the repo. The original notebook
will be here. So that'll be the ground
truth of all the stuff that we're going
to go through. And I'll also add this
once we're done to GitHub. So make sure
you bookmark that and that way you'll be
able to access all the resources for the
course. And so let's call this one,
we'll give it a little heading
and go transfer learning with TensorFlow
part one.
feature extraction
and we'll just remind ourselves of what
transfer learning is. So
transfer learning
is leveraging
a working models existing architecture
and learned patterns for our own
problem. And so
there are two main benefits. What are
they? Can you remember from the last
slide? We're not going to go back to it.
We're just going to try and remember
these. So number one is can leverage
an existing neural
network architecture
proven to work on problems similar to
our own. This is very important. And
number two is can leverage
a working neural network architecture
which has already learned patterns on
similar data to our own. Then we can
adapt those patterns to our own data.
Okay. So that's a fair amount of text to
start off with, but this is just so we
know what we're covering. And so in this
video, what do we have to do first
whenever we start with a new problem?
Well, in the steps for modeling, you
might have remembered we need to
download and become one with the data.
So how about we do that? We've kind of
Oh, I got a typo here. Transfer
learning. But before we do that, we need
a GPU for this. So come into runtime,
change runtime type. I'm going to change
mine to GPU. So that's going to speed up
all of the neural network code that we
run. Finding patterns and numbers. A GPU
is much faster than no GPU. So I'll
click save on there.
And then we'll make sure that our
runtime is connected. And we can check,
are we using a GPU
running Nvidia SMI
and beautiful, we've got a Tesla T4 GPU.
Excellent. That'll be plenty enough for
what we need. So, how about let's start
off. We'll get some data in here.
Download and becoming one with the data.
So, as I said, we're kind of going to go
a little bit faster with some of the
sections that we've already covered.
We're using the same data set as we used
in the previous notebook. So, 10 food
classes, all data, we're using the same
one here, but this time, oh, actually,
we're using 10% of it. This is where the
power of transfer learning is really
going to shine. So, in our previous
notebook, we used 10 food classes, all
of the data. So there was 750 images in
each class. Now there's only going to be
75 image for each training class. We'll
see this when we explore the data. So
let's get data 10% of 10 food classes
from food 101
and we are going to import zip file and
then we can
download the data which is stored on
Google storage. So I'm going to come in
here copy that link address again this
is from this link here. All the links
will be where you need to find links.
wget.
Here we go. We're going to do that. And
then we will unzip the downloaded file.
So we want zip ref
equals zip file dot zip file. And then
we need to pass it the path to the
downloaded file. 10 food classes
10%.zip.
Beautiful. and then zip ref.extract
all and then zipref.close.
So, this shouldn't take too long at all.
Downloading from Google storage.
Beautiful. And now, do you remember how
we walked through all the different
directories within the file we just
downloaded? Here we go. 10 food classes,
10%. So, if you remember from the
previous module, we had training. We
have 10 different classes from the food
101 data set
which is just this one on Kaggle. I'm
just going to link this here.
I've just downloaded this pre-process 10
classes and chosen 10% of those randomly
selected training images from each
class. I'll leave the pre-processing
notebook where you need links as well.
Boom.
All right. But instead of just talking
about it, let's inspect our data. So how
many images in each folder? We can do
this by importing OS and then we can
walk through 10% data directory and list
number of files.
So for dur names file names in os.walk,
walk. We want 10 food classes 10%.
And then we want to print
there are we want len of dernames
directories
and
len
file names.
So this is inspecting our data. very
important step with each new project in
derpath.
So this is going to output what is in
our data. Let's see what this says.
Okay, so there are two directories and
zero images in 10 food classes 10%.
Wonderful. And there are 10 directories
and zero images in 10 food classes 10%
test. So there's 250 images in each of
the testing classes. So the test data is
actually the same testing data that we
used in the previous module in notebook
03.
And the only difference here is that
instead of 750 images in each training
class, we're now using 75 images in each
training class. Now this is going to
highlight one of the powerful features
of transfer learning. So we've only got
10% of the training data that we had
before. So 10 times less examples for
our model to learn. Intuitively, how do
you think this will go if our model had
750 examples of what fried rice looks
like? Do you think it would do better
than if it had 75 examples?
So intuitively, you might think that
less data leads to worse results. And
that would make sense of all the things
that we've learned for deep learning so
far. But as we'll see shortly, transfer
learning has another idea. So what you
could do if you wanted to is visualize
some of these, but it's the exact same
data we've used in the previous module.
So I'll let you write a a little
function here to visualize what the
images are going on. But let's create
cuz we're working with TensorFlow. We're
going to create some data loaders. So
this is preparing I forgot how to spell
for a second preparing the data. So
we'll use the image data generator
class to load in our images in batches.
So maybe I'll uh turn that into markdown
so we don't get an error. And we'll go
here setup data inputs. So from
TensorFlow
preprocessing
image import image data generator. Now
there's nothing in this cell that we
haven't seen before. Except this time
actually that's that's probably a little
lie. We're going to set up a global
variable this time so we're not
typing them out all the time. Now in
deep learning you'll see a lot of
notebooks that write things in capital
letters.
This is just something you'll come
across. I just thought of it then. And
that usually means that this is some
sort of hyperparameter.
All right. So if you ever see a capitals
in a deep learning example notebook,
generally that's a hyperparameter that's
going to be used somewhere else. So we
could do epochs equals 5. We're going to
set that later. And we're going to go
trainer.
What's our training directory? Well,
it's 10 food classes. This time it's 10%
10% of the training data. So 10 times
less data than what we used in our
previous notebook.
Train.
Oh, we need to get rid of that. And the
test dur is going to be
my fingers are not dancing around the
keyboard how I want them today. Test.
Wonderful.
Now what do we do? We need to set up a
train data gen. So we'll go image data
generator. We need to rescale our data.
1 / 255 dot. Beautiful. And then we're
going to do a test data gen equals image
data generator. Same thing here. Rescale
255 dot. And then we're going to print
out
training images.
We can go train data 10% just so we know
that we're using 10% of the training
data. Train data gen flow from
directory
and we're going to pass a trainer and
then the target size can be our image
shape that we created up here. So we'll
go image shape and then the batch size
can be you might have guessed this can
be the batch size and then the class
mode is what what type of data are we
working with when we've got multiple
classes. I've kind of spoiler alerted it
there but that's all right. We're
working with categorical data. If we
were working with only two classes what
would it be? Binary data. So now let's
print out the testing images. and we
want to go test data. Now the test data
is the exact same test set we were using
in the previous notebook. So although
there are less training images, there's
the exact same testing images, the same
amount and the same actual images in the
testing directory. So that's all you
need to know for that. We're going to go
test data gen. That way we can compare
the results of our transfer learning
model that we're about to build with the
results of the models we built from
scratch in our previous notebook. Batch
size is going to be batch size and then
class mode equals
categorical.
All righty. So let's load that. Oh,
image shape is not defined. Oh, I
actually typed it fully out to be image.
I've tried to reduce my number of
keystrokes and ended up not. Okay,
wonderful. So, oh, this is good timing
for this warning. If you're using a GPU,
because Collab gives you free resources,
and this actually costs Google money.
So, if we've set up our runtime to use a
GPU, Collab will intermittently tell
you, hey, you're using a GPU, but you're
not using the GPU. So, do you want to
stop using that GPU and give it back to
someone else? So, if you're not using a
GPU, it's very kind to just let someone
else use that. But we're going to be
using one very shortly, so it's safe to
ignore that warning. So, as you see
here, we have training images. So, this
time we've only found 750 images
belonging to 10 classes. So, instead of
7 12,000 images, we have 750, but we
have the exact same amount of testing
images. So this is a very important
point to showcase the power of transfer
learning. We only have we have 10 times
less examples for our model to learn
patterns on. So just keep that in mind
as we go forward. In the next video,
we're going to set up something new that
we actually haven't seen before. So get
ready for that. I told you this is a
very exciting module taking over or
learning some of the most powerful
concepts in deep learning. I'm so
excited. I'll see you soon.
[Music]
Now, before we build a model, there's an
important concept we're going to cover.
Well, we're going to get very familiar
with it actually because it's going to
play a key role in our future model
building experiments, and that is
callbacks. So, let's write a heading
here. Setting up callbacks. Now,
callbacks are things to run whilst our
model trains.
All right, let's break this down. So,
I'm going to write it in one sentence
here. Callbacks are extra functionality
you can add to your models to be
performed during or after training. I
know I've said while a model trains, but
they can also happen after training. Now
let's talk about some of the most
popular callbacks. So some of the most
popular callbacks are again these are
just going to be the names. We're going
to look at the explanation. We're going
to get hands- on with them first. So
tracking experiments with the tensor
board call back.
Model checkpointing with the model
checkpoint call back. and
stopping a model from training before it
trains too long and overfits with the
early stopping call back. Now, we've
said three there. These are some of the
most popular call backs cuz they're very
helpful. We're going to get hands-on
experience with all of them eventually,
but I just wanted to to say the names of
them now so that when I mention them
later, it's not like, well, where did
this come from? So, let's have a look.
Let's go TensorFlow callbacks. We've
only mentioned three. There's a lot. So,
we come in here. You can even write your
own. Look at that. So, we're going to
use some of the pre-existing ones. This
is the TensorFlow documentation in the
TF car's callback callback. So abstract
based class to use to build new
callbacks. So writing your own call
backs. But here we have where's the
overview overview call backs
call backs utilities called at certain
points during model training. Beautiful.
Then we got a few other options. So
these are all the existing ones. Early
stopping. There's the one we talked
about there. Learning rate scheduleuler.
That's something that will change your
learning rate as you go. Model
checkpoint. That'll save the car's model
or model weights at some frequency. So
say for example we wanted to save our
model every epoch instead of at the end
of training. We could use the model
checkpoint and then tensorboard enable
visualizations for tensorboard. We'll
see what that is shortly. So let's go
back to the keynote.
What are callbacks?
Callbacks are a tool which can add
helpful functionality to your models
during training, evaluation or
inference. And some popular callbacks
include, we've discussed these three,
callback name, tensorboard, use case,
log the performance of multiple models,
and then view and compare these models
in a visual way on TensorBoard. A
dashboard for inspecting neural network
parameters. This is helpful to compare
the results of different models on your
data. We can use that with the code
here. There's model checkpointing. We've
discussed what that can offer. And
there's early stopping. But again, this
is just a slide. And if you're like me,
you like to learn things by writing
code. So, how about we create a
tensorboard callback. All right, we'll
go back to our notebook. And why would
this be important?
Why would we want to track our
experiments?
So, how have we done it in the past?
Like in our previous notebook, if we
open another collab tab and we go to
notebook 03
here, we kind of didn't even track our
experiments, did we? We just scrolled up
and down the notebook looking for
different model results. So here's what
we did. We were like, okay, where's one
model? There's a model. That's what it
got. Validation accuracy. All right. Did
it beat it on this one? No, it didn't
beat it. How about on the next one? Oh,
I beat it on that one. And then
everything's all over the place. So,
this is where the tensorboard call back
is going to help us. Again, TensorBoard
covers a lot more things than just
experiment tracking, but we're going to
use it to see what experiment tracking
looks like in TensorBoard. So, where we
go? We got another model. Yeah, all over
the place. All right, let's create a
function to build a TensorBoard
callback. And so if we want to track our
experiments,
we might want to know, hey, when did we
run this model and what was the
experiment name? So how about we set up
a function that creates a folder based
on a certain name of an experiment cuz
we could call it model 4, the
experiment. That would be simple, model
4. And then that way we want one folder
that stores all of our training. Then we
would have a folder with model 1 2 3 4
and here's all the results stored in a
folder. And then we could use that
folder. Yeah, that's a great idea. Let's
do that. All right. So, I said a lot of
stuff there, but it's probably going
straight over your head if you've never
heard it, but it's going to make sense
once we code it. So, create tensorboard
callback.
And now we want to functionalized
functionalized because because we need
to create a new one for each model.
We'll see what this looks like in
practice. So we're going to start by
importing date time because we're going
to use a date stamp a timestamp for when
our model was run. This will be helpful
so we can know, hey, this model was run
on last Wednesday, but the model on
Thursday beat it. So what was the
difference between the Thursday and the
Wednesday model? Then we go defaf create
tensorboard call back and we need a
directory name. The directory name
parameter is what we're going to use to
create a file to store our tensorboard
callback. In other words, where our
model training results are going to go.
And then we are going to create an
experiment name. So the experiment name
is say we want model 3 is our
experiment. We could set it as the
experiment name. So we'll set up a
variable called log dur where the logs
are going to go and then it's going to
be a combination of directory name plus
we need a slash here to create a file
path. And then we're going to go plus
experiment name. And then we'll go plus
we'll add another slash here. And then
we're going to get the current
timestamp.
So whenever this is created datetime dot
datetime dotnow dot string from time and
then we're going to set it as we'll go
year first then we'll go month then
we'll go day. You can get specific as
you like here. And then we'll set it up
as hour,
minute, seconds. Okay, fair bit going on
here, but what this is is creating a
directory path of where to store. So the
directory path is going to be the
directory name slash the experiment
name. So there's a subdirectory with the
experiment name slash the date time. So
that's when we ran our specific
experiment.
And then we go tensorboard. We'll create
tensorboard callback
equals
tf.c carers docallbacks
tensorboard.
And now where does that come from? It
comes from the callbacks module.
So if we click on tensorboard
tf.caras.callbacks.tensorboard.
Now you see here how the first parameter
is logged. We go down. And what does log
dur say the path of the directory where
to save log files to be passed by
tensorboard. So if we come back in here
well we've just created log dur. So
let's set that up. We'll go log dur
equals log dur. Wonderful. Then we'll
print out just a little notification
here to tell us saving
tensorboard log files to
log dur. Wonderful.
And then we will return the tensor board
call back.
All righty. So we went through a whole
bunch of steps just then. Let's go back
through what we've said. So we're
setting up callbacks. This is a very
important concept in model training. As
you start to build larger and larger
models and run lots of more modeling
experiments, they become even more
important. So tracking experiments, we
can track them with TensorBoard. We'll
see exactly what TensorBoard is shortly.
Model checkpointing. If we're training
our model for we've only trained models
for really five epochs, but let's say we
train them for a thousand epochs or just
indefinitely and we wanted to checkpoint
it say every 10 epochs so that if
training fails or something at least
we've got a checkpoint saved somewhere.
And then early stopping is another very
popular callback which is we could set
our model training as long as it likes
but as soon as it stops improving some
metric such as validation accuracy it
stops. Now, we've only spoken about
these two in Word so far. We'll see them
in action later on. And so, now that
we've got a call back ready to go, how
about we start preparing to build our
first transfer learning model? We'll see
this in action once our model is ready
to train.
[Music]
Now, we're up to a very, very exciting
point. I'm going to introduce you to one
of my favorite TensorFlow resources, and
that is TensorFlow Hub. We're going to
write a heading here, creating models
using TensorFlow Hub. All right. Now,
let's just have a look. All right. So to
get to TensorFlow Hub, we can search
TensorFlow Hub
and we could go to TensorFlow Hub on the
TensorFlow.org
page. Let's see what it says. TensorFlow
Hub is a repository of trained machine
learning models. Oh, that is beautiful.
And look at this. Look how quickly
we can start to use a TensorFlow hub
model. We got import TensorFlow hub as
hub. We haven't seen that before, but
we're going to practice. We set up our
model in one line of code and then
embeddings we pass this is a natural
language processing model here. We pass
it to our model. Wow, this is absolutely
amazing.
So models. So find train models from the
TensorFlow community on tfub.dev.
Let's go there. I'm going to grab this.
We'll copy that and we'll go here. We'll
write down what we've been doing. So in
the past we've used TensorFlow to create
our own models layer by layer from
scratch.
Right. So now we're going to do a
similar process
except the majority of our models
layers are going to come from TensorFlow
hub. We can access pre-trained
models on tfub.dev.
Now let's explore what are we looking
for? What are we working on? We're
working on food vision. So image
classification. So let's keep that in
mind when we're exploring TensorFlow
Hub. Okay, tfub.dev.
I'm going to explore. There's a lot
going on here. Just like the TensorFlow
documentation. Welcome to TensorFlow
Hub. TensorFlow Hub lets you search and
discover hundreds of trained Oh yes,
ready to deploy machine learning models
in one place. Oh my goodness, look at
this. We've got problem domains: image,
text, audio, video, depending on what
problem you're working on. What are we
working on? We're working on images.
Wonderful. Oh my goodness, look how many
problem domains that there are. What
TensorFlow version do we want? Well,
TF2, that's what we're working with.
Okay. So, we'll start to filter out.
Let's explore this problem domain. We
don't want image augmentation. Image
classification. Yes, that's what we do
want. We want image classifier. Hm.
That's a bit confusing, but let's get
rid of that. Image feature vector. Ooh,
we do want that. Object detection. No.
No. Remember, we're working on food
vision, pose detection, RNN agent,
segmentation,
style transfer, super resolution. No,
thank you. Okay, so we've narrowed it
down to a few things. What have we got
here? What's feature vector? We've
discussed that briefly.
H okay, feature vectors of images. This
is what I was talking about of mobile
net v2 trained on imageet. We haven't
even seen what imageet is. Imageet. Here
we go.
Imageet is an image database organized
according to the wordnet hierarchy.
We're not going to go through that. What
you need to know about imageet is it's a
very big database of images of whole
different classes. So what we'll notice
is a lot of these models here, the
feature vector ones are trained on
imageet. So we could keep scrolling
through here
for as long as we want. There's a lot of
models here, almost too many. So to kind
of fix things up, how about we narrow
this down? So I might go into
architecture and I just want efficient
net B 0. And what else do I want? I also
want
maybe ResNet V250.
Okay. Now you might be thinking, Daniel,
where where did you even get the name
efficient net from? Where did you get
the name ResNet from? I've never even
heard of these things. Well, this has
only come from experience with working
with deep learning models, and you're
going to come across a heap of models in
your work with deep learning that are
named quite strangely when you first
begin. Now, the reason why I chose these
is typically when you're working on some
sort of problem and you want to use
transfer learning with your problem, you
want to use the model which has worked
best in your particular problem domain.
And so how would we find a model that is
working best in our particular problem
domain? So ResNet, I'll just let you in
on a little secret. Resnet is a a very
good image classification model. Same
with efficient net. But if we wanted to
find which model is performing the best,
we can go to a website called papers
with code. This collects all of the
latest research papers in deep learning
and puts them into one place. Browse
state-of-the-art. Here we go. Now again,
there's going to be a lot of stuff here
that you're not quite sure of, but I
just want you to start to get familiar
with these resources because you're
going to come across some later on in
your deep learning journey. So, what are
we working on? Computer vision, image
classification. Okay, so this is where
researchers from top universities and
top companies publish their paper. So, a
research paper. You can often find code
as well. And this is going to tell you
which architecture is performing best on
image classification.
So image classification is a fundamental
task that attempts to comprehend an
entire image as a whole. So here we go.
So this is the best method so far on
imageet. So efficient net L2 meta pseudo
labels that's from the paper. So again a
lot of information here. I myself even
look at these things and I don't
understand what's going on. So, I just
want you to get familiar with this kind
of resource.
Just to let you know, not always the
model which is performing best is the
best for your problem cuz a lot of these
models take a lot of different
techniques and whatnot. They're not as
straightforward as what we've been
using. So, that's just something to keep
in mind. But that gives us the name
efficient net, which is when we come
back to TensorFlow hub, we found
efficient net. So of course you could
dive in and read the efficient net paper
if you wanted to. Efficient net paper
rethinking model scaling for
convolutional neural networks and
that'll tell you all about efficient
net. But the beautiful thing is is that
often times when a new model has come
out and it's performing really well on a
certain type of problem and lots of
people are getting use out of it, it
goes up to TensorFlow hub so that we can
access it. So that was a lot of talking
and I showed a lot of resources. cuz we
haven't done quite much coding but let's
see how we might use the efficient net B
0 feature vector one model. So we've got
efficient net B 0 classification but we
want feature vector and this will be
clear why shortly.
So we click into here image feature
vector feature vectors of efficient net
models trained on imageet. Wonderful. So
we can download that model formats TF2
saved model. Oh, we've seen the save
model format. Now, requires TensorFlow
2.2 plus. Okay, we can download that if
we wanted to or we can copy this URL.
Huh. What does this look like if we just
copy that? So, there's the paper it came
from. Training
usage. This is how we use it. Wow. Okay.
We can put it into a model. TFAR
sequential hub.
All right.
number of classes. Okay, let's give this
a go. So, we'll come back to our
notebook and
we'll put in here browsing the
TensorFlow
hub page and sorting for image
classification.
We found the following feature vector
model link.
All right. So there's our efficient netb
0 feature vector. So before we put all
the pieces of the puzzle together, so in
the next video, we're going to start to
see how we can use this link into
building a model. I'd like you to spend
10 minutes exploring TensorFlow hub just
off a whim. And then if you see
something like this, ResNet V250, I want
you to Google that. Just Google what
ResNet V250 and what does that mean?
Even Google efficient net B 0 and see
what that is. Why is there different
numbers of efficient net? So just give
that a go. I'll give you a little hint
is that often times in deep learning,
the higher the number, the more complex
the model. So that's just one tip, but
give that a go. Spend 10 minutes just
exploring TensorFlow Hub. And in the
next video, we'll start to put together
the piece of the puzzle and we'll see
how we can create a feature vector.
Remember in part one, feature
extraction. We'll see where feature
vector comes into play. We'll see how we
can create a feature extraction model
using our pre-trained efficient net
feature vector model. Whoa, we said
feature a lot there, but again, it'll be
a lot clearer once we write the code.
So, browse TensorFlow Hub, see what you
can find, and I'll see you in the next
video.
[Music]
How'd you go? Did you explore TensorFlow
Hub? You know what? In the time between
the previous video to now, I've had an
idea of what we can do to really really
practice what we've been talking about.
Callbacks, feature extraction, the whole
lot, transfer learning. What we're going
to do is pit two models against each
other. Two TensorFlow hub models. So,
this is ResNet V250.
So, again, you might not have ever heard
of this, but what you should know is
ResNet. It should come up. ResNet deep
residual learning for image recognition.
So this is another paper when this model
got first introduced. This is where
ResNet comes from. So residual neural
network. It did incredibly well on the
imageet test. There we go. This result
won first place on 2015 classification.
So this is where the ResNet model came
from. We don't have to replicate it. We
can use it from TensorFlow hub. And
let's see to get practice using
TensorFlow Hub. We're going to
let's compare the following two models.
We'll go ResNet URL equals that model
there. And then we've already got our
efficient net B 0 efficient net
URL. Did I spell efficient net right?
Efficient URL. I am prone to typos.
Okay, so we've got two URLs. This is
beautiful from TensorFlow Hub. Now,
we'll go here. We could start importing
some requirements, but I just want to
show you this slide to really drive home
the power of TensorFlow Hub. What is
TensorFlow Hub? We've kind of discussed
it already. A place to find a plethora
of pre-trained machine learning models
ready to be applied and fine-tuned for
your own problems. So, what are we
working on? We're working on food vision
image classification. So maybe we find a
model that's pre-trained on imageet.
What could that be? So maybe we find
this on TensorFlow hub. This is the URL
that we can use. So TensorFlow makes
using a pre-trained model as simple as
calling a URL. So what you should be
asking yourself when you're working on
any kind of deep learning problem and
you want to use some sort of existing
architecture, does my problem exist on
TensorFlow Hub? And that's where you can
come in here, TensorFlow Hub, go to
here, problem domains. Are you working
with images? Are you working with text?
Are you working with video? Are you
working with audio? And then start to go
through here. But now enough talking
Daniel. Let's get hands-on and build our
first model using TensorFlow Hub. So,
we're going to need a few dependencies.
Import dependencies.
And we'll go import TensorFlow
as TF.
Import TensorFlow hub as hub. You're
going to notice we've built some models
from scratch in the past few well, we've
built a lot of models so far, but now
you're going to see I'm actually so
excited. I can't hide it. I can't I feel
myself smiling so much because we're
going to see how incredibly powerful
transfer learning is building a model
with a single URL. So in the past we've
written a whole bunch of model code to
create models. We've built them from
scratch each time. But let's let's start
to functionize it. So let's make a
create model function to create a model
from a URL. Ready? This is going to be
absolutely wild. def create model.
What do we need? We need a model URL.
And then we need numbum classes. How
many classes are we working with? We're
working with 10. That was a spoiler
alert. Train 1 2 3 4 5 6 7 8 9 10. We
already know this. So, we got 10 food
classes. And again, this parameter could
be changed depending on what we're
doing. So, we'll put a dock string here.
So takes a TensorFlow hub URL and
creates uh Keras sequential
sequential model with it. Oh, we're
getting that warning again. We're going
to use the GPU shortly collab. Bear with
us. UGS equals We want model URL.
We're writing some very Pythonic code
here. um a TensorFlow hub feature
extraction URL. We haven't quite covered
what feature extraction or feature
vector means just yet, but we will or we
have glazed over it in a previous video,
but we're going to see it in the context
of transfer learning in a second. Number
of so the number of classes is number of
output neurons in the output layer. So
should be equal to number of target
classes
default 10.
Maybe we'll tab that across so we know.
And then what does it return? returns
an uncompiled
car
sequential
car sequential model with model URL
as feature extractor
layer and dense output layer with numbum
classes
output neurons. Woohoo. Now that is a
beautiful looking dock string. Ideally
any function that you write you'll have
some sort of dock string like this. This
is in the Google Python dock string
format. So you can look that up and
that's just the style that they like to
write their dock strings in. So now here
we got to write our model creation code.
So we're going to download the
pre-trained model and save it as a
layer. Now I'm not just pulling this
from anywhere. So if you go into
TensorFlow hub, all of the models have
example usage. So see here, TFK is
sequential. We're about to write this
hub.
Layer URL trainable equals false. We
haven't covered that yet, but we will
one step at a time. Let's write the code
first. I'm going to stop jumping around
and we're going to finish writing this
code.
So the feature extractor layer equals
hub.
layer. and we're going to pass it in the
model URL. So, wraps a save model as a
carer's layer. So, what do you think
that's going to do? So, as a car's
layer, that just means that we're going
to be able to put this entire model into
a car sequential model just like we've
been putting layers into our car
sequential models in previous videos.
And we're going to set trainable equals
false. Now, you might be wondering, why
are we setting trainable equals false?
is like, aren't we going to train this
model? And yes, we are. But I'll give a
little hint as to what we're going to
do. Freeze the already learned patterns.
So remember, these models have been
pre-trained on imageet. So they've
learned patterns in imageet. So we want
to utilize those patterns from imageet
for our food vision problem. Actually,
let's give this a name so we know what
it is. Name equals,
we'll be creative. feature extraction
layer and then we'll define the input
shape as image shape
plus three. Now that that line that we
just did is a fanciful way of creating
our image shape is up here.
All right. So we just did a fancy way of
turning it into the image shape that
we've been using before. Also, if you do
image shape plus three,
I keep typing it in the short version.
Image shape plus three. It just turns it
into that. All right. So, that's just a
syntax hack for doing that. Okay. Now,
let's create our sequential model.
Create our own model. We're going to do
model equals TFARS.sequential.
We've seen this before.
Now we want to actually start off with
the feature extraction layer. Feature
extraction layer. So
layer would help if I type that right.
So what this means is because it's
sequential means our input image is
going to go through this model first and
then we're putting our own layers.
We're putting our own dense layer on the
top. So activation what's the activation
for a multiclass classification problem
softmax and then the name can be output
layer
all right and then we need to return
model
far out we covered a fair few steps
there so
we're going to compare the following two
models in our own data set let's start
with the ResNet URL model. So let's
create a ResNet model using our
function. We'll test out our create
model function. So we'll go create
ResNet
model. Actually we'll um put in here
creating ResNet
TensorFlow Hub feature extraction
model just to break up our notebook.
Creating and testing.
Wonderful. Create ResNet model. So we
can go ResNet model equals our create
model function. What does it take? Look
at that. See that? That's our own dock
strings from the function that we just
wrote. Takes a TensorFlow hub URL and
creates a car sequential model with it.
Look at that. That's how beautiful this
is. This is Collab in action because we
wrote that dock string. This is what
pops up for us. We're helping our future
selves with the code that we've been
writing.
or if someone else came across our code
and wanted to use it, they might not be
too horrified because we've actually got
some dock strings. So, num classes
equals train data 10% numbum classes.
There's another attribute you can use.
Num classes. We could always just leave
it as the default 10, but just for
completeness, I'll just show you what
this looks like.
Wonderful. And so,
ResNet URL, I've got a typo here, of
course, is not defined.
Resnet model. Now what's the key point
here? It returns it uncompiled. So what
do we have to do? We have to compile our
model. Oh, what have we got? Keyword
argument not understood. Trainable. Uh,
another typo.
Trainable equals false.
Let's run this.
Feature extraction layer is not defined.
Look at all these typos we've got.
feature
extractor layer. Um, what do we want to
call it? Extractor or extraction?
Extractor.
And what other typo are we going to come
across?
None. Okay, beautiful. Can we get a
summary of that or do we need to compile
it? ResNet model
dot summary.
Woah, that is a lot of total parameters.
But do you see what happened here? We
have non-trainable parameters. We
haven't seen non-trainable parameters.
That's always been zero so far. The only
trainable parameters is the output
layer.
So that's the only patterns that are
going to train is our output layer.
Because we've set trainable equal to
false, our feature extractor layer is
not trainable during training.
So we'll see what that looks like
visually in a second. But let's compile.
What do we do after we create a model?
We compile a model. Compile our ResNet
model.
So ResNet
model
dompile.
What's our loss function going to be?
We're on a multiclass classification
problem. Let's use categorical cross
entropy.
What's our optimizer? Our friendly atom.
TF carers optimizers.
And our metrics can be accuracy just
like the computer vision models we've
been building in the previous module.
Let's compile. And we're actually ready
to train this model. So this is very
exciting. Are you ready to destroy all
of the results that we've got building
our own models? because I hope you are.
So, I'm going to issue you the challenge
because we've just compiled this before
we go ahead and train this. I'd like to
issue you the challenge before the next
video is to fit this on the training
data just like we have all of our other
models and see how it goes. Train it for
five epochs.
If you really want to challenge
yourself, pass the call back, the
tensorboard call back that we created
before to the fit method and just have a
look at the results. I just want you to
see what the results are and remind
yourself that this is training on 10% of
the data. So once it finishes fitting,
if you go through the challenge, go back
to our 03 notebook, find the best
results in here, and see what happens.
All right, so there's your challenge.
But I'll see you in the next video. Get
excited. We're about to fit our first
transfer learning model.
[Music]
Can you hear that? That's me rubbing my
hands together because I'm so excited.
We're going to blow our previous models
out of the water and all without even
putting together our own layers. This is
all we've done. Look at our model. Our
model is like four lines of code. You
ready? Let's fit it to the data. So,
we're going to can that and go. Let's
fit our ResNet model to the data. 10% of
10 classes. Just another reminder, we're
only using 10% of the training data
here. So, we'll save it to ResNet
history, just so we know. I'm going to
go reset
model.fit.
We can pass in train. Actually, you
might have already given this a go
yourself. That was my challenge for the
last video. How'd you go? If so, did you
see the results?
I don't want to give any spoiler alerts,
but uh
equals len train data
10%.
And then we want to pass in the
validation data. Remember the validation
data is the exact same that we used in
our previous notebook. Hasn't changed.
It's still a test data. 10 classes, 250
images each. And now we're going to see
this was the extra challenge. You see in
the fit function it has a parameter for
callbacks. Previously we haven't really
used that but now we're going to do so.
So let's pass in our callbacks.
Callbacks has to be in the form of a
list and our callback is a function
create tensorboard callback which
requires a parameter durame equals what
should we call it? TensorFlow
hub.
We'll save our models into that
directory. And we'll also pass it
experiment name. And this is going to be
ResNet
50 V2. And if you're wondering where
ResNet 50 V2 came from, that is from the
model URL. ResNet V250. Oh, we could
have really done that the other way
around. Doesn't really matter. ResNet 50
V2. Okay, are you ready? This is our
first transfer learning feature
extraction model. If there's no typos,
this code should run and it's going to
be very very exciting. So, let's let's
do it.
Of course, there was validation data and
then we can do we can also do validation
steps, can't we? Did you pick up on
that? Did you pick up that I missed
that? Don't necessarily need it, but
we'll put it there anyway for
completeness. And you ready? 3 2 1 first
transfer learning model going off. Look
at that. This is our saving tensorboard
log files. This is our little helper
function to TensorFlowhub/ResNet50
V2 and that's the current timestamp. And
so if we go into here, we've got
TensorFlow Hub and then inside that is
ResNet 50 V2. We're going to see what
this is later on, but look how quick
that is. Look at that. Oh my gosh. So,
while this is fitting, let's discuss
what's actually going on. We've only got
about 20 seconds per epoch, so we better
be quick. This is what we've built. A
ResNet 50 asterisk feature extractor.
So, our input data is 10 classes of food
101. 75 images of each class. We've got
sushi, steak, ice cream. Oh, delicious
ice cream. Pizza, hamburger. We pass it
through this model. Now, this is a
massive architecture, ResNet 34. In the
code, we're using ResNet 50, a slightly
larger architecture than ResNet 34. If
you counted up those, I think there's 34
there. So, ResNet 50, as you imagine, is
bigger. If you look closely, this is a
3x3 convolution. 3x3 convolution, 3x3
convolution. But the key point of ResNet
is that it has these skip connections.
We're not going to dive too deep into
that. There's a lot of blog posts online
covering that if you want to dive into
the specifics. We're just trying to
apply it to our own problem. You can
find out. This is from the ResNet paper.
So all of these layers, all of the
patterns that they've learned, they stay
the same, right? Because we put
trainable equals false. So they're
frozen. These layers, all their patterns
are pre-trained on imageet. What we're
changing, if you see here, the original
ResNet layer had an output layer for a
thousand classes. So that was for
imageet. But how many classes do we
have? Remember, deep learning is all
about input and output shapes. We have
10 classes. So what do we do? We change
the output layer. Instead of being an
imageet thousand classes, we change it
to 10. So this changes to be the same
shape as the number of classes we're
working with. So this is what we've
built, a ResNet 50 feature extractor. A
feature extractor. The majority of the
underlying layers stay the same. So our
images go through here. It outputs a
vector here of whatever the patterns
have found in that image. And then our
layer here, this dense layer, converts
that feature vector into a tensor that's
related to our 10 classes. So let's go
back and see how our model's gone. It
should be finished training now. How
quick was that?
Now, oh my goodness, look at that. 90%
training accuracy on 10 classes, only
10% of the data. And validation
accuracy, we're pushing close to 80.
Let's just go back and remind ourselves
of what we got in our previous notebook.
So this is again we have to come to
multiclass classification for image
classification. We built model 10 where
is it?
Model 10.
Model 10. Do we have a summary of model
10 anywhere?
Model 10 is the same as model 8. So we
got convolution activation convolution
max pooling. Convolution convolution max
pooling. the same as what's on the CNN
explainer website. If we come down here,
how long did that take? We had to use
data augmentation and a whole bunch of
different things. Took a 100 seconds per
layer. And what did we end up with?
Validation accuracy of 40, not even 40%.
And we've just used ResNet 50 V2. It's
trained in an eighth of the time, but
we've got basically double the results
on the validation data set.
What did I tell you about transfer
learning? That is the power of transfer
learning. So what do we have to do? We
have to plot the loss curves. So let's
write our can we even just copy our loss
curves function from the previous
notebook. No, we'll write it out. So
transfer learning. Let's just write
ourselves a note here. Wow,
that is incredible.
um our transfer learning feature
extractor.
So remember if we come back to our
keynote all of these layers. So this is
the ResNet model that we imported from
TensorFlow hub. All of them have been
pre-trained on imageet. So they already
have some patterns of what images look
like. All we did was like hey ResNet
model look at our images. What do you
think? put out some patterns that you
find and then we've trained the dense
layer on the end here to conform to our
own custom classes. So that's feature
extraction transfer learning. We come
back our transfer learning feature
extractor model
outperformed
all of the previous models we built by
hand
substantially
and in a quicker
training time. So that is the incredible
power of transfer learning
we can get incredible results. Oh, and
with I forgot
and with only 10% of the training
examples, we didn't even use the full
data set. We've got another however many
image. We've got another about 700
images of each training class if we
wanted to step up our results. So what's
a good thing to do is to plot loss
curves. So let's create our function
here. So let's create
a function to plot our loss curves.
So we've saved our ResNet models
training history to ResNet history.
We've also saved it here to TensorFlow
Hub. So I'd like you as a challenge to
create a function to plot our loss
curves. If you want some reference,
we've got a function in our previous
notebook 03.
you might need to replicate this
function
and then we'll see what our loss curves
look like for our ResNet feature
extraction model. So that's your
challenge. I hope you're amazed as I am
at the power of transfer learning.
Transfer learning still blows my mind.
But yeah, give a shot at creating a plot
loss curves function if you really want
to test out plotting the loss curves of
our ResNet model. I'll see you in the
next video.
[Music]
Oh, I'm still buzzing from the amazing
results we got from our transfer
learning feature extraction model. We
need to evaluate our model. So, it's one
thing to train a model. The next most
important thing is to evaluate it. So,
let's create a function to plot our
model's loss curves. Now, we did make
this in 03. So, the challenge was did
you try creating the function yourself?
If not, that's completely fine. Let's
see how we can do it together. I mean,
just a tidbit, you could put a function
like this into a script called something
like helper.py
and import it when you need it. So, you
could, if you find yourself using
functions over and over again, like we
do when we're doing a lot of
experiments, you might want to create a
helper script like helper.py Pi, upload
it to GitHub and then every time you
start a notebook, you could go from
helper import plot loss curves,
something like that. So that's just a
tidbit for going forward. That's a
little like software engineering thing.
But because we're machine learning
practitioners, we write all of our code
in notebooks
and piplot. We need mapplot lib because
we're going to be plotting. And then we
want to plot the validation and
training curves.
We go defaf plot loss curves history.
And we could put in a little doctring
here. Returns
separate loss curves
for training and validation metrics. And
we go args
history
is a tensorflow
history object
and then returns and we want plots of
training / validation loss and accuracy
metrics. Now I just realized that's one
part where this function will fail is
that if we're using a metric that's not
accuracy.
So if you created your model and
compiled it with a different metric
here, that's where this function would
fail. So that's just something to know
going forward is that depending on what
metric you're using, so far we've only
been using accuracy
history and then we want loss and then
we want what else do we want? We want
val loss equals history dot history.
Val loss
and we want accuracy. Oh, this is going
to be so cool. So cool to see the
training curves of our first transfer
learning model. It's already blown all
of our previous models out of the water.
All of those models we spent crafting
hand by hand writing the convolutional
layers. I mean, TensorFlow did take care
of a lot.
This is just the loss. This is the
accuracy of our model training.
TensorFlow did take care of a lot of
things behind the scenes for us when we
wrote those layers. But if you can use
transfer learning and get better results
with less data because here's an
important point. We downloaded now I'm
going to plot the loss. We downloaded a
pre-made data set. But on your own
projects, you're not always going to
have a pre-made data set. So if you can
collect less data for whatever you want
to work on, if you're working on an
actual food vision application, you
might not have access to 750 images of
whatever you're working with. So you
might only have 20. And so if transfer
learning performs really good with less
data because it's using the patterns
that it's pre-learned on other problems,
well then that's a good thing. Now we
need X label plt.x label. I was kind of
just talking out loud there while I was
coding. Now, we need a legend.
And what else do we need to do? Plot
accuracy. You see how you see how
importing this from like a helper.py
would be really good. Maybe we'll do
that in a future module if we typing out
too many functions. We'll see how we can
create a helper script for ourselves.
Plot.figure plt.plot cuz what's the
coders motto? Don't repeat yourself.
Machine learning practitioners motto.
Experiment experiment experiment and now
we're doing some data visualization or
data exploring motto here. This is
visualize visualize visualize plot epoch
val accuracy label equals
hm
val accuracy wonderful plt.title
accuracy and then plt.x
X label epochs and then the final one is
plt.leend. This should work hopefully.
That took a little while. Run. And now
plot loss curves
ResNet history.
Boom.
Would you look at that?
So, what were we trying to fix in our
previous module? We were trying to get
these curves as close as possible. And
this is pretty close. Although the
validation loss seems to be not going
down as fast as the training loss here.
So hinting that our model might be
overfitting. Remember if validation loss
is increasing while training loss is
decreasing. That should trigger a light
bulb in your head and go hm my model
might be overfitting. In other words,
learning the training data too well. But
this is good here as well. They're
fairly close. Whereas compare that to a
loss curve in our previous one.
Validation loss is all over the place.
It was a little bit better there because
we did some tricks like data
augmentation, but way better than that.
Look at that. That's just complete mud,
isn't it? Far out. Okay, so
our model is going good. That's our
first transfer learning model. So, so
exciting. What were we doing? We set up
an experiment. We've done creating and
testing ResNet TensorFlow hub feature
extraction model.
What did we say we wanted to do?
We want to compare ResNet to efficient
net specifically ResNet V250 to
efficient net B0. So let's do that. I'm
going to copy this heading. We'll do
that in the next video. So creating and
testing. We'll come down to where we
were text creating and testing
efficient net B0 tensor hub feature
extraction model. So your homework
before the next lesson is to think about
or you could even try do this yourself.
What would we have to change to use
efficientet B 0 instead of ResNet. So
that's your challenge and then you can
also look up if this is what ResNet
feature extraction looks like. Try and
find a picture that's similar for
efficient net. So you could Google
efficient net B 0 architecture and then
see how that relates to something that
looks like this. So you might not
understand what's going on in every
single layer, but that's okay. But these
are the kind of diagrams you're going to
see when you get involved in deep
learning. These are just layers
connected. We've built models like this.
This is just a bigger version. So that's
your challenge. Try and do that. In the
next video, we'll build an efficient net
tensorflow hub feature extraction model
and we're going to see how that goes
against ResNet. And I think efficient
net is going to blow our previous
models, so in the 03 out of the water as
well. But the big battle right now is
ResNet verse efficient net. So give that
a try and I'll see you soon.
[Music]
All right, we saw how well ResNet 50 or
V2 did on our data set. Now are you
ready to blow our previous experiments
results out of the water? So what are we
trying to beat here? This is our
previous notebook where we built model
10. Got a validation accuracy of 39%.
And it took about 110 seconds per epoch
to train cuz we did a bunch of data
augmentation. And we saw ResNet 50 V2
gets even better than that. It gets 75%
accuracy on the validation data set
using only 10% of the data. And it only
takes 16 seconds or 27 seconds for the
first epoch, but then it speeds up for
the later epochs. So now let's see how
efficient net goes. What we'll do, we'll
create efficient net B 0 feature
extractor model. Before we write any
code, I just want to show you remind you
what we're creating. So this is a ResNet
50 feature extractor that we just
created. And the note here is that this
is actually just a 34 layer ResNet
rather than a 50 layer ResNet. What
we're going to do with efficient net,
efficient net is just a different name
of a different kind of computer vision
architecture. We'll have the same input
data. We're going to have the same
output layer except we're going to
change these layers in the middle here.
Instead of using ResNet 50 V2, we're
going to substitute in efficient net B
0. And how do we do that? Well, we can
do that with our TensorFlow hub URL.
So, let's write in here efficient net
model and we're going to call it we're
going to use our function create model.
We'll type in model URL and then we can
type in efficient net URL. Beautiful.
And then the number of classes, we could
leave that as default cuz here from our
dock string it is 10. But we're going to
set that to train data 10%.n numbum
classes just so we're doing it nice and
officially. Now we're going to compile
our efficient net model. So we'll go
efficient net model
dompile. And what are we using for the
loss? Since we're doing multiclass
classification, we're using categorical
cross entropy. And then for the
optimizer we are going to use atom
tfkers optimizer
atom. This will be the exact same as
compile method as we did for our reset
model metrics equals accuracy. So we
could actually we could actually put
this compile into our create model
function. So that way create model would
return a compiled model. So that's a
little extension you might want to try
out. Now we can fit fit efficient net
model to 10% of training data.
All righty. So let's go efficient
net. I always get this efficient net
history cuz there's so many ent.
That's what confuses me when I try to
type efficient net. We want to fit
efficient net model
on train data 10%. We'll do the same
number of epochs. So this is where I was
talking about before. We could have set
epochs as a global variable if we're
running a bunch of experiments. So we
could have set epoch equal to five and
then type in epoch here and then that
way we wouldn't have to retype five
every time. So that's just something to
keep in mind.
On the validation data, we're going to
use
test data, which is just the same test
data we've been using for the last few
models. We're going to go validation
steps equals len test data. And have you
caught me out? Have you caught me out
with what we forgot here? We forgot
steps per epoch equals len train data
10%. Wonderful. We'll add a comma there.
And what have we been doing? Something
that's new in this notebook. We've been
adding callbacks.
So callbacks equals our create. We made
a handy function for this create
tensorboard callback. The directory name
is of course going to be TensorFlow hub
because that's our existing directory
over here, TensorFlow hub. And we can
see in there there's our ResNet 50V2
model. So all of its training history is
going to be stored in there. And so
we're just going to do the exact same,
but we're going to type in experiment
name equals
efficient net.
efficient net B 0. Actually, we'll type
it in there. Wonderful. All right. So,
are you ready to go? We're about to fit
a pre-trained efficient net B 0 model on
10% of the training data. So just a
little reminder what we've created.
Exact same setup here except now we're
using an efficient net B 0 feature
extractor. You could substitute out this
architecture in here for efficient net B
0. Let's see if we can actually we'll
run this. You ready? And while it's
fitting because it's going to take a few
seconds per epoch, we're going to
explore to see where we can find out
some information about efficient net.
We'll run that. Make sure there's no
errors. This is so exciting. Efficient
net's one of my favorite architectures
in the deep learning world of recent
times. Here we go. Saving TensorBoard
log files to TensorFlow hub efficientet
B 0. And there's a time stamp epoch 105.
Beautiful. Oh, look how quick that is.
That is absolutely amazing. We don't
have much time here. I want to just
search up efficient net. So this is a
great blog post here from Google when
that first got released.
Here we go. Efficient net improving
accuracy and efficiency through AutoML
and model scaling. Here
we go. The architecture for our baseline
network efficient net B 0 is simple and
clean making it easier to scale and
generalize. Beautiful. So this is the
efficient net B 0 architecture that
we're using. We've got a convolutional
layer, got an MB comp one layer. If you
want to find out more details about
this, I'd recommend you check out the
blog post and then there's a paper in
here as well describing what efficient
is. But if we come down here, ah this is
our accuracy. So how different networks
perform on imageet and so here's B 0. So
this is the number of parameters
millions. We'll have a look at this in a
moment. Efficient B 0 uses the least
number of parameters but still gets
great results. And so here's our ResNet
50 or is ResNet 50 V2 there? I don't
think it's listed but as you can see
there's different variants of efficient
net. So B 0, B1, B2, B3, B4, B5, B6, B7
performing at various levels of
accuracy. However, the number of
parameters increases.
So I'll let you dig into that. How about
we just copy this link and we come back
here.
Details on
efficient net.
Wonderful. All right. How do we go? 15
seconds per epoch. Oh my goodness, look
at that. Validation accuracy is 86%
on only 10% of the data with a training
accuracy of nearly 90. Oh my goodness,
look at this. Compared to our ResNet
model, we come up here 75%.
So we've seen and about the same time
per epoch. So we've just seen just by
using efficient net we've increased our
validation accuracy from our ResNet 50
model by over 10%. And if we come back
to our previous model we built by hand
we've over doubled it over doubled our
validation accuracy and in almost
1/10enth of the time per epoch. So 109
seconds versus 15 seconds. That is
absolutely wild.
Let's plot the loss curves. So this is
power of transfer learning right taking
what one model has learned in one domain
and applying it to what we need for our
specific problem efficient net history.
Let's have a look. Wow. What have we
been discussing? The ideal scenario for
these curves is for them to line up. And
look how close that's getting. It looks
like they could keep going. I mean if
you trained these models, they're
heading in the same direction. They
could keep going. The loss could keep
going down. accuracy looks like it might
be tapering off a little bit. So maybe
that's a little extension if you want to
go through and train it for another five
epochs. See how high you can push this
validation accuracy to. That is so
exciting. All right, so we've seen how
powerful it is that we can use efficient
net B 0. Oh, and by the way, that's
another tidbit. We've only used B 0
here. There's still B1, B2, B3, B4, B5,
B6, B7. And maybe they're on TensorFlow
Hub. So maybe that's something else you
could look into. But there's something
we have to look at. So we've seen the
power of transfer learning. We've seen
the power of feature extraction. There's
one more type of transfer learning we
actually haven't discussed. We're going
to look into it further in the next
module. But I think in the next video
we'll discuss some other kinds of
transfer learning. And then we'll dig
into our TensorFlow hub directory and
see how we can look at the results of
our models on TensorBoard. Oh, this is
so exciting. I'm so happy you're being
introduced to the power of transfer
learning. I'll see you in the next
video.
[Music]
Welcome back. In the last video, we
trained our best model yet, an efficient
net B 0 pre-trained on ImageNet,
downloaded from TensorFlow Hub, and
adjusted to our data set. We got the
best results and we only used 10% of our
training images. So 75 images per class
versus 750 images. There's one more
thing that we kind of forgot to go
through in the last video, but we will
check that now. So if you wanted to go
efficient net model summary,
we'll just check out the number of
parameters here. So total parameters
4,62,357.
The trainable parameters are only the
ones in the output layer. All of the
feature extraction layer stays frozen.
So if we go back to our keynote, all of
the layers in here, all of the patterns
in this pre-trained architecture stay
the same. All we're doing is updating
this output layer. And again, this is
for ResNet 50, but the same thing goes
with efficient net B 0 if you just
wanted to substitute it in here. So if
we come back here, let's compare it to
our ResNet model dots summary.
Wow. So this one has 23 1.5 million
total parameters. The only trainable
parameters are the ones in the output
layer.
But our efficient net outperforms our
ResNet model even though it has over
five times the number of parameters. I
guess that's where the efficient comes
in the efficient net name. I need a drum
roll after that. Like
um so that's just something to look into
if you want to keep going. Oh, and one
more thing or actually no, Daniel, stop
getting distracted. We said we're going
to look at the different types of
transfer learning. This is going to be
more of a theory based lecture, but
we're going to have a look at some
slides in the keynote. But we've already
built one of the types of transfer
learning. So we'll write down here. I'm
going to call this one asis
transfer learning
using an existing model with no changes
whatsoever.
And then we have or this could be is
whatsoever one word? I don't even know.
Eg using imageet
model on a thousand imagageet classes
none of your own. This is the one we've
done. So feature extraction
transfer learning. So user pre-learned
patterns of an existing model. So this
is the type that we've just gone
through. Feature extraction of an
existing model. Eg. efficient net
B 0 trained on imageet
and adjust the output layer for your own
problem. Eg,000
classes from imageet to 10 classes of
food.
And we'll turn this into markdown so it
makes sense.
And then the final one is fine tuning.
We haven't had a look at this. This is
going to be the next section.
Fine-tuning transfer learning.
So use the pre-learned
patterns of an existing model and
fine-tune
many or all of the underlying
layers
including
new output layers. Okay, so again this
is just words based. So AS is transfer
learning very similar to what we've done
except we wouldn't even change the final
layer in AS is transfer learning. So you
just take a pre-trained neural network
say imageet model on a thousand imageet
classes and run it on another thousand
classes the same as imageet feature
extraction is what we've done. So we've
taken a pre-trained model/ for example
efficient netb 0 trained on imageet and
adjusted the output layer for our own
problem. So image netset's a 1,000
classes to our 10 classes of food and
then fine-tuning. I'm not going to dig
too deep into this, but you can read
that sentence there. Let's get visual
because text on a page not that fun.
Here we go. Original model versus
feature extraction.
So this is the original model. We have
the large data set down here. This
little symbol for example imageet. And
we have the input layer layer 2 dot dot
dot layer 234 235. These are actually
arbitrary numbers. I've just put them
quite high because if we have a look at
this working architecture, eg efficient
net. Let's have a look
how many layers does our
efficient net B 0 feature extractor
have. So we can get this by going
efficient net model.layers
layers and it'll show us our layers. So
you see here the first one is a
TensorFlow hub car layer and that's the
one we downloaded from TensorFlow Hub
all pre-trained and this is our output
layer here. So if we want to get
specifically zero wonderful TensorFlow
hub and we can go do weights weights is
another name for pattern probably the
more formal name I use patterns and
weights interchangeably. So if we have a
look at the weights of layer zero, look
at this. These are all different
patterns of numbers our efficient net B
0 architecture has learned on imageet.
So if I look at these and I have no idea
what's going on, but the neural network,
these are the patterns that it's learned
in previous images. Look how many there
are. So there's, if we have a look,
there's one of these tenses for every
layer, right? All right. So there's a
weight matrix for every layer. We could
keep going, but that's going to take far
too long. So what we can do is go len
weights.
So this will give us an approximation of
how many layers are in our efficient net
B 0 feature extractor. So if we come
here,
this is this what we're talking about.
The architecture for our baseline
network efficient net B 0 simple and
clean make it easier to scale and
generalize. So that's going to give us
an idea of how many layers there are in
our efficient net model. So if you
compare that let's say 309 it might not
be the exact number but just going to
give us an approximation. That's why
I've put something like this 235. A lot
of modern architectures it's not at rare
for them to have over 100 layers. So if
we compare that to the model we
constructed by hand in our previous
notebook. Let's just find an example.
Look at what we did. So we created this
model 9. It has one, two, three, four,
five, six layers. Quite a simple
architecture. Imagine how much time it
would have took us to try and design
something like this. So that's the real
power of transfer learning is if we come
back to where we were, come back to our
keynote, we can take a working
architecture and apply it to our own
problem. So this is the original model
versus what we've done for feature
extraction.
So as we've said the base layers stay
the same. They stay frozen. That's
something that you'll hear very often
when we talk about transfer learning is
that these layers stay frozen during
training. The original model layers
don't update during training. The only
layers that update are these two output
layers. So where did we actually did
that by design?
If we come up to our create model
function. So you see here when we
download the pre-trained model and save
it as a carers layer, we've set
trainable equal to false. That is what
creates this setup that these layers
stay frozen. And now for feature
extraction transfer learning, we have a
different data set. Another key point
here, it's similar to imageet. It's
still images. So we've got 10 classes of
food from food 101. And what have we
changed here? Because remember a very
important point in deep learning is the
input and output shapes of your model.
So the output layer shape in this case
was imageet thousand classes but ours
the output layer shape matches the
number of input classes that we have.
And so this is what changes between an
original model and feature extraction
transfer learning.
And the output layer gets trained on new
data. All of this stays the same. A
feature vector comes out here, goes
through our final dense layer, and then
our softmax activation in the case of
multiclass classification outputs a
prediction probability for what class it
should be based on the inputs. So
there's a lot going on here, but we've
written the code to do this already. So
that's how I like to do things. I like
to write the code first, and then when
you look at the theory, you're like,
"Wow, I've actually already done this."
So we're starting to putting the pieces
of the puzzle together. But as we
discussed, there's one more type. We put
down three dot points here. So let's
compare all of them. This module is all
about feature extraction. We're going to
have a look at fine-tuning later, but
let's just introduce it while we're on
the topic of kinds of transfer learning.
So here's the original model. We just
saw that for feature extraction, the
data set changes. However, the layers
stay the same. they stay frozen and the
output layer changes. And then if we go
to fine-tuning,
the output layer stays the same.
However, in fine-tuning, we might
unfreeze some or all of the layers in
the underlying model and the data set
might change. So that's the major
difference between fine-tuning and
feature extraction is that in feature
extraction all of the layers generally
stay frozen except for the output layer.
Whereas in fine-tuning we might unfreeze
some of the underlying layers and update
their parameters or weights with our own
custom data set. However,
as with feature extraction, these layers
start off with the patterns they've
learned in imageet. And just a little
tidbit for fine-tuning, it typically
requires more data than feature
extraction. So we're going to have a
look at that in an upcoming module.
However, just keep this picture in mind
for the different kinds of transfer
learning you're going to come across.
That's about all we're going to say for
now on the different types of transfer
learning. Let's write some more code. In
the next video, we're going to
comparing our models results using
TensorBoard.
All righty.
So, we're going here, TensorFlow Hub.
This has got all our model training
results. We're going to see conclusively
which model performs better. So, I'll
see you in the next video. We'll find
out what TensorBoT is and we'll see how
we can use it to track our experiments.
[Music]
We've already got some phenomenal
results by using the power of transfer
learning and feature extraction. And
we've actually compared our models
already. Like we've got validation
accuracy here. Why don't we just look at
that? Why don't we just keep doing that?
Well, you might have guessed this is
going to become problematic if we were
to run more than two models. What if we
had a dozen models and we just had to
keep scrolling back up and trying to
keep track of which one was the best?
That's not a very fun time. But thanks
to our create tensorboard call back, our
models training logs have automatically
been tracked in this little directory
here, TensorFlow Hub. So let's start
to explore what is TensorBoard. So we'll
go into the keynote.
TensorBoard is a way to visually explore
your machine learning models performance
and internals. And you can even host,
track, and share your machine learning
experiments on tensorboard.dev.
TensorBoard also integrates with
websites like Weights and Biases.
Weights and Biases make some of my
favorite machine learning tools. So
that's a big shout out to them. Go have
a look at their website. We're not going
to go through that in this course, but
the things we do with TensorBoard can
link into weights and biases. And this
is what we're going to end up with.
something like this. And so you see
here, this is a few comparisons of our
different model names and experiments.
But rather than just look at the slide,
let's start to code this up. And we can
also find information on tensorboard by
going, what is tensorboard? TensorBoard.
TensorBoard. Here we go. A whole bunch
of dot points which echo a little bit
more information than what we described
in the slides. And we've even got
tensorboard.dev. dev, which lets you
easily host, track, and share your
experiment results. Let's click on that.
Beautiful. Let's see how we can start to
use this and get something like this to
track our modeling experiments. There's
a little tidbit here. If we do use
tensorboard.dev, this is a warning. So,
I'll say this before we start tracking
anything.
Note,
when you upload things to
tensorboard.dev,
your experiments are public.
So if you're running
private experiments,
things you don't want others to see,
do not
upload them to
tensorboard.dev.
So that's very important.
to the experiments that we're about to
upload. So the training results from our
different models, they're going to be
publicly available. Something to
consider in all of your machine learning
projects. If you are publishing things,
make sure what you're publishing is
allowed to be public cuz if it has to be
private, well then you can run into some
problems later on. So we can use the
TensorBoard
command in Collab, which is
pre-installed. So upload, if you're not
running this in collab, you'll have to
go through the tensorboard.dev
documentation to to see how you can do
this from your local machine. It's
actually not that hard. There we go. pip
install TensorBoard. However,
TensorBoard comes pre-installed in
Collab. So if you are running this in
Collab, I'm running this in Collab. It
should work straight out of the box. So
upload TensorBoard dev records. Where
are these coming from? We're reminding
we've got in here our TensorBoard logs.
So, if we keep going into this
directory, train,
and we have a bunch of results.
Events.outtf events. We're not going to
go through all those. All you need to
know is they contain our training logs
of our two experiments thanks to our
create tensorboard callback function.
So, we can upload our records to
tensorboard by going tensorboard
dev upload. And then we have to pass it
a few flags. So one is log dur for log
directory. So where are the files
stored?
Ours are stored in tensorflow hub which
is just the directory name here. This
will be different depending on what you
call your different directories for
tracking experiments. So we can do there
log dur and then we're going to pass it
the name flag. So this is going to be
the name of the experiment that we
upload to tensorboard.dev.
So efficient net b 0 this is our
experiment verse resnet
50 v2.
So of course you can change that to
whatever experiment you're running. Then
we're going to pass it a description
and we're going to call it comparing
two different TF hub feature extraction
model
AR textures using 10% of the training
data. You can get as creative or as
non-creative as you want with a
description. And then finally, we're
going to write oneshot here, which is
once the files have finished uploading,
one shot is going to exit the uploader.
So let's see what happens when we run
this command. If everything works, we
should get a TensorBoard experiment ID.
So TensorBoard uploader, this will
upload your TensorBoard logs to
tensorboard.dev from the following
directory. Beautiful. Here we go. Here's
our warning again. This tensorboard will
be visible to everyone. Do not upload
sensitive data. And then this notice
will not be shown again while you are
logged into the uploader. To log out,
run TensorBoard dev orth revoke.
Continue. And we're going to type in
yes.
So to authorize this, I'm going to click
on this.
This is going to bring me to a page
where I need to sign into my Google
account. And then it's going to give me
a code. So I'm going to go through this
step. I'm going to get the code.
However, I'm going to blank out my code.
You should do the same, too. Don't show
anyone your secret codes. So once I
click my account here, it's going to
show me a secret code to allow me to use
tensorboard.dev with my account. And if
you share that code with someone else,
they could potentially have access to
your account. So that's the only step
you're not going to see here.
I'm going to blank this out. I just
copied my code. I'm going to come back
and enter my authorization code. Again,
it's blanked out for a reason so that if
someone sees your code, they might have
access to your account. Keep your codes
private.
I've entered the authorization code and
my experiments have been uploaded.
Beautiful. Now, we get a public link.
We're going to copy this link and write
it here. Our
tensorboard
experiments are uploaded
publicly
here
and turn this into markdown. Beautiful.
And I'm going to hide this the output of
this cell because it's got the code in
it. Wonderful. Once you run that,
authorize your account. Enter your
secret code. It's going to upload all
everything that's in the log directory
to this experiment URL here. So, we can
view this.
You ready? This is so exciting.
Look at what we've got. So, this is our
experiment name up here. I'm going to
zoom in a little bit. Yeah, there we go.
Efficient B 0 versus ResNet 50V2.
comparing two different TensorFlow hub
feature extraction and model
architectures using 10% of the data. So
if we come over to here, there's a lot
going on. We're not going to go through
every single thing because that would
take too long. The TensorBoard
documentation lists a whole bunch of
stuff here. What we're worried about is
the information. So we can see our
efficient net B 0 and there's the time
stamp. Here's its performance on the
training data. So if we toggle that, if
we watch this, the orange line goes
away. Wonderful. The same thing with the
validation. That's our efficient net B
0's performance on the validation. And
then we've got ResNet 50 V2 on train and
validation. And so these experiments
here reflect
the file paths that we've got in here.
Efficient netb 0 time stamp and train
validation. Again, these were created
with the create tensorboard callback
function that we passed to our model
when it was being fit. So have a look.
You can imagine how helpful this would
be if you had we've only got two
modeling experiments here. So maybe we
only care about what we've got on the
validation data set. How did our models
go?
Okay, so we can see here the ResNet 50
model starts at about 50% or 56% then
efficient net starts at 75 and then it
keeps going keeps going keeps going and
they stay about the same distance away
from each other throughout all five
epochs. That is absolutely amazing. Now
again, imagine if we had 20 different
experiments going on here. We could have
them all saved down here and then we
could filter them to which ones we want
to go and we could easily just find
which one performs the best by going to
this point on the graph here. So there's
the accuracy. The same thing with the
loss. We see that the efficient net B 0
model finishes with a lower loss value.
Remember the loss value is how wrong
your model is compared to ideal
predictions that it should be making.
So this is powerful stuff. So there's a
lot more to explore, but if you want to
see what experiments you have, check out
what tensorboard. So say you've uploaded
a fair few different training results.
Check out what tensorboard experiments
you have.
Remember this URL is public. So if you
click on that, you're going to see the
same results that I've got there. Do not
upload private data to tensorboard.dev.
So we can run
tensorboard dev list.
This is going to tell us the experiments
that we have.
All right. So here's my efficient net
versus reset 50 v2. We get a whole bunch
of information on when it was. I've got
another experiment here. Fine-tuning
efficient net B 0 on all food 101 data.
That might be a spoiler as to what's to
come. transfer learning experiments.
I've got a whole bunch of different
stuff. Here's another one I performed
earlier on efficient net B 0 versus
ResNet 50V2.
And if you wanted to delete an
experiment, so let's say you uploaded
some private data and you want to get
rid of that as soon as possible. Well,
you can go tensorboard
dev delete and then we're going to go
experiment ID. And then we would copy
this experiment ID in there. So I could
copy this and I could paste it here. If
I ran that command, it would delete this
experiment. However, I'm not going to
run that because I want to leave this
here so that when you come and visit
this notebook later on, if you want, you
can just click on that and the link will
work for you. So that's just how you
delete an experiment. And then again if
you wanted to confirm that was confirm
the deletion
by rechecking what experiments
you have left. So
tensorboard
dev list. So then we would go in here
and see, hey, did this experiment ID get
deleted? To make sure it was, we would
go, okay, that experiment ID doesn't
exist. Good. It's no longer on
TensorBoard.
So I think that is it for section four,
transfer learning in TensorFlow part
one, feature extraction. We've covered a
fair bit of ground and it's a relatively
short section compared to what we've
done before, but it doesn't take away
from the importance of what we've just
gone through. So, we learned at the
start the power of transfer learning. We
downloaded a data set. Of course, it was
10% of the food classes, so only 75
images per training class versus 750 we
used in the previous module. We went
back through creating data loaders. We
set up a call back. We learned what
callbacks are and some of the most
popular ones. We created a tensorboard
callback to track our model training
results so we could later compare them
using tensorboard.dev.
Here's our experiments here. Remember
this is publicly available so you can go
and have a look at what's there. We
created a ResNet TensorFlow hub feature
extraction model. We built an efficient
netb 0 tensorflow hub feature extraction
model and we saw that efficient net
performed
absolutely amazingly on our training
data and in far less time than some of
the models we've built in previous
modules. We went through the different
kinds of transfer learning. Feature
extraction was the one we focused on
here. In a future module we're going to
go through fine-tuning transfer
learning. And finally we finished off by
comparing our model's results using
tensorboard. a very very powerful tool.
If you want to learn more about
TensorBoard, you can go into the
TensorBoard
documentation on TensorFlow's website
because we've only looked at it for one
use case and it's got a whole bunch
more. Oh, and just one more thing while
we're here.
Weights and biases integrates
beautifully with TensorBoard. So
developer tools for machine learning,
we're not going to go through that. But
if you want to integrate weights and
biases with your tensorboard logs and
track your experiments, very important
point of uh deep learning is to track
your experiments. You can do that with
what we've created here. Far out. So
I've set up some exercises and
extracurriculum which you can get in the
GitHub github.com. Mr. debt/tensorflow
deep learning.
So all of the exercises, so this is will
probably be a little bit different by
the time you you watch this, but this
table is going to be your ground truth
for all of the course materials. If you
want the exercises for this section, you
click go to exercises. And so here we
go. So here's the exercises and extra
curriculum for this section. So give it
a go. There's nothing here that should
be too strenuous if you refer back to
what we've learned through here. But
congratulations on learning about
transfer learning, one of the most
powerful features of deep learning,
specifically feature extraction transfer
learning. In the next section, we're
going to go through
transfer learning fine-tuning. So I'll
see you there.
[Music]
Awesome work. You've taken your first
real step into the world of deep
learning. You've built real models for
regression, classification, and computer
vision. But this is just the beginning.
If you're ready to go deeper and build
the kinds of projects that get you
noticed by top companies, then it's time
to take the rest of Dan's TensorFlow
boot camp course. Inside, Dan will walk
you through advanced topics like NLP,
time series forecasting, and fine-tuning
models. You'll also work on real world
projects like food vision, skim lit, and
bit predict. Not to mention, it's one of
the highest rated online courses for
TensorFlow. So, I can almost guarantee
that you won't be disappointed. Plus,
it'll also grant you access to our
membersonly Discord channel where you
will get priority support from our
mentors and our instructors. If you
enjoyed this crash course, please drop
it a like and leave a comment down below
and share with anyone that might be
interested in deep learning as well. All
right, that's it for me and I look
forward to seeing you inside the Zero to
Mastery