I. Ozkan
Spring 2025
Book Chapter
An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Chapter 10
Others
STAT 365/665: Data Mining and Machine Learning COurse at Yale Univ.
Hands-On Machine Learning with R, Bradley Boehmke & Brandon Greenwell, Chapter 13
Deep Neural Networks, DNN, Deep Learning Networks are powerful machine learning (ML) algorithms
A very active area of research
In part of of both Supervised and Unsupervised learning using multiple layered models
Requires lots of data (hence, cheaper computation and availability of larger data sets make the algorithm feasible)
The cornerstone of deep the neural network [Artificial] Neural Network
It is inspired by the structure and functions of biological neural networks
For the history read related wikipedia section
Feed Forward Network (FNN), Convolutional Neural Networks (CNN), Recurrent neural networks (RNN),…, are the network structure composed of artificial neurons are suggested and succesfully applied for some specific problems
Succesfully applied in:
We will loosely discuss Feed Forward Network
Perceptron (a binary classifier)
Artificial Neurons (generalized version of perceptron)
[Deep] Neural Networks (Network of Artificial Neurons, also called Artificial Neural Networks, ANN)
Layers and Nodes
Activation Function
Back-propagation
Batching, mini-batching
Regularization
Dropout
Learning Rate
Example(1)
A decision to attend a outdoor activity based on these three factors (let’s assume binary factors)
Based on your utility function \(U(.)\) you will decide to attend (for the sake of simplicity once more, it is an additive function)
If the utility level exceeds some threshold then the decision is yes (attend) otherwise no
\(\text{outcome, }y = \begin{cases} 1 & \text{if } w_1 x_1+w_2 x_2+w_3x_3 \geq threshold \\ 0 & \text{if otherwise} \end{cases}\)
Or
\(y = \begin{cases} 1 & \text{if } x \cdot w \geq threshold \\ 0 & \text{if otherwise} \end{cases}\)
 - \(y\) is a step function
where it takes \(1\) if a linear
combination of \(x's\) are greater
than a threshold value
*: Image is from https://stats.stackexchange.com/questions/419716/whats-the-difference-between-artificial-neuron-and-perceptron
Perceptron is at the center of the artificial neural networks acts similar to biological neuron. It activates other neurons based on the values that receives from input terminals
It is a mathematical node and the basic processing element
In this example, body just get the sum of weighted inputs then activation function converts this value into zero or one. This is one of the example of the activation functions. There are several activation functions suggested other than step function.
In this example there are input layer, \(L_1\), hidden layer, \(L_2\) and the output layer, \(L_3\)
In more complex setting, there may be more than one hidden layer, time to time called deep neural network
An example, deep Feedforward Network,
*: Source http://euler.stat.yale.edu/~tba3/stat665/lectures/lec12/lecture12.pdf
Activation functions in hidden layers are typically nonlinear
In most cases the same activation function is used in the network
The commonly used activation functions \(f(.)\) are the following (as a notation, \(z= \boldsymbol{w \cdot x + b}\)):
The activations are like nonlinear transformations of linear combinations of the features (inputs)
Network is trained (finding the weights) iteratively
Propagation backward (backpropagation) algorithm was the first popular one:
Gradient Descent algorithm that is applied for optimization (for example minimizing loss function) requires, initialization, stopping condition, step size (learning rate), gradient of the function
The example of a simple linear regression is:
Neural networks can be used to approximate a function
One hidden layer is enough to model any piecewise continuous function (Hornik et.al., 1989)
This is an example of \(y=x^2\)
function that is modeled with two hidden layers each with three neurons
and logistic activation function (using neuralnet package
of R)
 
Number of hidden neurons is a task specific problem
Using too many neurons is increasing the risk of overfitting
It is model selection problem
There is no standard and accepted way of choosing better network structure
Neural networks have input layer, output layer and number of hidden layers
Number of Inputs, Outputs and the complexity of the problem
Using more hidden layers may create optimization problem
For many problems single hidden layer with enough nodes is enough
The example below shows single hidden layer with six neurons
Neural Network power increases with if neurons operates independently
In practice, some neurons begins to detect the same features of the data (adaptation)
Dropout is the solution of this co-adaptation
When hidden neurons are randomly selected, weak learning model is obtained during each training epoch. Combination of these weak learners results in stronger predictive power
As a summary:
*Image Source: http://neuralnetworksanddeeplearning.com/chap3.html
*Image Source: https://www.geeksforgeeks.org/machine-learning/backpropagation-in-neural-network/
*Epoch: One complete forward pass and one backward pass of the error for all training instances
 
 
 
 Next Week: R
Examples