Deep learning train/dev/test split source

If you have 10,000,000 examples, how would you split the train/dev/test set?

The dev and test set should:

Data inputted into deep learning algorithms should be split into a distribution in which the majority of data (well over 70%+) is fed into the training set. This differs from traditional machine learning algorithms which had ~70% of the data given to the training set. This difference comes from the fact that there is now much more data available and deep learning algorithms improve with more data, whereas traditional machine learning algorithms improvement levels out after a certain amount of data is fed in.

For example, in a dataset of 1 million examples, you mighht decide that 10k examples are enough to evaluate which algorithm is better - 98% train, 1% dev, 1% test.

University of Michigan - Ann Arbor

Deep learning algorithms can be implemented to solve real-world problems today.

Deep feedforward networks, also called multilayer perceptrons (MLPs), is the quintessential deep learning model that is behind nearly all modern practical applications of deep learning.

Scaling these models to large inputs such as high-resolution images or long temporal sequences requires specialization. 
 - Convolutional neural networks (CNN) can be used for processing large images.
 - Recurrent neural networks (RNN) can be used for processing temporal sequences. 


Deep Learning Algorithms

This function randomly shuffles the dataset and splits off a certain percentage of the input samples for use as a training set, and then puts the remaining samples into a different variable for use as a test set. So in this example, we're using a 75-25% split of training versus test data. And that's a pretty standard relative split that's used. It's a good rule of thumb to use in deciding what proportion of training versus test might be helpful.
```
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
```
- Unless optional parameters test_size or train_size are passed in, the method will subset the training and test data randomly into 75% train:25% test
- random_state is an optional parameter used for reproducible output across multiple function calls

Train Test Split Function

https://www.simplilearn.com/deep-learning-algorithms-article

More about deep learning algorithms 

A helpful website that introduces neural networks:
https://missinglink.ai/guides/neural-network-concepts/

Neural Network Reference

Convolutional networks, also known as convolutional neural networks, or CNNs, are a specialized kind of neural network for processing data that has a known grid-like topology. Examples include time-series data, which can be thought of as a 1-D grid taking samples at regular time intervals, and image data, which can be thought of as a 2-D grid of pixels. 

The network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers.

The CNN gets its power from the following concepts:
- Sparse connectivity
- Parameter sharing

Convolutional Neural Network (CNN)

Recurrent neural networks, or RNNs, are a family of neural networks for processing sequential data. A recurrent neural network is specialized for processing a sequence of inputs $x^{(1)}, . . . , x^{(τ)}$ and each time add additional layers of comprehension on top of the previous inputs. 


Recurrent Neural Network (RNN)

**Generative Models**  As might be clear from their name , generative models are able to generate new contents by learning features of certain datasets. As for more formal definition, generative models specify in terms of a probabilistic model describing how it thinks the dataset was produced.  Using sampling from this probability distribution, we can create new data in the style of the old.
 
The joint probability distribution P(X,Y) is learned from the data, and then divided by P(X) to obtain the conditional probability distribution P(Y|X), which is used as a predictive model. It is called a generative model because the model represents the relationship between a given input X and an output Y. Common generative models are: Naive Bayes, Hidden Markov Model, Gaussian Mixture Model, Document Topic Generation Model (LDA), and restricted Boltzmann machine.

Generative Models

There are functions you can compute with a small layer that shallower networks would need a lot more hidden units in order to compute. 

Circuit Theory

Deep learning train/dev/test split 

Deep feedforward networks, also called feedforward neural networks, or multilayer perceptrons (MLPs), are the quintessential deep learning models.

It was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

Deep Feedforward Networks (MLP = Multi-Layer Perceptrons)

Instead of manually building up deep network architectures, many python libraries are available online to help users to easily implement deep learning models.

- Trax (built on TensorFlow)
- Tensor2Tensor (built on TensorFlow)
- Pytorch
- Fast.ai ( built on top of PyTorch)
- Keras (built to be a higher level framework to interface with other frameworks like TensorFlow)
- Vowpal Wabbit 
- Theano
- Caffee 

Deep Learning Python libraries (frameworks)

What can convolutional neural networks be used for?

A generative adversarial network (GAN) is a class of machine learning framework where two neural networks, a generative model and a discriminant model, contest against each other in a "game" (in the form of a zero-sum game, where one agent's gain is another agent's loss). Given a training set, this technique learns to generate new data with the same statistics as the training set. 

The core idea of a GAN is based on the "indirect" training through the discriminator, which itself is also being updated dynamically. This basically means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator and the discriminator is rewarded every time it correctly deduces if a sample is from the model distribution or the data distribution. This enables the model to learn in an unsupervised manner.

Generative adversarial network(GAN)

Deep Learning models require incredible computational resources to successfully train a model. As a result, optimizing the training process is a key step in developing an algorithm. One approach involves finding the parameters of a neural network that minimize its loss function. This is accomplished via gradient descent, a popular technique for minimizing loss function error. 

Optimization for Training Deep Models

Whereas deep learning algorithms were once always implemented on a single computer's central processing unit (CPU), research has shown that families of CPUs or even a single graphics processing unit (GPU) is much more effective. Some systems employ even greater parallelization to achieve the fastest training times with distributed architectures.

Implementations of Deep Learning

Monte Carlo algorithms return results with variability. The variability can typically be reduced by allocating more resources. The benefit of Monte Carlo algorithms is that they can provide an approximated estimation with limited source while maintaining a certain degree of accuracy. In Machine Learning, many problems are too complicated to have precise answers. Monte Carlo approximations are therefore a good approach to these problems.

Monte Carlo Methods

Deep learning frameworks provide specialized software tools that streamline the construction, training, and deployment of neural networks. The availability of these efficient frameworks has made the design and implementation of whole system optimization significantly easier, which is a critical component in achieving high performance for complex deep learning pipelines.

Deep Learning Frameworks

An adversarial example is an input to a machine learning model that has been intentionally modified with small, often imperceptible perturbations designed to cause the model to output an incorrect classification. For instance, adding specific, calculated noise to an image of a panda can cause a deep neural network to misclassify it as a gibbon, even though the image remains visually identical to the original for a human observer.

Adversarial Example

`from sklearn.model_selection import train_test_split`

What does the following code do?

```
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
```

The key purpose of splitting the dataset into training and test sets is:

The purpose of setting the random_state parameter in train_test_split is:

Given a dataset with 10,000 observations and 50 features plus one label. Assume a train/test split of 75%/25%.

Learn Before

Related

Learn After