Concept

Train Test Split Function

This function randomly shuffles the dataset and splits off a certain percentage of the input samples for use as a training set, and then puts the remaining samples into a different variable for use as a test set. So in this example, we're using a 75-25% split of training versus test data. And that's a pretty standard relative split that's used. It's a good rule of thumb to use in deciding what proportion of training versus test might be helpful.

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
  • Unless optional parameters test_size or train_size are passed in, the method will subset the training and test data randomly into 75% train:25% test
  • random_state is an optional parameter used for reproducible output across multiple function calls

0

2

Updated 2021-02-27

Tags

Data Science