If I am using all features of my dataset and I achieve 100% accuracy on my training set, but ~70% on validation set, what should I look out for?

underfitting
nothing, the model is perfect
overfitting
None of these

The correct answer is C. overfitting.

Overfitting is a common problem in machine learning that occurs when a model learns the training data too well and is unable to generalize to new data. This can happen when a model is trained on a small dataset or when the model is too complex.

In the case of the question, the model is achieving 100% accuracy on the training set, but only 70% accuracy on the validation set. This suggests that the model is overfitting to the training data and is not able to generalize to new data.

There are a number of ways to address overfitting, such as:

  • Regularization: This is a technique that penalizes the model for having too many complex features. This can help to prevent the model from overfitting to the training data.
  • Data augmentation: This is a technique that artificially increases the size of the training dataset. This can help the model to learn more about the underlying patterns in the data and to avoid overfitting.
  • Early stopping: This is a technique that stops the training process early, before the model has had a chance to overfit to the training data.

If you are experiencing overfitting, it is important to try a number of different techniques to address the problem. The best technique to use will depend on the specific problem you are trying to solve.