The correct answer is C. Data Splitting.
Data splitting is the process of dividing a dataset into three parts: training, validation, and testing sets. The training set is used to train the model, the validation set is used to evaluate the model’s performance during training, and the testing set is used to evaluate the model’s performance after training.
Data sampling is the process of selecting a subset of data from a larger dataset. Data cleaning is the process of removing errors and inconsistencies from data. Data transformation is the process of converting data into a format that is more suitable for machine learning.
Here is a more detailed explanation of each option:
- Data Sampling: Data sampling is the process of selecting a subset of data from a larger dataset. This can be done randomly, or it can be done based on certain criteria. Data sampling is often used to reduce the size of a dataset, or to create a representative sample of a larger population.
- Data Cleaning: Data cleaning is the process of removing errors and inconsistencies from data. This can include correcting typos, removing duplicate data, and filling in missing values. Data cleaning is an important step in any data analysis project, as it can help to ensure that the results are accurate and reliable.
- Data Transformation: Data transformation is the process of converting data into a format that is more suitable for machine learning. This can include converting data into a numerical format, or into a format that is compatible with a particular machine learning algorithm. Data transformation is often necessary to prepare data for machine learning, as the data may not be in a format that is compatible with the algorithm.
- Data Splitting: Data splitting is the process of dividing a dataset into three parts: training, validation, and testing sets. The training set is used to train the model, the validation set is used to evaluate the model’s performance during training, and the testing set is used to evaluate the model’s performance after training. Data splitting is an important step in any machine learning project, as it allows the model to be evaluated on data that it has not seen before.