In machine learning, what is the term for the process of transforming categorical data into numerical format for modeling?

Data Aggregation
Data Encoding
Data Normalization
Data Imputation

The correct answer is C. Data Encoding.

Data encoding is the process of converting categorical data into numerical format for modeling. This is done by assigning a unique number to each category. For example, if you have a dataset of customers with the following categories: “Male”, “Female”, and “Other”, you could encode them as follows:

  • Male = 0
  • Female = 1
  • Other = 2

Once the data is encoded, it can be used in machine learning models.

A. Data Aggregation is the process of combining data from multiple sources into a single dataset. This is often done to create a more complete picture of the data. For example, you might aggregate data from sales, marketing, and customer service to get a better understanding of your overall business performance.

B. Data Imputation is the process of filling in missing values in a dataset. This is often done before data analysis or modeling. There are a number of different imputation methods, such as mean imputation, median imputation, and multiple imputation.

D. Data Normalization is the process of transforming data so that it has a mean of 0 and a standard deviation of 1. This is often done before data analysis or modeling. Normalization can help to improve the performance of machine learning models.