What is the term for the process of converting categorical data into numerical form for machine learning?

Data normalization
Text vectorization
Feature scaling
One-hot encoding

The correct answer is D. One-hot encoding.

One-hot encoding is a technique used to convert categorical data into numerical form for machine learning. It works by creating a new feature for each unique category in the data. For example, if you have a dataset of customers with the following categories: gender (male, female), age (18-24, 25-34, 35-44, 45-54, 55-64, 65+), and location (California, New York, Texas, Florida, Illinois), you would create six new features: gender_male, gender_female, age_18_24, age_25_34, age_35_44, age_45_54, age_55_64, age_65+, location_California, location_New York, location_Texas, location_Florida, location_Illinois.

One-hot encoding is a common technique used in machine learning because it allows algorithms to work with categorical data in a way that they can understand. It is also relatively easy to implement and can be done in most programming languages.

Here is a brief explanation of each of the other options:

  • Data normalization is the process of rescaling data so that it has a mean of 0 and a standard deviation of 1. This is often done to improve the performance of machine learning algorithms.
  • Text vectorization is the process of converting text data into a numerical form that can be used by machine learning algorithms. This can be done in a number of ways, such as bag-of-words or n-grams.
  • Feature scaling is the process of rescaling features so that they have a similar scale. This is often done to improve the performance of machine learning algorithms.