The correct answer is: D. all above
Removing the whole line is a simple solution, but it can lead to loss of information. Creating sub-models to predict those features is a more complex solution, but it can provide more accurate results. Using an automatic strategy to input them according to the other known values is a middle ground between the two, and it can be a good option if you don’t have enough data to train a separate model for each feature.
Here is a more detailed explanation of each option:
- Removing the whole line is the simplest solution, but it can lead to loss of information. If you remove a line, you are essentially saying that you don’t know anything about the values of the features in that line. This can be a problem if those features are important for your analysis.
- Creating sub-models to predict those features is a more complex solution, but it can provide more accurate results. If you create a separate model for each feature, you can train each model on a dataset that is specifically tailored to that feature. This can lead to more accurate predictions. However, it is also more time-consuming and computationally expensive to create multiple models.
- Using an automatic strategy to input them according to the other known values is a middle ground between the two. This approach involves using a rule-based system or a machine learning algorithm to predict the values of the missing features. This can be a good option if you don’t have enough data to train a separate model for each feature, but you still want to get more accurate results than you would get by simply removing the missing lines.
Ultimately, the best approach for dealing with missing data depends on the specific situation. If you have a lot of data and you are willing to invest the time and resources, creating separate models for each feature can be the best option. However, if you don’t have as much data or you are on a tight deadline, using an automatic strategy to input the missing values can be a good alternative.