In Pandas, what method is used to remove duplicate rows from a DataFrame based on specified columns?

dropna()
fillna()
drop_duplicates()
groupby()

The correct answer is C. drop_duplicates().

The drop_duplicates() method is used to remove duplicate rows from a DataFrame based on specified columns. It takes two arguments: the columns to check for duplicates, and the how argument, which specifies how to handle duplicate rows. The how argument can be one of the following values:

  • ‘any’: Remove all rows that have any duplicate values in the specified columns.
  • ‘first’: Keep the first row for each group of duplicate rows.
  • ‘last’: Keep the last row for each group of duplicate rows.
  • ‘keep’: Keep all rows, even if they have duplicate values.

For example, the following code uses the drop_duplicates() method to remove all rows that have duplicate values in the ‘A’ and ‘B’ columns:

“`
df = pd.DataFrame({‘A’: [1, 2, 3, 4, 5], ‘B’: [6, 7, 8, 9, 10]})
df = df.drop_duplicates([‘A’, ‘B’])
print(df)

A B
0 1 6
1 2 7
2 3 8
3 4 9
4 5 10
“`

The dropna() method is used to remove rows that have missing values. The fillna() method is used to fill in missing values with a specified value. The groupby() method is used to group rows together based on a common value.

Exit mobile version