pytrnsys_process.process.process_sim.handle_duplicate_columns#

pytrnsys_process.process.process_sim.handle_duplicate_columns(df: DataFrame) DataFrame[source]#

Process duplicate columns in a DataFrame, ensuring they contain consistent data.

This function checks for duplicate column names and verifies that: 1. If one duplicate column has NaN values, the other(s) must also have NaN at the same indices 2. All non-NaN values must be identical across duplicate columns

Parameters:

df (pandas.DataFrame) – Input DataFrame to process

Returns:

df – DataFrame with duplicate columns removed, keeping only the first occurrence

Return type:

pandas.DataFrame

Raises:

ValueError – If duplicate columns have: 1. NaN values in one column while having actual values in another at the same index, or 2. Different non-NaN values at the same index