I have a dataframe. I would like group by col1, order by col3 and detect changes from row to row in col2.
Here is my example:
import pandas as pdimport datetimemy_df = pd.DataFrame({'col1': ['a', 'a', 'a', 'b', 'b', 'b'],'col2': [2, 2, 3, 5, 5, 5],'col3': [datetime.date(2023, 2, 1), datetime.date(2023, 3, 1), datetime.date(2023, 4, 1), datetime.date(2023, 2, 1), datetime.date(2023, 3, 1), datetime.date(2023, 4, 1)]})my_df.sort_values(by=['col3'], inplace=True)my_df_temp = my_df.groupby('col1')['col2'].apply( lambda x: x != x.shift(1)).reset_index(name='col2_change')
Here is how my dataframe looks:
col1 col2 col30 a 2 2023-02-011 a 2 2023-03-012 a 3 2023-04-013 b 5 2023-02-014 b 5 2023-03-015 b 5 2023-04-01
Here is how result looks like:
col1 level_1 col2_change0 a 0 True1 a 1 False2 a 2 True3 b 3 True4 b 4 False5 b 5 False
This is clearly incorrect. What am I doing wrong?