Quantcast
Channel: Active questions tagged row - Stack Overflow
Viewing all articles
Browse latest Browse all 448

Long format data: Calculate NA for year x as row mean of other years

$
0
0

I have a pretty large longitudinal data set ranging from 2014 to 2021. Most of the variables are available for every years. However there are a few variables that are available for 2014 and 2016, but not for 2015. In these cases, I want to calculate the value for 2015 as the mean of the value from 2014 and 2016.

So the data structure looks as follows. Note this is extremely simplified, the data set has way more variables and observation. Also for each respondent there are rows for the other years as well (obviously), which I didnt write down here

PIDYearVar 1Var 2Var 3
120141022
12015158NA
120161264
220141175
22015163NA
220161459

PID is the id/number that identifies each respondent. Var1 and Var2 are available for every year, Var3 is only available in 2014 and 2015

What I want is this:

PIDYearVar 1Var 2Var 3
120141022
120151583
120161264
220141175
220151637
220161459

For Var3, instead of NA, the row for 2015 contains the mean of the value in 2014 and 2016.How can I achieve this?

My first ideas was to adress the missing values in 2015 by is.na() but this would address all the NAs in the whole data set and not just the NAs in 2015 for Var2. How can I adress these NAs specifically, so that it a) only calculates the value for 2015 as mean of 2014 and 2016 for Var2 and b) only for those rows where PID is the same, so that values of different respondents do not get mixed up?


Viewing all articles
Browse latest Browse all 448

Trending Articles