Quantcast
Channel: Active questions tagged row - Stack Overflow
Viewing all articles
Browse latest Browse all 445

What sample margins does rake choose here? Two samples in one data frame

$
0
0

Let's say I have one data frame that consists of two samples. Assume there are no clusters or strata. The data set looks like this:

  var1 var2 ... climate1    2    1 ...   Warm2    2    3 ...   Cold3    3    2 ...   Warm4    1    1 ...   Warm5    3    1 ...   Cold...

There are different variables (var1, ...) and one indicator "climate", that tells you to which sample the rows/sampled units belong. There are no missings in the variable climate.

I created two survey design objects and I think this step is correct because the number of rows in the survey design objects differs.

svy.unw.1 <- svydesign(ids = ~1, data = plant[plant$climate == "Warm",] ) svy.unw.2 <- svydesign(ids = ~1, data = plant[plant$climate == "Cold",] ) 

Next, I created individual population margins looking like this:

var1.P1 <- data.frame(var1 = c(1,2,3),                      Freq = nrow(plant[plant$climate == "Warm",]) * c(0.2, 0.2, 0.6)) var1.P2 <- data.frame(var1 = c(1,2,3),                      Freq = nrow(plant[plant$climate == "Cold",]) * c(0.1, 0.1, 0.8)) ...

Then I raked the data.

svy.rake.1 <- rake(design = svy.unw.1,                     sample.margins = list(~var1, ~var2),                     population.margins = list(var1.P1, var2.P1))svy.rake.2 <- rake(design = svy.unw.2,                    sample.margins = list(~var1, ~var2),                    population.margins = list(var1.P2, var2.P2))

Now, here is my question:

Since there are two samples in the data frame "plant", will this part of the rake function: sample.margins = list(~var1, ~var2) choose the correct values of var1 and var2?

More specifically:Will the sample margins for svy.rake.1 / svy.rake.2 be only selected from the rows of plant where plant$climate =="Warm" / plant$climate =="Cold" ?

I am unsure since I subsetted the data frame when creating the survey design objects, but in the rake command there is no further specification of which rows to work with. What I don't want is that the whole data frame is used as a basis for the sample margins of the two distinct samples.

Note: Besides the issue of the two samples, I followed this instruction: https://www.r-bloggers.com/2014/04/survey-computing-your-own-post-stratification-weights-in-r/by Christoph Waldhauser.


Viewing all articles
Browse latest Browse all 445

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>