I have a CSV file that is too large to be opened in Excel. In this file, I am trying to see how much of each column's sum is from rows 1, 47:56, and 156:158. This does NOT include the first two columns of the file. There are also headers for both the rows and columns. The file is made up of about 22,000 columns and 210 rows.
I am trying to see which columns get more than >.1% of their total sums from the mentioned rows, and then delete those columns.
I have been using vroom rather than readr when loading the file due to how large it is.
Example file setup:
H e a d e r A1 A2 A3 A4 A5 A6H 1 sample a 1 0 0 13 0 9e 2 sample b 4 0 0 8 312 24a 3 sample c 0 20 0 49 0 17d 4 sample d 2 0 213 18 56 3e 5 sample e 5 4 0 10 94 62r 6 sample f 9 87 0 2 33 90
Code:
library(dplyr)library(vroom)myData <- vroom("File.csv")myData$newRow <- 100*(colSums(myData[-1, -2])/rowSums(myData[1, 47:56, 156:158]))
I was trying to create a new row with the percentage from (each column's sum EXCEPT 1 and 2)/(sum called rows). And here is the latest error message I have received, and the one that I am having trouble understanding:
> myData$newRow <- 100*(colSums(myData[-1, -2])/rowSums(myData[1, 47:56, 156:158]))Error:! Assigned data `100 * ...` must be compatible with existing data.✖ Existing data has 200 rows.✖ Assigned data has 21941 rows.ℹ Only vectors of size 1 are recycled.Run `rlang::last_trace()` to see where the error occurred.Warning message:In drop && length(xo) == 1L :'length(x) = 3 > 1' in coercion to 'logical(1)'
Any advice would be appreciated. Thank you.