
#setwd("projects/Excel Aggregate Loop/excel files")
wd_path = "C:/Users/Sam/Desktop/projects/Excel Aggregate Loop/excel files"
list.files(wd_path)
## [1] "claims listing 2011.xlsx" "claims listing 2012.xlsx"
## [3] "claims listing 2013.xlsx" "claims listing 2014.xlsx"
## [5] "claims listing 2015.xlsx" "claims listing 2016.xlsx"
R can then loop through each of these files and perform any action. If there were 100 files, or 1000 files in
this directory, this would still work.
The loss files have dummy data for a claims listing. Each row is an individual claim which would have
a member name, memberID, loss date, report date, paid loss amount, incurred loss amount, case reserve
amount, etc. To make this as realistic as possible, the have artibtrary columns in addition to the ones which
we need, file year, loss date, and paid.
file_names = list.files(wd_path)
file_paths = paste(wd_path, "/", file_names, sep = "")
head(read_excel(file_paths[4]) %>% select(-3,-4,-5))
## # A tibble: 4 x 6
## name `license number` age `file year` `loss date` paid
## <chr> <dbl> <dbl> <dbl> <dttm> <dbl>
## 1 jeff 3 23 2014 2011-01-01 400
## 2 sue 3 43 2014 2012-01-01 400
## 3 mark 2 55 2014 2013-01-01 200
## 4 sarah 1 100 2014 2014-01-01 500
In order to evaluate the age of the losses, we need to take into account when each loss was evaluated. This is
accomplished by going into Excel and adding in a column for “file year”, which specifies the year of evaluation
of the file. For instance, for the “claim listing 2013” file, all of the claims have a “2013” in the “file year”
column.
Step 2: Load the Data into R
Initialize a data frame which will store the aggregated loss run data from each of the excel files.
The names
of this data frame need to be the names of excel file columns which need to be aggregated.
For instance, these could be “reported”, “Paid Loss”, “Case Reserves”, or “Incurred Loss”. If the excel files
have different names for the same quantities (ie, “Paid Loss” vs. “paid loss”), then they should be renamed
within excel first.
merged_data = as_data_frame(matrix(nrow = 0, ncol = 3))
names(merged_data) = c("file year", "loss date", "paid")
Someone once said “if you need to use a ‘for’ loop in R, then you are doing something wrong”. Vectorized
functions are faster and easier to implement. The function
my_extract
below takes in the file name of the
excel file and returns a data frame with only the columns which are selected.
excel_file_extractor = function(cur_file_name){
read_excel(cur_file_name[1], sheet = "Sheet1", col_names = T) %>%
select(`file year`, `file year`, `loss date`, paid) %>%
rbind(merged_data)
}
Apply the function to all of the files in the folder that you created. Obviously, if you had 100 excel files this
would still work just as effectively.
2