Use the 'ffdownload'-package to download Fama-French datasets in R

2024-12-20·
Sebastian Stöckl
Sebastian Stöckl
· 4 min read
Project StatusCRAN StatusCRAN downloadsLifecycleWebsite
Project Status
CRAN status
CRAN downloads
Lifecycle: stable
Website pkgdown

Literally tens of thousands of papers use and cite data from Kenneth French’s famous data library providing academia with US and international Asset Pricing factors and portfolios. However, due to their composition, the CSV files on the website are tedious to import and usually require a lot of manual labor. This prohibits researchers from all over the world to automatically update and use these files.

For this purpose, many years ago I have commissioned the initial files of a package that has now (with much additional work from my side) become FFdownload and is available on CRAN as well as my github repository.

We install either the official release or the development version using

install("ffdownload")
# or development version
devtools::install_github("sstoeckl/ffdownload")

As there are many different files such as monthly files that additionally contain annual data as well as daily and sometimes even weekly files, the algorithm needs very clear specifications which I will detail in the next subsection:

Downloading one ore more specific datasets

In this case, we download the Fama and French (1992), Fama and French (1993) 3-Factor-Dataset, process it (automatically) and plot the resulting factors. To do this, we use the optional argument listinput specifying ‘F-F_Research_Data_Factors’ and consequently only downloading and processing this specific dataset. The FFdownload() function thereby takes the following arguments:

  • output_file name of the .RData file to be saved (include path if necessary)
  • tempdir specify if you want to save downloaded files at a specific location. Necessary for reproducible research as the files on the website do change from time to time
  • exclude_daily excludes the daily datasets (are not downloaded) ==> speeds the process up considerably
  • download set to TRUE if you actually want to do the download again (e.g. you want to update data). set to false and specify tempdir to keep processing the already downloaded files
  • download_only set to FALSE if you want to process all your downloaded files at once
  • listsave if not NULL, the list of unzipped files is saved here (good for processing only a limited number of files through inputlist). Is written before inputlist is processed
  • inputlist if not NULL, FFdownload tries to match the names from the list with the list of downloadable files (zipped CSV) on the website
library(FFdownload)
tempf <- tempfile(fileext = ".RData")
inputlist <- c("F-F_Research_Data_Factors")
FFdownload(output_file = tempf, inputlist=inputlist, exclude_daily = TRUE, download = TRUE, download_only=FALSE)
load(tempf)
fig <- exp(cumsum(FFdata$`x_F-F_Research_Data_Factors`$monthly$Temp2["1960-01-01/",c("Mkt.RF","SMB","HML")]/100))
plotFF <- plot(fig[,"Mkt.RF"],main="FF 3 Factors",major.ticks = "years",format.labels="%Y",col="black",lwd=2,lty=1,cex=0.8)
plotFF <- lines(fig[,"SMB"],on=NA,main="Size",col="darkgreen",lwd=2,lty=1,ylim=c(0,5),cex=0.8)
plotFF <- lines(fig[,"HML"],on=NA,main="Value",col="darkred",lwd=2,lty=1,ylim=c(0,15),cex=0.8)
plotFF

We could also add momentum (Carhart 1997) and the additional two factors of the Fama and French (2014) 5-factor model by additionally specifying ‘F-F_Momentum_Factor’, ‘F-F_ST_Reversal_Factor’ and ‘F-F_LT_Reversal_Factor’. We do this and make use of the ggplot package to create another plot.

library(tidyverse);library(timetk)
tempf <- tempfile(fileext = ".RData")
inputlist <- c('F-F_Research_Data_Factors','F-F_Momentum_Factor', 'F-F_ST_Reversal_Factor', 'F-F_LT_Reversal_Factor')
FFdownload(output_file = tempf, inputlist=inputlist, exclude_daily = TRUE, download = TRUE, download_only=FALSE)
load(tempf)
FFfive <- FFdata$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date") %>%
    left_join(FFdata$`x_F-F_Momentum_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
    left_join(FFdata$`x_F-F_ST_Reversal_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
    left_join(FFdata$`x_F-F_LT_Reversal_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
    pivot_longer(Mkt.RF:LT_Rev,names_to="FFVar",values_to="FFret") %>% mutate(FFret=FFret/100,date=as.Date(date))
FFfive %>% filter(date>="1960-01-01",!FFVar=="RF") %>% group_by(FFVar) %>% arrange(FFVar,date) %>%
  mutate(FFret=ifelse(date=="1960-01-01",1,FFret),FFretv=cumprod(1+FFret)-1) %>% 
  ggplot(aes(x=date,y=FFretv,col=FFVar,type=FFVar)) + geom_line(lwd=1.2) + scale_y_log10() +
  labs(title="FF5 Factors plus Momentum", subtitle="Cumulative wealth plots",ylab="cum. returns") + 
  scale_colour_viridis_d("FFvar") +
  theme_bw() + theme(legend.position="bottom")

Be aware, that downloading all monthly files takes some time in processing by the file converter. If you additionally include the daily files, processing time can sum to >1 hour.

If you want a Snapshot of all the files saved on your hard drive (before they change again) I recommend specifying a permanent tempdir where the downloaded files will not be deleted on restart. Also, if you have already downloaded a Snapshot of the data without processing (download=TRUE and download_only=TRUE), you can post-process without re-downloading by setting download=FALSE and download_only=FALSE.

If you just want a list of all available files on the website to select the ones you really need to download, I suggest setting listsave to a specific location and keep download=FALSE as well as download_only=TRUE.

References

Carhart, Mark M. 1997. “On Persistence in Mutual Fund Performance.” The Journal of Finance 52 (1): 57–82. https://doi.org/10.2307/2329556.

Fama, Eugene F., and Kenneth R. French. 1992. “The Cross-Section of Expected Stock Returns.” The Journal of Finance 47 (2): 427–65. https://doi.org/10.1111/j.1540-6261.1992.tb04398.x.

———. 1993. “Common Risk Factors in the Returns on Stocks and Bonds.” Journal of Financial Economics 33 (1): 3–56. https://doi.org/10.1016/0304-405X(93)90023-5.

———. 2014. “A Five-Factor Asset Pricing Model.” Journal of Financial Economics. https://doi.org/10.1016/j.jfineco.2014.10.010.