Use the 'ffdownload'-package to download Fama-French datasets in R
Project Status | CRAN Status | CRAN downloads | Lifecycle | Website |
---|---|---|---|---|
Literally tens of thousands of papers use and cite data from Kenneth French’s famous data library providing academia with US and international Asset Pricing factors and portfolios. However, due to their composition, the CSV files on the website are tedious to import and usually require a lot of manual labor. This prohibits researchers from all over the world to automatically update and use these files.
For this purpose, many years ago I have commissioned the initial files
of a package that has now (with much additional work from my side)
become FFdownload
and is available on
CRAN as well as my
github repository.
We install either the official release or the development version using
install("ffdownload")
# or development version
devtools::install_github("sstoeckl/ffdownload")
As there are many different files such as monthly files that additionally contain annual data as well as daily and sometimes even weekly files, the algorithm needs very clear specifications which I will detail in the next subsection:
Downloading one ore more specific datasets
In this case, we download the Fama and French
(1992), Fama and French
(1993) 3-Factor-Dataset, process it
(automatically) and plot the resulting factors. To do this, we use the
optional argument listinput
specifying ‘F-F_Research_Data_Factors’
and consequently only downloading and processing this specific dataset.
The FFdownload()
function thereby takes the following arguments:
output_file
name of the .RData file to be saved (include path if necessary)tempdir
specify if you want to save downloaded files at a specific location. Necessary for reproducible research as the files on the website do change from time to timeexclude_daily
excludes the daily datasets (are not downloaded) ==> speeds the process up considerablydownload
set to TRUE if you actually want to do the download again (e.g. you want to update data). set to false and specifytempdir
to keep processing the already downloaded filesdownload_only
set to FALSE if you want to process all your downloaded files at oncelistsave
if not NULL, the list of unzipped files is saved here (good for processing only a limited number of files throughinputlist
). Is written beforeinputlist
is processedinputlist
if not NULL, FFdownload tries to match the names from the list with the list of downloadable files (zipped CSV) on the website
library(FFdownload)
tempf <- tempfile(fileext = ".RData")
inputlist <- c("F-F_Research_Data_Factors")
FFdownload(output_file = tempf, inputlist=inputlist, exclude_daily = TRUE, download = TRUE, download_only=FALSE)
load(tempf)
fig <- exp(cumsum(FFdata$`x_F-F_Research_Data_Factors`$monthly$Temp2["1960-01-01/",c("Mkt.RF","SMB","HML")]/100))
plotFF <- plot(fig[,"Mkt.RF"],main="FF 3 Factors",major.ticks = "years",format.labels="%Y",col="black",lwd=2,lty=1,cex=0.8)
plotFF <- lines(fig[,"SMB"],on=NA,main="Size",col="darkgreen",lwd=2,lty=1,ylim=c(0,5),cex=0.8)
plotFF <- lines(fig[,"HML"],on=NA,main="Value",col="darkred",lwd=2,lty=1,ylim=c(0,15),cex=0.8)
plotFF
We could also add momentum (Carhart
1997) and the additional two factors of
the Fama and French (2014) 5-factor model
by additionally specifying ‘F-F_Momentum_Factor’,
‘F-F_ST_Reversal_Factor’ and ‘F-F_LT_Reversal_Factor’. We do this
and make use of the ggplot
package to create another plot.
library(tidyverse);library(timetk)
tempf <- tempfile(fileext = ".RData")
inputlist <- c('F-F_Research_Data_Factors','F-F_Momentum_Factor', 'F-F_ST_Reversal_Factor', 'F-F_LT_Reversal_Factor')
FFdownload(output_file = tempf, inputlist=inputlist, exclude_daily = TRUE, download = TRUE, download_only=FALSE)
load(tempf)
FFfive <- FFdata$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date") %>%
left_join(FFdata$`x_F-F_Momentum_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
left_join(FFdata$`x_F-F_ST_Reversal_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
left_join(FFdata$`x_F-F_LT_Reversal_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
pivot_longer(Mkt.RF:LT_Rev,names_to="FFVar",values_to="FFret") %>% mutate(FFret=FFret/100,date=as.Date(date))
FFfive %>% filter(date>="1960-01-01",!FFVar=="RF") %>% group_by(FFVar) %>% arrange(FFVar,date) %>%
mutate(FFret=ifelse(date=="1960-01-01",1,FFret),FFretv=cumprod(1+FFret)-1) %>%
ggplot(aes(x=date,y=FFretv,col=FFVar,type=FFVar)) + geom_line(lwd=1.2) + scale_y_log10() +
labs(title="FF5 Factors plus Momentum", subtitle="Cumulative wealth plots",ylab="cum. returns") +
scale_colour_viridis_d("FFvar") +
theme_bw() + theme(legend.position="bottom")
If you want a Snapshot of all the files saved on your hard drive (before
they change again) I recommend specifying a permanent tempdir
where
the downloaded files will not be deleted on restart. Also, if you have
already downloaded a Snapshot of the data without processing
(download=TRUE
and download_only=TRUE
), you can post-process without
re-downloading by setting download=FALSE
and download_only=FALSE
.
listsave
to a specific location and keep download=FALSE
as
well as download_only=TRUE
.References
Carhart, Mark M. 1997. “On Persistence in Mutual Fund Performance.” The Journal of Finance 52 (1): 57–82. https://doi.org/10.2307/2329556.
Fama, Eugene F., and Kenneth R. French. 1992. “The Cross-Section of Expected Stock Returns.” The Journal of Finance 47 (2): 427–65. https://doi.org/10.1111/j.1540-6261.1992.tb04398.x.
———. 1993. “Common Risk Factors in the Returns on Stocks and Bonds.” Journal of Financial Economics 33 (1): 3–56. https://doi.org/10.1016/0304-405X(93)90023-5.
———. 2014. “A Five-Factor Asset Pricing Model.” Journal of Financial Economics. https://doi.org/10.1016/j.jfineco.2014.10.010.