-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
90 lines (76 loc) · 3.11 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# Example of `make` for data analysis
This repository shows how to use `make` for a data analysis project.
The paper (PDF) is generated from a LaTeX source file, a table, and a figure.
`make` uses a set of instructions in `makefile` to generate the table and figure using R code and a datafile, and then generate the PDF from the manuscript.
`make` is clever because it deconstructs each part of the analysis so that only parts that have changed need to be rerun. If the data change, everything is rerun. If figure-generating code changes, only that code and the manuscript are rerun. If only the manuscript changes, only `pdflatex` is rerun. It's smart like that.
Basically it works on a directed acyclic graph (DAG) model, represented by this network graph:
```{r setup, include=FALSE}
library("svglite")
knitr::opts_chunk$set(
dev = "svglite",
fig.path="fig-",
fig.ext = "svg"
)
```
```{r dag, echo = FALSE, fig.height=4, fig.width=6}
requireNamespace("igraph", quietly = TRUE)
requireNamespace("ggraph", quietly = TRUE)
parse_makefile <-
function(
file = "makefile",
...
) {
con <- file(file, open = "rb")
on.exit(close(con))
m <- character()
# handle continuation characters
while (length(con)) {
this_line <- readLines(con, n = 1L)
if (!length(this_line)) {
break
}
m[length(m) + 1] <- this_line
while (grepl("\\\\$", m[length(m)]) && length(con)) {
m[length(m)] <- sub("\\\\$", " ", m[length(m)])
m[length(m)] <- paste0(m[length(m)], readLines(con, n = 1L))
}
}
return(m)
}
makefile_to_network <-
function(
lines = parse_makefile(),
exclude = NULL,
...
) {
rules <- lines[grepl("[ A-Za-z0-9./]+ ?[:] ?[ A-Za-z0-9./]+", lines)]
rules_list <- strsplit(rules, " ?: ?")
edges <- lapply(rules_list, function(x) {
# cleanup target
target <- x[1L]
if (target %in% exclude) {
return(NULL)
}
## handle targets with multiple outputs
targets <- strsplit(target, " ")[[1L]]
# cleanup prereqs
prereqs <- strsplit(x[2L], "[ \t]+")[[1L]]
prereqs <- prereqs[!is.na(prereqs)]
# return
if (!length(prereqs) || all(prereqs == "")) {
return(NULL)
}
as.vector(t(as.matrix((expand.grid(prereqs, targets, stringsAsFactors = FALSE)))))
})
igraph::make_graph(unlist(edges))
}
net <- makefile_to_network(parse_makefile("makefile"), exclude = "all")
ggraph::ggraph(net, layout = "kk") +
ggraph::geom_edge_link(ggplot2::aes(start_cap = ggraph::label_rect(node1.name),
end_cap = ggraph::label_rect(node2.name)),
arrow = grid::arrow(length = grid::unit(2, 'mm'))) +
ggraph::geom_node_text(ggplot2::aes(label = name), size = 4) +
ggplot2::theme_void()
```
The R file `analysis.R` shows what is going on in `makefile` using possibly more familiar R syntax. The `README.Rmd` file contains the code to construct the above graph from an arbitrary makefile.
Zach Jones has [a good tutorial](http://zmjones.com/make/) about all of this.