Version 4.3.0 has:
Version 4.2.0 will be released today. There are several improvements to code style and performance. In addition, there are new features such as cache/hash externalization and runtime prediction. See the new storage and timing vignettes for details. This release has automated checks for back-compatibility with existing projects, and I also did manual back compatibility checks on serious projects.
Version 3.0.0 is coming out. It manages environments more intelligently so that the behavior of make() is more consistent with evaluating your code in an interactive session.
Version 1.0.1 is on CRAN! I'm already working on a massive update, though. 2.0.0 is cleaner and more powerful.
igraph >= 2.1.2.memo_expr() because it causes errors on R-devel.is.R().clustermq 0.9.0 (@mschubert).rm() and remove().rlang PR 1255.batchtools template file can be brewed (#1359, @pat-s).targets.NOTICE and inst/NOTICE to more explicitly credit code included from other open source projects. (Previously drake just had comments in the source with links to the various projects.)dsl_sym() instead of as.symbol() when constructing commands for combine() (#1340, @vkehayas).level_separation argument to vis_drake_graph() and render_drake_graph() to control the aspect ratio of visNetwork graphs (#1303, @matthewstrasiotto, @matthiasgomolka, @robitalec).caching = "master" in favor of caching = "main"..data in DSL (#1323, @shirdekel).identical() to compare file hashes (#1324, @shirdekel).seed = TRUE in future::future().parallelism = "clustermq" and caching = "worker" (@richardbayes).NROW() throws an error (#1300, julian-tagell on Stack Overflow).lifecycle that does not require badges to be in man/figures.log_worker argument of clustermq::workers() to make() and drake_config() (#1305, @billdenney, @mschubert).as.is to TRUE in utils::type.convert() (#1309, @bbolker).cached_planned() and cached_unplanned() now work with non-standard cache locations (#1268, @Plebejer).use_cache to FALSE more often (#1257, @Plebejer).iris dataset with the airquality dataset in all documentation, examples, and tests (#1271).code_to_function() to the proper environment (#1275, @robitalec).tidyselect (#1274, @dernst).txtq lockfiles (#1232, #1239, #1280, @danwwilson, @pydupont, @mattwarkentin).drake_script() function to write _drake.R files for r_make() (#1282).expose_imports() in favor of make(envir = getNamespace("yourPackage") (#1286, @mvarewyck).r_make() if getOption("drake_r_make_message") is FALSE (#1238, @januz).visNetwork graph by using the hierarchical layout with visEdges(smooth = list(type = "cubicBezier", forceDirection = TRUE)) (#1289, @mstr3336).splice_inner() from dropping formal arguments shared by c() (#1262, @bart1).subtarget_hashes.cross() for crosses on a single grouping variable.group() used with specialized formats (#1236, @adamaltmejd).tidyselect >= 1.0.0..names argument (#1240, @maciejmotyka, @januz).drake_plan() (#1237, @januz).cross() sub-targets (#1204, @psadil). Expansion order is the same, but names are correctly matched now.file_out() files in clean(), even when garbage_collection is TRUE (#521, @the-Hull).keep_going = TRUE for formatted targets (#1206).progress_bar instead of progress) so that drake works without the progress package (#1208, @mbaccou).config$settings (#965).drake_done() and drake_cancelled() (#1205).drake_graph_info() (#1207).verbose is 2 (#1203, @kendonB).jobs argument of clean().drake_build() or drake_debug() (#1214, @kendonB).hasty_build (#1222).config$settings (#965).file_in()/file_out()/knitr_in() files are not literal strings (#1229).file_out() and knitr_in() in imported functions (#1229).knitr_in() in dynamic branching (#1229).target().progress() => drake_progress(), running() => drake_running(), failed() => drake_failed()) (#1205).digest version to require 0.6.21 (#1166, @boshek)depend trigger to toggle invalidation from dynamic-only dependencies, including the max_expand argument of make().session_info argument parsing (and reduce calls to utils::sessionInfo() in tests).tibble 3.0.0.target(format = "file") (#1168, #1127).max_expand on a target-by-target basis via target() (#1175, @kendonB).make(), not in drake_config() (#1156).make(verbose = 2), remove the spinner and use a progress bar to track how many targets are done so far.cli (optional package).console_log_file in favor of log_make as an argument to make() and drake_config()."loop" and "future" parallel backends (#400).loadd() RStudio addin through the new rstudio_drake_cache global option (#1169, @joelnitta).recoverable(), e.g. dynamic branching + dynamic files.drake_plan() if a grouping variable is undefined or invalid (#1182, @kendonB).drake_deps and drake_deps_ht (#1183).rlang::trace_back() to make diagnose()$error$calls nicer (#1198).These changes invalidate some targets in some workflows, but they are necessary bug fixes.
$<-() and @<-() (#1144).bind_plans() (#1136, @jennysjaarda).analyze_assign() (#1119, @jennysjaarda)."running" progress of dynamic targets."fst_tbl" format for large tibble targets (#1154, @kendonB).format argument to make(), an optional custom storage format for targets without an explicit target(format = ...) in the plan (#1124).lock_cache argument to make() to optionally suppress cache locking (#1129). (It can be annoying to interrupt make() repeatedly and unlock the cache manually every time.)cancel() and cancel_if() function to cancel targets mid-build (#1131).subtarget_list argument to loadd() and readd() to optionally load a dynamic target as a list of sub-targets (#1139, @MilesMcBain).file_out() (#1141).drake_config() level (#1156, @MilesMcBain).config argument in all user-side functions (#1118, @vkehayas). Users can now supply the plan and other make() arguments directly, without bothering with drake_config(). Now, you only need to call drake_config() in the _drake.R file for r_make() and friends. Old code with config objects should still work. Affected functions:
make()outdated()drake_build()drake_debug()recoverable()missed()deps_target()deps_profile()drake_graph_info()vis_drake_graph()sankey_drake_graph()drake_graph()text_drake_graph()predict_runtime(). Needed to rename the targets argument to targets_predict and jobs to jobs_predict.predict_workers(). Same argument name changes as predict_runtime().drake_config() is to serve functions r_make() and friends.@ operator. For example, in the static code analysis of x@y, do not register y as a dependency (#1130, @famuvie).deps_profile() (#1134, @kendonB).deps_target() output (#1134, @kendonB).drake_meta_() objects objects.drake_envir() and id_chr() (#1132).drake_envir() to select the environment with imports (#882).vctrs paradigm and its type stability for dynamic branching (#1105, #1106).target as a symbol by default in read_trace(). Required for the trace to make sense in #1107."future" backend (#1083, @jennysjaarda).log_build_times argument to make() and drake_config(). Allows users to disable the recording of build times. Produces a speedup of up to 20% on Macs (#1078).make(), outdated(make_imports = TRUE), recoverable(make_imports = TRUE), vis_drake_graph(make_imports = TRUE), clean(), etc. on the same cache.format trigger to invalidate targets when the specialized data format changes (#1104, @kendonB).cache_planned() and cache_unplanned() to help selectively clean workflows with dynamic targets (#1110, @kendonB).drake_config() objects and analyze_code() objects."qs" format (#1121, @kendonB).%||% (%|||% is faster). (#1089, @billdenney)%||NA due to slowness (#1089, @billdenney).is_dynamic() and is_subtarget() (#1089, @billdenney).getVDigest() instead of digest() (#1089, #1092, https://github.com/eddelbuettel/digest/issues/139#issuecomment-561870289, @eddelbuettel, @billdenney).backtick and .deparseOpts() to speed up deparse() (#1086, https://stackoverflow.com/users/516548/g-grothendieck, @adamkski).build_times() (#1098).mget_hash() in progress() (#1098).drake_graph_info() (#1098).outdated() (#1098).make(), avoid checking for nonexistent metadata for missing targets.drake_config().use_drake() (#1097, @lorenzwalthert, @tjmahr).drake's interpretation of the plan. In the plan, all the dependency relationships among targets and files are implicit. In the spec, they are all explicit. We get from the plan to the spec using static code analysis, e.g. analyze_code().drake::drake_plan(x = target(...)) from throwing an error if drake is not loaded (#1039, @mstr3336).transformations lifecycle badge to the proper location in the docstring (#1040, @jeroen).readd() / loadd() from turning an imported function into a target (#1067).disk.frame targets with their stored values (#1077, @brendanf).subtargets() function to get the cached names of the sub-targets of a dynamic target.subtargets arguments to loadd() and readd() to retrieve specific sub-targets from a parent dynamic target.get_trace() and read_trace() functions to help track which values of grouping variables go into the making of dynamic sub-targets.id_chr() function to get the name of the target while make() is running.plot(plan) (#1036).vis_drake_graph(), drake_graph_info(), and render_drake_graph() now take arguments that allow behavior to be defined upon selection of nodes. (#1031,@mstr3336).max_expand argument to make() and drake_config() to scale down dynamic branching (#1050, @hansvancalster).drake_config() objects.prework is a language object, list of language objects, or character vector (#1 at pat-s/multicore-debugging on GitHub, @pat-s).config$layout. Supports internal modifications by reference. Required for #685.dynamic a formal argument of target().storrs and decorated storrs (#1071).setdiff() and avoiding names(config$envir_targets).dir_size(). Incurs rehashing for some workflows, but should not invalidate any targets.which_clean() function to preview which targets will be invalidated by clean() (#1014, @pat-s).storr (#1015, @billdenney, @noamross)."diskframe" format for larger-than-memory data (#1004, @xiaodaigh).drake_tempfile() function to help with "diskframe" format. It makes sure we are not copying large datasets across different physical storage media (#1004, @xiaodaigh).code_to_function() to allow for parsing script based workflows into functions so drake_plan() can begin to manage the workflow and track dependencies. (#994, @thebioengineer)seed_trigger() (#1013, @CreRecombinase).txtq API inside decorated storr API (#1020).max_expand in drake_plan(). max_expand is now the maximum number of targets produced by map(), split(), and cross(). For cross(), this reduces the number of targets (less cumbersome) and makes the subsample of targets more representative of the complete grid. It also. ensures consistent target naming when .id is FALSE (#1002). Note: max_expand is not for production workflows anyway, so this change does not break anything important. Unfortunately, we do lose the speed boost in drake_plan() originally due to max_expand, but drake_plan() is still fast, so that is not so bad.NULL targets (#998).cross() (#1009). The same fix should apply to map() and split() too.map() (#1010).fst-powered saving of data.table objects.transform a formal argument of target() so that users do not have to type "transform =" all the time in drake_plan() (#993).ropensci.github.io/drake to docs.ropensci.org/drake.target(format = "keras") (#989).verbose argument in various caching functions. The location of the cache is now only printed in make(). This made the previous feature easier to implement.combine() (#1008).storr (#968).drake_plan(transform = slice()) understand .id and grouping variables (#963).clean(garbage_collection = TRUE, destroy = TRUE). Previously it destroyed the cache before trying to collect garbage.r_make() passes informative error messages back to the calling process (#969).map() and cross() on topologically side-by-side targets (#983).dsl_left_outer_join() so cross() selects the right combinations of existing targets (#986). This bug was probably introduced in the solution to #983.progress() more consistent, less dependent on whether tidyselect is installed.format argument of target() (#971). This allows users to leverage faster ways to save and load targets, such as write_fst() for data frames and save_model_hdf5() for Keras models. It also improves memory because it prevents storr from making a serialized in-memory copy of large data objects.tidyselect functionality for ... in progress(), analogous to loadd(), build_times(), and clean().do_stuff() and the method stuff.your_class() are defined in envir, and if do_stuff() has a call to UseMethod("stuff"), then drake's code analysis will detect stuff.your_class() as a dependency of do_stuff().file_in() URLs. Requires the new curl_handles argument of make() and drake_config() (#981).target(), map(), split(), cross(), and combine() (#979).file_out() files in clean() unless garbage_collection is TRUE. That way, make(recover = TRUE) is a true "undo button" for clean(). clean(garbage_collection = TRUE) still removes data in the cache, as well as any file_out() files from targets currently being cleaned.clean() only appears if garbage_collection is TRUE. Also, this menu is added to rescue_cache(garbage_collection = TRUE)..drake/. The old .drake_history/ folder was awkward. Old histories are migrated during drake_config(), and drake_history()..drake_history in plan_to_code(), plan_to_notebook(), and the examples in the help files.make(recover = TRUE).recoverable() and r_recoverable() to show targets that are outdated but recoverable via make(recover = TRUE).drake_history(). Powered by txtq (#918, #920).no_deps() function, similar to ignore(). no_deps() suppresses dependency detection but still tracks changes to the literal code (#910).transform_plan().seed column of drake plans to set custom seeds (#947).seed trigger to optionally ignore changes to the target seed (#947).drake_plan(), interpret custom columns as non-language objects (#942).clustermq >= 0.8.8.ensure_workers in drake_config() and make().make() after config is already supplied.make() from inside the cache (#927).CITATION file with JOSS paper.deps_profile(), include the seed and change the names.make(). All this does is invalidate old targets.set_hash() and get_hash() in storr to double the speed of progress tracking.$ (#938).xxhash64 as the default hash algorithm for non-storr hashing if the driver does not have a hash algorithm.These changes are technically breaking changes, but they should only affect advanced users.
rescue_cache() no longer returns a value.clustermq (#898). Suggest version >= 0.8.8 but allow 0.8.7 as well.drake recomputes config$layout when knitr reports change (#887).make() (#878).r_drake_build().r_make() (#889).expose_imports(): do not do the environment<- trick unless the object is a non-primitive function.assign() vs delayedAssign().file_in() files and other strings (#896).ignore() work inside loadd(), readd(), file_in(), file_out(), and knitr_in().file_in() and file_out(). drake now treats file_in()/file_out() files as URLS if they begin with "http://", "https://", or "ftp://". The fingerprint is a concatenation of the ETag and last-modified timestamp. If neither can be found or if there is no internet connection, drake throws an error."unload" and "none", which do not attempt to load a target's dependencies from memory (#897).drake_slice() to help split data across multiple targets. Related: #77, #685, #833.drake_cache() function, which is now recommended instead of get_cache() (#883).r_deps_target() function.r_make(), r_vis_drake_graph(), and r_outdated() (#892).get_cache() in favor of drake_cache().clean() menu prompt.drake_config().config argument.use_cache to FALSE in storr function calls for saving and loading targets. Also, at the end of make(), call flush_cache() (and then gc() if garbage collection is enabled).callr::r() within commands as a safe alternative to lock_envir = FALSE in the self-invalidation section of the make() help file.file_in()/file_out()/knitr_in() files. We now rehash files if the file is less than 100 KB or the time stamp changed or the file size changed.rlang's new interpolation operator {{, which was causing make() to fail when drake_plan() commands are enclosed in curly braces (#864).config$lock_envir <- FALSE" from loop_build() to backend_loop(). This makes sure config$envir is correctly locked in make(parallelism = "clustermq")..data argument of map() and cross() in the DSL.drake_plan(), repair cross(.data = !!args), where args is an optional data frame of grouping variables.file_in()/file_out() directories for Windows (#855)..id_chr work with combine() in the DSL (#867).make_spinner() unless the version of cli is at least 1.1.0.text_drake_graph() (and r_text_drake_graph() and render_text_drake_graph()). Uses text art to print a dependency graph to the terminal window. Handy for when users SSH into remote machines without X Window support.max_expand argument to drake_plan(), an optional upper bound on the lengths of grouping variables for map() and cross() in the DSL. Comes in handy when you have a massive number of targets and you want to test on a miniature version of your workflow before you scale up to production.clustermq workers for as long as possible. Before launching them, build/check targets locally until we reach an outdated target with hpc equal to FALSE. In other words, if no targets actually require clustermq workers, no workers get created.make(parallelism = "future"), reset the config$sleep() backoff interval whenever a new target gets checked.CodeDepends with a base R solution in code_to_plan(). Fixes a CRAN note.drake_plan()) is no longer experimental.callr API (r_make() and friends) is no longer experimental.evaluate_plan(), expand_plan(), map_plan(), gather_plan(), gather_by(), reduce_plan(), reduce_by().deps(), max_useful_jobs(), and migrate_drake_project().drake_plan(x = target(..., transform = map(...))) avoid inserting extra dots in target names when the grouping variables are character vectors (#847). Target names come out much nicer this way, but those name changes will invalidate some targets (i.e. they need to be rebuilt with make()).config$jobs_preprocess (local jobs) in several places where drake was incorrectly using config$jobs (meant for targets).loadd(x, deps = TRUE, config = your_config) to work even if x is not cached (#830). Required disabling tidyselect functionality when deps TRUE. There is a new note in the help file about this, and an informative console message prints out on loadd(deps = TRUE, tidyselect = TRUE). The default value of tidyselect is now !deps.testthat >= 2.0.1.9000.drake_plan() transformations, allow the user to refer to a target's own name using a special .id_chr symbol, which is treated like a character string.transparency argument to drake_ggraph() and render_drake_ggraph() to disable transparency in the rendered graph. Useful for R installations without transparency support.vis_drake_graph() and drake_ggraph() displays. Only activated in vis_drake_graph() when there are at least 10 nodes distributed in both the vertical and horizontal directions.vis_drake_graph() and render_drake_graph().drake_plan() (#847).drake plans (drake_plan()) inside drake_config() objects. When other bottlenecks are removed, this will reduce the burden on memory (re #800).targets argument inside drake_config() objects. This is to reduce memory consumption.layout and direction arguments of vis_drake_graph() and render_drake_graph(). Direction is now always left to right and the layout is always Sugiyama.drake_cache.csv by default) to avoid issues with spaces (e.g. entry names with spaces in them, such as "file report.Rmd")`.drake 7.0.0, if you run make() in interactive mode and respond to the menu prompt with an option other than 1 or 2, targets will still build.drake_graph(). The bug came from append_output_file_nodes(), a utility function of drake_graph_info().r_make(r_fn = callr::r_bg()) re #799.drake_ggraph() and sankey_drake_graph() to work when the graph has no edges.use_drake() function to write the make.R and _drake.R files from the "main example". Does not write other supporting scripts.hpc column in your drake_plan(), you can now select which targets to deploy to HPC and which to run locally.list argument to build_times(), just like loadd().file_in() and file_out() can now handle entire directories, e.g. file_in("your_folder_of_input_data_files") and file_out("directory_with_a_bunch_of_output_files").config to HPC workers.drake_ggraph()
drake plan to the config argument of a function.map() and cross() transformations in the DSL, prevent the accidental sorting of targets by name (#786). Needed merge(sort = FALSE) in dsl_left_outer_join().verbose argument of make() now takes values 0, 1, and 2, and maximum verbosity in the console prints targets, retries, failures, and a spinner. The console log file, on the other hand, dumps maximally verbose runtime info regardless of the verbose argument.f <- Rcpp::cppFunction(...) did not stay up to date from session to session because the addresses corresponding to anonymous pointers were showing up in deparse(f). Now, drake ignores those pointers, and Rcpp functions compiled inline appear to stay up to date. This problem was more of an edge case than a bug.drake_plan(), deprecate the tidy_evaluation argument in favor of the new and more concise tidy_eval. To preserve back compatibility for now, if you supply a non-NULL value to tidy_evaluation, it overwrites tidy_eval.drake_config() objects by assigning closure of config$sleep to baseenv().drake plans, the command and trigger columns are now lists of language objects instead of character vectors. make() and friends still work if you have character columns, but the default output of drake_plan() has changed to this new format.parallelism argument of make()) except "clustermq" and "future" are removed. A new "loop" backend covers local serial execution.built(), find_project(), imported(), and parallel_stages(); full list at #564) and the single-quoted file API.lock_envir to TRUE in make() and drake_config(). So make() will automatically quit in error if the act of building a target tries to change upstream dependencies.make() no longer returns a value. Users will need to call drake_config() separately to get the old return value of make().jobs argument to be of length 1 (make() and drake_config()). To parallelize the imports and other preprocessing steps, use jobs_preprocess, also of length 1.storr namespace. As a result, drake is faster, but users will no longer be able to load imported functions using loadd() or readd().target(), users must now explicitly name all the arguments except command, e.g. target(f(x), trigger = trigger(condition = TRUE)) instead of target(f(x), trigger(condition = TRUE)).bind_plans() when the result has duplicated target names. This makes drake's API more predictable and helps users catch malformed workflows earlier.loadd() only loads targets listed in the plan. It no longer loads imports or file hashes.progress(), deps_code(), deps_target(), and predict_workers() are now data frames.hover to FALSE in visualization functions. Improves speed.bind_plans() to work with lists of plans (bind_plans(list(plan1, plan2)) was returning NULL in drake 6.2.0 and 6.2.1).get_cache(path = "non/default/path", search = FALSE) looks for the cache in "non/default/path" instead of getwd().tibble.ensure_loaded() in meta.R and triggers.R when ensuring the dependencies of the condition and change triggers are loaded.config argument to drake_build() and loadd(deps = TRUE).lock_envir argument to safeguard reproducibility. More discussion: #619, #620.from_plan() function allows the users to reference custom plan columns from within commands. Changes to values in these columns columns do not invalidate targets.make() pitfalls in interactive mode (#761). Appears once per session. Disable with options(drake_make_menu = FALSE).r_make(), r_outdated(), etc. to run drake functions more reproducibly in a clean session. See the help file of r_make() for details.progress() gains a progress argument for filtering results. For example, progress(progress = "failed") will report targets that failed.storr's key mangling in favor of drake's own encoding of file paths and namespaced functions for storr keys.., .., and .gitignore from being target names (consequence of the above).drake cache, which the user can set with the hash_algorithm argument of new_cache(), storr::storr_rds(), and various other cache functions. Thus, the concepts of a "short hash algorithm" and "long hash algorithm" are deprecated, and the functions long_hash(), short_hash(), default_long_hash_algo(), default_short_hash_algo(), and available_hash_algos() are deprecated. Caches are still back-compatible with drake > 5.4.0 and <= 6.2.1.magrittr dot symbol to appear in some commands sometimes.fetch_cache argument in all functions.DBI and RSQLite from "Suggests".config$eval <- new.env(parent = config$envir) for storing built targets and evaluating commands in the plan. Now, make() no longer modifies the user's environment. This move is a long-overdue step toward purity.codetools package.session argument of make() and drake_config(). Details: in #623.graph and layout arguments to make() and drake_config(). The change simplifies the internals, and memoization allows us to do this.make() in a subdirectory of the drake project root (determined by the location of the .drake folder in relation to the working directory).verbose argument, including the option to print execution and total build times.mclapply() or parLapply(), depending on the operating system).build_times(), predict_runtime(), etc. focus on only the targets.plan_analyses(), plan_summaries(), analysis_wildcard(), cache_namespaces(), cache_path(), check_plan(), dataset_wildcard(), drake_meta(), drake_palette(), drake_tip(), recover_cache(), cleaned_namespaces(), target_namespaces(), read_drake_config(), read_drake_graph(), and read_drake_plan().target() as a user-side function. From now on, it should only be called from within drake_plan().drake_envir() now throws an error, not a warning, if called in the incorrect context. Should be called only inside commands in the user's drake plan.*expr*() rlang functions with their *quo*() counterparts. We still keep rlang::expr() in the few places where we know the expressions need to be evaluated in config$eval.prework argument to make() and drake_config() can now be an expression (language object) or list of expressions. Character vectors are still acceptable.make(), print messages about triggers etc. only if verbose >= 2L.in_progress() to running().knitr_deps() to deps_knitr().dependency_profile() to deps_profile().predict_load_balancing() to predict_workers().this_cache() and defer to get_cache() and storr::storr_rds() for simplicity.hover to FALSE in visualization functions. Improves speed. Also a breaking change.drake_cache_log_file(). We recommend using make() with the cache_log_file argument to create the cache log. This way ensures that the log is always up to date with make() results.Version 6.2.1 is a hotfix to address the failing automated CRAN checks for 6.2.0. Chiefly, in CRAN's Debian R-devel (2018-12-10) check platform, errors of the form "length > 1 in coercion to logical" occurred when either argument to && or || was not of length 1 (e.g. nzchar(letters) && length(letters)). In addition to fixing these errors, version 6.2.1 also removes a problematic link from the vignette.
sep argument to gather_by(), reduce_by(), reduce_plan(), evaluate_plan(), expand_plan(), plan_analyses(), and plan_summaries(). Allows the user to set the delimiter for generating new target names.hasty_build argument to make() and drake_config(). Here, the user can set the function that builds targets in "hasty mode" (make(parallelism = "hasty")).drake_envir() function that returns the environment where drake builds targets. Can only be accessed from inside the commands in the workflow plan data frame. The primary use case is to allow users to remove individual targets from memory at predetermined build steps.tibble 2.0.0.0s from predict_runtime(targets_only = TRUE) when some targets are outdated and others are not.sort(NULL) warnings from create_drake_layout(). (Affects R-3.3.x.)evaluate, formatR, fs, future, parallel, R.utils, stats, and stringi.parse() in code_dependencies().memory_strategy (previously pruning_strategy) to "speed" (previously "lookahead").drake_config() (config$layout) just to store the code analysis results. This is an intermediate structure between the workflow plan data frame and the graph. It will help clean up the internals in future development.label argument to future() inside make(parallelism = "future"). That way , job names are target names by default if job.name is used correctly in the batchtools template file.dplyr, evaluate, fs, future, magrittr, parallel, R.utils, stats, stringi, tidyselect, and withr.rprojroot from "Suggests".force argument in all functions except make() and drake_config().prune_envir() to manage_memory().pruning_strategy argument to memory_strategy (make() and drake_config()).console_log_file in real time (#588).vis_drake_graph() hover text to display commands in the drake plan more elegantly.predict_load_balancing() and remove its reliance on internals that will go away in 2019 via #561.worker column of config$plan in predict_runtime() and predict_load_balancing(). This functionality will go away in 2019 via #561.predict_load_balancing() to time and workers.predict_runtime() and predict_load_balancing() up to date.drake_session() and rename to drake_get_session_info().timeout argument in the API of make() and drake_config(). A value of timeout can be still passed to these functions without error, but only the elapsed and cpu arguments impose actual timeouts now.map_plan() function to easily create a workflow plan data frame to execute a function call over a grid of arguments.plan_to_code() function to turn drake plans into generic R scripts. New users can use this function to better understand the relationship between plans and code, and unsatisfied customers can use it to disentangle their projects from drake altogether. Similarly, plan_to_notebook() generates an R notebook from a drake plan.drake_debug() function to run a target's command in debug mode. Analogous to drake_build().mode argument to trigger() to control how the condition trigger factors into the decision to build or skip a target. See the ?trigger for details.sleep argument to make() and drake_config() to help the main process consume fewer resources during parallel processing.caching argument for the "clustermq" and "clustermq_staged" parallel backends. Now, make(parallelism = "clustermq", caching = "main") will do all the caching with the main process, and make(parallelism = "clustermq", caching = "worker") will do all the caching with the workers. The same is true for parallelism = "clustermq_staged".append argument to gather_plan(), gather_by(), reduce_plan(), and reduce_by(). The append argument control whether the output includes the original plan in addition to the newly generated rows.load_main_example(), clean_main_example(), and clean_mtcars_example().filter argument to gather_by() and reduce_by() in order to restrict what we gather even when append is TRUE.make(parallelism = "hasty") skips all of drake's expensive caching and checking. All targets run every single time and you are responsible for saving results to custom output files, but almost all the by-target overhead is gone.path.expand() on the file argument to render_drake_graph() and render_sankey_drake_graph(). That way, tildes in file paths no longer interfere with the rendering of static image files.evaluate_plan(trace = TRUE) followed by expand_plan(), gather_plan(), reduce_plan(), gather_by(), or reduce_by(). The more relaxed behavior also gives users more options about how to construct and maintain their workflow plan data frames."future" parallelism to make sure files travel over network file systems before proceeding to downstream targets.visNetwork package is not installed.make_targets() if all the targets are already up to date.seed argument in make() and drake_config().caching argument of make() and drake_config() to "main" rather than "worker". The default option should be the lower-overhead option for small workflows. Users have the option to make a different set of tradeoffs for larger workflows.condition trigger to evaluate to non-logical values as long as those values can be coerced to logicals.condition trigger evaluate to a vector of length 1.drake_plan_source().make(verbose = 4) now prints to the console when a target is stored.gather_by() and reduce_by() now gather/reduce everything if no columns are specified.make(jobs = 4) was equivalent to make(jobs = c(imports = 4, targets = 4)). Now, make(jobs = 4) is equivalent to make(jobs = c(imports = 1, targets = 4)). See issue #553 for details.verbose is at least 2.load_mtcars_example().hook argument of make() and drake_config().gather_by() and reduce_by(), do not exclude targets with all NA gathering variables.digest() wherever possible. This puts old drake projects out of date, but it improves speed.stringi package no longer compiles on 3.2.0.code_dependencies(), restrict the possible global variables to the ones mentioned in the new globals argument (turned off when NULL. In practical workflows, global dependencies are restricted to items in envir and proper targets in the plan. In deps_code(), the globals slot of the output list is now a list of candidate globals, not necessarily actual globals (some may not be targets or variables in envir).unlink() in clean(), set recursive and force to FALSE. This should prevent the accidental deletion of whole directories.clean() deleted input-only files if no targets from the plan were cached. A patch and a unit test are included in this release.loadd(not_a_target) no longer loads every target in the cache.igraph vertex attribute (fixes #503).knitr_in() file code chunks.sort(NULL) that caused warnings in R 3.3.3.analyze_loadd() was sometimes quitting with "Error: attempt to set an attribute on NULL".digest::digest(file = TRUE) on directories. Instead, set hashes of directories to NA. Users should still not directories as file dependencies.vis_drake_graph(). Previously, these files were missing from the visualization, but actual workflows worked just fine.codetools failures in R 3.3 (add a tryCatch() statement in find_globals()).clustermq-based parallel backend: make(parallelism = "clustermq").evaluate_plan(trace = TRUE) now adds a *_from column to show the origins of the evaluated targets. Try evaluate_plan(drake_plan(x = rnorm(n__), y = rexp(n__)), wildcard = "n__", values = 1:2, trace = TRUE).gather_by() and reduce_by(), which gather on custom columns in the plan (or columns generated by evaluate_plan(trace = TRUE)) and append the new targets to the previous plan.template argument of clustermq functions (e.g. Q() and workers()) as an argument of make() and drake_config().code_to_plan() function to turn R scripts and R Markdown reports into workflow plan data frames.drake_plan_source() function, which generates lines of code for a drake_plan() call. This drake_plan() call produces the plan passed to drake_plan_source(). The main purpose is visual inspection (we even have syntax highlighting via prettycode) but users may also save the output to a script file for the sake of reproducibility or simple reference.deps_targets() in favor of a new deps_target() function (singular) that behaves more like deps_code().vis_drake_graph() and render_drake_graph().vis_drake_graph() and render_drake_graph().vis_drake_graph() using the "title" node column.vis_drake_graph(collapse = TRUE).dependency_profile() show major trigger hashes side-by-side
to tell the user if the command, a dependency, an input file, or an output file changed since the last make().txtq package is installed.loadd() and readd(), giving specific usage guidance in prose.build_drake_graph() and print to the console the ones that execute.txtq is not installed.drake's code examples to the drake-examples GitHub repository and make make drake_example() and drake_examples() download examples from there.show_output_files argument to vis_drake_graph() and friends."clustermq_staged" and "future_lapply".igraph attributes of the dependency graph to allow for smarter dependency/memory management during make().vis_drake_graph() and sankey_drake_graph() to save static image files via webshot.static_drake_graph() and render_static_drake_graph() in favor of drake_ggraph() and render_drake_ggraph().columns argument to evaluate_plan() so users can evaluate wildcards in columns other than the command column of plan.target() so users do not have to (explicitly).sankey_drake_graph() and render_sankey_drake_graph().static_drake_graph() and render_static_drake_graph() for ggplot2/ggraph static graph visualizations.group and clusters arguments to vis_drake_graph(), static_drake_graph(), and drake_graph_info() to optionally condense nodes into clusters.trace argument to evaluate_plan() to optionally add indicator columns to show which targets got expanded/evaluated with which wildcard values.always_rename argument to rename in evaluate_plan().rename argument to expand_plan().make(parallelism = "clustermq_staged"), a clustermq-based staged parallelism backend (see #452).make(parallelism = "future_lapply_staged"), a future-based staged parallelism backend (see #450).codetools rather than CodeDepends for finding global variables.loadd() and readd() dependencies in knitr reports referenced with knitr_in() inside imported functions. Previously, this feature was only available in explicit knitr_in() calls in commands.drake_plan()s.inst/hpc_template_files.drake_batchtools_tmpl_file() in favor of drake_hpc_template_file() and drake_hpc_template_files().garbage_collection argument to make(). If TRUE, gc() is called after every new build of a target.sanitize_plan() in make().tracked() to accept only a drake_config() object as an argument. Yes, it is technically a breaking change, but it is only a small break, and it is the correct API choice.DESCRIPTION file.knitr reports without warnings.lapply-like backends, drake uses persistent workers and a main process. In the case of "future_lapply" parallelism, the main process is a separate background process called by Rscript.make()'s.
(Previously, there were "check" messages and a call to staged_parallelism().)make(parallelism = c(imports = "mclapply_staged", targets = "mclapply").make(jobs = 1). Now, they are kept in memory until no downstream target needs them (for make(jobs = 1)).predict_runtime(). It is a more sensible way to go about predicting runtimes with multiple jobs. Likely to be more accurate.make() no longer leave targets in the user's environment.imports_only argument to make() and drake_config() in favor of skip_targets.migrate_drake_project().max_useful_jobs().upstream_only argument to failed() so users can list failed targets that do not have any failed dependencies. Naturally accompanies make(keep_going = TRUE).plyr as a dependency.drake_plan() and bind_plans().target() to help create drake plans with custom columns.drake_gc(), clean out disruptive files in storrs with mangled keys (re: #198).load_basic_example() in favor of load_mtcars_example().README.md file on the main example rather than the mtcars example.README.Rmd file to generate README.md.deps_targets().deps() in favor of deps_code()pruning_strategy argument to make() and drake_config() so the user can decide how drake keeps non-import dependencies in memory when it builds a target.drake plans to help users customize scheduling.makefile_path argument to make() and drake_config() to avoid potential conflicts between user-side custom Makefiles and the one written by make(parallelism = "Makefile").console argument to make() and drake_config() so users can redirect console output to a file.show_source(), readd(show_source = TRUE), loadd(show_source = TRUE).!! operator from tidyeval and rlang is parsed differently than in R <= 3.4.4. This change broke one of the tests in tests/testthat/tidy-eval.R The main purpose of drake's 5.1.2 release is to fix the broken test.R CMD check error from building the pdf manual with LaTeX.drake_plan(), allow users to customize target-level columns using target() inside the commands.bind_plans() function to concatenate the rows of drake plans and then sanitize the aggregate plan.session argument to tell make() to build targets in a separate, isolated main R session. For example, make(session = callr::r_vanilla).reduce_plan() function to do pairwise reductions on collections of targets..) from being a dependency of any target or import. This enforces more consistent behavior in the face of the current static code analysis functionality, which sometimes detects . and sometimes does not.ignore() to optionally ignore pieces of workflow plan commands and/or imported functions. Use ignore(some_code) to
drake to not track dependencies in some_code, andsome_code when it comes to deciding which target are out of date.drake to only look for imports in environments inheriting from envir in make() (plus explicitly namespaced functions).loadd() to ignore foreign imports (imports not explicitly found in envir when make() last imported them).loadd() so that only targets (not imports) are loaded if the ... and list arguments are empty..gitignore file containing "*" to the default .drake/ cache folder every time new_cache() is called. This means the cache will not be automatically committed to git. Users need to remove .gitignore file to allow unforced commits, and then subsequent make()s on the same cache will respect the user's wishes and not add another .gitignore. this only works for the default cache. Not supported for manual storrs."future" backend with a manual scheduler.dplyr-style tidyselect functionality in loadd(), clean(), and build_times(). For build_times(), there is an API change: for tidyselect to work, we needed to insert a new ... argument as the first argument of build_times().file_in() for file inputs to commands or imported functions (for imported functions, the input file needs to be an imported file, not a target).file_out() for output file targets (ignored if used in imported functions).knitr_in() for knitr/rmarkdown reports. This tells drake to look inside the source file for target dependencies in code chunks (explicitly referenced with loadd() and readd()). Treated as a file_in() if used in imported functions.drake_plan() so that it automatically fills in any target names that the user does not supply. Also, any file_out()s become the target names automatically (double-quoted internally).read_drake_plan() (rather than an empty drake_plan()) the default plan argument in all functions that accept a plan.loadd(..., lazy = "bind"). That way, when you have a target loaded in one R session and hit make() in another R session, the target in your first session will automatically update.dataframes_graph().diagnose() will take on the role of returning this metadata.read_drake_meta() function in favor of diagnose().expose_imports() function to optionally force drake detect deeply nested functions inside specific packages.drake_build() to be an exclusively user-side function.replace argument to loadd() so that objects already in the user's environment need not be replaced.seed argument to make(), drake_config(), and load_basic_example(). Also hard-code a default seed of 0. That way, the pseudo-randomness in projects should be reproducible
across R sessions.drake_read_seed() function to read the seed from the cache. Its examples illustrate what drake is doing to try to ensure reproducible random numbers.!! for the ... argument to drake_plan(). Suppress this behavior using tidy_evaluation = FALSE or by passing in commands passed through the list argument.rlang::expr() before evaluating them. That means you can use the quasiquotation operator !! in your commands, and make() will evaluate them according to the tidy evaluation paradigm.drake_example("basic"), drake_example("gsp"), and drake_example("packages") to demonstrate how to set up the files for serious drake projects. More guidance was needed in light of #193.drake_plan() in the help file (?drake_plan).drake to rOpenSci GitHub URL.config argument, which you can get from
drake_config() or make(). Examples:
cache$exists() instead.make() decides to build targets.storr cache in a way that is not back-compatible with projects from versions 4.4.0 and earlier. The main change is to make more intelligent use of storr namespaces, improving efficiency (both time and storage) and opening up possibilities for new features. If you attempt to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will stop you before any damage to the cache is done, and you will be instructed how to migrate your project to the new drake.formatR::tidy_source() instead of parse() in tidy_command() (originally tidy() in R/dependencies.R). Previously, drake was having problems with an edge case: as a command, the literal string "A" was interpreted as the symbol A after tidying. With tidy_source(), literal quoted strings stay literal quoted strings in commands. This may put some targets out of date in old projects, yet another loss of back compatibility in version 5.0.0.rescue_cache(), exposed to the user and used in clean(). This function removes dangling orphaned files in the cache so that a broken cache can be cleaned and used in the usual ways once more.cpu and elapsed arguments of make() to NULL. This solves an elusive bug in how drake imposes timeouts.graph argument to functions make(), outdated(), and missed().prune_graph() function for igraph objects.prune() and status().analyses() => plan_analyses()as_file() => as_drake_filename()backend() => future::plan()build_graph() => build_drake_graph()check() => check_plan()config() => drake_config()evaluate() => evaluate_plan()example_drake() => drake_example()examples_drake() => drake_examples()expand() => expand_plan()gather() => gather_plan()plan(), workflow(), workplan() => drake_plan()plot_graph() => vis_drake_graph()read_config() => read_drake_config()read_graph() => read_drake_graph()read_plan() => read_drake_plan()render_graph() => render_drake_graph()session() => drake_session()summaries() => plan_summaries()output and code as names in the workflow plan data frame. Use target and command instead. This naming switch has been formally deprecated for several months prior.drake_quotes(), drake_unquote(), and drake_strings() to remove the silly dependence on the eply package.skip_safety_checks flag to make() and drake_config(). Increases speed.sanitize_plan(), remove rows with blank targets "".purge argument to clean() to optionally remove all target-level information.namespace argument to cached() so users can inspect individual storr namespaces.verbose to numeric: 0 = print nothing, 1 = print progress on imports only, 2 = print everything.next_stage() function to report the targets to be made in the next parallelizable stage.session_info argument to make(). Apparently, sessionInfo() is a bottleneck for small make()s, so there is now an option to suppress it. This is mostly for the sake of speeding up unit tests.log_progress argument to make() to suppress progress logging. This increases storage efficiency and speeds some projects up a tiny bit.namespace argument to loadd() and readd(). You can now load and read from non-default storr namespaces.drake_cache_log(), drake_cache_log_file(), and make(..., cache_log_file = TRUE) as options to track changes to targets/imports in the drake cache.rmarkdown::render(), not just knit().drake properly.plot_graph() to display subcomponents. Check out arguments from, mode, order, and subset. The graph visualization vignette has demonstrations."future_lapply" parallelism: parallel backends supported by the future and future.batchtools packages. See ?backend for examples and the parallelism vignette for an introductory tutorial. More advanced instruction can be found in the future and future.batchtools packages themselves.diagnose().hook argument to make() to wrap around build(). That way, users can more easily control the side effects of distributed jobs. For example, to redirect error messages to a file in make(..., parallelism = "Makefile", jobs = 2, hook = my_hook), my_hook should be something like function(code){withr::with_message_sink("messages.txt", code)}.drake was previously using the outfile argument for PSOCK clusters to generate output that could not be caught by capture.output(). It was a hack that should have been removed before.drake was previously using the outfile argument for PSOCK clusters to generate output that could not be caught by capture.output(). It was a hack that should have been removed before.make() and outdated() print "All targets are already up to date" to the console."future_lapply" backends.plot_graph() and progress(). Also see the new failed() function, which is similar to in_progress().parLapply parallelism. The downside to this fix is that drake has to be properly installed. It should not be loaded with devtools::load_all(). The speedup comes from lightening the first clusterExport() call in run_parLapply(). Previously, we exported every single individual drake function to all the workers, which created a bottleneck. Now, we just load drake itself in each of the workers, which works because build() and do_prework() are exported.overwrite to FALSE in load_basic_example().report.Rmd in load_basic_example().get_cache(..., verbose = TRUE).lightly_parallelize() and lightly_parallelize_atomic(). Now, processing happens faster, and only over the unique values of a vector.make_with_config() function to do the work of make() on an existing internal configuration list from drake_config().drake_batchtools_tmpl_file() to write a batchtools template file from one of the examples (drake_example()), if one exists.