Package development
This article will walk through adding a function to the package. It will use a git-branching workflow, but it is not a full git tutorial. This is also not a complete guide to R package development (a comprehensive guide is R Packages), instead this is meant as more of a checklist for the general steps. Several references are included at the bottom for more information on R-package development and git workflows.
Initial Setup
This section will describe the steps necessary to get started. These steps will only need to be done once.
Required installations
There are several programs that are needed before any work can begin. With admin access to your computer, you can install all of these, otherwise create a ticket with your IT group with the following requests. The links provided assume a Windows computer. Adjustments might be needed for Mac or Linux OS:
Once those have been installed, the following R packages will be needed for R-package development work:
install.packages(c("devtools", "rmarkdown"))
GitHub versus GitLab
dataRetrieval is a publicly available package. The public is urged to report issues and ask questions on our GitHub page: GitHub Issues
The public is welcome to fork the GitHub repository and can submit pull requests. Since dataRetrieval is used by so many people, any proposed changes will be thoroughly reviewed before being considered.
dataRetrieval development done by USGS employees is done on code.usgs.gov/water/dataRetrieval. This is the USGS GitLab system. Throughout this document, it will be referred to both as code.usgs.gov and GitLab. Once updates and bug fixes are tested on GitLab, they will be pushed up to the GitHub repository.
code.usgs.gov (GitLab) setup
USGS or DOI employees can use code.usgs.gov for dataRetrieval development. This section shows the steps to setup development work through the code.usgs.gov system.
code.usgs.gov is the USGS enterprise GitLab home. Some people will either refer it as either code.usgs.gov or simply “GitLab”. These instructions will generally refer to code.usgs.gov as GitLab unless it’s specific to our USGS version of GitLab (for example, how to log in). If you have questions on how to use code.usgs.gov, general advice about GitLab will often suffice (so Googling for answers is a great first step!)
The first step to collaborating via code.usgs.gov is to log in. For USGS/DOI employees, that means using:
Personal Access Token
This step only needs to be done once per computer setup. So if you’ve done it before you can skip this step.
You’ll need to set up a Personal Access Token. Go to https://code.usgs.gov/-/profile/personal_access_tokens and create a new token selecting the api scope:
After clicking the green “Create personal access token”, you will see a screen like this:
Save your token in a safe place (KeyPass for instance) so you don’t need to constantly regenerate tokens. You will need this token for authenticating when doing a git pull (in a few steps….).
Local project setup
The easiest way to set up work on this package is to create a new Project within RStudio. These are the steps needed to get the files from GitLab (for USGS/DOI employees) or GitHub (external collaborators) to your computer.
- Create a new project -> Version Control -> Git
- Navigate to the landing page for the repository to find the Repository URL.
From code.usgs.gov, use the “Clone” button and copy the HTTPS URL:
From GitHub, use the “Code” button and copy the SSH url:
- Paste the HTTPS git address in Repository URL:
- Logging in
You may need to log in (if you’ve logged in before, there will not be a log in prompt and you can skip this step).
For code.usgs.gov, use your full email for your user name, and the personal access token for your password. NOT your AD password!
If you attempt to use HTTPS and are not even prompted for a password, but you receive a failed authentication error message, then you probably at one point in the past provided an incorrect username/password and Windows (or MacOS) chose to store that password in their built-in credential manager. On Windows, this is called Credential Manager. On Mac, this is called Keychain. Clear that credential and try again.
For GitHub, see Adding a New SSH Key.
Setup Local Build
Initial Installation
Click “Install and Restart” from Build tab in the upper right of your RStudio window. If there are packages that this package depends on and they are not installed on your computer, you will get a warning. Install those packages manually and try the “Install and Restart” again. If successful, you will see something like:
The most common problem at this stage will be that you might not have all the packages that are listed in either the Depends or Imports of the DESCRIPTION file. If you see a message like this:
Read which package is missing from your computer and install, or you can use the remotes
package:
remotes::install_deps()
Set up roxygen options
Turn on the option to use Roxygen to build help pages. Click on the More button in the “Build” tab, then choose “Configure Build Tools”.
Click the radio button to “Generate documentation with Roxygen”. Then click the Configure button and check all the options:
If you do not have some LaTeX software installed (on Windows that would be MikTex for instance), you will want to include --no-manual
in your check options.
Click “OK” to exit out of all of the configuration options. You are now set to start working on the package!
R-package development
We’re now setup to start contributing to the R-package! This next section will walk through adding a new function to the package.
Project management
Planned dataRetrieval work will be done on GitLab. Public inquiries, bug reports, and enhancement requests are done on GitHub. Both environments have similar features.
Each distinct set of work tasks should be described in a single issue. The issues should have enough detail to describe the required work and an example of how to test if the task is complete. Discussions of the issues can include screenshots, formatted code snippets, and general markdown formatting options. Each task should be assigned to a developer. Appropriate labels should be assigned to the task. Issue templates are provided. Developers are encouraged to use git commands to reference issues.
A few developers have the authority to merge updates into the main branch of dataRetrieval. Each merge/pull request must be reviewed by an authorized approver. Approvers must check that the updates are functional and efficient. Since dataRetrieval is on CRAN, examples and tests that hit web services should be flagged to not run on CRAN. Approvers should try to consider unique use-cases that may not be addressed in the original Issue. Additional topics that approvers should be checking for increased test coverage, minimizing new package dependencies, consistent coding style, and consistent user experience. Ideally authorized approvers get another approver to check their merge requests, but it is not required.
Set up a new git branch
Before starting anything new, try to always remember to pull any changes that have happened on the repo. In the “Git” tab, click the down arrow which triggers a “git pull”:
Create a new branch. This will let you work on your idea without risking adding in-development work into the “main” branch of the repository. As a good rule-of-thumb, consider creating a new branch for each “Issue”.
Building a new function
You have all the details for a new function, great! Here are some steps to get it fully incorporated into the package so others can use it. See R Packages for complete details. Follow the style and guidelines agreed upon by the package development team.
Here is an example of some of a new function which follows our package guidelines. Save this as an R file in the “R” folder of the package.
#' My fancy new function
#'
#' This is my fancy new function. It's really great.
#'
#' @param excited_flag logical. This sets our output
#' to excited or not
#' @export
#' @examples
#' my_fancy_function(excited_flag = TRUE)
#' my_fancy_function(excited_flag = FALSE)
my_fancy_function <- function(excited_flag){
if(!is.logical(excited_flag)){
stop("Input is not a length one logical vector")
}
if(excited_flag){
say <- "HI!"
} else {
say <- "hi"
}
return(say)
}
Local Package Check
Check the package to make sure the function works properly, and doesn’t add any new Errors, Warnings, Notes.
Note sometimes this can take a long time. The package will be checked automatically on code.usgs.gov with the CI pipeline, but it is still a good idea to run the check locally especially when setting up a new function initially.
Creating tests
It’s also important to create tests to check that your functions do exactly what you expect them to do. Use the testthat package to create these tests. Navigate to the tests/testthat folder and create some new tests for your new function. As much as possible, the tests should check every part of your function.
context("Test my fancy function")
test_that("My fancy function", {
excited <- my_fancy_function(excited_flag = TRUE)
not_excited <- my_fancy_function(excited_flag = FALSE)
expect_type(excited, type = "character")
expect_equal(excited, "HI!")
expect_equal(not_excited, "hi")
expect_error(my_fancy_function(excited_flag = "what?"))
})
In the “Build” tab, under “More”, test the full package
Adding the function to vignettes, README, report template.
If the function should be advertised in the general workflow, consider adding it to the README. This is the first place users will read when they come to the code repository.
The function could be added to a section of a vignette, or an entirely new vignette could be written.
In some packages, there is a report template that is in the “inst/templates” folder. Consider if function would make sense in a generalized report.
Push up to code repository
This can be done in stages. If you’ve done everything correctly, you should be ready to commit at least the R script, the generated Rd documentation file, the testthat changes, and the updates to the NAMESPACE.
In the Git tab, click the down arrow to “Pull”. If you haven’t made any changes on this branch yet, you should get a message “Already up to date”.
In the same Git tab, click the radio buttons next to the files you are either adding or modifying. This should include the R script for the function, the test file(s), the Rd documentation, and the updated NAMESPACE. It could also include changes to the README, vignettes, inst/template/report.Rmd, etc. Then click the “Commit” button:
- Write a message to help you remember what this commit was all about. Glance at the “Diffs” shown below. When everything looks good, click “Commit”.
- Click the up arrow to “Push” the changes up to the feature branch in GitLab (code.usgs.gov).
Gitlab CI Pipeline
The repository should have a file already set up called .gitlab-ci.yml. This file organizes automatic checks on the package via a “pipeline”. Usually there is a setup phase that installs all the required package, then a check phase to check the package. I like to add separate testing and test coverage checks. Before the merge request is merged, the pipeline should come back with no errors:
This helps assure that the changes are not causing any failures on an independent platform.
Merge request
So far, you have pushed a specific branch up to code.usgs.gov. It’s not set to go directly to the “main” branch. If you open the repository, you may see a message like this:
Click on that “Create merge request” to initiate a “merge request”.
Fill in some more information on what is being included in this merge request. You could assign someone to do a peer review of your code. This is a great place for review. It is a good idea to discuss with your collaborators if you want guidelines set up to decide who has the responsibility to perform the merge (that is…click the Merge button).
Merge Reports
There are a few things to check when the pipeline has finished. The first main goal is that the package passes all the checks. The next thing to check is the test coverage. When you write tests in the “testthat” folder, you are checking that the code is doing what it’s suppose to be doing. We can also check how well your code is covered by tests. This is called “test coverage”. There are programs that can analyze each line of your code and determine which lines are being tested, and which lines are not. A high percentage of test coverage is one metric of good testing. However, it is also important to write good and meaningful tests. If you only test that a function didn’t fail, you still don’t know if the function produced meaningful output.
The test coverage should be as good if not better than the coverage currently in the canonical “main” branch. In the following screenshot, the test coverage on the “main” branch is 92.11%. The pull request is set to increase that coverage by 0.11% to 92.22%. If you notice the test coverage decreasing, you will want to look at your test suite and see where you can add tests to make sure your package is working properly. This screenshot also shows where to get a full report on the tests in your package.
Another report that GitLab can produced is found in the “Changes” tab in the merge request. There are green and red vertical lines on the file diffs. These vertical lines indicate the code is covered by a test (green line), or that there is no test (thicker red line).
This can be useful for finding parts of your code that are not covered by tests.
If the Pipeline has passed and the code review was satisfactory, clicking the Merge button will move all your changes into the “main” branch.
Clean up local enviornment
Congratulations, you contributed to the package! The next steps should always be done after the merge request is completed so you don’t continue to work on the specific feature branch.
Switch back to “main” branch
In the upper right corner of the Git window in RStudio, you can click a dropdown to get back to the “main” branch.
Delete local feature branch
The way code.usgs.gov is set up, the branch should be automatically deleted ON code.usgs.gov. You will still want to delete the feature branch locally. Go to the Terminal in the lower left corner in RStudio, and type git branch -d my_fancy_function
(or whatever you named your branch).
Syncing up repositories
USGS software must be provided on the code.usgs.gov to satisfy federal and USGS-specific source code policy. However, there is a rich history on the GitHub platform - and most external users prefer GitHub. To comply with policy, retain the history, and make public contributions and questions easier, we manually push updates to both repositories. Most development is done on GitHub. When that development is complete, an official dataRetrieval
developer pushes those changes to code.usgs.gov. Development on features that are not-yet available to the public are done on code.usgs.gov. When those features are available, a developer pushes those changes to GitHub. Someday we might automate a mirroring system, but currently the process is manual.
Only official dataRetrieval developers will have the access to do the following. Set up multiple git “remotes”. One strategy would be the “origin” (default) is a personal fork of DOI-USGS/dataRetrieval, “upstream” is the DOI-USGS/dataRetrieval, and “codeusgs” is the code.usgs.gov/water/dataRetrieval.
git remote add upstream https://github.com/DOI-USGS/dataRetrieval.git
git remote add codeusgs https://code.usgs.gov/water/dataRetrieval.git
To pull from the main branch of either repository:
git pull upstream main
git pull codeusgs main
In general, create a pull request to push to the DOI-USGS/dataRetrieval main branch or a merge request to push to the main branch of code.usgs.gov/water/dataRetrieval.