How Do You Organise Your R Project? This Is What We Do.

The Biometrics group at Telethon Kids Institute uses a standardised template project directory to manage our biostatistical consultation projects. This approach allows us to streamline our workflow, initiate projects, and produce professional looking reports directly from the statistical analysis platform minimising the time spent on the non-analytical aspects of our projects. This project structure is identical and successful for both simple and large-scale projects.

Although this workflow has been developed in R, the broader principles discussed here are applicable to any scripting language and non-coding projects alike. This blog is based on recent presentations that I made at the UserR!2019, Toulouse, France (lightning talk) and the 40th Annual Conference of the International Society for Clinical Biostatistics (ISCB40), Leuven, Belgium (winner of best poster award) conferences in July 2019.

Project Directory Structure

We have developed and refined the skeleton of a template project that contains a series of sub-directories with distinct objectives that cover all steps of a project from initial import of raw data to reporting. Our template is pre-populated with directories and starter-files that has enabled our group to save time in project initiation, preliminary analysis, analysis, and reporting.

Figure 1. Schematic of our template project. Sub-directories are indicated by squares and template files are shown by circles.

This structure is based on what is described by the “ProjectTemplate” R package (http://projecttemplate.net/index.html). We have found this structure to be attractive thanks to its automated project initiation, executing data wrangling scripts, and loading library packages and data. Without going into excessive detail, we have several directories including:

  • Directories to house data at various stages of rawness, cleanliness, and wrangling
  • admin directories for project meta data
  • 2 scripting directories (ProjectTemplate can be configured to automatically run scripts in the munge directory; the R directory can be reproduced/changed for different scripting languages or changed to a generic “src” folder)
  • and the vignettes folder for our reports.

This structure is also useful as it allows us to build our projects as R packages that are easily distributed to our collaborators once the analysis is complete. We chose to build our reports in the vignettes directory rather than locate them in a “reports” sub‑directory since we can share the final work as an installable package – enabling us to make a collection of project reports available to our collaborators by browsing the package vignettes. Packaging projects is also a useful way to share the cleaned data as it is available as an included dataset along with any documentation that was created specifically for the analysis.

options(ProjectTemplate.templatedir = “path/to/templates")
create.project(paste(“path/to/projects”, “00_project”, sep = “/”), template = "biometrics_project")

Box 1. It only takes 2 lines of code to initiate a new project. A template directory can contain multiple project templates.

Reproducible Research

It is important that our analysis is traceable from raw data to final report and that all changes that were made to the analysis throughout the project life cycle are tracked. I suggest that no modifications are made to the raw data once it is received from the researcher (which often comes as a .xlsx or a .csv). The first thing to do once data is received is to prefix the file name with the current date (YYYYMMDD_) then make it read-only. Any further cleaning is then performed in-script where all changes are documented and can be verified and audited. Subsequent changes to the analysis are tracked via. GitLab, which we have installed on our secure servers, or If the data isn’t sensitive then services such as GitHub or Bit Bucket.

Reporting with R Markdown

R Markdown is an excellent tool that allows analysts to compose the project narration and data analysis output in a single document. Markdown also helps to maintain the quality of a report; by keeping the analysis data frames and the report commentary all within in the R environment. Inserting the analysis outputs directly into the report removes the possibility of transcription errors. Markdown is also great for updating reports when minor changes have been made to the underlying data.

To streamline our analysis plans and reporting we have developed a series of R markdown templates that produce documents that conform to the Telethon Kids Institute’s style guide and online branding. These reports produce beautiful stand-alone HTML documents that we distribute to our collaborators and are built on the bootstrap CSS libraries which allow for dynamic responsive pages that can be viewed on a range of devices.

Figure 2. The default reporting template that we have packaged in https://github.com/telethonkids/biometrics; when the package is installed we can create new reports by clicking in R Studio: “File > New File > R Markdown… > From Template”. More information about our templates can be found here.

Toolbox

The following table lists some packages that we use to simplify our reporting. These packages are useful as they allow us to focus on the data without wasting time on the non-analytical parts of the project such as package citations, caption numbering, and tabulating and visualising model output. A brief description is provided about why we use each of these packages, you should visit the official documentation for further details.

Table 1. Summary of useful packages that we use during our professional biostatistical consultations.

Package Description
Tidyverse collection Data wrangling/summaries/visualisation
Captioner Cross-reference tables/figures/models
KableExtra Nicely format data frames for reporting
Stargazer Create well-formatted regression tables
Broom Extract a model’s estimates and statistics
Repmis Create a bibliography of loaded packages
Devtools::build_vignettes() Knit all vignette .Rmd files
jtools::plot_summs() Visualise a model(s) effect estimates and CIs
Gggally::ggpairs() Look at your data with a plot pair matrix
roxygen2 Documenting code by writing .Rd files in the man/ directory

The default project directory structure that we have developed can be seen in action as part of the Telethon Kids rstudio GitHub repository; this repo is an implementation of R Studio within a Docker container (see here and here). You can navigate through the repo on GitHub, or clone it to your local machine. The template project is found in the projects/00_next_project sub-directory. Each of the sub-directories in this template contain a README that briefly describes its purpose.

Disclaimer

I doubt there will be anything in this article that can be called “new”, but unless someone has worked in a place with a clearly defined project structure then it is unlikely they have thought about an efficient way to organise their myriad of relates files (documents/data/scripts).

Conversations that I had with data scientists, statisticians, and analysts at both the UseR!2019 and ISCB40 conferences indicate that organisations are becoming increasingly aware of the importance of well-structured data projects. This workflow has come from reading many articles and trying out several packages – very few of which I recorded; thus, unfortunately, I am using other people’s ideas/concepts without proper acknowledgement. I don’t claim anything in this article as my original work and if you know of any authoritative sources on this content then please leave a comment.

There is a plethora of other sub-directories that could be included in a template project. For example, a figures directory for high resolution publication-ready images is a worthy inclusion. I am interested in how do YOU structure your projects; leave a comment and let me know what tools you use and how you increase throughput to ease your workload.

Use RStudio Server in a Virtual Environment with Docker in Minutes!

A fundamental aspect of the reproducible research framework is that (statistical) analysis can be reproduced; that is, given a set of instructions (or a script file) the exact results can be achieved by another analyst with the same raw data. This idea may seem intuitive, but in practice it can be difficult to achieve in an analytical environment that is always evolving and changing.

For example, Tidyverse users will have recently seen the following warning in R:

Warning message:
'data_frame()' is depreciated, use 'tibble()'.

data_frame(a = 1:3)

This is because dplyr has changed, hopefully for the better, and tibble() is now the preferred route. It is conceivable that in a future release the data_frame() function will no longer be a part of dplyr at all.

So, what does this have to do with reproducible research? Say you want to come back to your analysis in 5 years and re-run your old code. Your future installation of R with the latest Tidyverse installation may not be backwards compatible with your now ancient old code and the analysis will crash.

There are a couple of strategies that we could use to deal with updating dependencies. The first, and possibly the easiest, is to use devtools to install older package versions (see here for further details). But what if the version of the package you used is not archived on CRAN? For example, the analysis could have used a package from GitHub that has since changed and the maintainer hasn’t used systematic releases for you to pull from. Thus, this strategy is quite likely to fail.

Another solution is to put your entire project inside a static virtual environment that is isolated from the rest of your machine. The benefit of project isolation is that any dependencies that are installed within the environment can be made persistent, but are also separated from any dependency upgrades that might be applied to your host machine or to other projects.

Docker isn’t a new topic for regular R-Bloggers readers, but for those of you that are unfamiliar: Docker is a program that uses virtual containers, which isolate and bundle applications. The containers are defined by a set of instructions detailed in a docker-compose.yml or Dockerfile and can effectively be stacked to call upon the capabilities of base images. Furthermore, one of the fundamental tenets of Docker is portability – if a container will work on your local machine then it will work anywhere. And because base images are versioned, they will work for all time. This is also great for scalability onto servers and distribution to colleagues and collaborators who will see exactly what you have prepared regardless of their host operating system.

Our dockerised RStudio environment.

Our group at the Telethon Kids Institute has prepared source code to launch a dockerised RStudio Server instance via a NGINX reverse proxy enabled for HTTPS connections. Our GitHub repository provides everything you need to quickly establish a new project (just clone the repository) with the features of RStudio Web Server with the added benefit of SSL encryption (this will need some local configuration); even though RStudio Server is password protected, web encryption is still important for us as we routinely deal with individual level health data.

(Important note, even with data security measures in place, our policy is for patient data to be de-identified prior to analysis as per good clinical practice norms and, except for exceptional circumstances, we would use our intranet to further restrict access).

The defining docker-compose.yml that we use can be found at out our GitHub site via this link: https://github.com/TelethonKids/rstudio. We used an RStudio base image with Tidyverse pre-installed with R 3.5.3, which is maintained by rocker at https://hub.docker.com/r/rocker/tidyverse. As updates are made to the base-image, the repository will be updated with new releases that provide an opportunity to re-install a specific version of the virtual environment. It has also been set up with persistent volumes for the projects, rstudio, and lib/R directories to keep any changes made to the virtual environment for back-up and further version control.

By combining tools like Docker and Git we believe we can refine and make common place, within our institute and those we collaborate with, a culture of reproducible research as we conduct world class research to improve the lives of sick children.

Automating R-markdown Tables With Hooks

KableExtra was chosen as the primary way to format tabulated data in the HTML R markdown templates used at Telethon Kids Institute, see the Biometrics R package and this article. I like the tables produced by kableExtra: they look tidy and the package has a feature that highlights table rows that the pointer is hovering over. Moreover, it is very easy to use the package, which is invoked by adding the following 2 lines of code (via. dplyr syntax with the R iris data set):

Note: this article was first posted on the Telethon Kids blog in January, 2019, but has been re-posted as a contribution to R-bloggers.

head(iris) %>%
kable("html") %>%
kable_styling("hover", full_width = F)
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
5.13.51.40.2setosa
4.93.01.40.2setosa
4.73.21.30.2setosa
4.63.11.50.2setosa
5.03.61.40.2setosa
5.43.91.70.4setosa

This is all well and good, and I would be mostly happy to leave the template here and move on. BUT. The motivation behind these R markdown templates was to streamline as much of the formatting code as is possible; this is to help to maintain a constant theme between the reports/communications produced by Telethon Kids’ researchers, staff and students. I also like my code to be as DRY as possible.

The requirement of adding these 2 lines of code at the end of each table chunk must be fixed!

As a starting point, the default output of a data.frame() in an R markdown document is like this:

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

I approached the problem by using code suggested in this old stack overflow answer by Konrad Rudolph. Konrad used the the knitr::knit_hooks() function to re-parse the chunk output and prepare the string for further manipulation.

The difficulty in formatting the chunk output is that character data is passed to the hook. The workflow first needed to remove the ##’s from each line with gsub() then re-parse the table with read.table(), before adding in the formatting code. This feels messy to me and it fails for wide tables that have rows broken into multiple lines (which was acknowledged in the original answer from 2013).

With a night to dwell on my problem, I realised/remembered that R markdown also sends the chunk source code to knitr for display, which can also be picked up and manipulated by a hook!

My solution, which is very similar to that on stack overflow (but better), takes the source code and uses eval() to evaluate the code and append the formatting lines from kableExtra. This is my knitr::knit_hooks() code:

default_source_hook <- knit_hooks$get("source")
knit_hooks$set(
source = function(x, options) {
if(is.null(options$table))
default_source_hook(x, options)
else {
eval(parse(text = x)) %>%
kable("html") %>%
kable_styling("hover", full_width = F)
}
}
)

which is invoked by:


    ```{r table_format, results = "hide", table = T, eval = F}
      head(iris)

    ```

and produces a table identical to the first one illustrated in this article.

Enabled via the chunk option table = T, the hook applies formatting only to the chunks that I want formatted with kableExtra. Since the source code is re-evaluated in the knitr hook, the option eval = F should be included in the chunk header so the chunk isn’t evaluated twice; however, this double evaluation wouldn’t be a problem for small tables.

Finally, because the table is displayed using source, and output isn’t touched, the option results = "hide" should be included so that the default table is excluded from the final knit.

Mission accomplished!? No… Unfortunately there are still a couple of problems with this solution:

  1. You are unable to display the code used to generate the table in knitted document (echo = T), because the source code string was changed, not the output string.
  2. I’m replacing 2 lines of code with 3 chunk options – although not a deal breaker (as you may have set these options anyway) it’s not quite as auto-magical as I would have liked.

At the time of writing I still haven’t got an accepted answer to this problem, which I posted on stack overflow. If you have a better solution – or want to up-vote my question – then you know what to do.

This post has been shared with http://www.R-bloggers.com

Full example .Rmd file


---
title: "Untitled"
author: "Paul"
date: "27 September 2018"
output: html_document
---

```{r setup, include = F}

library(dplyr)
library(ggplot2)
library(kableExtra)
library(knitr)

default_source_hook <- knit_hooks$get('source')

knit_hooks$set(
  source = function(x, options) {
    if(is.null(options$table))
      default_source_hook(x, options)
    else {
      eval(parse(text = x)) %>%
        kable("html") %>%
        kable_styling("hover", full_width = F)
    }
    
  }
)

data(iris)

```

## Normal
With no chunk options:

```{r normal}
head(iris)

```

## The desired ouptut
With chunk options `results = "hide"` and `table = T`:

```{r table_format, results = "hide", table = T, eval = F}
head(iris)

```

## It still work as normal for other other output types
With no chunk options:

```{r image}
iris %>%
  ggplot(aes(x = Sepal.Length, y = Sepal.Width, group = Species)) +
  geom_point()

```

Deploying an R Shiny App With Docker

If you haven’t heard of Docker, it is a system that allows projects to be split into discrete units (i.e. containers) that each operate within their own virtual environment. Each container has a blueprint written in its Dockerfile that describes all of the operating parameters including operating system and package dependencies/requirements. Docker images are easily distributed and, because they are self-contained, will operate on any other system that has Docker installed, include servers.

When multiple instances/users attempt to start a Shiny App at the same time, only a single R session is initiated on the serving machine. This is problematic. For example, if one user starts a process that takes 10 seconds to complete, all other users will need to wait until that process has completed before any other tasks can be processed.

One of the benefits of deploying a containerised Shiny App is that each new instance will run in its own R session.

In this article, I will go through the steps that I took to deploy an app developed in Shiny onto a fully-functional web server. All the code here, plus some basic web-server configuration, can be found at this Telethon Kids GitHub repository.

Getting Started

Docker can be installed following the instructions here, making sure to also follow the post-installation instructions here. Docker-compose will also need to be installed by following these instructions here. For the example presented here, Docker was installed on an Ubuntu 18.04 OS in a Virtual Box; the same approach will be used for other apps running on an AWS EC2 instance.

I will be using the words image and container throughout this article. A Docker image is a functioning snapshot of the blueprint. A running image is called a container, which is operating by receiving, processing and sending information to the client or other containers.

Docker

The Dockerfile contains the schematics of a Docker container, that is it is used to call a base image and define and customisations that need to be made for the specific application to run correctly.

This is the Dockerfile that describes our Shiny App (i.e. the Default app when a new “Shiny Web App …” is created in RStudio):

 
FROM rocker/shiny:3.5.1

RUN apt-get update && apt-get install libcurl4-openssl-dev libv8-3.14-dev -y &&\
mkdir -p /var/lib/shiny-server/bookmarks/shiny

# Download and install library
RUN R -e "install.packages(c('shinydashboard', 'shinyjs', 'V8'))"

# copy the app to the image COPY shinyapps /srv/shiny-server/
# make all app files readable (solves issue when dev in Windows, but building in Ubuntu)
RUN chmod -R 755 /srv/shiny-server/

EXPOSE 3838

CMD ["/usr/bin/shiny-server.sh"]

This blueprint is using the base image rocker/shiny built on R version 3.5.1. For some packages to run properly on an Ubuntu OS (the container’s base operating system) I needed to install libcurl4 and libv8 by adding the following lines to our Dockerfile.

RUN apt-get update &&\
apt-get install libcurl4-openssl-dev libv8-3.14-dev -y &&\
mkdir -p /var/lib/shiny-server/bookmarks/shiny

The packages that the app depends on are also installed via. the Dockerfile with the RUN statement:

RUN R -e "install.packages(c('shinydashboard', 'shinyjs', 'V8'))"

Our app.R file (i.e. the Shiny App) is copied from the shinyapps directory on the host PC to a folder inside the container image with the COPY statement. See the GitHub repo for an example of the project directory structure.

I had some file permission issues while building on Ubuntu via. a Windows VM. To overcome this, all copied file permissions were changed with chmod -R 755 to ensure they were readable and executable inside the container.

# Copy the app to the image
COPY shinyapps /srv/shiny-server/

# Make all app files readable
RUN chmod -R +r /srv/shiny-server/

The final two lines expose port 3838 to receive incoming traffic and tell docker how to start the app. With the Dockerfile complete, images are easily built with the following one-liner:

docker build -t telethonkids/new_shiny_app ./path/to/Dockerfile/directory

and can be run as a standalone container with:

docker run --name=shiny_app --user shiny --rm -p 80:3838 telethonkids/new_shiny_app

and accessed in a browser at http://127.0.0.1.

ShinyProxy

The story doesn’t end there. ShinyProxy can be easily set up in another container to facilitate container creation. An image for a ShinyProxy container is defined by the following Dockerfile, this is an example from ShinyProxy’s GitHub repository for a containerised deployment:

FROM openjdk:8-jre

RUN mkdir -p /opt/shinyproxy/
RUN wget https://www.shinyproxy.io/downloads/shinyproxy-2.1.0.jar -O /opt/shinyproxy/shinyproxy.jar
COPY application.yml /opt/shinyproxy/application.yml

WORKDIR /opt/shinyproxy/
CMD ["java", "-jar", "/opt/shinyproxy/shinyproxy.jar"]

The apps that are to be hosted are defined in the application.yml file. This contains the information for ShinyProxy to launch Shiny Apps and needs to be copied into the container from the host with COPY application.yml /opt/shinyproxy/application.yml.

The following code snippet has only one Shiny App, but multiple app configurations can be added to specs.

proxy:
title: Telethon Kids Institute Shiny Apps
hide-navbar: false
landing-page: /
heartbeat-rate: 10000
heartbeat-timeout: 600000
port: 8080
docker:
internal-networking: true
specs:
- id: shiny_app
display-name: New Shiny App
description: The default app when initiating a new shiny app via. RStudio
container-cmd: ["/usr/bin/shiny-server.sh"]
container-image: telethonkids/new_shiny_app
container-network: tki-net
container-env:
user: "shiny"
environment:
- APPLICATION_LOGS_TO_STDOUT=false

This ShinyProxy set up listens for traffic on port 8080 and will time-out after a period of inactivity, which has been set to 10 minutes.

heartbeat-rate: 10000
heartbeat-timeout: 600000

internal-networking: true must be set because ShinyProxy is being run on a container on the same host as the Shiny App. The configuration for the each app is listed under specs, which give instructions on how to launch the container, what Docker image to use to start the container and what network it should be listening on (note that the container-network must be the same as the network listed in the docker-compose.yaml file, more to come soon).

container-cmd: ["/usr/bin/shiny-server.sh"]
container-image: telethonkids/new_shiny_app
container-network: tki-net

We also have some environment variables that we want to set so the R session is being run for the user shiny and logs are not recorded.

container-env:
user: 'shiny'
environment:
- APPLICATION_LOGS_TO_STDOUT=false

docker-compose.yml

The docker-compose.yml file contains a series of instructions that can be used to build and launch all the containers needed for deployment in a single or multi-container project. The benefit of this is that a complex network of containers and networks can be version controlled, and launched with one line of code.

version: "3.6"
services:
  shinyproxy:
    depends_on:
      - nginx
      - influxdb
    image: telethonkids/shinyproxy
    container_name: tki_shinyproxy
    restart: on-failure
    networks:
      - tki-net
    volumes:
      - ./shinyproxy/application.yml:/opt/shinyproxy/application.yml
      - /var/run/docker.sock:/var/run/docker.sock
    ports:
      - 8080:8080

networks:
  tki-net:
    name: tki-net

Docker uses networks to enable communication between containers. For ShinyProxy to communicate properly with the Shiny App, the network specified in docker-compose.yml must be the same as the same as that listed in application.yml. (Tip, if you’re using a docker-compose file to launch the app, don’t set up the docker network manually, see here for why).

Each container listed in services: is named in container_name, with the Dockerfile being pointed to via. the context and Dockerfile variables.
If the container was terminated with an error code, the restart line instructs Docker what action is to be taken . The networks variables lists the Docker network that the container has access to. Finally, the app is exposed on port 80.

The network used by all applications in this docker-compose must match the network specified in the ShinyProxy application.yml.

networks:
telethonkids-net:
name: telethonkids-net

With everything in place, the ShinyProxy image is built and the container(s) started with docker-compose build and docker-compose up -d. (Important, the app defined in application.yml also be built). Containers are stopped and removed with docker-compose down.

With the configurations listed here, a full-functioning Shiny App can be deployed to the internet with little extra configuration. Even simpler, if this setup can be cloned from GitHub and the files in “webapp/shinyapp” replaced by your Shiny App’s app.R.

Bonus Tip

Another app that I made used CSS to do some custom formatting (only 4 tags). For some reason the style sheet was ignored by Chrome when run with ShinyProxy, but not when running standalone. The problem did not exist when tested on Firefox and Internet Explorer browsers. I think this had something to do with the iframe used by ShinyProxy. This was resolved by putting the style in app.R within a head tag and removing the reference to the external style.css.

Reflections on the AIM WA Practical Decision Making Course

On Thursday and Friday of last week (1st and 2nd of November, 2018) I attended a training course at the Australian Institute of Management (AIM). This training was made possible by an AIM WA scholarship via. an internal application at Telethon Kids Institute. I went to a course called Practical Decision Making, which was attended by 8 other people from local government, training organisations, and research institutes/universities.

I would like to express my thanks to both AIM WA and the Telethon Kids Institute for this opportunity. The 2 day course was engaging, educational, and will benefit me and my work at Telethon Kids.

Biostatisticians take on the responsibility for all statistical activities for a research project, supervising the work of other researchers and members in interdisciplinary teams. We also contribute to clinical discussions on study objectives and the choice/definition of primary endpoints, this might also involve liaising with external experts and opinion leaders. The biostatistician is well situated within any collaborative research team to assess the situation/data and make a critical decision and communicate with all members so they are on-board and excited to achieve the best possible outcome for the research project.

The Practical Decision Making course was facilitated by John Wilcock at AIM’s training centre in Floreat. The training was delivered through group based discussion and lecture-like sessions. A detailed 70 page course booklet was also provided with templates and tools to: guide problem definition, perform cause and effect analysis, a decision analysis matrix, and processes to facilitate risk analysis and management. John was very engaging and drew on many years of management experiences in companies including IKEA, Target, and his own businesses.

The training also used a number of case studies that were solved in small groups. These provided opportunities to work through the decision making process and also encouraged me to use various communication styles when working collaboratively that cater for all learning types (i.e. visual vs. audible vs. tactile).

One group exercise that I found particularly memorable was called the “Zin Obelisk”. This task required 3 person groups to share 30 discrete, and sometimes obscure, pieces of information and decide which day of the week a gang of workers would complete the construction of a religious monument. This exercise highlighted skills that I can improve to ensure all team members that I work with are included in technical discussions and can participate to the decision making process. My style of working through a problem is to scribble down notes until I can form a path to the answer, but rather than jotting down notes for my own benefit, I can be more inclusive by scribbling on something more visible to everyone such as a flip-book on an A-frame.

The decision making and communication skills that were learnt from AIM Practical Decision Making course will help me:

  • Ask “why” at least 5 times so I can really understand the researchers’ questions
  • Frame a problem in a way that will help put the issue in the proper perspective
  • Effectively communicate my decisions to relevant stakeholders and encourage action
  • Help me use established tools and techniques to guide the strategic direction of a research project to a successful conclusion.

Introducing the Telethon Kids Institute R-Markdown Templates

The Telethon Kids Institute has a strong sense of brand awareness that is detailed in our internal style guide and is visible in television advertisements, reports, presentations and seminar posters (and elsewhere!). Our branding is important as it contributes to developing a sense of recognition within the local community and helps support our position as a world leader in child health research. Instantaneous brand recognition is especially important for Telethon Kids, as we receive significant funded from the public, either directly from individuals and businesses or through Channel 7’s Telethon.

The Biometrics Group are the internal consultants for the Telethon Kids Institute and Perth Children’s Hospital for all things biostatistics and data management. As we have multiple competing priorities on the go at any one time, it is important for us to have a tool kit that enables us to rapidly provide support our researchers, collaborators, and students while complying with the themes defined in the Institute’s style guide. We achieve this by working in the R environment, as this enables us to leverage the latest developments in open source statistics and data science.

The Biometrics Group has developed a series of R markdown templates, accessed via. the Biometrics R package, that frees us to focus on the task at hand and not to worry about regularly tweaking the visual aspect of our reports and communications. These easy-to-use templates produce stand-alone HTML files for reports and presentations that can be distributed by email and opened in an internet browser (e.g. Chrome, Firefox, Safari, Edge, etc.) including on mobile devices. Perhaps most importantly, all members of our group can rapidly create visually stunning documents without the need to learn HTML or CSS.

The Biometrics package also includes custom colour themes, based on those defined in the style guide, that can be applied to categorical and continuous data in ggplot2 graphics and used in any other app that accepts common image file formats.

The Telethon Kids Institute Biometrics package is now available on GitHub and new R markdown documents can be created with these templates in R Studio after package installation:

devtools::install_github("TelethonKids/biometrics", build_vignettes = TRUE)

and are accessed through the menu: New -> R markdown -> From template. A brief introduction to using these templates are found in the package vignettes:

browseVignettes("biometrics")

The usual markdown syntax is used to prepare your documents; a cheat sheet explaining how to use R markdown can be found here.

Perth Young Statisticians Conference 2018

Yesterday, I had the opportunity to attend the Young Statisticians Conference 2018 hosted by the Statistics Society of Australia at Rigby’s Bar.

This conference featured 6 talks from practicing statisticians in industry and academia and 4 talks from honours and PhD students studying in WA (1 from ECU and 3 from UWA).

The Industry/Academic talks were presented by:

  • Halena Talbot – Australian Bureau of Statistics
  • Kim Carter – Data Analysis Australia
  • Karl Beidatch – BHP Billiton
  • Gemma Cadby – University of Western Australia
  • Nicola Armstrong – Murdock University
  • Grant Coble-Neal – Western Power

The talks were followed by a speed dating style Q&A between groups of conference attendees and the invited speakers. The conference concluded with a networking sundowner where I had the opportunity to meet other Perth statisticians and promote the work that we do at Telethon Kids.

This event was a showcase of the statistical/data science work that is available for students in WA (most attendees were university students, with a few like me who have recently completed their studies).

Although there wasn’t a defined theme for the talks, there was one point that kept being raised: raw data is messy, regardless what industry you are in and don’t be afraid to ask pointed questions so that I, as the analyst, have a good understanding of the research question.

Also, relevant to my work was a slide by Kim Carter stating that big data does not need big solutions – which I felt was relevant in the context of link data and some of the larger projects that the biometrics group may be asked to consult (e.g. Origins).

Several of the speakers also encouraged the attendees to add coding as a core competency as it lives hand-in-hand with current statistics. This resonated with me as I believe that my ability to code helped me get my position at Telethon Kids as a graduate biostatistician with no real-world experience in the field.

On reflection, this event could have been a great way to get young statisticians in Perth interested in the medical and health research that is being done at Telethon Kids. This conference happens every other year and I recommend the next time it comes around a biostatistician from Telethon Kids should be involved and show off some of the fun work that we are involved with that has a real impact on children’s’ lives.