All the scripts and data for this series is available at

You’ll need some of the csv files there to do the later tutorials.

Aside from this document (00_Introduction), this series correspond to R scripts of the same name, found in the “scripts” folder of the site linked above.

So, 01_basicHist.html describes in detail what you’ll see in 01_basicHist.R and so on. This allows us keep the annotations in the R script fairly light and allows you to have this html guide open or printed out while you work with the script.

The idea is that while you’re working with the R scripts, you can have the html file open in a browser window. The html files can be printed too. Use whatever works best for you.

If you need some assistance with these, please let me know.

Use the Navigation page to begin.

Important note

Especially if you’re new to coding, you’re not going to understand everything in these files.

That’s OK. That’s normal. You don’t have to understand everything to use it.

Think about this like you’re opening a door. All you need to know is how to turn the door knob, you don’t have to know how to design and machine all the parts and install a door knob in a home.

At some point you might want to or need to, but you don’t have to right now. You’re not reading this because you want to become a compuer scientist, you’re reading this to become a better journalist.

If at any time you want to learn more about something, copy the code snippit and paste it into a google search. Chances are lots of people have written about it and some of them may have even written a tutorial about it.

BTW, that’s a valuable tip: google what you’re curious about and add the word “tutorial.” It’s how most of the links listed below were found.

But generally, don’t feel bad because you don’t ‘get’ something right off. No one does.

If things go wrong

They will. You may have an extra comma, or forgotten to close parenthesis. It happens to everyone.

R is pretty good about telling you where the problem is. Or, try comparing your code to code that works and see if you can spot the difference.

Sometimes problems can get complicated. If so, you can try copying the warning message into a google search. A lot of times you’ll come across the answer because you’re not the only person to have this problem. You can also contact me.

And sometimes, it’s really complicated. All these files were created using R version 3.3.3 and R Studio 1.0.143. That shouldn’t be an issue until it is (welcome to coding).

How these guides are structured

These guides assume you’ve installed R and R Studio, and have a little experience working with R.

  • 00_Quickstart is just that - it’ll get you up and running with R and give you some familarity working with it through R-Studio.

  • From 01 to 06, we look at R’s basic plotting tools. For exploring data yourself, or just for practice, these are good to know. However, their design isn’t great for print or online. And, it takes some work to get a decent graphic out of them.

  • At 07, we start working with ggplot2, the R graphics library that takes a sophisticated approach to graphics. 08_qplot is part of ggplot - q stands for quick, and it’s meant to create quick visualizations that help you understand the data. Qplot is an excellent way of quickly looking at your data, and it’s a lot easier to use than the basic plots. Along with ggplot2, qplot should replace all those basic plots.

  • From 08 through 14, we learn to create the most common graphic types - bar charts, stacked bar charts, grouped bar charts, line and area charts. We also learn how to create color and black-and-white pdfs suitable for print publication and color png files for online.

  • 15 demonstrates how to explore relationships between data using qplot scatter plots and finally creating a publication-ready scatter plot that shows what we found.

  • Finally, M01 is a link to another guide I created on how we examined cyclist and pedestrian crashes in the six-county area. It’s four parts, with the last two on how we used mapping in R to examine the data.

I may add more tutorials at a later date - for instance looking at what happens if something goes wrong.

Final thoughts

Open source and fair use

The open-source ecosystem is pretty unique. Millions of people (not an exaggeration) participate in building and maintaining software that’s free to use and share. In some cases, we’ve used some of their code in our scripts. That’s OK, it’s how open source coding is supposed to work.

For instance, say you want to load in a comma-separated file. You search the internet and find a chunk of code someone posted:

df <- read_csv("filename.csv")

It’s fine to use the code - no one owns the code itself. And part of the open-source ethos is sharing knowledge so that others can learn. They do that because others did that for them.

However, taking someone’s data and their script, running it to produce their work and then using their result in your work - that could be a problem.

But if you’re just looking to learn and improve your own skills, don’t be afraid to look at other peoples’ work.