ggraph: ggplot for graphs

Read More

Seeing Theory: Learn Statistics through simulation

Read More

Give a talk about an application of R at EARL

Read More

Data scientist Giora Simchoni recently published a fantastic analysis of the history of pop songs on the Billboard Hot 100 using the R language. Giora used the rvest package in R to scrape data from the Ultimate Music Database site for the 350,000 chart entries (and 35,000 unique songs) since 1940, and used those data to create and visualize …

0 6.5k

There are several tools available in R for creating animations (movies) from statistical graphics. The animation package by Yihui Xie will create an animated GIF or video file, using a series of R charts you generate as the frames. And the gganimate package by David Robinson is an extension to ggplot2 that will create a movie …

0 6.5k

While Excel isn’t usually my tool of choice for manipulating or analyzing data (I prefer to use it as a data source for R), it has just learned a new trick that’s likely to prove useful from time to time. Extracting the useful information from complicated or inconsistent formats can often be a pain, but with …

0 6.5k

The Consumer Data Research Centre, the UK-based organization that works with consumer-related organisations to open up their data resources, recently published a new course online: An Introduction to Spatial Data Analysis and Visualization in R. Created by James Cheshire (whose blog Spatial.ly regularly features interesting R-based data visualizations) and Guy Lansley, both of University College London Department of Geography, …

0 6.5k

The new Visual Studio 2017 has built-in support for programming in R and Python. For older versions of Visual Studio, support for these languages has been available via the RTVS and PTVS add-ins, but the new Data Science Workloads in Visual Studio 2017 make them available without a separate add-in. Just choose the “Data Science and analytical applications” …

0 6.5k

It’s well-known that the home team has an advantage in soccer (or football, as it’s called in England). But which teams have made the most of their home-field advantage over the years? Evolutionary biologist (and Liverpool fan) Joe Gallagher analyzed the percentage of points won in the UK Premier League (which awards 3 points for a win …

0 6.5k

There’s a handy new function in R 3.4.0 for anyone interested in data about CRAN packages. It’s not documented, but it’s pretty simple: tools::CRAN_package_db() returns a data frame with one row for every package on CRAN and 65 columns of data on those packages, as shown below. > names(tools::CRAN_package_db()) [1] “Package” “Version” “Priority” [4] “Depends” …

0 6.5k

Developer Q&A site Stack Overflow recently introduced Stack Overflow Trends, a useful tool for tracking the growth and decline in the rate of questions asked on various topics (by their Stack Overflow tag). For example, you can see that activity around both R and Python has been increasing over the past 8 years: As you’d …

0 6.5k

Last week, my Microsoft colleagues Bharath Sankaranarayan and Carl Saroufim presented a live webinar showing how you can predict a patient’s length of stay at a hospital using SQL Server R Services. The recorded webinar is available for on-demand viewing now. (Registration is required to view.) The webinar is based on the Machine Learning Solution Template Predicting …

0 6.5k

There’s a reason why data scientists spend so much time exploring data using graphics. Relying only on data summaries like means, variances, and correlations can be dangerous, because wildly different data sets can give similar results. This is a principle that has been demonstrated in statistics classes for decades with Anscombe’s Quartet: four scatterplots which despite being …

0 6.5k