R is a programming language focused on providing tools for statisticians working with data. In the past years, its use has increased and this can be easily explained by the reduction on the costs to generate, access and maintain datasets.
Using R, data scientists can concentrate on applying statistical techniques on their data while other tools and packages do the “dirty job”, such as reading and interpreting information from data sources, or generating deliverables in forms of graphics, reports, webpages and presentations.
There exists more than six reasons why you should start using R, but it is possible to select the most relevant and impressive for new users:
- The community – R is an open source programming language, which supports libraries developed by users. Moreover, its community has been increasing in the last years and it is extremely easy to find online resources, like forums, blog posts or online courses (Continue reading about the community of R…);
- The IDE – Some years ago, R had only a shell interface that should be used to run scripts, load libraries, debug functions, etc.. Nowadays, it is possible to use R Studio: a free IDE with powerful tools to visualize, edit and debug your code.
- Data manipulation – Either using the basic functions or packages that can be found online, it is possible to access external data bases and read several types of file, such as CSV, JSON, XML, XLS, HTML, etc.. Furthermore, there are functions to easily merge datasets, extract information and apply changes to groups of elements, according to user-defined classification rules.
- Data visualization – Apart from the traditional plots that can be made using other tools (such as Matlab and Gnuplot), some packages in R provide professional appearance graphics. Also, there are functions that give an impressive overview of the data and another that let you play with your plot before generating the final version.
- Predictions – Since R is focused on statistics, it has some options for making predictions. However, there is a specific package (called Caret) that makes it as simple as possible by creating an interface to access several prediction methods. The same library provides helpers to prepare the data using well known techniques (such as cross-validation) and compare the results of the predictions.
- Dissemination of the results – R has support to markdown, which is also used in other platforms as Github. Using them, it is possible to generate reports (in PDF, HTML, DOC), papers and also presentation slides. Besides them, it is also possible to create (and share) interactive data applications to display its results.