How to count by group in R

Counting by various groups — in some cases called crosstab experiences — can be a beneficial way to look at information ranging from community impression surveys to medical assessments. For illustration, how did men and women vote by gender and age group? How numerous software program developers who use both of those R and Python are adult men vs. women of all ages?

There are a lot of methods to do this kind of counting by types in R. Below, I’d like to share some of my favorites.

For the demos in this article, I’ll use a subset of the Stack Overflow Builders study, which surveys developers on dozens of subjects ranging from salaries to systems utilized. I’ll whittle it down with columns for languages utilized, gender, and if they code as a interest. I also added my personal LanguageGroup column for no matter whether a developer reported employing R, Python, both of those, or neither.

If you’d like to follow along, the final web page of this article has recommendations on how to down load and wrangle the information to get the exact information set I’m employing.

The information has a single row for each and every study response, and the 4 columns are all figures.

str(mydata)
'data.frame':83379 obs. of  four variables:
 $ Gender            : chr  "Person" "Person" "Person" "Person" ...
 $ LanguageWorkedWith: chr  "HTML/CSSJavaJavaScriptPython" "C++HTML/CSSPython" "HTML/CSS" "CC++C#PythonSQL" ...
 $ Hobbyist          : chr  "Of course" "No" "Of course" "No" ...
 $ LanguageGroup     : chr  "Python" "Python" "Neither" "Python" ...

I filtered the uncooked information to make the crosstabs far more manageable, which includes eliminating missing values and having the two biggest genders only, Person and Girl.