Some other metrics
And, of course, we can use the standard data analysis tools as well after quantifying our package descriptions a bit. Let's see, for example, the length of the documents in the corpus:
> vnchar <- sapply(v, function(x) nchar(x$content)) > summary(vnchar) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.00 27.00 37.00 39.85 50.00 168.00
So, the average package description consists of around 40 characters, while there is a package with only two characters in the description. Well, two characters after removing numbers, punctuations, and the common words. To see which package has this very short description, we might simply call the which.min
function:
> (vm <- which.min(vnchar)) [1] 221
And this is what's strange about it:
> v[[vm]] <<PlainTextDocument (metadata: 7)>> NA > res[vm, ] V1 V2 221 <NA>
So, this is not a real package after all, but rather an empty row in the original table. Let's visually inspect the overall...