Here I present a new function I created to find the count of lines and words in a text document and return them in the form of a table. It uses the wc “qdap” package in R as well as base R functions sum, nrow, as.numeric, as.data.frame and cbind.
The Problem:
How to find both the number of lines and the amount of words in a potentially large document using R and return it as a table”
The solution:
First install and load qdap package
[code lang=”r”]install.packages("qdap");library(qdap)[/code]
Load text document
[code lang=”r”]doc = readLines("doc.txt", ok = TRUE)[/code]
Read “WordsLines” in Function
[code lang=”r”]
WordsLines = function(dataframe, names1, names2){
Words = as.data.frame(dataframe) #since the dataframe is in text format put it into a dataframe
Wc = wc(Words[,1]) #get the word count of each input (all rows) of the first column
Words1 = as.data.frame(Wc) #put that word count into a dataframe
Words1$Wc = as.numeric(Words1$Wc) #make sure it is numeric
names(Words1)[1] = paste("Words") #change the column name to "Words"
Words1 = sum(Words1, na.rm = T) #Sum all the word counts of the entire column
Lines = nrow(Words) #find the number of words in the entire dataframe
final = cbind(Lines, Words1) #combine the line count and wort count into one table
colnames(final) = c(names1, names2) #change the names of the columns to fit the particular dataset
final #return the table
}
[/code]
Call function
[code lang=”r”]WordsLines(doc, "Doc Lines", "Doc Words")[/code]
Should return something like this:
[code lang=”r”]
Doc Lines Doc Words
[1,] 1010242 33482314[/code]