Monday, 9 September 2013

Grouping words based on hierarchy in R

Grouping words based on hierarchy in R

I would like to get a hierarchy in my vector of words like in the example:
# Start (in reality these will not be right next to each other)
words <- c("hello-world", "hello", "string", "sub-string", "custom-fields",
"custom", "hi-hat", "hat")
# Result
highlevel <- c("hello-world", "sub-string", "custom-fields", "hi-hat")
lowerlevel <- c("hello", "string", "custom", "hat")
In reality I'll be facing big data and am looking for an efficient way to
group these. If possible, I would also like them to be linked somehow. The
goal is to search for the higher level words first, and when they are not
found, look for the lower level words.
Ideas?

No comments:

Post a Comment