공유자료 HOME > 자료실 > 공유자료
 
[정보기술활용연구] TF-IDF를 이용하여 DocumentTermMatrix만들기.
관리자 16-04-11 15:28 590
안녕하세요 석사 2기 서한솔입니다.
 
저번 강의시간에 교수님께서 말씀하신 TF-IDF 구하는 코드 업로드합니다.
 
그대로 복사하여서 코드 실행해 보신 다음, 코드를 하나하나 살펴보시면
 
어렵지 않게 이해하실 수 있을 것입니다.
 
 
 
#텍스트 벡터 형성
Example<-c("Neural Network emulates how the human brain works by having a network of neurons that are interconnected and sending stimulating signal to each other.",
     "Support Vector Machine provides a binary classification mechanism based on finding a dividing hyperplane between a set of samples with +ve and -ve outputs.",
     "From a probabilistic viewpoint, the predictive problem can be viewed as a conditional probability estimation; trying to find Y where P(Y | X) is maximized.",
     "K Nearest neighbor is also called instance-based learning, in contrast to model-based learning, because it is not learning any model at all."
     )
 
library(tm)
library(RTextTools)
 
#RTextTools 패키지의 'create_matrix' 함수와
#'tm' 패키지의 weightTfIdf를 활용하여 DocumentTermMatrix 형성
dtmat<-create_matrix(Example, language = "english", removeNumbers = T, removePunctuation = T, stemWords = T, weighting = tm::weightTfIdf)
dtmat2<-as.matrix(dtmat); dtmat2