안녕하세요 석사 2기 서한솔입니다.
저번 강의시간에 교수님께서 말씀하신 TF-IDF 구하는 코드 업로드합니다.
그대로 복사하여서 코드 실행해 보신 다음, 코드를 하나하나 살펴보시면
어렵지 않게 이해하실 수 있을 것입니다.
#텍스트 벡터 형성
Example<-c("Neural Network emulates how the human brain works by having a network of neurons that are interconnected and sending stimulating signal to each other.",
"Support Vector Machine provides a binary classification mechanism based on finding a dividing hyperplane between a set of samples with +ve and -ve outputs.",
"From a probabilistic viewpoint, the predictive problem can be viewed as a conditional probability estimation; trying to find Y where P(Y | X) is maximized.",
"K Nearest neighbor is also called instance-based learning, in contrast to model-based learning, because it is not learning any model at all."
#RTextTools 패키지의 'create_matrix' 함수와
#'tm' 패키지의 weightTfIdf를 활용하여 DocumentTermMatrix 형성
dtmat<-create_matrix(Example, language = "english", removeNumbers = T, removePunctuation = T, stemWords = T, weighting = tm::weightTfIdf)