°øÁö»çÇ× HOME > Á¤º¸±¤Àå > °øÁö»çÇ×
 
ÇÑ±Û DTM ÆÐÅ°Áö
°ü¸®ÀÚ 18-06-14 22:42 841
   makeDTM.zip (9.6K) [4] DATE : 2018-06-14 22:42:43
BK 21ÀÇ Áö¿øÀ¸·Î Document Term Matrix¸¦ ¸¸µå´Â ÆÐÅ°Áö¸¦ ¸¸µé¾î ¹èÆ÷ÇÕ´Ï´Ù.

tm ÆÐÅ°Áö´Â ´ÙÀ½°ú °°Àº ¹®Á¦°¡ ÀÖ½À´Ï´Ù.
- Corpus¶ó´Â ŸÀÔÀ» °ÅÃÄ¾ß »ç¿ëÇÒ ¼ö ÀÖ´Ù
- »ç¿ëÀÚ ÀÎÅÍÆäÀ̽º°¡ ÀÚÁÖ ¹Ù²ï´Ù.
- ÇѱÛÀÇ °æ¿ì ¶ç¾î¾²±â¿¡ µû¸¥ Ä÷³ ºÐ¸®°¡ Àß µÇÁö ¾Ê´Â´Ù.

ÀÌ¿¡ makeDTMÀ̶ó´Â ÆÐÅ°Áö¸¦ ¸¸µé¾î »ç¿ëÇϱ⠽±°í, ÇÑ±Û Ã³¸®¸¦ Àß ÇÒ ¼ö ÀÖµµ·Ï Çß½À´Ï´Ù.
ÁÖ¿ä Ư¡Àº ´ÙÀ½°ú °°½À´Ï´Ù.

- µ¥ÀÌÅÍÇÁ·¹ÀÓÀ» ¹Ù·Î Àоîµé¿© DTMÀ» ¸¸µç´Ù.
- keyword¸¦ ÁöÁ¤ÇÏ¿© ÇÊ¿äÇÑ Ä÷³¸¸À» ¸¸µé ¼ö ÀÖ´Ù. (¹ÌÁöÁ¤½Ã ¸ðµç ´Ü¾î ¼±ÅÃ)

makeDTM() ÆÐÅ°ÁöÀÇ ¼³Ä¡ ¹× »ç¿ë °úÁ¤Àº ´ÙÀ½°ú °°½À´Ï´Ù.

1. R¿¡¼­ ÇѱÛÀ» Àß »ç¿ëÇϱâ À§Çؼ­´Â ±âº»ÀûÀ¸·Î Tools --> Global Options --> Code --> Saving --> Default text encoding¸¦ UTF-8·Î ÇÕ´Ï´Ù. (ºñÁ¤Çü µ¥ÀÌÅÍ Ã³¸®ÀÇ ÀÏ¹Ý »çÇ×)

2. devtools¸¦ ÀÌ¿ëÇÏ¿© ±êÇé¿¡¼­ ´Ù¿î·Îµå ¹Þ½À´Ï´Ù.
library(devtools)
install_github("caitechKHU/makeDTM")

# ¸¸¾à Àß ¼³Ä¡°¡ ¾È µÇ¸é ÷ºÎµÈ ÆÄÀÏÀ» ´Ù¿î·Îµå ¹Þ°í, R ÆÐÅ°Áö °æ·Î¿¡ Á÷Á¢ ¼³Ä¡ÇÕ´Ï´Ù.
# R ÆÐÅ°Áö °æ·Î´Â .libPaths() ·Î ¾Ë ¼ö ÀÖ°í, ù ¹ø° ³ª¿À´Â °æ·Î¿¡ ¾ÐÃàÀ» Ç®¾î³ÖÀ¸¸é µË´Ï´Ù.

3. µ¥ÀÌÅÍÇÁ·¹ÀÓÀ» º»¹® ³»¿ëÀÌ µé¾î°¡´Â "TEXT" Ä÷³°ú ºÐ·ù°¡ µé¾î°¡´Â "LABEL" Ä÷³À» °®µµ·Ï ±¸¼ºÇÕ´Ï´Ù. ´Ù¸¥ Ä÷³ÀÌ ´õ ÀÖ´Â °ÍÀº ¹®Á¦µÇÁö ¾Ê½À´Ï´Ù. 
´Ü¾îº° Ä÷³ ±¸ºÐÀº ¶ç¾î¾²±â ´ÜÀ§·Î µË´Ï´Ù.

4. À§ÀÇ µ¥ÀÌÅÍÇÁ·¹ÀÓÀ» ºÒ·¯¿É´Ï´Ù.
stringsAsFactors = FALSE  ¿É¼ÇÀº ¾È Çصµ µË´Ï´Ù.
docs <- read.csv("sample.csv")     

5. Å°¿öµå¸¦ ¼³Á¤ÇÕ´Ï´Ù. 
Å°¿öµå¸¦ ÁöÁ¤ÇÏÁö ¾ÊÀ¸¸é Àüü ´Ü¾î°¡ Å°¿öµå°¡ µË´Ï´Ù.
keyword <- c("¿¢¼¿À»", "ÄÄÇ»ÅÍ°¡", "Àϱ⸦", "´Ù½Ã", "¿À´ÃÀº")

6. DTMÀ» ¸¸µì´Ï´Ù.
LABEL = TRUE ¿É¼ÇÀ» »ç¿ëÇϸé data.frameÀ¸·Î ³ª¿É´Ï´Ù. ÀÌ ¿É¼ÇÀ» »ç¿ëÇÏÁö ¾ÊÀ¸¸é matrix·Î ³ª¿É´Ï´Ù.
weight·Î´Â "tf" ¿Í "tfidf" ¸¦ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù.
library(makeDTM)
mydtm <- makeDTM(docs, key=keyword, LABEL = TRUE, weight = "tfidf")

class(mydtm)     
mydtm

Ä÷³ ±¸ºÐÀº ¶ç¾î¾²±â ´ÜÀ§À̹ǷÎ, TEXT Ä÷³¿¡ ´ëÇÏ¿©´Â ÇüÅÂ¼Ò ºÐ¼®À» ¼öÇàÇÑ µÚ makeDTM()ÇÏ´Â °ÍÀ» ÃßõÇÕ´Ï´Ù. 

ÇÙ½É °úÁ¤À» ¿ä¾àÇÏ¸é ´ÙÀ½ÀÇ µÎ ´Ü°èÀÔ´Ï´Ù.
1) keyword <- c("¿¢¼¿À»", "ÄÄÇ»ÅÍ°¡", "Àϱ⸦", "´Ù½Ã", "¿À´ÃÀº")
2) mydtm <- makeDTM(docs, key=keyword, LABEL = TRUE, weight = "tfidf")


                               * * *
¾Æ·¡¿¡ ¸î °¡Áö ´Ù¸¥ »ç¿ë ¿¹¸¦ º¸ÀÔ´Ï´Ù.
# Å°¿öµå¸¦ ¾Æ±Ô¸ÕÆ®¿¡ Á÷Á¢ ³Ö±â
mydtm <- makeDTM(docs, key=c("¿¢¼¿À»", "ÄÄÇ»ÅÍ°¡", "Àϱ⸦", "´Ù½Ã", "¿À´ÃÀº"), LABEL = TRUE, weight = "tfidf") 

# weight ¸¦ ±âº»°ª(tf)À¸·Î Çϱâ
mydtm <- makeDTM(docs, key=keyword, LABEL = TRUE) 

# ¸ðµç °ÍÀ» ±âº»°ªÀ¸·Î Çϱâ(keyword Àüü ´Ü¾î, LABEL ¾øÀ½, weight=tf)
mydtm <- makeDTM(docs) 


*** ÀÌ ÇÁ·Î±×·¥Àº °æÈñ´ëÇб³ °æ¿µ´ëÇÐÀÇ BK21 ÇÁ·Î±×·¥ (µ¥ÀÌÅÍ°úÇп¡ ±â¹ÝÇÑ °æ¿µÀü¹® ¿¬±¸Àη ¾ç¼ºÆÀ)ÀÇ Áö¿øÀ» ¹Þ¾Ò½À´Ï´Ù ***