우리는 어떻게 규칙을 위해 apriori에 대한 지원과 자신감을 찾을 수 있습니까?

거래 데이터에서 상품 연결을하고 있습니다. 규칙을 만들기 위해 R에서 arules 패키지를 사용하고 있습니다. 나는,이 링크 https://1drv.ms/u/s!Ak1rt2E1f2gFgV9t7hMVAn0P4gd0 우리는 어떻게 규칙을 위해 apriori에 대한 지원과 자신감을 찾을 수 있습니까?

library(arules) 
library(arulesViz) 
df = read.csv("trans.csv") 
trans = as(split(df[,"Item"], df[,"Billno"]), "transactions") 
inspect(trans[1:20]) 
summary(trans) 
rules1 = apriori(trans,parameter = list(support = 0.6, confidence = 0.6, 
target = "rules")) 
summary(rules1) ##Output is "Set of 0 rules"

나는대로 출력을 얻고 내 샘플 데이터를 공유하는 0 규칙

의

Summary(rules1)

설정하고 나는이 링크를 게시하기 전에 https://stats.stackexchange.com/questions/56034/association-analysis-returns-0-useful-rules을 언급 이. 그리고 난 지원과 자신감을 위해 임의의 숫자를 시도했지만 아무런 효과가 없었습니다.

출처

2017-04-24 mk11o5

재현 가능한 예를 제공해야합니다. 참조 : http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example –

감사합니다. Michael, 방금 샘플에서 샘플 데이터를 공유했습니다. – mk11o5

올바른 최소 지원 및 최소 신뢰도 값을 찾고 0 개의 빈 항목 집합 또는 0 개의 연결 규칙으로 끝나는 문제는 매우 일반적입니다. 지원과 확신이 정확히 무엇인지 다시 생각해보아야 할 경우 this을 읽으십시오. 먼저 트랜잭션 데이터에서

살펴 보자 :

summary(trans) 
transactions as itemMatrix in sparse format with 
2531 rows (elements/itemsets/transactions) and 
6632 columns (items) and a density of 0.0005951533 

most frequent items: 
AR845311 AR800369 AR828249 AR839869 AR831167 (Other) 
     84  35  31  29  24  9787 

element (itemset/transaction) length distribution: 
sizes 
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 
767 509 306 238 160 112 100 52 69 50 31 27 18 12 13 15 9 10 7 5 4 
23 24 25 27 28 32 34 36 48 
    3 4 2 3 1 1 1 1 1 

    Min. 1st Qu. Median Mean 3rd Qu. Max. 
    1.000 1.000 2.000 3.947 5.000 48.000

첫 번째 문제점을 해결하기위한 최소한의 지원입니다. 요약에 따르면 가장 자주 나오는 항목 (AR845311)은 데이터 세트에서 84 번 발생합니다. 귀하의 물품은 일반적으로 매우 낮은 지지도를 보입니다.

summary(itemFrequency(trans)) 

     Min. 1st Qu. Median  Mean 3rd Qu.  Max. 
     0.0003951 0.0003951 0.0003951 0.0005952 0.0003951 0.0331900

0.6의 지원, 그러나 가장 빈번한 단 하나 품목은 단지 0.033의 지원이있다! 지원을 줄여야합니다. 당신이 다음에 최소한의 지원을 설정할 수 있습니다 데이터에 적어도 10 번 발생할 itemsets/규칙을 찾으려면 :

10/length(trans) 

[1] 0.003951008

두 번째 문제는 데이터가 매우 희소하다는 것이다 (요약 약의 밀도를 보여줍니다 0.0006). 즉, 거래가 다소 짧습니다 (즉, 항목이 거의 없음).

table(size(trans)) 

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 
767 509 306 238 160 112 100 52 69 50 31 27 18 12 13 15 9 10 7 5 4 
23 24 25 27 28 32 34 36 48 
    3 4 2 3 1 1 1 1 1

짧은 거래는 규칙의 신뢰도가 낮다는 것을 의미합니다. 귀하의 데이터는 매우 낮다는 것을 알았으므로 먼저 0을 사용합니다.

rules <- apriori(trans, 
+ parameter = list(support = 0.004, confidence = 0, target = "rules")) 
Apriori 

Parameter specification: 
confidence minval smax arem aval originalSupport maxtime support minlen maxlen 
      0 0.1 1 none FALSE   TRUE  5 0.004  1  10 
target ext 
    rules FALSE 

Algorithmic control: 
filter tree heap memopt load sort verbose 
    0.1 TRUE TRUE FALSE TRUE 2 TRUE 

Absolute minimum support count: 10 

set item appearances ...[0 item(s)] done [0.00s]. 
set transactions ...[6632 item(s), 2531 transaction(s)] done [0.00s]. 
sorting and recoding items ... [40 item(s)] done [0.00s]. 
creating transaction tree ... done [0.00s]. 
checking subsets of size 1 2 done [0.00s]. 
writing ... [46 rule(s)] done [0.00s]. 
creating S4 object ... done [0.00s]. 
> summary(rules) 
set of 46 rules 

rule length distribution (lhs + rhs):sizes 
1 2 
40 6 

    Min. 1st Qu. Median Mean 3rd Qu. Max. 
    1.00 1.00 1.00 1.13 1.00 2.00 

summary of quality measures: 
    support   confidence   lift   count  
Min. :0.004346 Min. :0.004346 Min. : 1.000 Min. :11.00 
1st Qu.:0.004741 1st Qu.:0.004840 1st Qu.: 1.000 1st Qu.:12.00 
Median :0.005531 Median :0.005729 Median : 1.000 Median :14.00 
Mean :0.006803 Mean :0.057301 Mean : 3.316 Mean :17.22 
3rd Qu.:0.007112 3rd Qu.:0.008890 3rd Qu.: 1.000 3rd Qu.:18.00 
Max. :0.033188 Max. :0.705882 Max. :21.269 Max. :84.00 

mining info: 
    data ntransactions support confidence 
trans   2531 0.004   0

결과는 신뢰도가 0.7 인 규칙이 하나 이상 있음을 보여줍니다. APRIORI를 높은 신뢰도로 다시 실행할 수 있습니다. 여기 상단 신뢰 규칙은 다음과 같습니다 here를 찾을 수 있습니다 협회 규칙 마이닝을 사용하는 방법에 대한

inspect(head(rules, by = "confidence")) 
    lhs   rhs  support  confidence lift  count 
[1] {AR835501} => {AR845311} 0.004741209 0.7058824 21.26891 12 
[2] {AR743988} => {AR845311} 0.004346108 0.6470588 19.49650 11 
[3] {AR800369} => {AR845311} 0.007111814 0.5142857 15.49592 18 
[4] {AR845311} => {AR800369} 0.007111814 0.2142857 15.49592 18 
[5] {AR845311} => {AR835501} 0.004741209 0.1428571 21.26891 12 
[6] {AR845311} => {AR743988} 0.004346108 0.1309524 19.49650 11

완벽한 예.

희망이 도움이됩니다.

출처

2017-04-26 15:05:53

우리는 어떻게 규칙을 위해 apriori에 대한 지원과 자신감을 찾을 수 있습니까?

답변

관련 문제