그룹별로 가장 많이 발생하는 선택 값

출생일을 포함 해 입원 환자에 대한 RDF 데이터가 있습니다. 여러 번 트리플 생년월일 및 중 일부는 잘못된 일 수 있습니다. 우리 그룹은 다음 규칙을 사용하기로 결정했습니다 : 날짜가 가장 자주 발생하는 날짜는 임시로 올바른 것으로 간주됩니다. 우리가 선택한 프로그래밍 언어 (SPARQL 외부)에서이를 수행하는 방법은 분명합니다.그룹별로 가장 많이 발생하는 선택 값

SPARQL에서 집약의 집계가 가능합니까?

나는 비슷한 질문을 SPARQL selecting MAX value of a counter으로 읽었지만, 아직 거기에 없다.

을 감안할 때 이러한 트리플 :

+----------------------------------------+------------------------+------------------+ 
|     part     |  xsddate   | datecount  | 
+----------------------------------------+------------------------+------------------+ 
| turbo:6e200ca0d5150282787464a2bda55814 | "2000-04-05"^^xsd:date | "1"^^xsd:integer | 
| turbo:b6be95364ec943af2ef4ab161c11c855 | "1971-12-30"^^xsd:date | "3"^^xsd:integer | 
| turbo:6e200ca0d5150282787464a2bda55814 | "2000-04-04"^^xsd:date | "2"^^xsd:integer | 
+----------------------------------------+------------------------+------------------+

내가 오직 최고 카운트 날짜를보고 싶어 :

@prefix turbo: <http://example.org/ontologies/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://example.org/ontologies/b6be95364ec943af2ef4ab161c11c855> a <http://example.org/ontologies/StudyPartWithBBDonation> ; turbo:hasBirthDateO turbo:3950b2b6-f575-4074-b0e8-f9fa3378f3be, turbo:4250aafa-4b0c-4f73-92b6-7639f427b61d, turbo:a3e6676e-a214-4af4-b8ef-34a8e20170bf . turbo:3950b2b6-f575-4074-b0e8-f9fa3378f3be turbo:hasDateValue "1971-12-30"^^xsd:date . turbo:4250aafa-4b0c-4f73-92b6-7639f427b61d turbo:hasDateValue "1971-12-30"^^xsd:date . turbo:a3e6676e-a214-4af4-b8ef-34a8e20170bf turbo:hasDateValue "1971-12-30"^^xsd:date . turbo:6e200ca0d5150282787464a2bda55814 a turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO turbo:b09519f5-b123-40d5-bb4a-737ec9f8b9a8, turbo:06c56881-a6c7-4d1d-993b-add8862dffd7, turbo:12ef184d-c8d6-4d93-a558-a3ba47bb56ca . turbo:b09519f5-b123-40d5-bb4a-737ec9f8b9a8 turbo:hasDateValue "2000-04-04"^^xsd:date . turbo:06c56881-a6c7-4d1d-993b-add8862dffd7 turbo:hasDateValue "2000-04-04"^^xsd:date . turbo:12ef184d-c8d6-4d93-a558-a3ba47bb56ca turbo:hasDateValue "2000-04-05"^^xsd:date .

이 쿼리는

PREFIX turbo: <http://example.org/ontologies/> SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate

다음을 제공합니다 각 환자마다 연구에 참여하고있는 사람 :

+----------------------------------------+------------------------+------------------+ 
|     part     |  xsddate   | datecount  | 
+----------------------------------------+------------------------+------------------+ 
| turbo:b6be95364ec943af2ef4ab161c11c855 | "1971-12-30"^^xsd:date | "3"^^xsd:integer | 
| turbo:6e200ca0d5150282787464a2bda55814 | "2000-04-04"^^xsd:date | "2"^^xsd:integer | 
+----------------------------------------+------------------------+------------------+

나는 여기에 가까워지고 있다고 생각. 이제는 동일한 행에 개수와 최대 개수를 가져와야합니다!

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX turbo: <http://example.org/ontologies/> 

SELECT ?part ?xsddate ?datecount ?countmax 
WHERE 
    { { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) 
     WHERE 
      { ?part rdf:type    turbo:StudyPartWithBBDonation ; 
        turbo:hasBirthDateO ?dob . 
      ?dob turbo:hasDateValue ?xsddate 
      } 
     GROUP BY ?part ?xsddate 
     } 
    UNION 
     { SELECT ?part (MAX(?datecount) AS ?countmax) 
     WHERE 
      { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) 
      WHERE 
       { ?part rdf:type    turbo:StudyPartWithBBDonation ; 
         turbo:hasBirthDateO ?dob . 
       ?dob turbo:hasDateValue ?xsddate 
       } 
      GROUP BY ?part ?xsddate 
      } 
     GROUP BY ?part 
     } 
    }

는 기본적으로

+----------------------------------------+------------------------+------------------+------------------+ 
|     part     |  xsddate   | datecount  |  countmax  | 
+----------------------------------------+------------------------+------------------+------------------+ 
| turbo:6e200ca0d5150282787464a2bda55814 | "2000-04-05"^^xsd:date | "1"^^xsd:integer |     | 
| turbo:b6be95364ec943af2ef4ab161c11c855 | "1971-12-30"^^xsd:date | "3"^^xsd:integer |     | 
| turbo:6e200ca0d5150282787464a2bda55814 | "2000-04-04"^^xsd:date | "2"^^xsd:integer |     | 
| turbo:6e200ca0d5150282787464a2bda55814 |      |     | "2"^^xsd:integer | 
| turbo:b6be95364ec943af2ef4ab161c11c855 |      |     | "3"^^xsd:integer | 
+----------------------------------------+------------------------+------------------+------------------+

출처

2017-09-07 Mark Miller

내 답변이 업데이트되었습니다. Blazegraph를 사용하는 경우 이름이 지정된 하위 쿼리를 사용할 수 있습니다. –

제공, 당신은 단지 (@AKSW 아래의 코멘트에 지적 또는 그냥이 UNION을 제거 할 수 있습니다) 쿼리에 .와 UNION를 교체해야합니다.이런 방식으로 쿼리를 변경, 따라서

Variable ?datecount is already used in a previous projection. Bindings are not propagated through projections since Sesame 2.8, so this may lead to logical errors in the query.

GraphDB 그러나, 당신이 오류를받을 것이다 Blazegraph에서

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX turbo: <http://example.org/ontologies/> 

SELECT ?part ?xsddate ?datecount_ ?countmax 
WHERE 
    { { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount_) 
     WHERE 
      { ?part rdf:type    turbo:StudyPartWithBBDonation ; 
        turbo:hasBirthDateO ?dob . 
      ?dob turbo:hasDateValue ?xsddate 
      } 
     GROUP BY ?part ?xsddate 
     } 
     . 
     { SELECT ?part (MAX(?datecount) AS ?countmax) 
     WHERE 
      { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) 
      WHERE 
       { ?part rdf:type    turbo:StudyPartWithBBDonation ; 
         turbo:hasBirthDateO ?dob . 
       ?dob turbo:hasDateValue ?xsddate 
       } 
      GROUP BY ?part ?xsddate 
      } 
     GROUP BY ?part 
     } 
    }

을, 당신은 named subqueries 사용할 수 있습니다

를

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX turbo: <http://example.org/ontologies/> 

SELECT ?part ?xsddate ?datecount ?countmax 

WITH 
    { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) 
     WHERE 
     { ?part rdf:type    turbo:StudyPartWithBBDonation ; 
       turbo:hasBirthDateO ?dob . 
      ?dob turbo:hasDateValue ?xsddate 
     } 
     GROUP BY ?part ?xsddate 
    } AS %sub 

WHERE 
    { { SELECT ?part (MAX(?datecount) AS ?countmax) 
     WHERE { INCLUDE %sub } GROUP BY ?part 
    } 
     INCLUDE %sub 
    }

출처

2017-09-07 16:59:24

감사합니다! 나는 triplestore 내의 명명 된 그래프에 결론을 다시 전달할 수있는 방법을 보여주기 위해 내 자신의 대답을 만들었습니다. 우리는 SPARQL 외부에서 이것을 끝내기도하지만, 지금 우리는이 옵션을 가지고있어서 정말 기쁩니다. –

왜 여기에 점이 필요합니까? 이것은 - 내가 아는 한, - 두 개의 'GroupGraphPattern'사이의 기본 연산이 조인 연산이므로 불필요하며 필요하지 않습니다. 건배. – AKSW

@AKSW, 맞습니다.이 점은 불필요합니다. 그러나'.'의 비공식적 인 "합류"의미론이 여기에 관련되어 보입니다. 이것이 내 실수의 이유였습니다. –

남 Stanislav의 굉장한 않음

에 Y 정교화는
가 triplestore

내에 명명 된 그래프에 합의 DOB 삽입 필터

을 첨가 {} 패턴 중 하나에 ?datecount 개명.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX turbo: <http://example.org/ontologies/> INSERT { GRAPH turbo:DOB_conclusions { ?part turbo:hasBirthDateO ?DOBconc . ?DOBconc turbo:hasDateValue ?xsddate . ?DOBconc turbo:conclusionated true . ?DOBconc rdf:type <http://www.ebi.ac.uk/efo/EFO_0004950> . } } WHERE { { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } . { SELECT ?part (MAX(?datecount2) AS ?countmax) WHERE { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount2) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } GROUP BY ?part } FILTER (?datecount = ?countmax) BIND(uri(concat("http://transformunify.org/ontologies/", struuid())) AS ?DOBconc) }

출처

2017-09-07 17:05:19

그룹별로 가장 많이 발생하는 선택 값

답변

관련 문제