2017-01-14 1 views
2

GoogleAnalytics raw data의 BigQuery를 구현에서 나오는 데이터를 집계는 다음과 같습니다필터링 여러 customDimensions에 다음

|-visitId 
|- date 
|- (....) 
+- hits 
    |- time 
    |- page 
     |- pagePath 
    |- eventInfo 
     |- eventAction 
    +- customDimensions 
     |- index 
     |- value 

내가

+---------+---------+-------+-----------+---------------+ 
| user_id | country | split | page Hits | CTA event hit | 
+---------+---------+-------+-----------+---------------+ 
| 100  | US  | A  | 25000  | 500   | 
+---------+---------+-------+-----------+---------------+ 
| 100  | US  | B  | 8000  | 90   | 
+---------+---------+-------+-----------+---------------+ 
| 200  | ES  | A  | 400  | 2    | 
+---------+---------+-------+-----------+---------------+ 
처럼 반복 customDimensions에서 3 개 값을 잡기 위해 찾고 있어요

처음 세 개의 열은 hits.customDimensions.index 1,4,7로 정의됩니다.

page hit SUM은 얼마나 많은보기를 수행했는지 나타내며 CTA event hit은 페이지 자체의 단추를 클릭하면 시작되는 이벤트의 합계입니다. SQL의 단순성을 위해 hits.page.pagePath='tshirt'hits.eventInfo.eventAction='upsell'

같은 repated 필드에서 3 개의 customDimensions를 읽는 데 어려움이 있습니다. 그런 다음 동일한 세션에서 발생한 이벤트를 찾는 데 어려움이 있습니다. 아래 이미지에서 BQ 데이터 세트

익숙하지 않은 사람들을 위해

업데이트는 각 라인은 히트, 그리고 다수의 히트 곡은 같은 행에있을 수 있습니다. BigQuery에서 REPEATED 필드라고합니다. 이미지에서 3 개의 큰 줄을 보았습니다. 첫 번째 행에는 8 개의 히트가 있습니다. 이미지에 여러 개의 customDimensions가 포함되어 있지는 않지만 동일한 조회에 대해 여러 개의 CustomDimensions가있을 수 있습니다. BigQuery에서 샘플 DB 세트에 액세스하려면 read here이 무료입니다.

enter image description here

+0

은 샘플 데이터 또는 예상 한 결과입니까? –

+0

예상 결과 – Pentium10

+0

몇 가지 샘플 데이터를 표시 할 수 있습니까? –

답변

2

에 대답하기 전에, 내가 해결책을 마련하기위한 가이드로 사용되는 모의 데이터를 표시하고 싶습니다 희망이 도움이 될 것입니다 :

WITH mock_data AS(
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '000' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(1 as index, '000' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 

select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '100' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(4 as hitnumber, [STRUCT(1 as index, '100' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 

select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 

select '3' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '3' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '3' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '300' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '3' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(1 as index, '300' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 

select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 

select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits 
) 

나는 4 별개의 사용자가 방문을 시뮬레이션 한 BigQuery에서 찾을 수있는 동일한 스키마를 사용하는 웹 사이트 ga_sessions

일부 내 가정은 실제 데이터와 약간 다를 수 있습니다. 이 경우에 알려주십시오. 모의 데이터를 좀 더 정확한 답변을위한 가이드로 적용 할 수 있습니다. (실제로 이러한 모의 작업을 사용하여 프로덕션 환경에서 통합 테스트를 실행하여 도움이 될 수 있습니다.) hits.page.pagepath=tshirt

  • 는 그들은 항상 해고 될 때

    customDimensions
    1. 만 해고 : 내가 한

      가정은 (내가 틀렸다면 정정 해줘)이었다. 즉, "tshirt"페이지를 방문 할 때마다 관례가 시작됩니다.

    2. eventAction 클릭이 발생하면 customDimensions이 동시에 실행되지 않습니다. 즉, hitNumber에 이벤트가 발생하고 다른 이벤트에 맞춤 이벤트가 발생합니다.

    이 예상 결과를 줄 수 있습니다 결과

    select 
        user_id, 
        country, 
        _split, 
        sum(page_hits) page_hits, 
        sum(CTA_event_hit) CTA_event_hit 
    from(
    select 
        max(user_id) user_id, 
        max(country) country, 
        max(_split) _split, 
        max(page_hits) page_hits, 
        max(CTA_event_hit) CTA_event_hit 
    from(
    select 
        fv, 
        v, 
        user_id, 
        country, 
        _split, 
        count(case when user_id is not null then 1 end) page_hits, 
        sum(click_flag) CTA_event_hit 
    from(
    select 
        fullvisitorid fv, 
        visitid v, 
        (select custd.value from unnest(hits.customdimensions) custd where custd.index = 1) user_id, 
        (select custd.value from unnest(hits.customdimensions) custd where custd.index = 4) country, 
        (select custd.value from unnest(hits.customdimensions) custd where custd.index = 7) _split, 
        case when hits.eventinfo.eventcategory = 'specific_category' and hits.eventinfo.eventlabel = 'specific_label' and hits.eventinfo.eventaction = 'upsell' then 1 end click_flag 
    from mock_data, 
    unnest(hits) hits 
    where 1 = 1 
        and hits.page.pagepath = 'tshirt' 
    ) 
    group by fv, v, user_id, country, _split 
    ) 
    group by fv, v 
    having user_id is not null 
    ) 
    group by user_id, country, _split 
    

    : 기본적으로

    enter image description here

    를, (가) USER_ID 검색 할 몇 가지 subselect 쿼리, 국가가균열. 모든 세션 (visitid)에 대해 데이터는 MAX 연산자를 사용하여 집계되며 마지막으로 user_id, 국가 및 분할 수준에 최종 집계가 있습니다.

    데이터 세트의 다른 쿼리에서 mock_data을 해당 테이블 ga_session 원하는 테이블로 변경하면됩니다.

    문제가 해결 될지 모르지만 도움이 될지 확실하지 않습니다.

    마지막으로,이 데이터는 AB 테스트 또는 사이트의 다양한 변형에 대한 실적 분석을위한 설정 인 것으로 보입니다. 이 경우 사용자가 균열 값을 변경하도록 허용하지 않는 것이 좋습니다. 결과가 왜곡 될 수있는 데이터가 일부 파손될 수 있습니다.

  • +0

    거의 완벽한 결과입니다. 열 이름을 조정해야했습니다. 당신은'mock_data, unnest (히트) 히트'에서 논리가 무엇인지 설명 할 수 있습니까 – Pentium10

    +0

    다행 :)! 여기에'unnest' 연산을 사용하여 결과를 평탄화합니다 (https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening-arrays). 이것은 기본적으로 배열 내부의 값을 "여는"방법으로 행을 예로들 수 있습니다. '[{fullvisitorid : 1, visitid : 1, hit : [{hitNumber : 0, type : ' 'pages.id': 1, visitid : 1, hits.hitNumber : 0, hits.type : 'PAGE''fullvisitorid : 1, 방문 페이지 수 : 1, 방문 페이지 수 : 1, 유형 : '페이지' view, visitits : 1, hits.hitNumber : 1, hits.type : 'PAGE'.'당신이 볼 수 있듯이, 값을 반복하고 배열을 열어 –

    +0

    네스트는 무엇인지 알지만 쉼표 연산은 크로스 조인입니까? 왜 그게 필요한거야? – Pentium10

    1

    가 난 단지 사용자 정의 열 및 페이지를 계산하기 위해 여기에 솔루션을 제공, 내가 문제를 이해 있는지 확인하려면 CTA 이벤트 히트에 대한 메트릭 명중 있지만 (아직). 샘플 GA 테이블 및 표준 SQL을 사용하여,이 같은 것을 볼 수 있습니다 :

    SELECT 
        ARRAY(SELECT AS STRUCT c.product, c.color, 1 page_hits 
        FROM t.hits hit CROSS JOIN 
         UNNEST(ARRAY(
         SELECT DISTINCT AS STRUCT 
          if(dim.index = 1, dim.value, NULL) product, 
          if(dim.index = 2, dim.value, NULL) color 
         FROM hit.customDimensions dim 
         WHERE dim.index in (1,2))) c 
    ) 
    FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t 
    

    기본적으로, 내부 SELECT 우리가 별도의 열 (이 예에서는 제품 및 색상)으로 customDimensions.index를 변환 한 다음 외부 SELECT는 매 히트마다 page_hits를 1로 설정하여 계산하도록 준비합니다.

    +0

    덕분에이게 정말 도움이되었습니다 – Pentium10