2008-09-29 8 views
2

rowID, 경도, 위도, businessName, url, caption이있는 테이블이 있습니다. 어떻게 중복을 모두 삭제 않지만, 단지 URL (우선)을 가지는 하나를 유지하거나 다른이없는 경우 캡션이있는 한 계속중복으로 중복 된 정보 제거

rowID | long | lat | businessName | url | caption 

    1  20  -20  Pizza Hut yum.com null 

:처럼이 보일 수 있습니다 URL (두 번째 우선 순위)을 삭제하고 나머지는 삭제 하시겠습니까?

+0

중복 된 값은 businessname을 기준으로합니까? –

+0

추측 중복은 long + lat + businessName입니까? –

+0

중복은 long + lat + businessName을 기준으로합니다. 이상적으로는 마지막에 시나리오에 가장 적합한 하나의 long + lat + businessName 만 제공하는 것이 이상적입니다. – RyanKeeter

답변

3

내 루핑 기술은 다음과 같습니다. 이것은 아마도 주류가되지 않기로 뽑힐 것이고, 나는 그걸로 멋지다.

DECLARE @LoopVar int 

DECLARE 
    @long int, 
    @lat int, 
    @businessname varchar(30), 
    @winner int 

SET @LoopVar = (SELECT MIN(rowID) FROM Locations) 

WHILE @LoopVar is not null 
BEGIN 
    --initialize the variables. 
    SELECT 
    @long = null, 
    @lat = null, 
    @businessname = null, 
    @winner = null 

    -- load data from the known good row. 
    SELECT 
    @long = long, 
    @lat = lat, 
    @businessname = businessname 
    FROM Locations 
    WHERE rowID = @LoopVar 

    --find the winning row with that data 
    SELECT top 1 @Winner = rowID 
    FROM Locations 
    WHERE @long = long 
    AND @lat = lat 
    AND @businessname = businessname 
    ORDER BY 
    CASE WHEN URL is not null THEN 1 ELSE 2 END, 
    CASE WHEN Caption is not null THEN 1 ELSE 2 END, 
    RowId 

    --delete any losers. 
    DELETE FROM Locations 
    WHERE @long = long 
    AND @lat = lat 
    AND @businessname = businessname 
    AND @winner != rowID 

    -- prep the next loop value. 
    SET @LoopVar = (SELECT MIN(rowID) FROM Locations WHERE @LoopVar < rowID) 
END 
+0

나는 비슷한 접근 방식을 사용합니다. 이 루핑 유형은 CURSOR보다 빠릅니다. 또한 서버의 CPU를 고정하지 않는 이점이 있습니다. 귀하의 질문에 링크 된 다른 게시물에 비슷한 코드를 넣습니다. –

+0

rowID가 char (11) 변수 인 경우 어떻게해야합니까? 그것은 기본 키이지만 문자열 (string)에 대해 min (foo)을 선택할 수 있습니까? – RyanKeeter

+0

일부 유형이 실제 기본 키가 되려면 테이블에 순서를 지정해야합니다. char (11)에 의한 순서는 아무런 문제가 없다. –

0

가능하다면 균질화 한 다음 중복을 제거 할 수 있습니까?

1 단계 :

UPDATE BusinessLocations 
SET BusinessLocations.url = LocationsWithUrl.url 
FROM BusinessLocations 
INNER JOIN (
    SELECT long, lat, businessName, url, caption 
    FROM BusinessLocations 
    WHERE url IS NOT NULL) LocationsWithUrl 
    ON BusinessLocations.long = LocationsWithUrl.long 
    AND BusinessLocations.lat = LocationsWithUrl.lat 
    AND BusinessLocations.businessName = LocationsWithUrl.businessName 

UPDATE BusinessLocations 
SET BusinessLocations.caption = LocationsWithCaption.caption 
FROM BusinessLocations 
INNER JOIN (
    SELECT long, lat, businessName, url, caption 
    FROM BusinessLocations 
    WHERE caption IS NOT NULL) LocationsWithCaption 
    ON BusinessLocations.long = LocationsWithCaption.long 
    AND BusinessLocations.lat = LocationsWithCaption.lat 
    AND BusinessLocations.businessName = LocationsWithCaption.businessName 

2 단계 : 중복을 제거합니다.

1

설정 기반 솔루션 :

delete from T as t1 
where /* delete if there is a "better" row 
     with same long, lat and businessName */ 
    exists(
    select * from T as t2 where 
     t1.rowID <> t2.rowID 
     and t1.long = t2.long 
     and t1.lat = t2.lat 
     and t1.businessName = t2.businessName 
     and 
     case when t1.url is null then 0 else 4 end 
      /* 4 points for non-null url */ 
     + case when t1.businessName is null then 0 else 2 end 
      /* 2 points for non-null businessName */ 
     + case when t1.rowID > t2.rowId then 0 else 1 end 
      /* 1 point for having smaller rowId */ 
     < 
     case when t2.url is null then 0 else 4 end 
     + case when t2.businessName is null then 0 else 2 end 
     ) 
이 솔루션은 지난 주에 "내가 스택 오버플로에 배운 것들"로를 데려
1
delete MyTable 
from MyTable 
left outer join (
     select min(rowID) as rowID, long, lat, businessName 
     from MyTable 
     where url is not null 
     group by long, lat, businessName 
    ) as HasUrl 
    on MyTable.long = HasUrl.long 
    and MyTable.lat = HasUrl.lat 
    and MyTable.businessName = HasUrl.businessName 
left outer join (
     select min(rowID) as rowID, long, lat, businessName 
     from MyTable 
     where caption is not null 
     group by long, lat, businessName 
    ) HasCaption 
    on MyTable.long = HasCaption.long 
    and MyTable.lat = HasCaption.lat 
    and MyTable.businessName = HasCaption.businessName 
left outer join (
     select min(rowID) as rowID, long, lat, businessName 
     from MyTable 
     where url is null 
      and caption is null 
     group by long, lat, businessName 
    ) HasNone 
    on MyTable.long = HasNone.long 
    and MyTable.lat = HasNone.lat 
    and MyTable.businessName = HasNone.businessName 
where MyTable.rowID <> 
     coalesce(HasUrl.rowID, HasCaption.rowID, HasNone.rowID) 
4

:

DELETE restaurant 
WHERE rowID in 
(SELECT rowID 
    FROM restaurant 
    EXCEPT 
    SELECT rowID 
    FROM (
     SELECT rowID, Rank() over (Partition BY BusinessName, lat, long ORDER BY url DESC, caption DESC) AS Rank 
     FROM restaurant 
     ) rs WHERE Rank = 1) 

경고 : 나는 이것을 실제 데이터베이스에서 테스트하지 않았다.

1

다른 대답과 비슷하지만 행 번호 r에 따라 삭제하려고한다. 계급보다 낫다. 공통 테이블 표현식과도 혼합하십시오.


;WITH GroupedRows AS 
( SELECT rowID, Row_Number() OVER (Partition BY BusinessName, lat, long ORDER BY url DESC, caption DESC) rowNum 
    FROM restaurant 
) 
DELETE r 
FROM restaurant r 
JOIN GroupedRows gr ON r.rowID = gr.rowID 
WHERE gr.rowNum > 1