2017-02-09 2 views
3

나는 rdd를 새로 시작했으며, 키로 그룹화하여 집계 계산에 스파크 셔플 링 연산을 사용하려고합니다. 처음에는 접근 방식이 rdd.groupby() 이었지만 수렴 시간이 길어지고 메모리가 비효율적 이었지만이 작업은 셔플의 측면에서 보면 꽤 비쌉니다. 다른 작업 rdd.combinebykey()을 발견했지만 이해하고 사용하면서 문제에 직면하고 있습니다.python에서 combinebykey spark rdd (pyspark)를 사용하여 그룹에 집계 계산

이 내가 시리즈 reducebykey 작업을 적용했습니다하는 addition on attribute key-3 말 RDD에 저장된 내 데이터가 집계 작업에 더욱 attribute key-6에서 을 customerrdd하는 내가 GROUPBY를 (적용

[(u'1', u'Customer#000000001', u'IVhzIApeRb ot,c,E', u'15', u'25-989-741-2988', u'711.56', u'BUILDING', u'to the even, regular platelets. regular, ironic epitaphs nag e', u''), (u'2', u'Customer#000000002', u'XSTf4,NCwDVaWNe6tEgvwfmRchLXak', u'13', u'23-768-687-3665', u'121.65', u'AUTOMOBILE', u'l accounts. blithely ironic theodolites integrate boldly: caref', u''), (u'3', u'Customer#000000003', u'MG9kdTD2WBHm', u'1', u'11-719-748-3364', u'7498.12', u'AUTOMOBILE', u' deposits eat slyly ironic, even instructions. express foxes detect slyly. blithely even accounts abov', u''), (u'4', u'Customer#000000004', u'XxVSJsLAGtn', u'4', u'14-128-190-5944', u'2866.83', u'MACHINERY', u' requests. final, regular ideas sleep final accou', u''), (u'5', u'Customer#000000005', u'KvpyuHCplrB84WgAiGV6sYpZq7Tj', u'3', u'13-750-942-6364', u'794.47', u'HOUSEHOLD', u'n accounts will have to unwind. foxes cajole accor', u''), (u'6', u'Customer#000000006', u'sKZz0CsnMD7mp4Xd0YrBvx,LREYKUWAh yVn', u'20', u'30-114-968-4951', u'7638.57', u'AUTOMOBILE', u'tions. even deposits boost according to the slyly bold packages. final accounts cajole requests. furious', u''), (u'7', u'Customer#000000007', u'TcGe5gaZNgVePxU5kRrvXBfkasDTea', u'18', u'28-190-982-9759', u'9561.95', u'AUTOMOBILE', u'ainst the ironic, express theodolites. express, even pinto beans among the exp', u''), (u'8', u'Customer#000000008', u'I0B10bB0AymmC, 0PrRYBCP1yGJ8xcBPmWhl5', u'17', u'27-147-574-9335', u'6819.74', u'BUILDING', u'among the slyly regular theodolites kindle blithely courts. carefully even theodolites haggle slyly along the ide', u''), (u'9', u'Customer#000000009', u'xKiAFTjUsCuxfeleNqefumTrjS', u'8', u'18-338-906-3675', u'8324.07', u'FURNITURE', u'r theodolites according to the requests wake thinly excuses: pending requests haggle furiousl', u''), (u'10', u'Customer#000000010', u'6LrEaV6KR6PLVcgl2ArL Q3rqzLzcT1 v2', u'5', u'15-741-346-9870', u'2753.54', u'HOUSEHOLD', u'es regular deposits haggle. fur', u''), (u'11', u'Customer#000000011', u'PkWS 3HlXqwTuzrKg633BEi', u'23', u'33-464-151-3439', u'-272.60', u'BUILDING', u'ckages. requests sleep slyly. quickly even pinto beans promise above the slyly regular pinto beans. ', u''), (u'12', u'Customer#000000012', u'9PWKuhzT4Zr1Q', u'13', u'23-791-276-1263', u'3396.49', u'HOUSEHOLD', u' to the carefully final braids. blithely regular requests nag. ironic theodolites boost quickly along', u''), (u'13', u'Customer#000000013', u'nsXQu0oVjD7PM659uC3SRSp', u'3', u'13-761-547-5974', u'3857.34', u'BUILDING', u'ounts sleep carefully after the close frays. carefully bold notornis use ironic requests. blithely', u''), (u'14', u'Customer#000000014', u'KXkletMlL2JQEA ', u'1', u'11-845-129-3851', u'5266.30', u'FURNITURE', u', ironic packages across the unus', u''), (u'15', u'Customer#000000015', u'YtWggXoOLdwdo7b0y,BZaGUQMLJMX1Y,EC,6Dn', u'23', u'33-687-542-7601', u'2788.52', u'HOUSEHOLD', u' platelets. regular deposits detect asymptotes. blithely unusual packages nag slyly at the fluf', u''), (u'16', u'Customer#000000016', u'cYiaeMLZSMAOQ2 d0W,', u'10', u'20-781-609-3107', u'4681.03', u'FURNITURE', u'kly silent courts. thinly regular theodolites sleep fluffily after ', u''), (u'17', u'Customer#000000017', u'izrh 6jdqtp2eqdtbkswDD8SG4SzXruMfIXyR7', u'2', u'12-970-682-3487', u'6.34', u'AUTOMOBILE', u'packages wake! blithely even pint', u''), (u'18', u'Customer#000000018', u'3txGO AiuFux3zT0Z9NYaFRnZt', u'6', u'16-155-215-1315', u'5494.43', u'BUILDING', u's sleep. carefully even instructions nag furiously alongside of t', u''), (u'19', u'Customer#000000019', u'uc,3bHIx84H,wdrmLOjVsiqXCq2tr', u'18', u'28-396-526-5053', u'8914.71', u'HOUSEHOLD', u' nag. furiously careful packages are slyly at the accounts. furiously regular in', u''), (u'20', u'Customer#000000020', u'JrPk8Pqplj4Ne', u'22', u'32-957-234-8742', u'7603.40', u'FURNITURE', u'g alongside of the special excuses-- fluffily enticing packages wake ', u''), (u'21', u'Customer#000000021', u'XYmVpr9yAHDEn', u'8', u'18-902-614-8344', u'1428.25', u'MACHINERY', u' quickly final accounts integrate blithely furiously u', u''), (u'22', u'Customer#000000022', u'QI6p41,FNs5k7RZoCCVPUTkUdYpB', u'3', u'13-806-545-9701', u'591.98', u'MACHINERY', u's nod furiously above the furiously ironic ideas. ', u''), (u'23', u'Customer#000000023', u'OdY W13N7Be3OC5MpgfmcYss0Wn6TKT', u'3', u'13-312-472-8245', u'3332.02', u'HOUSEHOLD', u'deposits. special deposits cajole slyly. fluffily special deposits about the furiously ', u''), (u'24', u'Customer#000000024', u'HXAFgIAyjxtdqwimt13Y3OZO 4xeLe7U8PqG', u'13', u'23-127-851-8031', u'9255.67', u'MACHINERY', u'into beans. fluffily final ideas haggle fluffily', u''), (u'25', u'Customer#000000025', u'Hp8GyFQgGHFYSilH5tBfe', u'12', u'22-603-468-3533', u'7133.70', u'FURNITURE', u'y. accounts sleep ruthlessly according to the regular theodolites. unusual instructions sleep. ironic, final', u''), (u'26', u'Customer#000000026', u'8ljrc5ZeMl7UciP', u'22', u'32-363-455-4837', u'5182.05', u'AUTOMOBILE', u'c requests use furiously ironic requests. slyly ironic dependencies us', u''), (u'27', u'Customer#000000027', u'IS8GIyxpBrLpMT0u7', u'3', u'13-137-193-2709', u'5679.84', u'BUILDING', u' about the carefully ironic pinto beans. accoun', u''), (u'28', u'Customer#000000028', u'iVyg0daQ,Tha8x2WPWA9m2529m', u'8', u'18-774-241-1462', u'1007.18', u'FURNITURE', u' along the regular deposits. furiously final pac', u''), (u'29', u'Customer#000000029', u'sJ5adtfyAkCK63df2,vF25zyQMVYE34uh', u'0', u'10-773-203-7342', u'7618.27', u'FURNITURE', u'its after the carefully final platelets x-ray against ', u''), (u'30', u'Customer#000000030', u'nJDsELGAavU63Jl0c5NKsKfL8rIJQQkQnYL2QJY', u'1', u'11-764-165-5076', u'9321.01', u'BUILDING', u'lithely final requests. furiously unusual account', u''), (u'31', u'Customer#000000031', u'LUACbO0viaAv6eXOAebryDB xjVst', u'23', u'33-197-837-7094', u'5236.89', u'HOUSEHOLD', u's use among the blithely pending depo', u''), (u'32', u'Customer#000000032', u'jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J', u'15', u'25-430-914-2194', u'3471.53', u'BUILDING', u'cial ideas. final, furious requests across the e', u''), (u'33', u'Customer#000000033', u'qFSlMuLucBmx9xnn5ib2csWUweg D', u'17', u'27-375-391-1280', u'-78.56', u'AUTOMOBILE', u's. slyly regular accounts are furiously. carefully pending requests', u''), (u'34', u'Customer#000000034', u'Q6G9wZ6dnczmtOx509xgE,M2KV', u'15', u'25-344-968-5422', u'8589.70', u'HOUSEHOLD', u'nder against the even, pending accounts. even', u''), (u'35', u'Customer#000000035', u'TEjWGE4nBzJL2', u'17', u'27-566-888-7431', u'1228.24', u'HOUSEHOLD', u'requests. special, express requests nag slyly furiousl', u''), (u'36', u'Customer#000000036', u'3TvCzjuPzpJ0,DdJ8kW5U', u'21', u'31-704-669-5769', u'4987.27', u'BUILDING', u'haggle. enticing, quiet platelets grow quickly bold sheaves. carefully regular acc', u''), (u'37', u'Customer#000000037', u'7EV4Pwh,3SboctTWt', u'8', u'18-385-235-7162', u'-917.75', u'FURNITURE', u'ilent packages are carefully among the deposits. furiousl', u''), (u'38', u'Customer#000000038', u'a5Ee5e9568R8RLP 2ap7', u'12', u'22-306-880-7212', u'6345.11', u'HOUSEHOLD', u'lar excuses. closely even asymptotes cajole blithely excuses. carefully silent pinto beans sleep carefully fin', u''), (u'39', u'Customer#000000039', u'nnbRg,Pvy33dfkorYE FdeZ60', u'2', u'12-387-467-6509', u'6264.31', u'AUTOMOBILE', u'tions. slyly silent excuses slee', u''), (u'40', u'Customer#000000040', u'gOnGWAyhSV1ofv', u'3', u'13-652-915-8939', u'1335.30', u'BUILDING', u'rges impress after the slyly ironic courts. foxes are. blithely ', u''), (u'41', u'Customer#000000041', u'IM9mzmyoxeBmvNw8lA7G3Ydska2nkZF', u'10', u'20-917-711-4011', u'270.95', u'HOUSEHOLD', u'ly regular accounts hang bold, silent packages. unusual foxes haggle slyly above the special, final depo', u''), (u'42', u'Customer#000000042', u'ziSrvyyBke', u'5', u'15-416-330-4175', u'8727.01', u'BUILDING', u'ssly according to the pinto beans: carefully special requests across the even, pending accounts wake special', u''), (u'43', u'Customer#000000043', u'ouSbjHk8lh5fKX3zGso3ZSIj9Aa3PoaFd', u'19', u'29-316-665-2897', u'9904.28', u'MACHINERY', u'ial requests: carefully pending foxes detect quickly. carefully final courts cajole quickly. carefully', u''), (u'44', u'Customer#000000044', u'Oi,dOSPwDu4jo4x,,P85E0dmhZGvNtBwi', u'16', u'26-190-260-5375', u'7315.94', u'AUTOMOBILE', u'r requests around the unusual, bold a', u''), (u'45', u'Customer#000000045', u'4v3OcpFgoOmMG,CbnF,4mdC', u'9', u'19-715-298-9917', u'9983.38', u'AUTOMOBILE', u'nto beans haggle slyly alongside of t', u''), (u'46', u'Customer#000000046', u'eaTXWWm10L9', u'6', u'16-357-681-2007', u'5744.59', u'AUTOMOBILE', u'ctions. accounts sleep furiously even requests. regular, regular accounts cajole blithely around the final pa', u''), (u'47', u'Customer#000000047', u'b0UgocSqEW5 gdVbhNT', u'2', u'12-427-271-9466', u'274.58', u'BUILDING', u'ions. express, ironic instructions sleep furiously ironic ideas. furi', u''), (u'48', u'Customer#000000048', u'0UU iPhBupFvemNB', u'0', u'10-508-348-5882', u'3792.50', u'BUILDING', u're fluffily pending foxes. pending, bold platelets sleep slyly. even platelets cajo', u''), (u'49', u'Customer#000000049', u'cNgAeX7Fqrdf7HQN9EwjUa4nxT,68L FKAxzl', u'10', u'20-908-631-4424', u'4573.94', u'FURNITURE', u'nusual foxes! fluffily pending packages maintain to the regular ', u''), (u'50', u'Customer#000000050', u'9SzDYlkzxByyJ1QeTI o', u'6', u'16-658-112-3221', u'4266.13', u'MACHINERY', u'ts. furiously ironic accounts cajole furiously slyly ironic dinos.', u''), (u'51', u'Customer#000000051', u'uR,wEaiTvo4', u'12', u'22-344-885-4251', u'855.87', u'FURNITURE', u'eposits. furiously regular requests integrate carefully packages. furious', u''), (u'52', u'Customer#000000052', u'7 QOqGqqSy9jfV51BC71jcHJSD0', u'11', u'21-186-284-5998', u'5630.28', u'HOUSEHOLD', u'ic platelets use evenly even accounts. stealthy theodolites cajole furiou', u''), (u'53', u'Customer#000000053', u'HnaxHzTfFTZs8MuCpJyTbZ47Cm4wFOOgib', u'15', u'25-168-852-5363', u'4113.64', u'HOUSEHOLD', u'ar accounts are. even foxes are blithely. fluffily pending deposits boost', u''), (u'54', u'Customer#000000054', u',k4vf 5vECGWFy,hosTE,', u'4', u'14-776-370-4745', u'868.90', u'AUTOMOBILE', u'sual, silent accounts. furiously express accounts cajole special deposits. final, final accounts use furi', u''), (u'55', u'Customer#000000055', u'zIRBR4KNEl HzaiV3a i9n6elrxzDEh8r8pDom', u'10', u'20-180-440-8525', u'4572.11', u'MACHINERY', u'ully unusual packages wake bravely bold packages. unusual requests boost deposits! blithely ironic packages ab', u''), (u'56', u'Customer#000000056', u'BJYZYJQk4yD5B', u'10', u'20-895-685-6920', u'6530.86', u'FURNITURE', u'. notornis wake carefully. carefully fluffy requests are furiously even accounts. slyly expre', u''), (u'57', u'Customer#000000057', u'97XYbsuOPRXPWU', u'21', u'31-835-306-1650', u'4151.93', u'AUTOMOBILE', u'ove the carefully special packages. even, unusual deposits sleep slyly pend', u''), (u'58', u'Customer#000000058', u'g9ap7Dk1Sv9fcXEWjpMYpBZIRUohi T', u'13', u'23-244-493-2508', u'6478.46', u'HOUSEHOLD', u'ideas. ironic ideas affix furiously express, final instructions. regular excuses use quickly e', u''), (u'59', u'Customer#000000059', u'zLOCP0wh92OtBihgspOGl4', u'1', u'11-355-584-3112', u'3458.60', u'MACHINERY', u'ously final packages haggle blithely after the express deposits. furiou', u''), (u'60', u'Customer#000000060', u'FyodhjwMChsZmUz7Jz0H', u'12', u'22-480-575-5866', u'2741.87', u'MACHINERY', u'latelets. blithely unusual courts boost furiously about the packages. blithely final instruct', u''), (u'61', u'Customer#000000061', u'9kndve4EAJxhg3veF BfXr7AqOsT39o gtqjaYE', u'17', u'27-626-559-8599', u'1536.24', u'FURNITURE', u'egular packages shall have to impress along the ', u''), (u'62', u'Customer#000000062', u'upJK2Dnw13,', u'7', u'17-361-978-7059', u'595.61', u'MACHINERY', u'kly special dolphins. pinto beans are slyly. quickly regular accounts are furiously a', u''), (u'63', u'Customer#000000063', u'IXRSpVWWZraKII', u'21', u'31-952-552-9584', u'9331.13', u'AUTOMOBILE', u'ithely even accounts detect slyly above the fluffily ir', u''), (u'64', u'Customer#000000064', u'MbCeGY20kaKK3oalJD,OT', u'3', u'13-558-731-7204', u'-646.64', u'BUILDING', u'structions after the quietly ironic theodolites cajole be', u''), (u'65', u'Customer#000000065', u'RGT yzQ0y4l0H90P783LG4U95bXQFDRXbWa1sl,X', u'23', u'33-733-623-5267', u'8795.16', u'AUTOMOBILE', u'y final foxes serve carefully. theodolites are carefully. pending i', u''), (u'66', u'Customer#000000066', u'XbsEqXH1ETbJYYtA1A', u'22', u'32-213-373-5094', u'242.77', u'HOUSEHOLD', u'le slyly accounts. carefully silent packages benea', u''), (u'67', u'Customer#000000067', u'rfG0cOgtr5W8 xILkwp9fpCS8', u'9', u'19-403-114-4356', u'8166.59', u'MACHINERY', u'indle furiously final, even theodo', u''), (u'68', u'Customer#000000068', u'o8AibcCRkXvQFh8hF,7o', u'12', u'22-918-832-2411', u'6853.37', u'HOUSEHOLD', u' pending pinto beans impress realms. final dependencies ', u''), (u'69', u'Customer#000000069', u'Ltx17nO9Wwhtdbe9QZVxNgP98V7xW97uvSH1prEw', u'9', u'19-225-978-5670', u'1709.28', u'HOUSEHOLD', u'thely final ideas around the quickly final dependencies affix carefully quickly final theodolites. final accounts c', u''), (u'70', u'Customer#000000070', u'mFowIuhnHjp2GjCiYYavkW kUwOjIaTCQ', u'22', u'32-828-107-2832', u'4867.52', u'FURNITURE', u'fter the special asymptotes. ideas after the unusual frets cajole quickly regular pinto be', u''), (u'71', u'Customer#000000071', u'TlGalgdXWBmMV,6agLyWYDyIz9MKzcY8gl,w6t1B', u'7', u'17-710-812-5403', u'-611.19', u'HOUSEHOLD', u'g courts across the regular, final pinto beans are blithely pending ac', u''), (u'72', u'Customer#000000072', u'putjlmskxE,zs,HqeIA9Wqu7dhgH5BVCwDwHHcf', u'2', u'12-759-144-9689', u'-362.86', u'FURNITURE', u'ithely final foxes sleep always quickly bold accounts. final wat', u''), (u'73', u'Customer#000000073', u'8IhIxreu4Ug6tt5mog4', u'0', u'10-473-439-3214', u'4288.50', u'BUILDING', u'usual, unusual packages sleep busily along the furiou', u''), (u'74', u'Customer#000000074', u'IkJHCA3ZThF7qL7VKcrU nRLl,kylf ', u'4', u'14-199-862-7209', u'2764.43', u'MACHINERY', u'onic accounts. blithely slow packages would haggle carefully. qui', u''), (u'75', u'Customer#000000075', u'Dh 6jZ,cwxWLKQfRKkiGrzv6pm', u'18', u'28-247-803-9025', u'6684.10', u'AUTOMOBILE', u' instructions cajole even, even deposits. finally bold deposits use above the even pains. slyl', u''), (u'76', u'Customer#000000076', u'm3sbCvjMOHyaOofH,e UkGPtqc4', u'0', u'10-349-718-3044', u'5745.33', u'FURNITURE', u'pecial deposits. ironic ideas boost blithely according to the closely ironic theodolites! furiously final deposits n', u''), (u'77', u'Customer#000000077', u'4tAE5KdMFGD4byHtXF92vx', u'17', u'27-269-357-4674', u'1738.87', u'BUILDING', u'uffily silent requests. carefully ironic asymptotes among the ironic hockey players are carefully bli', u''), (u'78', u'Customer#000000078', u'HBOta,ZNqpg3U2cSL0kbrftkPwzX', u'9', u'19-960-700-9191', u'7136.97', u'FURNITURE', u'ests. blithely bold pinto beans h', u''), (u'79', u'Customer#000000079', u'n5hH2ftkVRwW8idtD,BmM2', u'15', u'25-147-850-4166', u'5121.28', u'MACHINERY', u'es. packages haggle furiously. regular, special requests poach after the quickly express ideas. blithely pending re', u''), (u'80', u'Customer#000000080', u'K,vtXp8qYB ', u'0', u'10-267-172-7101', u'7383.53', u'FURNITURE', u'tect among the dependencies. bold accounts engage closely even pinto beans. ca', u''), (u'81', u'Customer#000000081', u'SH6lPA7JiiNC6dNTrR', u'20', u'30-165-277-3269', u'2023.71', u'BUILDING', u'r packages. fluffily ironic requests cajole fluffily. ironically regular theodolit', u''), (u'82', u'Customer#000000082', u'zhG3EZbap4c992Gj3bK,3Ne,Xn', u'18', u'28-159-442-5305', u'9468.34', u'AUTOMOBILE', u's wake. bravely regular accounts are furiously. regula', u''), (u'83', u'Customer#000000083', u'HnhTNB5xpnSF20JBH4Ycs6psVnkC3RDf', u'22', u'32-817-154-4122', u'6463.51', u'BUILDING', u'ccording to the quickly bold warhorses. final, regular foxes integrate carefully. bold packages nag blithely ev', u''), (u'84', u'Customer#000000084', u'lpXz6Fwr9945rnbtMc8PlueilS1WmASr CB', u'11', u'21-546-818-3802', u'5174.71', u'FURNITURE', u'ly blithe foxes. special asymptotes haggle blithely against the furiously regular depo', u''), (u'85', u'Customer#000000085', u'siRerlDwiolhYR 8FgksoezycLj', u'5', u'15-745-585-8219', u'3386.64', u'FURNITURE', u'ronic ideas use above the slowly pendin', u''), (u'86', u'Customer#000000086', u'US6EGGHXbTTXPL9SBsxQJsuvy', u'0', u'10-677-951-2353', u'3306.32', u'HOUSEHOLD', u'quests. pending dugouts are carefully aroun', u''), (u'87', u'Customer#000000087', u'hgGhHVSWQl 6jZ6Ev', u'23', u'33-869-884-7053', u'6327.54', u'FURNITURE', u'hely ironic requests integrate according to the ironic accounts. slyly regular pla', u''), (u'88', u'Customer#000000088', u'wtkjBN9eyrFuENSMmMFlJ3e7jE5KXcg', u'16', u'26-516-273-2566', u'8031.44', u'AUTOMOBILE', u's are quickly above the quickly ironic instructions; even requests about the carefully final deposi', u''), (u'89', u'Customer#000000089', u'dtR, y9JQWUO6FoJExyp8whOU', u'14', u'24-394-451-5404', u'1530.76', u'FURNITURE', u'counts are slyly beyond the slyly final accounts. quickly final ideas wake. r', u''), (u'90', u'Customer#000000090', u'QxCzH7VxxYUWwfL7', u'16', u'26-603-491-1238', u'7354.23', u'BUILDING', u'sly across the furiously even ', u''), (u'91', u'Customer#000000091', u'S8OMYFrpHwoNHaGBeuS6E 6zhHGZiprw1b7 q', u'8', u'18-239-400-3677', u'4643.14', u'AUTOMOBILE', u'onic accounts. fluffily silent pinto beans boost blithely according to the fluffily exp', u''), (u'92', u'Customer#000000092', u'obP PULk2LH LqNF,K9hcbNqnLAkJVsl5xqSrY,', u'2', u'12-446-416-8471', u'1182.91', u'MACHINERY', u'. pinto beans hang slyly final deposits. ac', u''), (u'93', u'Customer#000000093', u'EHXBr2QGdh', u'7', u'17-359-388-5266', u'2182.52', u'MACHINERY', u'press deposits. carefully regular platelets r', u''), (u'94', u'Customer#000000094', u'IfVNIN9KtkScJ9dUjK3Pg5gY1aFeaXewwf', u'9', u'19-953-499-8833', u'5500.11', u'HOUSEHOLD', u'latelets across the bold, final requests sleep according to the fluffily bold accounts. unusual deposits amon', u''), (u'95', u'Customer#000000095', u'EU0xvmWvOmUUn5J,2z85DQyG7QCJ9Xq7', u'15', u'25-923-255-2929', u'5327.38', u'MACHINERY', u'ithely. ruthlessly final requests wake slyly alongside of the furiously silent pinto beans. even the', u''), (u'96', u'Customer#000000096', u'vWLOrmXhRR', u'8', u'18-422-845-1202', u'6323.92', u'AUTOMOBILE', u'press requests believe furiously. carefully final instructions snooze carefully. ', u''), (u'97', u'Customer#000000097', u'OApyejbhJG,0Iw3j rd1M', u'17', u'27-588-919-5638', u'2164.48', u'AUTOMOBILE', u'haggle slyly. bold, special ideas are blithely above the thinly bold theo', u''), (u'98', u'Customer#000000098', u'7yiheXNSpuEAwbswDW', u'12', u'22-885-845-6889', u'-551.37', u'BUILDING', u'ages. furiously pending accounts are quickly carefully final foxes: busily pe', u''), (u'99', u'Customer#000000099', u'szsrOiPtCHVS97Lt', u'15', u'25-515-237-9232', u'4088.65', u'HOUSEHOLD', u'cajole slyly about the regular theodolites! furiously bold requests nag along the pending, regular packages. somas', u''), (u'100', u'Customer#000000100', u'fptUABXcmkC5Wx', u'20', u'30-749-445-4907', u'9889.89', u'FURNITURE', u'was furiously fluffily quiet deposits. silent, pending requests boost against ', u'')] 

"customerrdd")로 전화입니다

def func(x): 
    return x 


def stringconverfunc(z): 
    return str(z) 


def floatconverfunc(l): 
    return float(l) 

def aggonvalfunc(y): 
    return y[3] 


grouprdd=customerrdd.groupBy(lambda w:(w[6])) 


result=grouprdd.flatMapValues(lambda q: func(q)).mapValues(lambda p: aggonvalfunc(p)) \ 
     .mapValues(lambda line: stringconverfunc(line)).mapValues(lambda line: line.strip()) \ 
     .mapValues(lambda line: floatconverfunc(line)).reduceByKey(lambda x, y: x + y).collect() 
print result 

OUTPUT :

flatmap 및 mapling 값으로, 여기에 대한 코드입니다

그러나 위의 방법은 셔플 (shuffling)면에서 상당히 비싸며 더 큰 데이터 세트에서는 작동하지 않습니다. 따라서 더 빨리 계산하기 위해 rdd.combinebykey과 동일한 개념을 구현하고 더 큰 데이터 세트에 사용할 수 있습니다. 나는 combinebykey을 참조하여이를 구현하려고했지만 키와 값을 제공하는 방법을 혼란스럽게 가져오고 집계를 수행해야합니다. 누구든지 도와 줄 수 있습니까? 나는 제안을하고 싶다.

답변

0

좋아, 초심자에게는이 모든 것을 알기가 어렵다. 그래서 나는 당신에게 몇 가지 조언을 주려고 노력할 것이다.

그룹화하지 않고 키를 할당 할 수 있습니다. keyBy으로 수행 할 수 있으며 셔플이 필요하지 않습니다. 결국 키 - 값 rdd는 첫 번째 항목이 키이고 두 번째 항목이 값인 크기 2의 튜플로 구성된 rdd에 불과합니다.
reduceByKey 또는 combineByKey에서 얻을 수있는 성능 향상은 피할 수있는 그룹화를 미리 수행하면 쓸모 없게됩니다.

또한 float은 공백 문자와 공백 문자가있는 문자열로 호출 할 수 있습니다. 자동으로 문자열을 제거합니다. lambda x: f(x) 형식의 람다를 만들 필요가 없습니다. f을 중괄호없이 직접 사용하면 동일한 효과를 얻을 수 있습니다. 같은 이유로 다른 기능으로 str 또는 float을 감쌀 필요가 없습니다.
operator 모듈은 값을 추가하고 검색하는 기능을 제공하므로이 값을 정의 할 필요가 없습니다. 더 자세한 정보는 파이썬 문서를보십시오.

내 솔루션은 다음과 같습니다 당신보다시피 다른 결과를

[(u'BUILDING', 204.0), 
(u'AUTOMOBILE', 280.0), 
(u'MACHINERY', 135.0), 
(u'HOUSEHOLD', 255.0), 
(u'FURNITURE', 224.0)] 

을 반환 의견

from operator import itemgetter, add 

result = customerrdd.keyBy(itemgetter(6))\ 
    .mapValues(itemgetter(3))\ 
    .mapValues(float)\ 
    .reduceByKey(add).collect() 

없이

from operator import itemgetter, add 

# `itemgetter(6)` is equivalent to `lambda x: x[6]`. Therefore we'll use element at 
# index 6 to key the rdd's entries. 
# This operation is equivalent to `customerrdd.map(lambda x: (x[6], x))` 
rdd = customerrdd.keyBy(itemgetter(6)) 

# Now extract element at index 3 from the values so we no longer have a tuple 
rdd = rdd.mapValues(itemgetter(3)) 

# Convert those elements to floats 
rdd = rdd.mapValues(float) 

# We could've done the previous steps in one by doing 
# rdd = customerrdd.map(lambda x: (x[6], float(x[3])) 

# Sum them up and collect the result 
result = rdd.reduceByKey(add).collect() 

,하지만 난 당신의 코드를 실행하고 동일한있어 . 그래서 나는 당신이 당신의 결과에 대해 다른 rdd를 가졌다 고 생각합니다.

+0

안녕하세요 @ swenzel, 먼저 답장을 보내 주셔서 감사합니다.하지만 셔플의 오버 헤드를 줄이기 위해 먼저'combinebykey' 연산 또는 이와 유사한 연산자를 구현하고 싶습니다. 그러나'reducebykey'는 계산 중에 높은 셔플 오버 헤드를 필요로합니다. 'groupby()''groupbykey()'reducebykey()'작업을 피하는 이유는 성능에 영향을 미치고 작업이 생성하는 메모리 복잡성으로 인해 더 큰 데이터 세트에 실패 할 때가 있기 때문입니다. 따라서'combinebykey'와 같은 연산의 관점에서 솔루션을 찾으십시오. –

+0

@ShafaatHussain 저는'reduceByKey'가 간단한 summation을위한 완전한 과잉 인 combineByKey보다 느리지 않을 것이라고 확신합니다. 병목 현상은'groupBy' 작업입니다. 내가 제공 한 코드를 사용해 보셨습니까? – swenzel

+0

다시 한 번 감사드립니다. 그러나 단지 합계가 필요하지 않습니다. 다른 집계 연산을 위해 확장 할 것입니다. 예 해결책을 제공해 드리겠습니다. :) –