問題
- 從圖數(shù)據(jù)庫中取出能構(gòu)成三角形的三個(gè)節(jié)點(diǎn)id叛复,并進(jìn)行去重执桌。
- 每一列都是三個(gè)節(jié)點(diǎn)的id,三個(gè)id都相同的話先巴,視為相同的三角形其爵。
- 例如,數(shù)據(jù)第一行與第二行就是相同的兩個(gè)三角形摩渺。
原始數(shù)據(jù)
# test.csv
0000068d96366d5e052ed65eaecb1112 74a995fb9b79e84232b7510644688cd8 dfb644784e6292b8d7f499fc53229dda
0000068d96366d5e052ed65eaecb1112 dfb644784e6292b8d7f499fc53229dda 74a995fb9b79e84232b7510644688cd8
0000068d96366d5e052ed65eaecb1112 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6
000007f62fb9ee58cec1391e55b9c200 836598c70ab46fb1c621fe548dd363cc a5545ef32843ecfa7038692a1dbbe305
000008dc6580ce3d3313e35417c0aa65 8c658523dbcf012383fb12aec76f220f b193cb6de781dbacc04ce2ccb96d43dc
00001d74b7b878b076ab2d84d5de4296 d2a5854ae3efa1ee4e7a4070e841f108 4c0fd736ef3845072bdd9a318a71a96f
000021d8344e102a857740ee319c40e9 c27c333b1586b239bb302b23c70ce274 9a29e26feb9d254a8f70c9ad94e8d1dc
000021d8344e102a857740ee319c40e9 9a29e26feb9d254a8f70c9ad94e8d1dc c27c333b1586b239bb302b23c70ce274
000021d8344e102a857740ee319c40e9 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6
0000419d8eec64dd2582336713a51ad3 714215e7fdf0f3f32073df73caecd44f 3255e40d3de83c0ad9ebd35569e601b4
0000434e444cab7991535ab1dc7e0a43 2ab9986c1d02386ca74155e55fcdb64c 16de98e5c7f9751fc6b66526ce875736
0000434e444cab7991535ab1dc7e0a43 16de98e5c7f9751fc6b66526ce875736 2ab9986c1d02386ca74155e55fcdb64c
00004e274353b7bd829ad789930a799c a76fdec4ace13644f6f50934c53fb484 18ee36edb4a3ed1c9d35cca2a01af37c
00004e274353b7bd829ad789930a799c 18ee36edb4a3ed1c9d35cca2a01af37c a76fdec4ace13644f6f50934c53fb484
00009475ecac55a5816427acd81a2873 d1c17f8ff05ee3243fc084ae45042891 73b357fc3e0f5e538521c9589762e113
00009f5e45eb2f2c37ed8e349ba63e76 c8fbf4f7114038db015eb8debf0bfba8 1f111ee13d222ab4f4d3d128af201aea
00009f5e45eb2f2c37ed8e349ba63e76 1f111ee13d222ab4f4d3d128af201aea c8fbf4f7114038db015eb8debf0bfba8
00009f5e45eb2f2c37ed8e349ba63e76 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6
0000b229f3216a8ca929c1a90292665e 5aaf57319e251c04fe8b3d4f1d687cad 2f1f7611449c9834dc8d1df8e745e7c1
0000f2546c76a8412c130228dca0ea8d b24401256e3bd6d30c32bbedb9a90956 24338e2d8656a53303f3dced1d6af82f
解決思路:三個(gè)相同的值,無論處理順序如何剂邮,都應(yīng)該生成相同的一個(gè)值摇幻。
比如:a+b+c = c+b+a,a^b^c = c^b^a
注:慶幸知識沒有完全還給老師挥萌。
方法一
將三個(gè)點(diǎn)的id值先編碼再異或得到一個(gè)值囚企,根據(jù)這個(gè)值進(jìn)行去重。
編碼異或之后的值瑞眼,不方便打印龙宏,再用base64再處理成可讀字符串。
from base64 import b64encode
vs = set()
with open('test.csv','r') as fr,open('out.csv','w') as fw:
for line in fr:
v0,v1,v2 = line.strip().split(',')
v = b64encode(bytes(i ^ j ^ k for i, j ,k in zip(v0.encode('utf-8'), v1.encode('utf-8'), v2.encode('utf-8')))).decode('utf-8')
if v not in vs:
fw.write(f'{v0},{v1},{v2},{v}\n')
vs.add(v)
else:
continue
最終結(jié)果
# output.csv
0000068d96366d5e052ed65eaecb1112 74a995fb9b79e84232b7510644688cd8 dfb644784e6292b8d7f499fc53229dda Y2IzPz03aT40MTI9am5jb2cwNmZoPmMwYGJnaDA2MWs=
0000068d96366d5e052ed65eaecb1112 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 Oz8yOm1mOjk8N2A+NDFiYjFuMj01NGZqNDc2ZTVnN2E=
000007f62fb9ee58cec1391e55b9c200 836598c70ab46fb1c621fe548dd363cc a5545ef32843ecfa7038692a1dbbe305 aTYzMTxqYzIwPzQ+NmAxaDdjYjhjZTYwPDVkaDAyY2Y=
000008dc6580ce3d3313e35417c0aa65 8c658523dbcf012383fb12aec76f220f b193cb6de781dbacc04ce2ccb96d43dc amI/NmtvYDQ3YGNnNzZgNGgwYzIxMzcyMDljMmdgYjA=
00001d74b7b878b076ab2d84d5de4296 d2a5854ae3efa1ee4e7a4070e841f108 4c0fd736ef3845072bdd9a318a71a96f YGFhY21mMGNiYjRmYjw3YjExMmc/NTw1OWxnZTM6P2g=
000021d8344e102a857740ee319c40e9 c27c333b1586b239bb302b23c70ce274 9a29e26feb9d254a8f70c9ad94e8d1dc amM1amQwYTxnYzU3YTc1OWIxMzdlazYyaTJsODUzNm4=
000021d8344e102a857740ee319c40e9 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 Oz8yOm9hZmU2NWdtM2VlZjluN29lMjZqZmNsZDBmY2o=
0000419d8eec64dd2582336713a51ad3 714215e7fdf0f3f32073df73caecd44f 3255e40d3de83c0ad9ebd35569e601b4 NDMxN2AwbDdtZWZrY2QyNmQ8amMzZjQxZGthYGVkMmE=
0000434e444cab7991535ab1dc7e0a43 2ab9986c1d02386ca74155e55fcdb64c 16de98e5c7f9751fc6b66526ce875736 M2c2bDQzZzNmZ2JoZW8wPDswYzQ2YTUyMmBsNmdgM2Y=
00004e274353b7bd829ad789930a799c a76fdec4ace13644f6f50934c53fb484 18ee36edb4a3ed1c9d35cca2a01af37c YD9jM2M2NGc3ZDExNGVnM2dgbGE3bWo/OzYyZjM+NjQ=
00009475ecac55a5816427acd81a2873 d1c17f8ff05ee3243fc084ae45042891 73b357fc3e0f5e538521c9589762e113 YzIxMjtlaTAwNmRgZWNmMjNiZzVpOjU+aTo3Z2UxPzE=
00009f5e45eb2f2c37ed8e349ba63e76 c8fbf4f7114038db015eb8debf0bfba8 1f111ee13d222ab4f4d3d128af201aea Ym5nY243NmM2YGNgMz80NWUyNDI+bGVpOmJjZGRmM28=
00009f5e45eb2f2c37ed8e349ba63e76 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 Oz8yOmQ2NzgxNDZqMDNlZDJsZTxpZ2A7bDA0MTczMWU=
0000b229f3216a8ca929c1a90292665e 5aaf57319e251c04fe8b3d4f1d687cad 2f1f7611449c9834dc8d1df8e745e7c1 NzdgMGAzMDlrYjlnPjo7Y2M/Mj9hMTNnZGE7P2RiNzA=
0000f2546c76a8412c130228dca0ea8d b24401256e3bd6d30c32bbedb9a90956 24338e2d8656a53303f3dced1d6af82f YDY3N25mNWU4MDFiZDtjMTIzZDI2MzI4Nz42aDNgPzQ=
想了一晚伤疙,還是覺得方法一太別扭银酗,應(yīng)該要把3個(gè)十六進(jìn)制數(shù)異或成一個(gè)十六進(jìn)制數(shù)徒像。
仔細(xì)再看了一下bytes的官方文檔黍特,看到 bytes.fromhex,hex 方法锯蛀,果然官方文檔是最棒的指導(dǎo)~
方法二
將三個(gè)點(diǎn)的id值讀取成bytes再異或得到一個(gè)值灭衷,根據(jù)這個(gè)值進(jìn)行去重。
vs = set()
with open('test.csv','r') as fr,open('out.csv','w') as fw:
for line in fr:
v0,v1,v2 = line.strip().split(',')
v = bytes(i ^ j ^ k for i, j ,k in zip(bytes.fromhex(v0), bytes.fromhex(v1), bytes.fromhex(v2))).hex()
# print(f'{v0},{v1},{v2},{v}')
if v not in vs:
fw.write(f'{v0},{v1},{v2},{v}\n')
vs.add(v)
else:
continue
最終結(jié)果
# out.csv
0000068d96366d5e052ed65eaecb1112 74a995fb9b79e84232b7510644688cd8 dfb644784e6292b8d7f499fc53229dda ab1fd70e432d17a4e06d1ea4b9810010
0000068d96366d5e052ed65eaecb1112 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 bf2a6fa9e59e41bb172d54f1670c5c3a
000007f62fb9ee58cec1391e55b9c200 836598c70ab46fb1c621fe548dd363cc a5545ef32843ecfa7038692a1dbbe305 2631c1c20d4e6d1378d8ae60c5d142c9
000008dc6580ce3d3313e35417c0aa65 8c658523dbcf012383fb12aec76f220f b193cb6de781dbacc04ce2ccb96d43dc 3df6469259ce14b270a4133669c2cbb6
00001d74b7b878b076ab2d84d5de4296 d2a5854ae3efa1ee4e7a4070e841f108 4c0fd736ef3845072bdd9a318a71a96f 9eaa4f08bb6f9c59130cf7c5b7ee1af1
000021d8344e102a857740ee319c40e9 c27c333b1586b239bb302b23c70ce274 9a29e26feb9d254a8f70c9ad94e8d1dc 5855f08cca558759b137a26062787341
000021d8344e102a857740ee319c40e9 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 bf2a48fc47e63ccf9774c241f85b0dc1
0000419d8eec64dd2582336713a51ad3 714215e7fdf0f3f32073df73caecd44f 3255e40d3de83c0ad9ebd35569e601b4 4317b0774ef4ab24dc1a3f41b0afcf28
0000434e444cab7991535ab1dc7e0a43 2ab9986c1d02386ca74155e55fcdb64c 16de98e5c7f9751fc6b66526ce875736 3c6743c79eb7e60af0a46a724d34eb39
00004e274353b7bd829ad789930a799c a76fdec4ace13644f6f50934c53fb484 18ee36edb4a3ed1c9d35cca2a01af37c bf81a60e5b116ce5e95a121ff62f3e64
00009475ecac55a5816427acd81a2873 d1c17f8ff05ee3243fc084ae45042891 73b357fc3e0f5e538521c9589762e113 a272bc0622fde8d23b856a5a0a7ce1f1
00009f5e45eb2f2c37ed8e349ba63e76 c8fbf4f7114038db015eb8debf0bfba8 1f111ee13d222ab4f4d3d128af201aea d7ea754869893d43c260e7c28b8ddf34
00009f5e45eb2f2c37ed8e349ba63e76 2902e25dde191f5f596a3756bed1a3ce 96288b79adb133ba4b69b5f97716eee6 bf2af67a364303c925ee0c9b5261735e
0000b229f3216a8ca929c1a90292665e 5aaf57319e251c04fe8b3d4f1d687cad 2f1f7611449c9834dc8d1df8e745e7c1 75b093092998eebc8b2fe11ef8bffd32
0000f2546c76a8412c130228dca0ea8d b24401256e3bd6d30c32bbedb9a90956 24338e2d8656a53303f3dced1d6af82f 96777d5c841bdba123d2652878631bf4
這樣看起來就舒服很多旁涤,合理多了翔曲。
總結(jié)
- 無序集合的比較,如果是一個(gè)一個(gè)對比劈愚,想一想都覺得難以實(shí)現(xiàn)瞳遍。因此選擇將無序集合生成一個(gè)值能夠作為代表,再進(jìn)行比較菌羽。
- 多看官方文檔