Overview
本節(jié)的目的是使用您在前幾節(jié)中學(xué)習(xí)的內(nèi)容創(chuàng)建協(xié)議敬锐。在本教程中传轰,我們將準(zhǔn)備抗體抗原結(jié)合結(jié)構(gòu)剩盒,在抗體上進(jìn)行點(diǎn)突變,記錄結(jié)合強(qiáng)度的變化慨蛙,并生成顯示能量差異的熱圖辽聊。該方法廣泛用于抗體界面設(shè)計,以提高結(jié)合親和力或擴(kuò)大結(jié)合范圍期贫。
Protocol
整個Protocol分為八個步驟:
- 使用FastRelax()準(zhǔn)備結(jié)構(gòu)
- 編寫函數(shù)以執(zhí)行突變PackMover()
- 編寫函數(shù)以解除綁定抗體抗原結(jié)合結(jié)構(gòu)unbind()
- 編寫獲取野生型氨基酸的函數(shù)
- 編寫函數(shù)以正確變異和pack特定的殘基并輸出能量情況
- 循環(huán)通過接口位置跟匆,用輸出文件將它們變?yōu)?0個氨基酸
- 匯總結(jié)合能分析的所有輸入文件
Loading structures
from pyrosetta import *
init()
pose = pose_from_pdb("inputs/1jhl.clean.pdb")
testPose = Pose()
testPose.assign(pose)
print(testPose)
PDB file name: inputs/1jhl.clean.pdb
Total residues: 353
Sequence: DIELTQSPSYLVASPGETITINCRASKSISKSLAWYQEKPGKTNNLLIYSGSTLQSGIPSRFSGSGSGTDFTLTISSLEPEDFAMYICQQHNEYPWTFGGGTKLEIKRQVQLQQSGAELVRPGASVKLSCKASGYTFISYWINWVKQRPGQGLEWIGNIYPSDSYTNYNQKFKDKATLTVDKSSSTAYMQLSSPTSEDSAVYYCTRDDNYGAMDYWGQGTTVTVKVYGRCELAAAMKRMGLDNYRGYSLGNWVCAAKFESNFNTGATNRNTDGSTDYGILQINSRWWCNDGRTPGSKNLCHIPCSALLSSDITASVNCAKKIVSDGDGMNAWVAWRKHCKGTDVNVWIRGCRL
Fold tree:
FOLD_TREE EDGE 1 108 -1 EDGE 1 109 1 EDGE 109 224 -1 EDGE 1 225 2 EDGE 225 353 -1
Step 1. Prepare the starting structure with FastRelax()
適當(dāng)relax結(jié)構(gòu)對于Rosetta的設(shè)計至關(guān)重要。非松弛結(jié)構(gòu)可能無法很好地克服不良的全局能量通砍,因此會影響以下關(guān)于結(jié)合能的分析贾铝。
FastRelax()用于relax結(jié)構(gòu)。雖然我們希望將結(jié)構(gòu)置于最低能量狀態(tài)埠帕,但我們希望盡可能保留晶體結(jié)構(gòu)中的骨架信息(最低RMSD)垢揩。因此,我們將constrain_relax_to_start_coords(True)應(yīng)用于FastRelax()敛瓷。
由于FastRelax()占用了大量資源叁巨,運(yùn)行它似乎會使notebook崩潰,因此我們刪除了“應(yīng)用”部分(執(zhí)行松弛的部分)呐籽,并打印出松弛Mover()锋勺。我們將松弛的結(jié)構(gòu)上傳到輸入文件夾,以便進(jìn)一步分析狡蝶。
坐標(biāo)約束relax是一種常規(guī)準(zhǔn)備。有關(guān)此準(zhǔn)備方法的更高級版本贪惹,請參閱:https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059004
from pyrosetta.rosetta.protocols.relax import FastRelax
relax = FastRelax()
scorefxn = get_fa_scorefxn()
relax.set_scorefxn(scorefxn)
relax = rosetta.protocols.relax.FastRelax()
relax.constrain_relax_to_start_coords(True)
print(relax)
#relax.apply(testPose)
#testPose.dump_pdb('test.relax.pdb')
Writing Function in Python
函數(shù)是組織代碼的好方法苏章。從本節(jié)開始奏瞬,我將介紹一些功能來促進(jìn)協(xié)議枫绅。
要在python中定義函數(shù),需要使用“def”關(guān)鍵字硼端。函數(shù)既可以返回值并淋,也可以簡單地執(zhí)行代碼塊。定義的函數(shù)也可以在主函數(shù)或代碼的其他部分中調(diào)用珍昨。
Step 2. Write the function for mutation
此函數(shù)利用了先前教程中演示的ResidueSelectorMover()县耽。允許設(shè)計和repack突變位置,而附近的殘留物僅限于repack镣典。最后的突變將使用PackMover()執(zhí)行兔毙。
from pyrosetta.rosetta.core.pack.task import *
from pyrosetta.rosetta.protocols import *
from pyrosetta.rosetta.core.select import *
def pack(pose, posi, amino, scorefxn):
# Select Mutate Position
mut_posi = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector()
mut_posi.set_index(posi)
#print(pyrosetta.rosetta.core.select.get_residues_from_subset(mut_posi.apply(pose)))
# Select Neighbor Position
nbr_selector = pyrosetta.rosetta.core.select.residue_selector.NeighborhoodResidueSelector()
nbr_selector.set_focus_selector(mut_posi)
nbr_selector.set_include_focus_in_subset(True)
#print(pyrosetta.rosetta.core.select.get_residues_from_subset(nbr_selector.apply(pose)))
# Select No Design Area
not_design = pyrosetta.rosetta.core.select.residue_selector.NotResidueSelector(mut_posi)
#print(pyrosetta.rosetta.core.select.get_residues_from_subset(not_design.apply(pose)))
# The task factory accepts all the task operations
tf = pyrosetta.rosetta.core.pack.task.TaskFactory()
# These are pretty standard
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.InitializeFromCommandline())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.IncludeCurrent())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.NoRepackDisulfides())
# Disable Packing
prevent_repacking_rlt = pyrosetta.rosetta.core.pack.task.operation.PreventRepackingRLT()
prevent_subset_repacking = pyrosetta.rosetta.core.pack.task.operation.OperateOnResidueSubset(prevent_repacking_rlt, nbr_selector, True )
tf.push_back(prevent_subset_repacking)
# Disable design
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.OperateOnResidueSubset(
pyrosetta.rosetta.core.pack.task.operation.RestrictToRepackingRLT(),not_design))
# Enable design
aa_to_design = pyrosetta.rosetta.core.pack.task.operation.RestrictAbsentCanonicalAASRLT()
aa_to_design.aas_to_keep(amino)
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.OperateOnResidueSubset(aa_to_design, mut_posi))
# Create Packer
packer = pyrosetta.rosetta.protocols.minimization_packing.PackRotamersMover()
packer.task_factory(tf)
#Perform The Move
if not os.getenv("DEBUG"):
packer.apply(pose)
#Load the previously-relaxed pose.
relaxPose = pose_from_pdb("inputs/1jhl.relax.pdb")
#Clone it
original = relaxPose.clone()
scorefxn = get_score_function()
print("\nOld Energy:", scorefxn(original),"\n")
pack(relaxPose, 130, 'A', scorefxn)
print("\nNew Energy:", scorefxn(relaxPose),"\n")
#Set relaxPose back to original
relaxPose = original.clone()
Old Energy: -1056.7705978308402
New Energy: -1057.2353638196532
Step 3. unbind()
該函數(shù)用于結(jié)合能分析。為了計算結(jié)合能骆撇,我們需要對結(jié)合態(tài)結(jié)構(gòu)的總能量進(jìn)行評分瞒御,分離(解除結(jié)合)抗原和抗體,然后對未結(jié)合態(tài)總能量進(jìn)行打分神郊。bound energy - unbound energy肴裙。您還可以使用位于rosetta.protocols.analysis中的InterfaceAnalyzerOver(IAM)。
from pyrosetta.rosetta.protocols import *
def unbind(pose, partners):
STEP_SIZE = 100
JUMP = 2
docking.setup_foldtree(pose, partners, Vector1([-1,-1,-1]))
trans_mover = rigid.RigidBodyTransMover(pose,JUMP)
trans_mover.step_size(STEP_SIZE)
trans_mover.apply(pose)
##Reset the original pose
relaxPose = original.clone()
scorefxn = get_score_function()
bound_score = scorefxn(relaxPose)
print("\nBound State Score",bound_score,"\n")
unbind(relaxPose, "HL_A")
unbound_score = scorefxn(relaxPose)
print("\nUnbound State Score", unbound_score,"\n")
print('dG', bound_score - unbound_score, 'Rosetta Energy Units (REU)')
Bound State Score -1056.7706161583792
Unbound State Score -998.7222028352029
dG -58.048413323176305 Rosetta Energy Units (REU)
Step 4. wildtype()
評估結(jié)合改善的一個重要指標(biāo)是突變體結(jié)合能與野生型結(jié)合能的比率涌乳。此函數(shù)返回給定位置的野生型氨基酸蜻懦。
def wildtype(aatype = 'AA.aa_gly'):
AA = ['G','A','L','M','F','W','K','Q','E','S','P'
,'V','I','C','Y','H','R','N','D','T']
AA_3 = ['AA.aa_gly','AA.aa_ala','AA.aa_leu','AA.aa_met','AA.aa_phe','AA.aa_trp'
,'AA.aa_lys','AA.aa_gln','AA.aa_glu', 'AA.aa_ser','AA.aa_pro','AA.aa_val'
,'AA.aa_ile','AA.aa_cys','AA.aa_tyr','AA.aa_his','AA.aa_arg','AA.aa_asn'
,'AA.aa_asp','AA.aa_thr']
for i in range(0, len(AA_3)):
if(aatype == AA_3[i]):
return AA[i]
print(wildtype(str(relaxPose.aa(130))))
Setp 4a. Exploration.
我們也可以使用一些實(shí)用函數(shù)來代替這個函數(shù)。
我們可以使用返回的序列STRING夕晓。注意宛乃,在Rosetta中,字符串的索引為0。
print(relaxPose.sequence())
print("\n",relaxPose.sequence()[2])
DIELTQSPSYLVASPGETITINCRASKSISKSLAWYQEKPGKTNNLLIYSGSTLQSGIPSRFSGSGSGTDFTLTISSLEPEDFAMYICQQHNEYPWTFGGGTKLEIKRQVQLQQSGAELVRPGASVKLSCKASGYTFISYWINWVKQRPGQGLEWIGNIYPSDSYTNYNQKFKDKATLTVDKSSSTAYMQLSSPTSEDSAVYYCTRDDNYGAMDYWGQGTTVTVKVYGRCELAAAMKRMGLDNYRGYSLGNWVCAAKFESNFNTGATNRNTDGSTDYGILQINSRWWCNDGRTPGSKNLCHIPCSALLSSDITASVNCAKKIVSDGDGMNAWVAWRKHCKGTDVNVWIRGCRL
E
現(xiàn)在讓我們從保存化學(xué)信息的ResidueType對象中獲取一些信息征炼。
print(relaxPose.residue_type(1))
ASP:NtermProteinFull (ASP, D):
Base: ASP
Properties: POLYMER PROTEIN CANONICAL_AA LOWER_TERMINUS SC_ORBITALS POLAR CHARGED NEGATIVE_CHARGE METALBINDING ALPHA_AA L_AA
Variant types: LOWER_TERMINUS_VARIANT
Main-chain atoms: N CA C
Backbone atoms: N CA C O 1H 2H 3H HA
Side-chain atoms: CB CG OD1 OD2 1HB 2HB
最后析既,讓我們從類型中獲取殘留物
resT = relaxPose.residue_type(1)
print(resT.base_name())
print(resT.name3())
print(resT.name1())
ASP
ASP
D
Step 5. Integrate functions for mutate and output
def mutate(pose, posi, amino, partners):
#main function for mutation
CSV_PREFIX = 'notec'
PDB_PREFIX = 'notep'
#Initiate test pose
testPose = Pose()
testPose.assign(pose)
#Initiate energy function
scorefxn = get_fa_scorefxn()
unbind(testPose, partners)
native_ub = scorefxn(testPose)
testPose.assign(pose)
#Variables initiation
content = ''
name = CSV_PREFIX + str(posi)+str(amino) + '.csv'
pdbname = PDB_PREFIX + str(posi)+str(amino) + '.pdb'
wt = wildtype(str(pose.aa(posi)))
# energy before mutaion
pack(testPose, posi, amino, scorefxn)
testPose.dump_pdb(pdbname)
bound = scorefxn(testPose)
unbind(testPose, partners)
unbound = scorefxn(testPose)
binding = unbound - bound
testPose.assign(pose)
# energy after mutaion
if (wt == amino):
wt_energy = binding
else:
pack(testPose, posi, wt, scorefxn)
wtbound = scorefxn(testPose)
unbind(testPose, partners)
wtunbound = scorefxn(testPose)
wt_energy = wtunbound - wtbound
testPose.assign(pose)
content=(content+str(pose.pdb_info().pose2pdb(posi))+','+str(amino)+','
+str(native_ub)+','+str(bound)+','+str(unbound)+','+str(binding)+','
+str(wt_energy)+','+str(wt)+','+str(binding/wt_energy)+'\n')
f = open(name,'w+')
f.write(content)
f.close()
mutate(relaxPose, 130, 'A', 'HL_A')
Step 6. Loop through interface positions
以下代碼循環(huán)通過所有重鏈和輕鏈位置,將它們突變?yōu)樗?0個氨基酸谆奥。實(shí)際運(yùn)行可能再次使筆記本崩潰眼坏。
您可以任意輸出數(shù)據(jù)。pose.scores()將具有數(shù)據(jù)酸些、pandas Dataframes數(shù)據(jù)幀宰译。在這里,我們將簡單地以csv文件的形式輸出每個突變的能量信息魄懂,并將它們分類到一個文件中以供將來分析沿侈。將輸出文件上載到輸入文件夾以進(jìn)行R分析市栗。
在實(shí)際工作環(huán)境中,可以實(shí)現(xiàn)并行化肃廓。請參見PyRosetta一章。對于此任務(wù)铣鹏,您不一定需要完成整個循環(huán)。我們在inputs文件夾中有一個完成的版本哀蘑。
#NOTE - This takes a very long time!!
# You may skip this block to continue the tutorial with pre-configured outputs.
if not os.getenv("DEBUG"):
for i in [92,85,68,53,5,45,44,42,32,31,22,108,100]:
print("\nMutating Position: ",str(i),"\n")
for j in ['G','A','L','M','F','W','K','Q','E','S','P','V','I','C','Y','H','R','N','D','T']:
mutate(relaxPose, i, j, 'HL_A')
Analysis of binding data
在收集匯總的結(jié)合能量信息后,我們使用panda進(jìn)行過濾和可視化合溺。我們過濾掉了較低的非束縛態(tài)能量結(jié)構(gòu),即那些非束縛態(tài)總能量高于原生態(tài)的結(jié)構(gòu)棠赛,并從過濾后的數(shù)據(jù)中生成熱圖睛约。如果您不想在第8步完成for循環(huán)哲身,我們將合并輸出csv的完成版本上載到名為“note_output.csv”的輸入文件夾。
#import modules need for analysis
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
csv_file_name = 'inputs/note_output.csv'
#Generating heatmaps
UNBOUND_CUTOFF = -995
RATIO_CUTOFF = 1.001
#load the data into a pandas dataframe
df = pd.read_csv(csv_file_name, names='Position Amino.Acid Native Bound Unbound Binding WT_Binding WT Ratio'.split(), index_col=False )
#Add wildtype AA to position (for display)
df['Position_WT_aa'] = df.Position + ' (' + df.WT + ')'
#filter values
df = df.query(f'Unbound<{UNBOUND_CUTOFF} and Ratio>{RATIO_CUTOFF}')
# convert from tidy format (https://en.wikipedia.org/wiki/Tidy_data) to a wider format
df = pd.pivot_table(df, values='Ratio',
index=['Position_WT_aa'],
columns='Amino.Acid')
#plot the heatmap
f, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(df, cmap='coolwarm', linewidths= 1, linecolor='lightgray')
plt.xlabel('Amino acid mutation');
plt.ylabel('Position chain and wild type AA');
sns.set_context("talk") #make labels bigger