# Problem
# An array is a structure containing an ordered collection of objects (numbers, strings, other arrays, etc.). We let A[k] denote the k-th value in array A. You may like to think of an array as simply a matrix having only one row.
# A random string is constructed so that the probability of choosing each subsequent symbol is based on a fixed underlying symbol frequency.
# GC-content offers us natural symbol frequencies for constructing random DNA strings. If the GC-content is x, then we set the symbol frequencies of C and G equal to x2 and the symbol frequencies of A and T equal to 1?x2. For example, if the GC-content is 40%, then as we construct the string, the next symbol is 'G'/'C' with probability 0.2, and the next symbol is 'A'/'T' with probability 0.3.
# In practice, many probabilities wind up being very small. In order to work with small probabilities, we may plug them into a function that "blows them up" for the sake of comparison. Specifically, the common logarithm of x (defined for x>0 and denoted log10(x)) is the exponent to which we must raise 10 to obtain x.
# See Figure 1 for a graph of the common logarithm function y=log10(x). In this graph, we can see that the logarithm of x-values between 0 and 1 always winds up mapping to y-values between ?∞ and 0: x-values near 0 have logarithms close to ?∞, and x-values close to 1 have logarithms close to 0. Thus, we will select the common logarithm as our function to "blow up" small probability values for comparison.
# Given: A DNA string s of length at most 100 bp and an array A containing at most 20 numbers between 0 and 1.
# Return: An array B having the same length as A in which B[k] represents the common logarithm of the probability that a random string constructed with the GC-content found in A[k] will match s exactly.
# Sample Dataset
# ACGATACAA
# 0.129 0.287 0.423 0.476 0.641 0.742 0.783
# Sample Output
# -5.737 -5.217 -5.263 -5.360 -5.958 -6.628 -7.009
# 給定一個最長為100個堿基對的DNA字符串s和一個包含最多20個介于0和1之間的數(shù)字的數(shù)組A。要求返回一個與A具有相同長度的數(shù)組B,其中B[k]表示在使用A[k]中的GC含量構(gòu)建的隨機(jī)字符串與s完全匹配的概率的常用對數(shù)产捞。
import math
def compute_log(s, A):
? ? # 計算堿基的頻率
? ? freq = {'A': 0, 'T': 0, 'G': 0, 'C': 0}
? ? for base in s:
? ? ? ? freq[base] += 1
? ? # 初始化結(jié)果數(shù)組B
? ? B = []
? ? # 計算每個GC含量對應(yīng)的概率
? ? for gc_content in A:
? ? ? ? # 計算AT和GC堿基的期望數(shù)量
? ? ? ? at_count = (1 - gc_content) / 2
? ? ? ? gc_count = gc_content / 2
? ? ? ? # 計算與s完全匹配的概率
? ? ? ? prob = (at_count ** freq['A']) * (at_count ** freq['T']) * (gc_count ** freq['G']) * (gc_count ** freq['C'])
? ? ? ? # 取對數(shù)并將結(jié)果添加到數(shù)組B中
? ? ? ? log_prob = math.log10(prob)
? ? ? ? B.append(log_prob)
? ? return B
# 測試樣例
s = "ACGATACAA"
A = [0.129, 0.287, 0.423, 0.476, 0.641, 0.742, 0.783]
result = compute_log(s, A)
print(result)