python - 条件概率 - Python

我正在研究这个 python 问题:

Given a sequence of the DNA bases {A, C, G, T}, stored as a string, returns a conditional probability table in a data structure such that one base (b1) can be looked up, and then a second (b2), to get the probability p(b2 | b1) of the second base occurring immediately after the first. (Assumes the length of seq is >= 3, and that the probability of any b1 and b2 which have never been seen together is 0. Ignores the probability that b1 will be followed by the end of the string.)

You may use the collections module, but no other libraries.



def dna_prob2(seq):
    tbl = dict()
    levels = set(word)
    freq = dict.fromkeys(levels, 0)
    for i in seq:
        freq[i] += 1
    for i in levels:
        tbl[i] = {x:0 for x in levels}
    lastlevel = ''
    for i in tbl:
        if lastlevel != '':
             tbl[lastlevel][i] += 1
        lastlevel = i
    for i in tbl:
        print(i,tbl[i][i] / freq[i])
    return tbl

tbl['T']['T'] / freq[i] 

基本上,最终结果应该是您在上面看到的最后一行 tbl。但是,当我尝试在 print(i,tbl[i][i]/freq[i) 中执行此操作并运行 dna_prob2(word) 时,我得到 0.0 s 代表一切。





def makeprobs(word):
  singles = {}
  probs = {}
  ll = len(word)
  for i in range(ll-1):
    x1 = word[i]
    x2 = word[i+1]
    singles[x1] = singles.get(x1, 0)+1.0
    thedict[(x1, x2)] = thedict.get((x1, x2), 0)+1.0
  for i in thedict:
    probs[i] = thedict[i]/singles[i[0]]
  return probs


