熱線電話:13121318867

登錄
首頁精彩閱讀利用統計學知識為android應用的啟動時間做數據分析
利用統計學知識為android應用的啟動時間做數據分析
2016-01-31
收藏

利用統計學知識為android應用的啟動時間做數據分析

一.數據說明

啟動時間用同一臺設備,同一個包進行啟動時間的測試,其中三組樣本數據(每組100份對比數據)如下:

  • 設備pro-5-1
base_list_1 = [0.944, 0.901, 0.957, 0.911, 1.189, 0.93, 0.94, 0.932, 0.951, 0.911, 0.934, 0.903, 0.922, 0.917, 0.931, 0.962, 0.945, 1.254, 0.918, 0.913, 0.931, 0.935, 0.89, 0.948, 0.932, 0.931, 0.875, 0.96, 1.117, 0.905, 0.955, 0.914, 0.95, 0.933, 0.941, 0.905, 0.919, 1.124, 0.953, 0.918, 0.942, 0.918, 0.914, 0.907, 0.942, 0.907, 0.895, 0.917, 0.927, 0.908, 0.915, 0.914, 0.945, 0.933, 0.894, 0.958, 0.885, 0.971, 0.94, 1.261, 0.949, 0.922, 1.009, 0.941, 0.942, 0.907, 0.913, 0.874, 0.963, 0.951, 0.972, 0.94, 0.952, 0.941, 0.954, 0.914, 0.951, 0.899, 0.908, 0.945, 0.934, 0.922, 0.92, 0.959, 0.946, 0.892, 0.847, 0.96, 0.973, 0.928, 0.913, 0.935, 0.939, 0.967, 0.907, 0.94, 0.927, 0.88, 1.004, 0.986]

cmp_list_1 = [0.931, 0.947, 0.965, 0.912, 0.966, 0.974, 0.97, 0.971, 0.958, 0.938, 0.949, 0.972, 0.946, 0.915, 0.906, 0.926, 0.955, 0.93, 0.931, 0.979, 0.952, 1.062, 0.921, 1.002, 0.927, 0.942, 0.991, 0.898, 1.121, 1.006, 0.941, 0.953, 1.013, 0.979, 0.997, 0.961, 0.947, 0.96, 0.966, 0.917, 1.002, 0.955, 0.946, 0.99, 0.945, 0.911, 0.923, 0.94, 0.933, 0.954, 0.907, 0.961, 0.937, 0.941, 0.897, 0.954, 0.979, 0.927, 0.957, 0.944, 0.961, 0.924, 0.953, 0.954, 0.929, 0.926, 0.965, 0.95, 0.964, 0.895, 0.921, 0.945, 0.955, 0.96, 0.962, 0.907, 0.933, 0.955, 0.921, 0.959, 0.934, 0.973, 0.977, 0.938, 0.945, 0.949, 0.932, 0.976, 0.947, 0.941, 0.898, 0.942, 0.887, 0.963, 0.931, 0.999, 0.915, 0.947, 0.958, 0.988]
  • 設備pro-5-2
base_list_2 = [0.887, 0.926, 0.931, 0.918, 0.905, 0.896, 0.889, 0.922, 0.923, 0.919, 0.927, 0.904, 0.927, 1.039, 0.933, 1.209, 0.935, 0.882, 0.947, 0.914, 0.871, 0.924, 0.922, 0.943, 0.902, 0.938, 0.896, 0.906, 0.939, 0.899, 0.934, 0.923, 0.927, 0.911, 0.943, 0.886, 0.844, 0.913, 0.907, 0.954, 0.934, 0.854, 0.953, 0.903, 0.931, 0.838, 0.936, 0.955, 0.943, 0.933, 0.901, 1.18, 0.907, 0.883, 0.885, 0.909, 0.94, 0.939, 0.889, 0.917, 0.933, 0.904, 0.888, 0.953, 0.936, 0.947, 0.927, 0.881, 0.914, 0.937, 0.898, 0.914, 0.929, 0.945, 0.935, 0.902, 0.939, 0.925, 0.909, 0.903, 0.92, 0.917, 0.987, 0.911, 0.889, 0.888, 0.91, 0.941, 0.904, 0.911, 0.908, 0.793, 1.113, 0.947, 0.876, 0.908, 0.91, 0.921, 0.941, 0.987]

cmp_list_1 = [0.929, 0.94, 0.931, 0.978, 0.965, 0.938, 0.941, 0.937, 0.91, 0.92, 0.934, 0.92, 0.981, 0.939, 0.928, 0.95, 0.94, 0.928, 0.925, 0.933, 0.963, 0.954, 0.987, 0.965, 0.96, 0.94, 0.966, 0.96, 0.942, 0.969, 0.978, 0.964, 0.921, 0.964, 0.939, 0.97, 0.961, 0.945, 1.004, 0.951, 0.916, 0.942, 0.955, 0.975, 0.947, 0.917, 0.944, 0.943, 0.905, 0.955, 0.96, 0.994, 0.925, 0.922, 0.958, 0.957, 0.958, 0.907, 0.981, 0.937, 0.959, 0.919, 0.959, 0.932, 0.951, 0.927, 0.949, 0.949, 0.944, 0.913, 0.967, 0.981, 0.942, 0.949, 0.932, 0.933, 0.97, 0.931, 0.918, 0.972, 0.95, 0.962, 0.988, 1.0, 1.003, 0.949, 0.933, 0.955, 0.934, 0.952, 0.937, 0.977, 0.936, 0.991, 0.986, 0.943, 0.997, 0.975, 0.991, 0.984]
  • 設備mx4-pro
base_list_1 = [1.359, 1.415, 1.395, 1.318, 1.345, 1.417, 1.36, 1.373, 1.337, 1.332, 1.498, 1.318, 1.392, 1.364, 1.397, 1.793, 1.341, 1.364, 1.428, 1.345, 1.418, 1.364, 1.372, 1.541, 1.465, 1.373, 1.337, 1.52, 1.375, 1.367, 1.366, 1.347, 1.334, 1.422, 1.354, 1.369, 1.413, 1.345, 1.373, 1.363, 1.464, 1.344, 1.324, 1.331, 1.405, 1.355, 1.674, 1.38, 1.352, 1.339, 1.326, 1.362, 1.431, 1.774, 1.312, 1.292, 1.384, 1.473, 1.337, 1.406, 1.412, 1.385, 1.292, 1.384, 1.342, 1.333, 1.435, 1.372, 1.42, 1.315, 1.344, 1.414, 1.51, 1.334, 1.308, 1.468, 1.401, 1.316, 1.373, 1.407, 1.474, 1.382, 1.346, 1.373, 1.366, 1.378, 1.315, 1.417, 1.431, 1.379, 1.324, 1.383, 1.349, 1.4, 1.327, 1.734, 1.395, 1.412, 1.438, 1.384]

cmp_list_1 = [1.414, 1.326, 1.421, 1.371, 1.363, 1.36, 1.417, 1.34, 1.357, 1.429, 1.308, 1.324, 1.351, 1.323, 1.367, 1.412, 1.391, 1.661, 1.34, 1.38, 1.528, 1.417, 1.352, 1.569, 1.32, 1.473, 1.531, 1.445, 1.407, 1.529, 1.356, 1.349, 1.362, 1.358, 1.375, 1.365, 1.317, 1.302, 1.342, 1.351, 1.393, 1.473, 1.392, 1.299, 1.367, 1.381, 1.354, 1.374, 1.551, 1.448, 1.387, 1.361, 1.358, 1.362, 1.568, 1.343, 1.334, 1.378, 1.417, 1.382, 1.421, 1.345, 1.336, 1.302, 1.349, 1.381, 1.374, 1.359, 1.38, 1.553, 1.34, 1.269, 1.353, 1.329, 1.649, 1.392, 1.367, 1.377, 1.403, 1.361, 1.352, 1.466, 1.389, 1.346, 1.345, 1.35, 1.383, 1.446, 1.613, 1.395, 1.402, 1.394, 1.348, 1.353, 1.395, 1.345, 1.274, 1.425, 1.351, 1.586]

二.正態性檢驗(采用spss)

進行正態性檢驗的目的是為了驗證我們的測試數據樣本是不是符合正態分布(近似),而且下面的統計學利用是需要在正態分布下進行的,因此,進行正態性檢驗是必備的。下列圖對應的是區域內的頻數統計

因為是同一臺設備的同一個場景,因此可知左右兩邊的分布應該是近似一致的。通過觀察Q-Q圖與Q-Q去勢圖可以斷定,我們的啟動時間是符合正態分布的。但需要注意的是,base_list_2跟cmp_list_2的分布,方差明顯差很多,可以看出數據分布更加零散(基本可以斷定第二組數據是不能拿來作為對比的),而其他幾組幾乎是同形狀的分布。

三.顯著性檢驗

方差齊性檢驗的目的是為了檢驗兩組數據兩兩對比的差異,從而判斷兩組數據的數據來源分布是否一致。能否通過方差齊性檢驗,是我們能否采用這組數據作為對比數據的前提標準。

判斷腳本如下

#coding:utf-8
import MySQLdb
import json
import numpy as np
from scipy.stats import levene
import threading
 
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
 
class DBOperate(object):
    def __init__(self, host, user, db, passwd, port):
        self.host = host
        self.user = user
        self.db = db
        self.passwd = passwd
        self.port = port
        self.conn = MySQLdb.connect(
            host = self.host,
            user = self.user,
            passwd = self.passwd,
            db = self.db,
            port = self.port)
        self.cur = self.conn.cursor()
 
    def execute(self,sql):
        try:
            self.cur.execute(sql)
            self.conn.commit()
            print "======sql執行成功: ",sql
        except Exception as e:
            print e
 
    def getData(self,sql):
        try:
            self.cur.execute(sql)
            result = self.cur.fetchall()
            return result
        except Exception as e:
            print e
 
    def close(self):
        self.cur.close()
        self.conn.close()   
 
class MathTools(object):
    def __init__(self,base_data,cmp_data):
        self.base_data = base_data
        self.cmp_data = cmp_data
 
    def testVar(self):
        '''方差齊性檢驗
        '''
        result = levene(self.base_data,self.cmp_data)
        print result
        if float(result[1]) > 0.05:
            print "方差齊性檢驗通過,可以認為方差相等(說明硬件或者執行時間不同可能帶來的誤差可以忽略)!"
 
    def getMeanAndVar(self):
        '''獲取樣本均值跟方差
        '''
        for each in [self.base_data,self.cmp_data]:
            mean = np.mean(each)       
            var = np.var(each)
            std = np.std(each)
            print "==================="
            print "均值:",mean
            print "方差:",var
            print "標準差:",std
            print "==================="
        return mean,var,std
 
 
def drawPlot(avg,std):
    x = np.linspace(0.5,1.5,10000)
    plt.plot(x,mlab.normpdf(x,avg,std))
    plt.show()
         
def dataAnalysis(tuple_data):
    avg_list = []
    for each_tuple in tuple_data:
        str_data = each_tuple[0]
        dic_data = json.loads(str_data)
        avg_time = float(dic_data['intervalStartTime'])
        avg_list.append(avg_time)
    return avg_list
 
def outputData(dboperate,task_id_1,task_id_2):
    data_base = dboperate.getData('''SELECT start_time_log from uctc_qms_start_time WHERE task_id=%s'''%task_id_1)
    data_cmp = dboperate.getData('''SELECT start_time_log from uctc_qms_start_time WHERE task_id=%s'''%task_id_2)
    base_list = dataAnalysis(data_base)
    cmp_list = dataAnalysis(data_cmp)
    return base_list,cmp_list
 
    
def main():
    dboperate = DBOperate(
        host="xxxx",
        user="xxxx",
        passwd="xxxx",
        db="xxxx",
        port=3306)
    base_list_1,cmp_list_1 = outputData(dboperate,216674,216675)
    print "base_list_1:\n",base_list_1
    print "cmp_list_1:\n",cmp_list_1
    mt = MathTools(base_list_1,cmp_list_1)
    mt.testVar()
    avg_list = mt.getMeanAndVar()
    base_list_2,cmp_list_2 = outputData(dboperate,216679,216680)
    print "base_list_2:\n",base_list_2
    print "cmp_list_2:\n",cmp_list_2
    mt2 = MathTools(base_list_2,cmp_list_2)
    mt2.testVar()
    mt2.getMeanAndVar()
 
    base_list_3,cmp_list_3 = outputData(dboperate,216677,216682)
    print "base_list_1:\n",base_list_3
    print "cmp_list_1:\n",cmp_list_3
    mt3 = MathTools(base_list_3,cmp_list_3)
    mt3.testVar()
    mt3.getMeanAndVar()    
     
    dboperate.close()
     
    
 
if __name__ == '__main__':
    main()

分別對三組數據做方差齊性檢驗發現第二組數據沒有通過方差齊性檢驗,第二組數據中base_list_2跟cmp_list_2存在顯著性差異,由于我們的測試是用了同一設備的同一個包進行同一種測試,因此可以斷定第二組數據必須過濾掉。


base_list_1:
[0.944, 0.901, 0.957, 0.911, 1.189, 0.93, 0.94, 0.932, 0.951, 0.911, 0.934, 0.903, 0.922, 0.917, 0.931, 0.962, 0.945, 1.254, 0.918, 0.913, 0.931, 0.935, 0.89, 0.948, 0.932, 0.931, 0.875, 0.96, 1.117, 0.905, 0.955, 0.914, 0.95, 0.933, 0.941, 0.905, 0.919, 1.124, 0.953, 0.918, 0.942, 0.918, 0.914, 0.907, 0.942, 0.907, 0.895, 0.917, 0.927, 0.908, 0.915, 0.914, 0.945, 0.933, 0.894, 0.958, 0.885, 0.971, 0.94, 1.261, 0.949, 0.922, 1.009, 0.941, 0.942, 0.907, 0.913, 0.874, 0.963, 0.951, 0.972, 0.94, 0.952, 0.941, 0.954, 0.914, 0.951, 0.899, 0.908, 0.945, 0.934, 0.922, 0.92, 0.959, 0.946, 0.892, 0.847, 0.96, 0.973, 0.928, 0.913, 0.935, 0.939, 0.967, 0.907, 0.94, 0.927, 0.88, 1.004, 0.986]
cmp_list_1:
[0.931, 0.947, 0.965, 0.912, 0.966, 0.974, 0.97, 0.971, 0.958, 0.938, 0.949, 0.972, 0.946, 0.915, 0.906, 0.926, 0.955, 0.93, 0.931, 0.979, 0.952, 1.062, 0.921, 1.002, 0.927, 0.942, 0.991, 0.898, 1.121, 1.006, 0.941, 0.953, 1.013, 0.979, 0.997, 0.961, 0.947, 0.96, 0.966, 0.917, 1.002, 0.955, 0.946, 0.99, 0.945, 0.911, 0.923, 0.94, 0.933, 0.954, 0.907, 0.961, 0.937, 0.941, 0.897, 0.954, 0.979, 0.927, 0.957, 0.944, 0.961, 0.924, 0.953, 0.954, 0.929, 0.926, 0.965, 0.95, 0.964, 0.895, 0.921, 0.945, 0.955, 0.96, 0.962, 0.907, 0.933, 0.955, 0.921, 0.959, 0.934, 0.973, 0.977, 0.938, 0.945, 0.949, 0.932, 0.976, 0.947, 0.941, 0.898, 0.942, 0.887, 0.963, 0.931, 0.999, 0.915, 0.947, 0.958, 0.988]
(2.585452271112739, 0.10944298973519527)
方差齊性檢驗通過,可以認為方差相等(說明硬件或者執行時間不同可能帶來的誤差可以忽略)!
===================
均值: 0.9432
方差: 0.00405766
標準差: 0.0636997645208
===================
===================
均值: 0.95079
方差: 0.0011006859
標準差: 0.0331765866237
===================
base_list_2:
[0.887, 0.926, 0.931, 0.918, 0.905, 0.896, 0.889, 0.922, 0.923, 0.919, 0.927, 0.904, 0.927, 1.039, 0.933, 1.209, 0.935, 0.882, 0.947, 0.914, 0.871, 0.924, 0.922, 0.943, 0.902, 0.938, 0.896, 0.906, 0.939, 0.899, 0.934, 0.923, 0.927, 0.911, 0.943, 0.886, 0.844, 0.913, 0.907, 0.954, 0.934, 0.854, 0.953, 0.903, 0.931, 0.838, 0.936, 0.955, 0.943, 0.933, 0.901, 1.18, 0.907, 0.883, 0.885, 0.909, 0.94, 0.939, 0.889, 0.917, 0.933, 0.904, 0.888, 0.953, 0.936, 0.947, 0.927, 0.881, 0.914, 0.937, 0.898, 0.914, 0.929, 0.945, 0.935, 0.902, 0.939, 0.925, 0.909, 0.903, 0.92, 0.917, 0.987, 0.911, 0.889, 0.888, 0.91, 0.941, 0.904, 0.911, 0.908, 0.793, 1.113, 0.947, 0.876, 0.908, 0.91, 0.921, 0.941, 0.987]
cmp_list_2:
[0.929, 0.94, 0.931, 0.978, 0.965, 0.938, 0.941, 0.937, 0.91, 0.92, 0.934, 0.92, 0.981, 0.939, 0.928, 0.95, 0.94, 0.928, 0.925, 0.933, 0.963, 0.954, 0.987, 0.965, 0.96, 0.94, 0.966, 0.96, 0.942, 0.969, 0.978, 0.964, 0.921, 0.964, 0.939, 0.97, 0.961, 0.945, 1.004, 0.951, 0.916, 0.942, 0.955, 0.975, 0.947, 0.917, 0.944, 0.943, 0.905, 0.955, 0.96, 0.994, 0.925, 0.922, 0.958, 0.957, 0.958, 0.907, 0.981, 0.937, 0.959, 0.919, 0.959, 0.932, 0.951, 0.927, 0.949, 0.949, 0.944, 0.913, 0.967, 0.981, 0.942, 0.949, 0.932, 0.933, 0.97, 0.931, 0.918, 0.972, 0.95, 0.962, 0.988, 1.0, 1.003, 0.949, 0.933, 0.955, 0.934, 0.952, 0.937, 0.977, 0.936, 0.991, 0.986, 0.943, 0.997, 0.975, 0.991, 0.984]
(4.5987224867656273, 0.0332145312054625)
===================
均值: 0.92446
方差: 0.0028034084
標準差: 0.0529472227789
===================
===================
均值: 0.95108
方差: 0.0005381736
標準差: 0.0231985689214
 
===================
base_list_3:
[1.359, 1.415, 1.395, 1.318, 1.345, 1.417, 1.36, 1.373, 1.337, 1.332, 1.498, 1.318, 1.392, 1.364, 1.397, 1.793, 1.341, 1.364, 1.428, 1.345, 1.418, 1.364, 1.372, 1.541, 1.465, 1.373, 1.337, 1.52, 1.375, 1.367, 1.366, 1.347, 1.334, 1.422, 1.354, 1.369, 1.413, 1.345, 1.373, 1.363, 1.464, 1.344, 1.324, 1.331, 1.405, 1.355, 1.674, 1.38, 1.352, 1.339, 1.326, 1.362, 1.431, 1.774, 1.312, 1.292, 1.384, 1.473, 1.337, 1.406, 1.412, 1.385, 1.292, 1.384, 1.342, 1.333, 1.435, 1.372, 1.42, 1.315, 1.344, 1.414, 1.51, 1.334, 1.308, 1.468, 1.401, 1.316, 1.373, 1.407, 1.474, 1.382, 1.346, 1.373, 1.366, 1.378, 1.315, 1.417, 1.431, 1.379, 1.324, 1.383, 1.349, 1.4, 1.327, 1.734, 1.395, 1.412, 1.438, 1.384]
cmp_list_3:
[1.414, 1.326, 1.421, 1.371, 1.363, 1.36, 1.417, 1.34, 1.357, 1.429, 1.308, 1.324, 1.351, 1.323, 1.367, 1.412, 1.391, 1.661, 1.34, 1.38, 1.528, 1.417, 1.352, 1.569, 1.32, 1.473, 1.531, 1.445, 1.407, 1.529, 1.356, 1.349, 1.362, 1.358, 1.375, 1.365, 1.317, 1.302, 1.342, 1.351, 1.393, 1.473, 1.392, 1.299, 1.367, 1.381, 1.354, 1.374, 1.551, 1.448, 1.387, 1.361, 1.358, 1.362, 1.568, 1.343, 1.334, 1.378, 1.417, 1.382, 1.421, 1.345, 1.336, 1.302, 1.349, 1.381, 1.374, 1.359, 1.38, 1.553, 1.34, 1.269, 1.353, 1.329, 1.649, 1.392, 1.367, 1.377, 1.403, 1.361, 1.352, 1.466, 1.389, 1.346, 1.345, 1.35, 1.383, 1.446, 1.613, 1.395, 1.402, 1.394, 1.348, 1.353, 1.395, 1.345, 1.274, 1.425, 1.351, 1.586]
(0.0077692351582683648, 0.92985189389348166)
方差齊性檢驗通過,可以認為方差相等(說明硬件或者執行時間不同可能帶來的誤差可以忽略)!
===================
均值: 1.39346
方差: 0.0075982484
標準差: 0.0871679321769
===================
===================
均值: 1.39223
方差: 0.0058431971
標準差: 0.0764408078189
===================

2.T檢驗

如果均值的誤差重疊,則認為軟件迭代對性能沒有影響。顯著性檢驗是為了檢查兩組樣本有沒有顯著性差異,通過校驗可以說明這兩組數據的可信度。

其實T檢驗更適合服從正態分布的小樣本判斷,大樣本應采用z檢驗。但由于我對小樣本跟大樣本都有對應測試,得到了同樣的結論(ps:具體t值不同),故這里暫時先用原來的大樣本來處理。

顯著性檢驗腳本:

#!/usr/bin/python  
import string  
import math  
import sys  
   
from scipy.stats import  t  
import matplotlib.pyplot as plt  
import numpy as np  
   
##############  
# Parameters #  
##############  
ver = 1 
verbose = 0 
alpha = 0.05 
   
def usage():  
    print """ 
    usage: ./program data_file(one sample in one line) 
    """ 
   
def main():  
   
    sample1 = [1.15, 1.119, 1.098, 1.147, 1.092, 1.131, 1.17, 1.138, 1.115, 1.143, 1.126, 1.182, 1.124, 1.145, 1.093, 1.131, 1.102, 1.191, 1.093, 1.089, 1.115, 1.128, 1.119, 1.163, 1.143, 1.114, 1.098, 1.142, 1.126, 1.213, 1.279, 1.125, 1.174, 1.103, 1.13, 1.089, 1.164, 1.106, 1.155, 1.085, 1.186, 1.155, 1.207, 1.081, 1.122, 1.112, 1.137, 1.096, 1.078, 1.122, 1.11, 1.095, 1.132, 1.134, 1.118, 1.117, 1.116, 1.116, 1.108, 1.14, 1.099, 1.124, 1.113, 1.203, 1.135, 1.124, 1.098, 1.105, 1.082, 1.107, 1.155, 1.164, 1.096, 1.175, 1.17, 1.161, 1.093, 1.152, 1.085, 0.969, 1.068, 0.95, 1.077, 0.999, 1.147, 1.144, 1.097, 1.119, 1.126, 1.148, 1.083, 1.106, 1.107, 1.094, 1.121, 1.136, 1.086, 1.141, 1.119, 1.153]
    sample2 = [1.154, 1.094, 1.131, 1.087, 1.148, 1.046, 1.228, 1.142, 0.931, 1.063, 1.12, 1.08, 1.129, 1.073, 1.116, 1.081, 1.177, 1.081, 1.133, 1.093, 1.13, 1.085, 1.125, 1.062, 1.133, 1.062, 0.927, 1.055, 1.202, 1.162, 1.102, 1.098, 1.126, 1.144, 1.088, 1.131, 1.105, 1.094, 1.099, 1.112, 1.158, 1.181, 1.107, 0.937, 1.082, 1.1, 1.06, 1.114, 1.088, 1.141, 1.085, 1.232, 1.131, 1.155, 1.069, 1.149, 1.088, 1.125, 1.074, 1.13, 1.053, 1.102, 1.128, 1.166, 1.101, 1.192, 1.073, 1.131, 1.057, 1.098, 1.077, 1.119, 1.084, 1.164, 1.114, 1.148, 1.063, 1.113, 1.084, 1.063, 1.05, 1.078, 1.112, 1.181, 1.109, 1.087, 1.075, 1.078, 1.109, 1.081, 1.104, 1.059, 1.099, 1.142, 1.084, 1.084, 1.09, 1.089, 1.14, 1.105]
     
    sample_len = len(sample1)
    sample_diff = []  
   
    for i in range(sample_len):  
        sample_diff.append(sample1[i] - sample2[i])  
   
    if (verbose):  
        print("sample_diff = ", sample_diff)  
   
   
    ######################  
    # Hypothesis testing #  
    ######################  
    sample = sample_diff  
   
    numargs = t.numargs  
    [ df ] = [sample_len - 1,] * numargs  
    if (verbose):  
        print("df(degree of freedom, student's t distribution parameter) = ", df)  
   
    sample_mean = np.mean(sample)  
    sample_std = np.std(sample, dtype=np.float64, ddof=1)  
    if (verbose):  
        print("mean = %f, std = %f" % (sample_mean, sample_std))  
   
    abs_t = math.fabs( sample_mean / (sample_std / math.sqrt(sample_len)) )  
    if (verbose):  
        print("t = ", abs_t)  
   
    t_alpha_percentile = t.ppf(1 - alpha / 2, df)  
   
    if (verbose):  
        print("abs_t = ", abs_t)  
        print("t_alpha_percentile = ", t_alpha_percentile)  
   
    if (abs_t >= t_alpha_percentile):  
        print "REJECT the null hypothesis" 
    else:  
        print "ACCEPT the null hypothesis" 
   
    ########  
    # Plot #  
    ########  
    rv = t(df)  
    limit = np.minimum(rv.dist.b, 5)  
    x = np.linspace(-1 * limit, limit)  
    h = plt.plot(x, rv.pdf(x))  
    plt.xlabel('x')  
    plt.ylabel('t(x)')  
    plt.title('Difference significance test')  
    plt.grid(True)  
    plt.axvline(x = t_alpha_percentile, ymin = 0, ymax = 0.095,   
            linewidth=2, color='r')  
    plt.axvline(x = abs_t, ymin = 0, ymax = 0.6,   
            linewidth=2, color='g')  
   
    plt.annotate(r'(1 - $\alpha$ / 2) percentile', xy = (t_alpha_percentile, 0.05),  
            xytext=(t_alpha_percentile + 0.5, 0.09), arrowprops=dict(facecolor = 'black', shrink = 0.05),)  
    plt.annotate('t value', xy = (abs_t, 0.26),  
            xytext=(abs_t + 0.5, 0.30), arrowprops=dict(facecolor = 'black', shrink = 0.05),)  
   
    leg = plt.legend(('Student\'s t distribution', r'(1 - $\alpha$ / 2) percentile', 't value'),   
            'upper left', shadow = True)  
    frame = leg.get_frame()  
    frame.set_facecolor('0.80')  
    for i in leg.get_texts():  
        i.set_fontsize('small')  
   
    for l in leg.get_lines():  
        l.set_linewidth(1.5)  
   
    normalized_sample = [0] * sample_len   
    for i in range(0, sample_len):  
        normalized_sample[i] = (sample[i] - sample_mean) / (sample_std / math.sqrt(sample_len))  
    plt.plot(normalized_sample, [0] * len(normalized_sample), 'ro')  
    plt.show()  
   
if __name__ == "__main__":  
    main()

輪流替換sample里的值。為了保證結果是可行的,先用numpy生成了兩組服從標準正態分布的測試數據來說明。

檢驗結果如下:

輸出為:ACCEPT the null hypothesis。

意思是這兩組數據沒有顯著性差異(均值)

另外對我們云測設備的數據進行測試。

  1. 第一組測試:

輸出:REJECT the null hypothesis(代表我們數據存在顯著性差異)

2.第二組測試:

輸出:REJECT the null hypothesis(代表我們數據存在顯著性差異)

3.第三組測試:

輸出:ACCEPT the null hypothesis(代表我們的數據沒有顯著性差異)

四.總結

1.通過正態性檢驗-方差齊性檢驗-t檢驗后,真正能用的數據就只剩下第三組。

base_list_3:
[1.359, 1.415, 1.395, 1.318, 1.345, 1.417, 1.36, 1.373, 1.337, 1.332, 1.498, 1.318, 1.392, 1.364, 1.397, 1.793, 1.341, 1.364, 1.428, 1.345, 1.418, 1.364, 1.372, 1.541, 1.465, 1.373, 1.337, 1.52, 1.375, 1.367, 1.366, 1.347, 1.334, 1.422, 1.354, 1.369, 1.413, 1.345, 1.373, 1.363, 1.464, 1.344, 1.324, 1.331, 1.405, 1.355, 1.674, 1.38, 1.352, 1.339, 1.326, 1.362, 1.431, 1.774, 1.312, 1.292, 1.384, 1.473, 1.337, 1.406, 1.412, 1.385, 1.292, 1.384, 1.342, 1.333, 1.435, 1.372, 1.42, 1.315, 1.344, 1.414, 1.51, 1.334, 1.308, 1.468, 1.401, 1.316, 1.373, 1.407, 1.474, 1.382, 1.346, 1.373, 1.366, 1.378, 1.315, 1.417, 1.431, 1.379, 1.324, 1.383, 1.349, 1.4, 1.327, 1.734, 1.395, 1.412, 1.438, 1.384]
cmp_list_3:
[1.414, 1.326, 1.421, 1.371, 1.363, 1.36, 1.417, 1.34, 1.357, 1.429, 1.308, 1.324, 1.351, 1.323, 1.367, 1.412, 1.391, 1.661, 1.34, 1.38, 1.528, 1.417, 1.352, 1.569, 1.32, 1.473, 1.531, 1.445, 1.407, 1.529, 1.356, 1.349, 1.362, 1.358, 1.375, 1.365, 1.317, 1.302, 1.342, 1.351, 1.393, 1.473, 1.392, 1.299, 1.367, 1.381, 1.354, 1.374, 1.551, 1.448, 1.387, 1.361, 1.358, 1.362, 1.568, 1.343, 1.334, 1.378, 1.417, 1.382, 1.421, 1.345, 1.336, 1.302, 1.349, 1.381, 1.374, 1.359, 1.38, 1.553, 1.34, 1.269, 1.353, 1.329, 1.649, 1.392, 1.367, 1.377, 1.403, 1.361, 1.352, 1.466, 1.389, 1.346, 1.345, 1.35, 1.383, 1.446, 1.613, 1.395, 1.402, 1.394, 1.348, 1.353, 1.395, 1.345, 1.274, 1.425, 1.351, 1.586]
(0.0077692351582683648, 0.92985189389348166)
方差齊性檢驗通過,可以認為方差相等(說明硬件或者執行時間不同可能帶來的誤差可以忽略)!
===================
均值: 1.39346
方差: 0.0075982484
標準差: 0.0871679321769
===================
===================
均值: 1.39223
方差: 0.0058431971
標準差: 0.0764408078189
===================

可以看到這兩組數據的均值跟方差均比較接近,也是比較符合我們經驗結果的測試數據。

2.同一個包,同一臺設備的啟動時間測試結論如下

(1).三組測試數據失敗兩組,足以說明我們的測試很不穩定。(需要找目前測試不穩定的原因,主要是目前引入的變量)

(2).兩組樣本通過方差齊性檢驗,說明我們不需要引入新的測試變量,如cpu,內存變化,以及硬件等對啟動時間的影響。

(3).通過控制t分布的置信區間,可以動態調整對應的數據均值范圍。

數據分析咨詢請掃描二維碼

若不方便掃碼,搜微信號:CDAshujufenxi

數據分析師資訊
更多

OK
客服在線
立即咨詢
日韩人妻系列无码专区视频,先锋高清无码,无码免费视欧非,国精产品一区一区三区无码
客服在線
立即咨詢