R語言與函數估計學習筆記（樣條方法）-CDA數據分析師官網

熱線電話：13121318867

登錄

首頁精彩閱讀R語言與函數估計學習筆記（樣條方法）

R語言與函數估計學習筆記（樣條方法）

2017-07-20

收藏

R語言與函數估計學習筆記（樣條方法）

樣條估計

如果函數在不同地方有不同的非線性度，或者有多個極值點，那么用多項式特別是低階多項式來完成擬合是非常不合適的。一種解決辦法是我們之前提到的近鄰多項式（或者稱局部多項式），另一種就是樣條——用分段的低階多項式逼近函數。
關于樣條，常用的有兩類，一類是多項式樣條，另一類是光滑樣條。

多項式樣條

多項式樣條的樣條基有很多，最為著名的是我們之前在函數逼近中提到的truncated power basis與B-spline basis。我們這里十分簡要的介紹一下B樣條，B樣條基下的函數逼近可以寫為：

其中

上式中否則取0.在R中splines包的函數bs()提供了B樣條估計，其調用格式為：

bs(x, df = NULL, knots = NULL, degree = 3, intercept = FALSE, Boundary.knots = range(x))

對于參數df值得說明的是df=degree+（Knots個數），attr(,“knots”)會顯示劃分點，我們常用的3次B樣條公式: df=k+3 (不含常數項)
我們以前面提到的essay data為例說明B樣條的估計情況：

easy <- read.table("D:/R/data/easysmooth.dat", header = T)
x <- easy$X
y <- easy$Y
m.bsp <- lm(y ~ bs(x, df = 6))

s = function(x) {
(x^3) * sin((x + 3.4)/2)
}
x.plot = seq(min(x), max(x), length.out = 1000)
y.plot = s(x.plot)
plot(x, y, xlab = "Predictor", ylab = "Response")
lines(x.plot, y.plot, lty = 1, col = 1)
lines(x, fitted(m.bsp), lty = 2, col = 2)

attr(bs(x, df = 6), "knots") #可以將看到，節點在不指定的情況下默認的是均勻樣條，當然，我們可以根據散點圖給#出節點的具體選擇。

## 25% 50% 75%
## -1.875 -0.250 1.375

m.bsp1 <- lm(y ~ bs(x, df = 6, knots = c(-2.5, -1, 2)))
lines(x, fitted(m.bsp1), lty = 3, col = 3)

AIC(m.bsp)

## [1] 718.1

AIC(m.bsp1)

## [1] 727.4

summary(m.bsp)

##
## Call:
## lm(formula = y ~ bs(x, df = 6))
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -3.790 -0.911 -0.065 0.892 4.445
##
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)       1.816      0.622    2.92   0.0039 **
## bs(x, df = 6)1 -10.552      1.161   -9.09 < 2e-16 ***
## bs(x, df = 6)2   -7.127      0.755   -9.44 < 2e-16 ***
## bs(x, df = 6)3    0.813      0.926    0.88   0.3808
## bs(x, df = 6)4   -4.056      0.859   -4.72 4.5e-06 ***
## bs(x, df = 6)5    5.781      0.967    5.98 1.1e-08 ***
## bs(x, df = 6)6   -3.505      0.865   -4.05 7.4e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.42 on 193 degrees of freedom
## Multiple R-squared: 0.824, Adjusted R-squared: 0.819
## F-statistic: 151 on 6 and 193 DF, p-value: <2e-16

可以看到B樣條基本很接近真實函數了，summary(m.bsp)報告了各個系數的估計，帶入f(x)的B樣條基展開中即可得到一個顯式的表達式。

光滑樣條

雖然B樣條已經很好了，但是理論與實踐都表明直接用最小二乘去求解系數效果不好，容易過擬合。一個可能的改進是光滑樣條。所謂的光滑樣條，就是在求解最小二乘時給估計函數f(x)加上了一定的懲罰，這個有點類似壓縮估計。我們這里采用最常用的光滑性懲罰，得到函數f(x)的估計m(x)滿足如下的懲罰最小二乘：

在R的splines包中提供了函數smooth.spline來求解光滑樣條

easy <- read.table("D:/R/data/easysmooth.dat", header = T)
x <- easy$X
y <- easy$Y
s.hat <- smooth.spline(x, y)

## OUTPUT
s.hat

## Call:
## smooth.spline(x = x, y = y)
## 
## Smoothing Parameter  spar= 0.7251  lambda= 0.0002543 (12 iterations)
## Equivalent Degrees of Freedom (Df): 11.56
## Penalized Criterion: 380.9
## GCV: 2.145

## OUTPUT PLOTS
s <- function(x) {
    (x^3) * sin((x + 3.4)/2)
}
x.plot = seq(min(x), max(x), length.out = 1000)
y.plot = s(x.plot)
plot(x, y, xlab = "Predictor", ylab = "Response")
lines(x.plot, y.plot, lty = 1, col = 1)
lines(s.hat, lty = 2, col = 2)

最后我們來講一下怎么計算出m(x),這里我們使用Reinsch algorithm。Step 1: 計算向量Q′y.Step 2: 找到一個非0對角陣R+λQ′Q使得它可以進行Cholesky分解，有因子L，DStep 3: 解方程：(R+λQ′Q)γ=Q′yStep 4: 得到估值m=y?αQγ.上面的Q與R可以表示為：

上面的t表示節點。我們不妨來算算essay data的例子：

easy <- read.table("D:/R/data/easysmooth.dat", header = T)
x <- easy$X
y <- easy$Y

n <- length(y)
knots <- seq(min(x), max(x), length = n + 1)
h <- knots[-1] - knots[-n]
Q <- matrix(0, n, n - 2)
R <- matrix(0, n - 2, n - 2)
for (i in 1:(n - 2)) {
    Q[i, i] = 1/h[i]
    Q[i + 1, i] = -1/h[i] - 1/h[i + 1]
    Q[i + 2, i] = 1/h[i + 1]
}
for (i in 2:(n - 2)) {
    R[i, i] = 1/6 * (h[i] + h[i + 1])
    R[i - 1, i] = h[i]/6
    R[i, i - 1] = h[i]/6
}
R[1, 1] = 1/6 * (h[1] + h[2])
lambda <- 0.2
A <- R + lambda * t(Q) %*% Q
gamma <- solve(A, t(Q) %*% as.matrix(y))

g <- as.matrix(y) - lambda * Q %*% gamma

s <- function(x) {
    (x^3) * sin((x + 3.4)/2)
}
x.plot <- seq(min(x), max(x), length.out = 1000)
y.plot <- s(x.plot)
plot(x, y, xlab = "Predictor", ylab = "Response")
lines(x.plot, y.plot, lty = 1, col = 1)
lines(x, g, lty = 2, col = 2)

在懲罰系數為0.2的情況下，擬合還是不壞的，不是嗎？至于為什么可以這樣算，我們只要注意到\int [m^{''}(x)]dx=m^'(x_i)QR^{-1}Q^'m(x_i),估計的問題就與我們十分熟悉的lasso，嶺回歸十分相像了。

CDA數據分析師考試相關入口一覽（建議收藏）：

? 想報名CDA認證考試，點擊>>> “CDA報名” 了解CDA考試詳情；

? 想學習CDA考試教材，點擊>>> “CDA教材” 了解CDA考試詳情；

? 想加入CDA考試題庫，點擊>>> “CDA題庫” 了解CDA考試詳情；

? 想了解CDA考試含金量，點擊>>> “CDA含金量” 了解CDA考試詳情；

最小二乘壓縮過擬合 R語言散點圖

數據分析咨詢請掃描二維碼

若不方便掃碼，搜微信號：CDAshujufenxi

上一篇回歸系列（一）| 怎樣正確地理解線性回歸

下一篇2020美國總統競選大戲開鑼，川普當選的奇跡會再發生嗎？

數據分析師考試動態

考試介紹
考試大綱
考試內容
考試地點

CDA報考指南

報考流程
考試時間
報名費用
聯系我們

數據分析學習

數據分析師資訊

更多

Copyright © 2015-2021, www.ruiqisteel.com All Rights Reserved. CDA數據分析師(北京國富如荷網絡科技有限公司) 版權所有京ICP備11001960號-9

京公網安備 11010802034615號經營許可證編號：京B2-20210330

聯系電話：13321103290 (微信同號)

OK

免費資料
免費試聽
訂制課程
職業規劃
認證考試

客服在線

日韩人妻系列无码专区视频,先锋高清无码,无码免费视欧非,国精产品一区一区三区无码

客服在線

立即咨詢

免密碼登錄

提交首次登錄驗證后自動注冊