熱線電話:13121318867

登錄
首頁精彩閱讀R語言學習之數據的清理和轉化
R語言學習之數據的清理和轉化
2018-06-20
收藏

R語言學習之數據的清理和轉化

處理字符串
grep grepl 和regexpr函數都能找到與模式相匹配的字符串 sub 和 gsub函數能替換匹配的字符串
加載strngr包,fixed里面為要匹配的字符串 返回匹配的字符串序列

[plain] view plain copy

    > library(stringr)  
    > multiple <- str_detect(english_monarchs$domain,fixed(","))  
    > english_monarchs[multiple,c("name","domain")]  
                                            name                    domain  
    17                                      Offa       East Anglia, Mercia  
    18                                      Offa East Anglia, Kent, Mercia  
    19                         Offa and Ecgfrith East Anglia, Kent, Mercia  
    20                                  Ecgfrith East Anglia, Kent, Mercia  
    22                            C<U+009C>nwulf East Anglia, Kent, Mercia  
    23               C<U+009C>nwulf and Cynehelm East Anglia, Kent, Mercia  
    24                            C<U+009C>nwulf East Anglia, Kent, Mercia  
    25                                  Ceolwulf East Anglia, Kent, Mercia  
    26                                 Beornwulf       East Anglia, Mercia  
    82             Ecgbehrt and <U+00C6>thelwulf              Kent, Wessex  
    83             Ecgbehrt and <U+00C6>thelwulf      Kent, Mercia, Wessex  
    84             Ecgbehrt and <U+00C6>thelwulf              Kent, Wessex  
    85    <U+00C6>thelwulf and <U+00C6>eelstan I              Kent, Wessex  
    86                          <U+00C6>thelwulf              Kent, Wessex  
    87 <U+00C6>thelwulf and <U+00C6>eelberht III              Kent, Wessex  
    88                      <U+00C6>eelberht III              Kent, Wessex  
    89                         <U+00C6>thelred I              Kent, Wessex  
    95                                     Oswiu       Mercia, Northumbria  
使用正則表達式來匹配多個要匹配的字符串,這是來匹配逗號和and
[plain] view plain copy

    > ruler <- str_detect(english_monarchs$name,",|and")  
    > english_monarchs[ruler & !is.na(ruler)]  

把name一列拆分掉,則可以使用str_splist函數

[plain] view plain copy

    > indival <- str_split(english_monarchs$name,",|and")  
    > head(indival[sapply(indival,length)>1])  
    [[1]]  
    [1] "Sigeberht " " Ecgric"     
      
    [[2]]  
    [1] "Hun"      " Beonna " " Alberht"  
      
    [[3]]  
    [1] "Offa "     " Ecgfrith"  
      
    [[4]]  
    [1] "C\u009cnwulf " " Cynehelm"      
      
    [[5]]  
    [1] "Sighere " " Sebbi"    
      
    [[6]]  
    [1] "Sigeheard " " Swaefred"   

st_count是用來統計有多少個字符串
[plain] view plain copy

    > str_count(english_monarchs$name,th)  

str_replace函數來代替字符串中的某一個
ignore.case來忽略某一個字符或字符串

數據分析咨詢請掃描二維碼

若不方便掃碼,搜微信號:CDAshujufenxi

數據分析師資訊
更多

OK
客服在線
立即咨詢
日韩人妻系列无码专区视频,先锋高清无码,无码免费视欧非,国精产品一区一区三区无码
客服在線
立即咨詢