1. Html網頁結構介紹-網頁到底如何傳送資料?爬蟲必學
2. 資料傳遞:Get與Post差異,網路封包傳送的差異
3. Html爬蟲Get教學-抓下Yahoo股票資訊,程式交易的第一步
4. Html爬蟲Get實戰-全台最大美食平台FoodPanda爬蟲,把熊貓抓回家
5. 資料分析實戰,熊貓FoodPanda熱門美食系列|看出地區最火料理種類
6. Json爬蟲教學-Google趨勢搜尋|掌握最火關鍵字
7. Json爬蟲實戰-24小時電商PChome爬蟲|雖然我不是個數學家但這聽起來很不錯吧
8. Html爬蟲Post教學-台灣股市資訊|網韭菜們的救星
9. Html爬蟲Post實戰-全球美食平台UberEat爬蟲
10. Pandas爬蟲教學-Yahoo股市爬蟲|不想再盯盤
11. Pandas爬蟲實戰-爬下全台各地區氣象預報歷史資料
12. 資料分析實戰-天氣預報圖像化|一張圖巧妙躲過雨季
天ä¸æ¦åï¼çºå¿«ä¸ç ´ï¼è¡é·äº¦å¦æ¤
å¨å°ç£3ã40æ²çæååï¼å¯è½éæ¯æ¯è¼ç¿æ £ä½¿ç¨PTTï¼åªæå°æ¸ç8å¹´ç´çæ使ç¨PTTçç¿æ £èç¶é©ãå¨è¥¿å 2000å¹´å¾çä¸ä»£ï¼å¤§å¤é½ä½¿ç¨Dcardä¾å大PTTï¼å ¶åå ä¸å¤ä¹è¨è¨è¦ªæ°ã注éé±ç§ãæ ¡åè¨æ¯æ´æ°å°çã
æåå¨åç¯æç« ãDcardç¬è²ï¼çè¨èéãèãDcardç¬è²ï¼æç« è³æãç¶ä¸ï¼å·²ç¶åå¾Dcardçä¸çååæç« èçè¨ãæ¬ç¯æç« å°è§£æå¦ä½ä½¿ç¨æåå¨Dcardçä¸ç¬ä¸ä¾çæç« èçè¨ï¼å©ç¨Python è¦è¦ºåçæ¹å¼ï¼åç¾åºæ¯åæç« ä¸»é¡èæéçéä¿ï¼ä¸ç¼ä¾¿è½çåºçæéå §ççç«åçæç« ä¸»é¡æééµåã
é¦å ï¼éè¦å¯å ¥å åç¬ä¸ä¾çæç« è³æèçè¨è³æï¼ä¸¦ä¸æºåå¤åè²ç¢¼ï¼çºä¹å¾è¦è¦ºç«ä½æºåã
import datetime
import matplotlib.pyplot as plt
import pandas as pd
# è²ç¢¼è¡¨
colors = ['#f44336', '#E91E63', '#9C27B0', '#673AB7', '#3F51B5', '#2196F3',
'#03A9F4', '#00BCD4', '#009688', '#4CAF50', '#8BC34A', '#CDDC39',
'#FFEB3B', '#FFC107', '#FF9800', '#FF5722', '#795548', '#9E9E9E',
'#607D8B', '#212121']
# è³æä¾èªtalkç
dcard_article = pd.read_csv('Dcardæç« è³æ.csv')
dcard_comment = pd.read_csv('Dcardçè¨è³æ.csv')
dcard_article.columns
dcard_comment.columns
å¨é²è¡è³æåæåï¼é½éè¦å å°è³æä½å®æ´çæ¸ çï¼å ¶ä¸ç©ºå¼æ¯çå è¦åèççãæ¬æç« ä½¿ç¨Pandas å¥ä»¶ä¸çfillna()æ¹æ³ä¾å°ææç空å¼çµ¦å¡«æ»¿ã
#--- å
è¡å¡«æ»¿ç©ºå¼
dcard_article['æ¨é¡'] = dcard_article['æ¨é¡'].fillna('')
dcard_article['å
§æç°¡ä»'] = dcard_article['å
§æç°¡ä»'].fillna('')
dcard_article['主é¡æ¨ç±¤'] = dcard_article['主é¡æ¨ç±¤'].fillna('')
dcard_comment['çè¨å
§å®¹'] = dcard_comment['çè¨å
§å®¹'].fillna('')
æéçºæ¬ç 究çéè¦åæ¸ä¹ä¸ï¼å æ¤æééè¦è©³ç´°èçï¼å çºæåè¦æ¾çä¸å主é¡æééµåççé»ï¼å¯è½æ¯å¨å¹¾åå°æå §ç¼ççï¼å æ¤è¥å®ç¾©ä¸åééµåçæéé»ç¨æäºï¼å¯è½å°±æç¡æ³ç¼ç¾éåééµåãå æ¤æåå¨æç« èçè¨çééµåä¸ï¼é¸æææ©çé£åæéï¼ä½çºéåééµåçãèµ·é ãã
# æéè½æ
dcard_article['ç¼ææé'] = pd.to_datetime(dcard_article['ç¼ææé'])
dcard_comment['ç¼ææé'] = pd.to_datetime(dcard_comment['ç¼ææé'])
if dcard_article['ç¼ææé'].min() < dcard_comment['ç¼ææé'].min():
firsttime = dcard_article['ç¼ææé'].min()
else:
firsttime = dcard_comment['ç¼ææé'].min()
Dcard æç« ä¸æ許å¤Tagï¼ä½å¨ç¶éç¬è²ä¸ä¾å¾ï¼éäºListåæ çTagé½æè®æå串çåæ ï¼å æ¤æ¬æç« å©ç¨eval()èapply()æ¹æ³ï¼å°ææçå串åæ è³æï¼è½ææ該è³æãåæçãåæ ã
def evaluation(thestr):
return eval(thestr) # è½æsträ¸çå
§å®¹çæ£åæ
dcard_article['主é¡æ¨ç±¤'] = dcard_article['主é¡æ¨ç±¤'].apply(
evaluation) # å°dataframeçè³æå
§å®¹å¥å
¥æ¹æ³
alltag = dcard_article['主é¡æ¨ç±¤'].sum() # å°æætag串åä¸èµ·
alltag = pd.DataFrame(alltag)
alltag.dropna(inplace=True) # åªé¤ç©ºå¼
alltag.drop_duplicates(0, inplace=True)
å©ç¨while è¿´åï¼è¨ç®æ¯åééµåçç¼æéï¼èå ¶å°æçæéãå¨è¨ç®ä¸ï¼çºäºæ¹ä¾¿çµ±è¨ï¼æ¯ç¯æç« é½å¯ä»¥å°ææçæ¨é¡ãå §æãçè¨å ¨é¨ä¸²å¨ä¸èµ·ï¼ä»¥ä¾¿æ¼å©ç¨Pandas å¥ä»¶çcount() æ¹æ³ä¾é²è¡è¨ç®ã
thetime = []
remember = []
doit = True
while doit:
firsttime = firsttime + datetime.timedelta(hours=1)
print(firsttime)
getdata_article = dcard_article[dcard_article['ç¼ææé'] < firsttime]
getdata_comment = dcard_comment[dcard_comment['ç¼ææé'] < firsttime]
if len(getdata_article) == len(dcard_article) and len(getdata_comment) == len(getdata_comment):
doit = False
else:
thetime.append(firsttime)
allstr = getdata_article['æ¨é¡'].sum() + getdata_article['å
§æç°¡ä»'].sum(
) + getdata_article['主é¡æ¨ç±¤'].astype(str).sum() + getdata_comment['çè¨å
§å®¹'].sum()
temp = []
for i in alltag[0]:
temp.append(allstr.count(i))
remember.append(temp)
timeflow = pd.DataFrame(remember)
timeflow.columns = alltag[0]
æå¾å°ææçç±é主é¡é¡¯ç¤ºåºä¾ï¼å çºä¸»é¡éå¤æé æé£ä»¥å¤è®ï¼å æ¤æååªé¡¯ç¤ºå20åã
count = 0
for i in timeflow.columns:
plt.plot(thetime, timeflow[i],
color=colors[count % 20],
linewidth=5,
alpha=0.3)
if timeflow[i].iloc[-1] > 20:
plt.text(thetime[-1], timeflow[i].iloc[-1], i, fontsize=10) # å ä¸æå註解
count = count + 1
plt.title("çé»åæ", fontsize=30) # æ¨é¡
plt.ylabel('主é¡åºç¾æ¬¡æ¸', fontsize=20) # yçæ¨é¡
plt.xlabel('æé軸', fontsize=20) # xçæ¨é¡
plt.tight_layout()
plt.show()
作者:楊超霆 行銷搬進大程式 創辦人