파이썬 웹 크롤링 연습
https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python
파이썬 웹크롤링 가이드 feat. jupyternotebook(0107)
from __future__ import print_function import os.path from collections import defaultdict import string import requests from bs4 import BeautifulSoup import pandas as pd import numpy as np import matplotlib.pyplot as plt
1. defaultdict 자료저장
출처: https://docs.python.org/2/library/collections.html
8.3.3.1. defaultdict
Examples
2. python request 모듈
result = requests.get(url)
c = result.content
soup = BeautifulSoup(c)
request.content 와 request.text의 차이점
r.text
is the content of the response in unicode, and r.content
is the content of the response in bytes.
3. Data를 가져올 HTML 구조를 파악
clean up function
def convert_num(val): """ Convert the string number value to a float - Remove all extra whitespace - Remove commas - If wrapped in (), then it is negative number """ val = string.strip(val).replace(",","").replace("(","-").replace(")","") return float(val)
4. Parse the HTML
<파이썬 웹 크롤러 연습할 떄 참고한 사이트>
http://creativeworks.tistory.com/entry/PYTHON-3-Tutorials-24-%EC%9B%B9-%ED%81%AC%EB%A1%A4%EB%9F%AClike-Google-%EB%A7%8C%EB%93%A4%EA%B8%B0-1-How-to-build-a-web-crawler
<파이썬 프로젝트 구성하기>
http://python-guide-kr.readthedocs.io/ko/latest/writing/structure.html
<파이썬 스터디 URL>