'Python_/Analysis' 카테고리의 글 목록

Python_/Analysis

dict , iteritems() 2017.10.02
<Pandas 데이터 분석> 1. 디렉토리 설정 2017.01.16
파이썬 웹 크롤링 연습 2017.01.07
네이버영화리뷰 크롤링 feat.Beautifulsoup 2016.12.06
[파이썬 모듈 설치] Beautiful Soup4 install, import 윈도우 cmd에서 실행 2016.12.06

dict , iteritems()

Jr.Kelly 2017. 10. 2. 00:15

2017. 10. 2. 00:15

밑바닥부터 시작하는 데이터과학 ch1 코드를 연습하다가,

요구사항: 근속연수를 입력하면 구간으로 나누어서

구간(key)에 속하는 평균 연봉을 뽑는 함수를 만드시오

1. 경력을 몇개의 구간으로 나누는 함수

def tenure_bucket(tenure):

if tenure < 2:

print("2년 미만 근무시 평균 연봉")

elif 2<=tenure < 5:

print("2년이상 5년 미만 근무시 평균 연봉")

elif 5<=tenure:

print("5년이상 근무시 평균연봉")

이 다음에, 근속연수를 위의 함수처럼 나눌 수 있게 하는 로직을 못짰다. 하지만 이렇게 간단하다!

2. 각 연봉을 해당 구간에 대응시키는 로직

(1) 새로운 dict을 만든다 defaultdict

(key:해당구간 values:연봉)

collection 모듈 이해하기

'Python_ > Analysis' 카테고리의 다른 글

<Pandas 데이터 분석> 1. 디렉토리 설정 (0)	2017.01.16
파이썬 웹 크롤링 연습 (0)	2017.01.07
네이버영화리뷰 크롤링 feat.Beautifulsoup (0)	2016.12.06
[파이썬 모듈 설치] Beautiful Soup4 install, import 윈도우 cmd에서 실행 (0)	2016.12.06

<Pandas 데이터 분석> 1. 디렉토리 설정

Jr.Kelly 2017. 1. 16. 15:24

2017. 1. 16. 15:24

디렉토리를 변경하는 방법

이제 이 디렉토리안의 csv파일을 검색하게 된다!

'Python_ > Analysis' 카테고리의 다른 글

dict , iteritems() (0)	2017.10.02
파이썬 웹 크롤링 연습 (0)	2017.01.07
네이버영화리뷰 크롤링 feat.Beautifulsoup (0)	2016.12.06
[파이썬 모듈 설치] Beautiful Soup4 install, import 윈도우 cmd에서 실행 (0)	2016.12.06

파이썬 웹 크롤링 연습

Jr.Kelly 2017. 1. 7. 03:33

2017. 1. 7. 03:33

https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python

파이썬 웹크롤링 가이드 feat. jupyternotebook(0107)

from __future__ import print_function
import os.path
from collections import defaultdict
import string
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

1. defaultdict 자료저장

출처: https://docs.python.org/2/library/collections.html

8.3.3.1. `defaultdict` Examples

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
...     d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

2. python request 모듈

result = requests.get(url)
c = result.content
soup = BeautifulSoup(c)

request.content 와 request.text의 차이점

r.text is the content of the response in unicode, and r.content is the content of the response in bytes.

3. Data를 가져올 HTML 구조를 파악

clean up function

def convert_num(val):
    """
    Convert the string number value to a float
     - Remove all extra whitespace
     - Remove commas
     - If wrapped in (), then it is negative number
    """
    val = string.strip(val).replace(",","").replace("(","-").replace(")","")
    return float(val)

4. Parse the HTML

<파이썬 웹 크롤러 연습할 떄 참고한 사이트>

http://creativeworks.tistory.com/entry/PYTHON-3-Tutorials-24-%EC%9B%B9-%ED%81%AC%EB%A1%A4%EB%9F%AClike-Google-%EB%A7%8C%EB%93%A4%EA%B8%B0-1-How-to-build-a-web-crawler

<파이썬 프로젝트 구성하기>

http://python-guide-kr.readthedocs.io/ko/latest/writing/structure.html

<파이썬 스터디 URL>

http://blog.naver.com/dudwo567890/220914435973

'Python_ > Analysis' 카테고리의 다른 글

dict , iteritems() (0)	2017.10.02
<Pandas 데이터 분석> 1. 디렉토리 설정 (0)	2017.01.16
네이버영화리뷰 크롤링 feat.Beautifulsoup (0)	2016.12.06
[파이썬 모듈 설치] Beautiful Soup4 install, import 윈도우 cmd에서 실행 (0)	2016.12.06

네이버영화리뷰 크롤링 feat.Beautifulsoup

Jr.Kelly 2016. 12. 6. 16:33

2016. 12. 6. 16:33

pip install requests

파이썬 모듈 설치를 위해 pip을 install 해야한다.. 방법은 구글링으로..!!

파이썬으로 웹에 접근하기 위해 써야하는 코드를 실행하기 위해 request lib를

설치해주었다.

*****python2.x 에서는 request 가 실행이 되지 않는다.ㅠㅠ 나는 python2.X 환경에서 하고 있어서 ,

import re

from bs4 import BeautifulSoup

from urllib2 import urlopen

urllib2 모듈 참고 사이트

네이버 영화 페이지에서 영화제목,리뷰,평점,날짜를 크롤링 해오는 코드이다.

코드에 대한 분석은 wikidoc의 점프 투 파이썬을 참고하여 올릴 예정

1. 네이버 영화 평점 페이지 F12 태그 분석

2. Beautiful soup Document 태그 navigate method 참고

***navermovie.py 를 cmd에 실행해보았더니,

no encoding error

해결 >>> #-*- coding: utf-8 -*- 를 vi 맨 위에 적어준다 .

but 여전히 유니코드 에러가 뜬다.

print html로 확인했더니

python html 파서결과 한글이 깨져 나온다

Unicode Encode Error: 'cp 949'

shell로 확인해보면 cp949로 되어있는 것을 확인할 수 있다.

윈도우 환경변수 설정>cmd>PYTHONIOENCODING의 값을 utf-8로 설정

ctrl + r > control (제어판)

파이썬 인코딩/디코딩 참고 사이트

Parser library를 설치 필요

soup = BeautifulSoup(html,"lxml")

에서 에러가 남.

참고

위의 코드를 사용하기 전에

Python2.7 compiler가 설치되어야한다.

2. Microsoft visual c++ 이 설치 되어있어야한다.

https://www.microsoft.com/en-us/download/confirmation.aspx?id=44266

C:\Users\yeseul\VC

$ pip install lxml

참고

'Python_ > Analysis' 카테고리의 다른 글

dict , iteritems() (0)	2017.10.02
<Pandas 데이터 분석> 1. 디렉토리 설정 (0)	2017.01.16
파이썬 웹 크롤링 연습 (0)	2017.01.07
[파이썬 모듈 설치] Beautiful Soup4 install, import 윈도우 cmd에서 실행 (0)	2016.12.06

[파이썬 모듈 설치] Beautiful Soup4 install, import 윈도우 cmd에서 실행

Jr.Kelly 2016. 12. 6. 16:16

2016. 12. 6. 16:16

1. Beautifulsoup install 후 알집으로 tar.gz 풀기

2. cmd 실행 후 python 환경에서 모듈이 잘 설치되어있다 확인한다.

from bs4 import BeautifulSoup

3. sublime에서 beautifulsoup.py 파일 생성 후 dir를 C:\Users\yeseul\pythonworkspace에 .py를 저장한 후에

cmd 에서는 python환경이 아닌 .py 파일의 생성 경로로 경로를 바꾼 후 파일을 열어본다.

모듈이 제대로 설치 되어있나 헬로월드를 프린트 해보았다.

가끔 실행할 떄 dir이 다른 곳에 되어있어 헤메길래 정리를 해둠!

참고 사이트:

https://wikidocs.net/2573

'Python_ > Analysis' 카테고리의 다른 글

dict , iteritems() (0)	2017.10.02
<Pandas 데이터 분석> 1. 디렉토리 설정 (0)	2017.01.16
파이썬 웹 크롤링 연습 (0)	2017.01.07
네이버영화리뷰 크롤링 feat.Beautifulsoup (0)	2016.12.06

PREV 이전 1 NEXT 다음

매일이 쌓이는 이야기

Python_/Analysis

dict , iteritems()

'Python_ > Analysis' 카테고리의 다른 글

<Pandas 데이터 분석> 1. 디렉토리 설정

'Python_ > Analysis' 카테고리의 다른 글

파이썬 웹 크롤링 연습

8.3.3.1. `defaultdict` Examples

'Python_ > Analysis' 카테고리의 다른 글

네이버영화리뷰 크롤링 feat.Beautifulsoup

'Python_ > Analysis' 카테고리의 다른 글

[파이썬 모듈 설치] Beautiful Soup4 install, import 윈도우 cmd에서 실행

'Python_ > Analysis' 카테고리의 다른 글

+ Recent posts

티스토리툴바

매일이 쌓이는 이야기

Python_/Analysis

dict , iteritems()

'Python_ > Analysis' 카테고리의 다른 글

<Pandas 데이터 분석> 1. 디렉토리 설정

'Python_ > Analysis' 카테고리의 다른 글

파이썬 웹 크롤링 연습

8.3.3.1. defaultdict Examples

'Python_ > Analysis' 카테고리의 다른 글

네이버영화리뷰 크롤링 feat.Beautifulsoup

'Python_ > Analysis' 카테고리의 다른 글

[파이썬 모듈 설치] Beautiful Soup4 install, import 윈도우 cmd에서 실행

'Python_ > Analysis' 카테고리의 다른 글

+ Recent posts

티스토리툴바

8.3.3.1. `defaultdict` Examples