[데이터 분석] python을 활용한 데이터 분석 (1)

Python을 활용한 데이터 분석(1) - 패키지 소개

- 데이터분석용 주요 패키지

1) numpy: 수치, 통계, 선형대수

2) mayplotlib

3) scipy

4) pandas: array를 이용한 다차원의 data structure ..

1. Numpy

수치 데이터 처리 기능을 확장

array
asarray
arange
ones, ones_like
zeros, zeros_like, empty, empty_like
eye, identity

universal functions

- 일종의 vectorized wrapper

1) abs, fabs

2) sqrt

3) square

4) exp : 자연 지수

5) log, log10, log2, log1p

6) sign

7) ceil

8) floor

9) rint

10) modf

... 등등

2. Matplotlib

plotting library for the python and its numpy.

plot을 애플리케이션에 내장하기 위한 object-oriented API

- pylab

- state machine 기반

- matlab과 유사

3. Scipy

과학, 분석 용 오픈소스 기반 python library

numpy와 scipy

- numpy array object위에서 구축/개발됨

-numpy stack의 일부분

: matplotlib, pandas 및 sympy포함

4. Pandas

python을 이용한 데이터 분석을 위한 software library

- 데이터 정제: data munging/preparation/cleaning/integration

- rich data manipulation tool(numpy 이용)

- fast, intuitive data structure

- python과 DSL(domain specific language)의 중간영역

- R의 data.frame과 유사: column과 row형태를 가지고 있음

- easy to use, highly consistent API

주요기능

1. Dataframe object - integrated indexing을 이용한 데이터 분석

2. 여러 포맷 지원

3. data alignment 및 결측 데이터를 위한 통합기능

4. 데이터셋의 reshaping 및 pivoting

5. 대규모 dataset용 label-based slicing, indexing, subsetting

6. 데이터 aggregating/transforming data(group by 엔진)

7.Hierarchical axis indexing

8. Time series 기능

Pandas.core

1. Data structure: Labeling, Slicing 등등

1) Series(1D)

- numpy array

- subclass of numpy.ndarray

- Data: any type

- index labels need not be ordered

- duplicates are possible

2) DataFrame(2D)

- potentially heterogeneous columns

- ndarray-like, but not ndarry

- column 별로 서로 다른 dtype을 가질 수 있다.

- row and column index

- size mutable: insert and delete columns

3) Panel (3D)

2. NA-friendly statistics: 결측치, 누락치 등을 쉽게 처리

3. Index implementations/label-indexing

4. GroupBy engine: SQL의 group by와 유사

5. Time Series tools

1) Data range generation

2) Extensible data offsets

6. Hierarchical indexing stuff

index란?

Every axis has an index

- 신속한 lookup과 Data alignment and join operations

- Hierarchical indexes

1) Semantics: a tule at each tick

2) Enables easy group selection

3) Terminology: "multiple levels" - 여러단계로 index를 나누기

4) Natural part of GroupBy and reshape operations

- Data Alignment: default: outer join

1) binary operations are joins

2) DataFrame joins/aligns on both axes

3) Irregularly-indexed data

저작자표시 비영리 변경금지

'Data > 데이터 분석·통계' 카테고리의 다른 글

[데이터 분석] python을 활용한 데이터 분석 (2) - ipyhon (0)	2018.07.08
[데이터 분석] python을 활용한 데이터 분석(1) - 패키지 설치 (0)	2018.07.08
[(빅)데이터 교육] 데이터 분석의 기초 - 데이터의 탐색 (0)	2018.06.25
[(빅)데이터 교육] 데이터 분석의 기초 - 분포에 관한 추론 (0)	2018.06.25
[(빅)데이터 교육] 데이터 분석의 기초 - 통계적 추론 (0)	2018.06.22

지-영

[데이터 분석] python을 활용한 데이터 분석 (1) - 패키지 소개

'Data > 데이터 분석·통계' 카테고리의 다른 글

댓글

티스토리툴바

[데이터 분석] python을 활용한 데이터 분석 (1) - 패키지 소개

'Data > 데이터 분석·통계' 카테고리의 다른 글

관련글

댓글

티스토리툴바