Python数据分析笔记2 pandas基础

全面翻新的pandas介绍, 篇幅较长, 请善用右下角目录!

1
2


import numpy as np
import pandas as pd

数据结构

Series

1

Series(data=None, index=None, dtype=None, name=None, copy=False, , fastpath=False)

data: 可遍历序列数据
index: 索引, hsahable即可. 默认为从0开始的range
dtype: numpy.dtype数据类型
copy: 是否为data的副本(默认为视图, 对它的改动会直接反映到data上)

1
2
3
4


# 数组初始化
s = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
# 字典初始化
s = pd.Series({'d': 4, 'b':7, 'a':-5, 'c':3})

a   -5
b    7
c    3
d    4
dtype: int64

1
2


# 转换为numpy数组
s.values

array([-5,  7,  3,  4])

1
2


# 取出索引
s.index

Index(['a', 'b', 'c', 'd'], dtype='object')

1
2
3


# name属性: Series的name会自动继承于DataFrame的列名, Index的name会自动继承于Series的索引名
s.name = 'population'
s.index.name = 'state'

DataFrame

1

DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

data: 初始化方式和Series相似, 可以传入二维数组或者字典(key作为列名, value是等长数组)
columns: 列名. 字符串列表

其他参数和Series相似

1
2
3
4
5


# 初始化
df = pd.DataFrame(np.arange(16).reshape((4,4)), 
                  index=['ohio','colorado', 'utah', 'new york'],
                 columns=['one', 'two', 'three', 'four'])
df