R上dplyr
包很著名,主要做EDA,一行代码分析完的思路非常优秀,但是有几个团队python都开发了类dplyr
,有一个是Coursera团队开发(pandas-ply),因此不用担心维护,另一个逼近原作,但是维护团队有点怕gg了,只能说take risk了。
pip install dplython
import pandas as pd
from dplython import (DplyFrame, X, diamonds, select, sift,
sample_n, sample_frac, head, arrange, mutate, group_by,
summarize, DelayFunction)
diamonds
数据表已经内置,所以直接调用就好了。
type(diamonds)
看来是已经内置了格式了!
diamonds >> head()
select
¶diamonds >> select(X.carat) >> head()
sift
(filter
)¶diamonds >> sift(X.carat > X.carat.mean()) >> head()
sample_n
¶diamonds >> sample_n(5)
arrange
¶diamonds >> arrange(X.carat) >> head()
mutate
神级函数¶diamonds >> mutate(caratsq = X.carat ** 2) >> select(X.carat,X.caratsq) >> head()
group_by
和summarize
¶diamonds >> group_by(X.color) >> summarize(mean_carat = X.carat.mean())