出自Quantopian的开源Alpha因子分析工具Alphalens,能对Aplha因子做较全面的分析和图标展示,我们子需要简单地准备好因子数据,便可得到相应的图表和统计结果。
开源库在这里找到Alphalens,说明在这里AlphalensDoc
最新版的Alphalens V0.1.2, 由于Mindgo禁用了部分库,所以需要做简单修改才能使用,为了使用简便,我把它打包成一个文件Alphalens_v12.py
我们用流通市值因子CMC为例:
import pandas as pd
import numpy as np
import datetime
from alphalens_v12 import *
导入需要的库
start_date = '20130101'
end_date = '20171023'
trade_period = 'weekly'
market_start_date = datetime.datetime.strptime(start_date,'%Y%m%d') - datetime.timedelta(days=180) #Half-year early than factor start date.
stock_set_start = get_index_stocks('000300.SH',market_start_date.strftime("%Y%m%d"))
stock_set_end = get_index_stocks('000300.SH',end_date)
# stock_set_start = get_index_stocks('000001.SH',market_start_date.strftime("%Y%m%d"))
# stock_set_start += get_index_stocks('399106.SZ', start_date)
#
# stock_set_end = get_index_stocks('000001.SH',end_date)
# stock_set_end += get_index_stocks('399106.SZ', end_date)
stock_list1 = list(set(stock_set_start).intersection(set(stock_set_end)))
trade_days1 = get_trade_days(start_date, end_date, None)
if trade_period == 'weekly':
trade_day_all = trade_days1[trade_days1.weekday==4]
trade_data_freq = 'week'
else:
trade_day_all = trade_days1
trade_data_freq = '1d'
print("Total assets: %d, time_period: %d" % (len(stock_list1), len(trade_day_all)))
设定回测时间区间,股票池,在此只取沪深300为股票池,并采用周线数据回测
count = len(trade_day_all)
price = get_candle_stick(stock_list1, trade_day_all[-1].date().strftime("%Y%m%d"), fre_step = trade_data_freq, fields = ['close'], skip_paused = False, bar_count = count)
price = pd.Panel.from_dict(price)
price = price.transpose(2,1,0)
price_close = price['close']
k_time_idx = price_close.index
k_assets = price_close.columns
取得价格数据
time_str = k_time_idx.strftime('%Y-%m-%d')
df_fac = pd.DataFrame(index = k_time_idx)
for stk in k_assets:
q = query(
factor.date,
factor.current_market_cap
).filter(
factor.symbol == stk,
factor.date.in_(time_str)
)
df_tmp = get_factors(q)
df_tmp.columns = ['factor_date', stk]
df_tmp = df_tmp.set_index('factor_date')
df_fac = df_fac.join(df_tmp, how='left')
取得流通市值数据
df_fac = df_fac.astype(np.float) #注,修改类型为Float
ah_factor_data = get_clean_factor_and_forward_returns(df_fac.stack(dropna=False), price_close, quantiles=7, groupby = None, by_group=False, periods = [1,3,5])
提取并格式化因子数据,对因子分位,分组处理
接下来“一键”生成报表:
create_full_tear_sheet(ah_factor_data, by_group = False)
对于线性因子,因子取值和收益率线性相关,而非线性因子,需要靠合理的分组(寻找合理的解释因子),观察分组内的线性相关性,Alphalens也提供了相应的分组接口供使用。
除去财务类横截面因子,技术指标因子也可用Alphalens进行分析,但一般需要预先对技术指标做因子化处理。