Crie várias colunas no Pandas Dataframe a partir de uma função

Question

Oct 09, 2015, 07:50 PM

python jit pandas multiple-columns numba

Crie várias colunas no Pandas Dataframe a partir de uma função

Eu sou um novato em python, então espero que minhas duas perguntas sejam claras e completas. Postei o código real e um conjunto de dados de teste no formato csv abaixo.

Consegui construir o código a seguir (principalmente com a ajuda dos colaboradores do StackOverflow) para calcular a volatilidade implícita de um contrato de opção usando o método Newton-Raphson. O processo calcula o Vega ao determinar a volatilidade implícita. Embora eu consiga criar uma nova coluna DataFrame para Volatilidade Implícita usando o método de aplicação Pandas DataFrame, não consigo criar uma segunda coluna para o Vega. Existe uma maneira de criar duas colunas DataFrame separadas quando a função retorna IV e Vega juntos?

Eu tentei:

return iv, vega da funçãodf[['myIV', 'Vega']] = df.apply(newtonRap, axis=1)ObteveValueError: Shape of passed values is (56, 2), indices imply (56, 13)

Também tentei:

return iv, vega da funçãodf['myIV'], df['Vega'] = df.apply(newtonRap, axis=1)ObteveValueError: Shape of passed values is (56, 2), indices imply (56, 13)

Além disso, o processo de cálculo é lento. Importei o numba e implementei o decorador @jit (nogil = True), mas vejo apenas uma melhoria de desempenho de 25%. O conjunto de dados de teste é o teste de desempenho com quase 900.000 registros. O tempo de execução é de 2 horas e 9 minutos sem numba ou com numba, mas sem nogil = True. O tempo de execução ao usar numba e @jit (nogil = True) é de 1 hora e 32 minutos. Posso fazer melhor?

from datetime import datetime
from math import sqrt, pi, log, exp, isnan
from scipy.stats import norm
from numba import jit


# dff = Daily Fed Funds (Posted rate is usually one day behind)
dff = pd.read_csv('https://research.stlouisfed.org/fred2/data/DFF.csv', parse_dates=[0], index_col='DATE')
rf = float('%.4f' % (dff['VALUE'][-1:][0] / 100))
# rf = .0015                        # Get Fed Funds Rate https://research.stlouisfed.org/fred2/data/DFF.csv
tradingMinutesDay = 450             # 7.5 hours per day * 60 minutes per hour
tradingMinutesAnnum = 113400        # trading minutes per day * 252 trading days per year
cal = USFederalHolidayCalendar()    # Load US Federal holiday calendar


@jit(nogil=True)                                # nogil=True arg improves performance by 25%
def newtonRap(row):
    """Estimate Implied Volatility (IV) using Newton-Raphson method

    :param row (dataframe):  Options contract params for function
        TimeStamp (datetime): Close date
        Expiry (datetime): Option contract expiration date
        Strike (float): Option strike
        OptType (object): 'C' for call; 'P' for put
        RootPrice (float): Underlying close price
        Bid (float): Option contact closing bid
        Ask (float): Option contact closing ask

    :return:
        float: Estimated implied volatility
    """
    if row['Bid'] == 0.0 or row['Ask'] == 0.0 or row['RootPrice'] == 0.0 or row['Strike'] == 0.0 or \
       row['TimeStamp'] == row['Expiry']:
        iv, vega = 0.0, 0.0         # Set iv and vega to zero if option contract is invalid or expired
    else:
        # dte (Days to expiration) uses pandas bdate_range method to determine the number of business days to expiration
        #   minus USFederalHolidays minus constant of 1 for the TimeStamp date
        dte = float(len(pd.bdate_range(row['TimeStamp'], row['Expiry'])) -
                    len(cal.holidays(row['TimeStamp'], row['Expiry']).to_pydatetime()) - 1)
        mark = (row['Bid'] + row['Ask']) / 2
        cp = 1 if row['OptType'] == 'C' else -1
        S = row['RootPrice']
        K = row['Strike']
        # T = the number of trading minutes to expiration divided by the number of trading minutes in year
        T = (dte * tradingMinutesDay) / tradingMinutesAnnum
        # TODO get dividend value
        d = 0.00
        iv = sqrt(2 * pi / T) * mark / S        # Closed form estimate of IV Brenner and Subrahmanyam (1988)
        vega = 0.0
        for i in range(1, 100):
            d1 = (log(S / K) + T * (rf - d + iv ** 2 / 2)) / (iv * sqrt(T))
            d2 = d1 - iv * sqrt(T)
            vega = S * norm.pdf(d1) * sqrt(T)
            model = cp * S * norm.cdf(cp * d1) - cp * K * exp(-rf * T) * norm.cdf(cp * d2)
            iv -= (model - mark) / vega
            if abs(model - mark) < 1.0e-9:
                break
        if isnan(iv) or isnan(vega):
            iv, vega = 0.0, 0.0
    # TODO Return vega with iv if add'l pandas column possible
    # return iv, vega
    return iv


if __name__ == "__main__":
    # test function from baseline data
    get_csv = True

    if get_csv:
        csvHeaderList = ['TimeStamp', 'OpraSymbol', 'RootSymbol', 'Expiry', 'Strike', 'OptType', 'RootPrice', 'Last',
                         'Bid', 'Ask', 'Volume', 'OpenInt', 'IV']
        fileName = 'C:/tmp/test-20150930-56records.csv'
        df = pd.read_csv(fileName, parse_dates=[0, 3], names=csvHeaderList)
    else:
        pass

    start = datetime.now()
    # TODO Create add'l pandas dataframe column, if possible, for vega
    # df[['myIV', 'Vega']] = df.apply(newtonRap, axis=1)
    # df['myIV'], df['Vega'] = df.apply(newtonRap, axis=1)
    df['myIV'] = df.apply(newtonRap, axis=1)
    end = datetime.now()
    print end - start

Dados de teste: C: /tmp/test-20150930-56records.csv

2015-09-30 16: 00: 00, AAPL151016C00109000, AAPL, 2015-10-16 16: 00: 00,109, C, 109.95,3.46,3.6,3.7,1565,1290,0.3497 2015-09-30 16:00: 00, AAPL151016P00109000, AAPL, 2015-10-16 16: 00: 00,109, P, 109.95,2.4,2.34,2.42,3790,3087,0.3146 2015-09-30 16: 00: 00, AAPL151016C00110000, AAPL, 2015-10 -16 16: 00: 00.110, C, 109.95,3,2.86,3,10217.28850,0.3288 2015-09-30 16: 00: 00, AAPL151016P00110000, AAPL, 2015-10-16 16: 00: 00.110, P , 109.95,2.81,2.74,2.8,12113,44427,0.3029 2015-09-30 16: 00: 00, AAPL151016C00111000, AAPL, 16/10/2015 16: 00: 00.111, C, 109.95,2.35,2.44,2.45, 6674,2318,0.3187 2015-09-30 16: 00: 00, AAPL151016P00111000, AAPL, 2015-10-16 16: 00: 00,111, P, 109.95,3.2,3.1,3.25,2031,3773,0.2926 2015-09- 30 16: 00: 00, AAPL151120C00110000, AAPL, 2015-11-20 16: 00: 00,110, C, 109.95,5.9,5.7,5.95,5330,17112,0.3635 2015-09-30 16: 00: 00, AAPL151120P00110000, AAPL, 20/11/2015 16: 00: 00.110, P, 109.95,6.15,6.1,6.3,3724,15704,0.3842