Resultados diferentes com fórmula e sem fórmula para treinamento em acento circunflexo
Notei que o uso de métodos de fórmula e não-fórmula em acento circunflexo durante o treinamento produz resultados diferentes. Além disso, o tempo necessário para o método de fórmula é quase 10x o tempo gasto para o método de não fórmula. Isso é esperado?
> z <- data.table(c1=sample(1:1000,1000, replace=T), c2=as.factor(sample(LETTERS, 1000, replace=T)))
# SYSTEM TIME WITH FORMULA METHOD
# -------------------------------
> system.time(r <- train(c1 ~ ., z, method="rf", importance=T))
user system elapsed
376.233 9.241 18.190
> r
1000 samples
1 predictors
No pre-processing
Resampling: Bootstrap (25 reps)
Summary of sample sizes: 1000, 1000, 1000, 1000, 1000, 1000, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared RMSE SD Rsquared SD
2 295 0.00114 4.94 0.00154
13 300 0.00113 5.15 0.00151
25 300 0.00111 5.16 0.00146
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 2.
# SYSTEM TIME WITH NON-FORMULA METHOD
# -------------------------------
> system.time(r <- train(z[,2,with=F], z$c1, method="rf", importance=T))
user system elapsed
34.984 2.977 2.708
Warning message:
In randomForest.default(trainX, trainY, mtry = tuneValue$.mtry, :
invalid mtry: reset to within valid range
> r
1000 samples
1 predictors
No pre-processing
Resampling: Bootstrap (25 reps)
Summary of sample sizes: 1000, 1000, 1000, 1000, 1000, 1000, ...
Resampling results
RMSE Rsquared RMSE SD Rsquared SD
297 0.00152 6.67 0.00197
Tuning parameter 'mtry' was held constant at a value of 2