aiz @Cube na FPU x87 usando o método Newton-Raphson

Estou tentando escrever um programa de montagem usando o processador 8086 que encontrará a raiz do cubo de um número. Obviamente, estou usando pontos flutuantes.

Algoritmo baseado em Método Newton-Raphson:

root := 1.0; 
repeat
     oldRoot := root;
     root := (2.0*root + x/(root*root)) / 3.0 
until ( |root – oldRoot| < 0.001;

Como faço para dividir (2 * raiz + x) por (raiz * raiz)?

.586
.MODEL FLAT
.STACK 4096

.DATA
root    REAL4   1.0
oldRoot REAL4   2.0
Two     REAL4   2.0
inttwo  DWORD   2
itThree DWORD   3
three   REAL4   3.0
x       DOWRD   27


.CODE
main    PROC
        finit           ; initialize FPU
        fld     root    ; root in ST
        fmul    two     ; root*two
        fadd    x       ; root*two+27

        fld     root    ; root in ST
        fimul    root    ; root*root

        mov     eax, 0  ; exit
        ret
main    ENDP 
END     

Acho que não entendo o que está na pilha e em que local. O produto da linha

aiz @fimul; raiz * raiz

vá para ST (1)? EDIT Não, ele entra no st (0) o que estava no st (0) foi empurrado para baixo na pilha para st (1)

Mas ainda não descobri a resposta para minha pergunta ...Como faço para dividir? Agora eu vejo Preciso dividir st (1) por st (0), mas não sei como. Eu tentei isso.

finit           ; initialize FPU
fld     root    ; root in ST
fmul    two     ; root*two
fadd    xx      ; root*two+27
; the answer to root*two+x is stored in ST(0) when we load root st(0) moves to ST1 and we will use ST0 for the next operation

fld     root    ; root in ST previous content is now in ST1
fimul   root    ; root*root
fidiv   st(1)

EDIT: Eu tinha a fórmula escrita errada. É isso que estou procurando.

(2.0*root) + x / (root*root)) / 3.0 That's what I need. 
STEP 1) (2 * root) 
STEP 2) x / (root * root) 
STEP 3) ADD step one and step 2 
STEP 4) divide step 3 by 3.0

root = (2,0 * 1,0) + 27 / (1,0 * 1,0) / 3,0; (2) + 27 / (1,0) / 3,0 = 11 ==> raiz = 11

EDIT2: NOVO CÓDIGO !!

.586
.MODEL FLAT
.STACK 4096

.DATA
root    REAL4   1.0
oldRoot REAL4   2.0
Two     REAL4   2.0
three   REAL4   3.0
xx      REAL4   27.0


.CODE
main    PROC
        finit           ; initialize FPU
                fld     root    ; root in ST    ; Pass 1 ST(0) has 1.0  
repreatAgain:
        ;fld    st(2)

        fmul    two     ; root*two      ; Pass 1 ST(0) has 2                                                                            Pass 2 ST(0) = 19.333333 st(1) = 3.0 st(2) = 29.0 st(3) = 1.0

        ; the answer to roor*two is stored in ST0 when we load rootSTO moves to ST1 and we will use ST0 for the next operation
        fld     root    ; root in ST(0) previous content is now in ST(1)      Pass 1 ST(0) has 1.0 ST(1) has 2.0                        Pass 2 st(
        fmul    st(0), st(0)    ; root*root                                 ; Pass 1 st(0) has 1.0 st(1) has 2.0
        fld     xx                                                          ; Pass 1 st(0) has 27.0 st(1) has 1.0 st(2) has 2.0
        fdiv    st(0), st(1) ; x / (root*root)  ; Pass 1: 27 / 1              Pass 1 st(0) has 27.0 st(1) has 2.0 st(2) has 2.0
        fadd    st(0), st(2) ; (2.0*root) + x / (root*root))                  Pass 1 st(0) has 29.0 st(1) has 1.0 st(2) has 2.0

        fld     three                                                       ; Pass 1 st(0) has 3.0 st(1) has 29.0 st(2) has 1.0 st(3) has 2.0

        fld     st(1)                                                       ; Pass 1 st(0) has 3.0 st(1) has 29.0 st(2) = 1.0 st(3) = 2.0
        fdiv    st(0), st(1) ; (2.0*root) + x / (root*root)) / 3.0            Pass 1 st(1) has 9.6666666666667



        jmp     repreatAgain
        mov     eax, 0  ; exit
        ret
main    ENDP 
END   

questionAnswers(2)

yourAnswerToTheQuestion