s pandas @ Throttle se aplicam ao usar uma chamada de API
Tenho um DataFrame grande com uma coluna de endereço:
data addr
0 0.617964 IN,Krishnagiri,635115
1 0.635428 IN,Chennai,600005
2 0.630125 IN,Karnal,132001
3 0.981282 IN,Jaipur,302021
4 0.715813 IN,Chennai,600005
...
e escrevi a seguinte função para substituir o endereço pelas coordenadas de longitude e latitude do endereço:
from geopy.geocoders import Nominatim
geo_locator = Nominatim(user_agent="MY_APP_ID")
def get_coordinates(addr):
location = geo_locator.geocode(addr)
if location is not None:
return pd.Series({'lat': location.latitude, 'lon': location.longitude})
location = geo_locator.geocode(addr.split(',')[0])
if location is not None:
return pd.Series({'lat': location.latitude, 'lon': location.longitude})
return pd.Series({'lat': -1, 'lon': -1})
Em seguida, os pandas que chamam aplicam o método na coluna de endereço e concatenam o resultado no final do DF em vez da coluna de endereço:
df = pd.concat([df, df.addr.apply(get_coordinates)], axis=1).drop(['addr'], axis=1)
No entanto, como o get_coordinates chama uma API de terceiros, ela falha:geopy.exc.GeocoderTimedOut: Service timed out
Como faço para limitar as solicitações para garantir uma resposta antes de continuar com o próximo valor?
Atualizar
Para melhorias adicionais, gostaria de chamar a API apenas em valores únicos, ou seja: se o endereçoIN,Krishnagiri,635115
aparece 20 vezes no meu DataFrame, gostaria de chamá-lo apenas uma vez e aplicar os resultados a todas as 20 ocorrência
Update 2:
astreamento @Log + Stack, para o código @Andrew Lavers:
...
Fetched Gandipet, Khanapur, Rangareddy District, Telangana, 500075, India
Fetched Jaipur Municipal Corporation, Jaipur, Rajasthan, 302015, India
Fetched Chennai, Chennai district, Tamil Nadu, India
Exception from geolocator: Fake exception for testing
Backing off for 1 seconds.
Exception from geolocator: Fake exception for testing
Backing off for 3 seconds.
Fetched None
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/geopy/geocoders/base.py", line 344, in _call_geocoder
page = requester(req, timeout=timeout, **kwargs)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open
'_open', req)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1321, in do_open
r = h.getresponse()
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 297, in begin
version, status, rea,son = self._read_status()
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1002, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 865, in read
return self._sslobj.read(len, buffer)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 625, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/...//tmp.py", line 89, in <module>
df.addr.apply(get_coordinates)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/series.py", line 3194, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/src/inference.pyx", line 1472, in pandas._libs.lib.map_infer
File "/Users/...//tmp.py", line 76, in get_coordinates
location = geo_locator.geocode(addr.split(',')[0])
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/geopy/geocoders/osm.py", line 307, in geocode
self._call_geocoder(url, timeout=timeout), exactly_one
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/geopy/geocoders/base.py", line 371, in _call_geocoder
raise GeocoderTimedOut('Service timed out')
geopy.exc.GeocoderTimedOut: Service timed out
Process finished with exit code 1