Last update: 20.08.2022
Contact : fdrs@ifrc.org or simon.weiss@ifrc.org
FDRS focal point: Simon Weiss, FDRS Data Analyst
The FDRS is ambitious and wide reaching. Although the data quality and reporting are improving each year, data is missing for some National Societies. As a result, some data fluctuations may be misleading: trend lines can drop for a given year when there is missing data, and some National Societies are excluded from the total and then appear again in another year.In order to better represent the network and better count everyone, FDRS implement every year data imputation techniques.
The purpose of this notebook is to apply the method selected by the FDRS team and to interact with the FDRS backoffice with a post method to publish the imputed values. The ingested data is replicated and displayed on the website https://data.ifrc.org/FDRS/ and used in FDRS research such as Everyone count report.
Therefore this is not a research notebook but a production notebook. It aims to simply expose the FDRS methodology of imputation step by step. A related python script has been created: imputing.py which allows to launch this methodology once.
The approach chosen was to replace the 2019, 2020 and 2021 missing data as well as to apply two different techniques according to the indicator categories, in the previous years all NSs reported their data then no input technique was employed.
The imputing applies only to main indicators and does not apply to disaggregated levels to maintain consistency across years. A detailed description of the methodology is available in the pdf Missing Data
This notebook contains four sections
We'll get started by loading the python modules in our analysis toolkit.
# data analysis and wrangling
import pandas as pd
import numpy as np
import requests, json
from pandas import json_normalize
import json
from datetime import datetime
import os
import sys
import warnings
warnings.filterwarnings('ignore')
#import api functions to get data
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
sys.path.append(module_path+"\\src\data")
from fdrsapi import api_function,baseline, api_function_imputed
#generate fdrs codebook and store it in references folder
KPI = requests.get(f"https://data-api.ifrc.org/api/indicator?apiKey=21e401ae-6b35-404b-a72a-b74cce66dee3").json()
KPI = json_normalize(data=KPI)
#export KPI to excel in references folder
KPI.to_excel(r"..\references\codebook.xlsx")
#import codebook
codebook = pd.read_excel(r"..\references\codebook.xlsx")
#select only NS Reach CPD KPIs
kpi=(codebook.query('KPI_Note== "NS Reach - CPD"')["KPI_Code"]).to_list()
#add manualy the other KPIS
kpi_code=kpi+["KPI_TrainFA_Tot","KPI_DonBlood_Tot"]+["KPI_GB_Tot","KPI_PeopleVol_Tot","KPI_PStaff_Tot","KPI_noLocalUnits","KPI_IncomeLC_CHF" ,"KPI_expenditureLC_CHF"]+["validated"]+["KPI_WasSubmitted"]
#counting nb of considered KPIS + removing metadata kpis
print("number of total KPI considered:", len(kpi_code)-2)
number of total KPI considered: 23
New indicators: remove
#remove the 2021 new indicators KPI collected from kpi_code
old_kpi=kpi_code.copy()
new_kpi=["KPI_Climate_CPD", "KPI_ClimateHeat_CPD", "KPI_ReachHI_CPD", "KPI_ReachHPM_CPD","KPI_ReachRCRCEd_CPD"]
for x in new_kpi:
old_kpi.remove(x)
#check final kpi list
old_kpi
print("number of old KPI considered:", len(old_kpi))
number of old KPI considered: 20
#read public api key from config file
f = open(module_path+"/src/config/public_api_key.txt")
api_key=f.readline()
#select years to download data from backoffice
years=["2018","2019","2020","2021"]
kpi_float=old_kpi.copy()
#run api function for each year
time_series=api_function(years,old_kpi,kpi_float,api_key)
time_series
2018 2019 2020 2021 number of KPI downloaded: 20
KPI ID | KPI_DON_code | NSO_DON_name | NSO_ZON_name | iso_3 | KPI_Year | KPI_DonBlood_Tot | KPI_GB_Tot | KPI_IncomeLC_CHF | KPI_PStaff_Tot | ... | KPI_ReachL_CPD | KPI_ReachM_CPD | KPI_ReachSI_CPD | KPI_ReachS_CPD | KPI_ReachWASH_CPD | KPI_TrainFA_Tot | KPI_WasSubmitted | KPI_expenditureLC_CHF | KPI_noLocalUnits | validated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DAF0012018 | DAF001 | Afghan Red Crescent | Asia Pacific | AFG | 2018 | 0.0 | 15.0 | 1.803370e+07 | 2195.0 | ... | NaN | NaN | NaN | NaN | NaN | 1020.0 | 1.0 | 1.847249e+07 | 122.0 | 1.0 |
1 | DAL0012018 | DAL001 | Albanian Red Cross | Europe and Central Asia | ALB | 2018 | 6531.0 | 14.0 | 1.555610e+06 | 53.0 | ... | 12800.0 | 0.0 | 34150.0 | 0.0 | 47000.0 | 62709.0 | 1.0 | 1.348786e+06 | 39.0 | 1.0 |
2 | DDZ0012018 | DDZ001 | Algerian Red Crescent | Middle East and North Africa | DZA | 2018 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 248.0 | 1.0 |
3 | DUS0012018 | DUS001 | American Red Cross | Americas | USA | 2018 | 2662986.0 | 15.0 | 3.656908e+09 | 19649.0 | ... | 2308469.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2283480.0 | 1.0 | 3.193028e+09 | 263.0 | 1.0 |
4 | DAD0012018 | DAD001 | Andorran Red Cross | Europe and Central Asia | AND | 2018 | 1084.0 | 9.0 | 1.177623e+06 | 20.0 | ... | NaN | NaN | 1117.0 | NaN | NaN | 2901.0 | 1.0 | 1.379745e+06 | 1.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
763 | DVE0012021 | DVE001 | Venezuelan Red Cross | Americas | VEN | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
764 | DVN0012021 | DVN001 | Vietnam Red Cross Society | Asia Pacific | VNM | 2021 | 1304191.0 | 5.0 | NaN | 16008.0 | ... | NaN | NaN | NaN | NaN | 141286.0 | 26811.0 | 1.0 | NaN | 12693.0 | NaN |
765 | DYE0012021 | DYE001 | Yemen Red Crescent Society | Middle East and North Africa | YEM | 2021 | NaN | 6.0 | 6.513051e+06 | 454.0 | ... | 538645.0 | 576219.0 | 21534.0 | 99003.0 | 419470.0 | 3404.0 | 1.0 | 5.273168e+06 | 32.0 | 1.0 |
766 | DZM0012021 | DZM001 | Zambia Red Cross Society | Africa | ZMB | 2021 | NaN | 10.0 | 2.930602e+06 | 66.0 | ... | 1249.0 | 4807.0 | NaN | 169.0 | 238278.0 | 1720.0 | 1.0 | 2.896418e+06 | 59.0 | 1.0 |
767 | DZW0012021 | DZW001 | Zimbabwe Red Cross Society | Africa | ZWE | 2021 | NaN | 17.0 | NaN | 147.0 | ... | 32163.0 | 1966.0 | 98386.0 | 62.0 | 77623.0 | 27376.0 | 1.0 | NaN | 132.0 | 1.0 |
768 rows × 26 columns
Deal with new indicators: <!> To update everytime we run the script
#read excel from data/raw folder
new_time_series=pd.read_excel(module_path+"/data/raw/FDRS2021NewKPIValues.xlsx")
#print new_time_series to check
new_time_series
iso_3 | KPI_Year | KPI_ReachRCRCEd_CPD | KPI_Climate_CPD | KPI_ClimateHeat_CPD | KPI_ReachHI_CPD | KPI_ReachHPM_CPD | |
---|---|---|---|---|---|---|---|
0 | AFG | 2018 | NaN | NaN | NaN | NaN | NaN |
1 | AFG | 2019 | NaN | NaN | NaN | NaN | NaN |
2 | AFG | 2020 | NaN | NaN | NaN | NaN | NaN |
3 | AFG | 2021 | 105.0 | NaN | NaN | 455730.0 | 4646.0 |
4 | AGO | 2018 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
674 | ZMB | 2020 | NaN | NaN | NaN | NaN | NaN |
675 | ZMB | 2021 | 0.0 | 0.0 | NaN | 0.0 | 382.0 |
676 | ZWE | 2018 | NaN | NaN | NaN | NaN | NaN |
677 | ZWE | 2019 | NaN | NaN | NaN | NaN | NaN |
678 | ZWE | 2020 | NaN | NaN | NaN | NaN | NaN |
679 rows × 7 columns
#merge new time series with old time series
time_series=time_series.merge(new_time_series, how='left', on=['iso_3','KPI_Year'])
#print to check
time_series
KPI ID | KPI_DON_code | NSO_DON_name | NSO_ZON_name | iso_3 | KPI_Year | KPI_DonBlood_Tot | KPI_GB_Tot | KPI_IncomeLC_CHF | KPI_PStaff_Tot | ... | KPI_TrainFA_Tot | KPI_WasSubmitted | KPI_expenditureLC_CHF | KPI_noLocalUnits | validated | KPI_ReachRCRCEd_CPD | KPI_Climate_CPD | KPI_ClimateHeat_CPD | KPI_ReachHI_CPD | KPI_ReachHPM_CPD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DAF0012018 | DAF001 | Afghan Red Crescent | Asia Pacific | AFG | 2018 | 0.0 | 15.0 | 1.803370e+07 | 2195.0 | ... | 1020.0 | 1.0 | 1.847249e+07 | 122.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
1 | DAL0012018 | DAL001 | Albanian Red Cross | Europe and Central Asia | ALB | 2018 | 6531.0 | 14.0 | 1.555610e+06 | 53.0 | ... | 62709.0 | 1.0 | 1.348786e+06 | 39.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
2 | DDZ0012018 | DDZ001 | Algerian Red Crescent | Middle East and North Africa | DZA | 2018 | NaN | NaN | NaN | NaN | ... | NaN | 1.0 | NaN | 248.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
3 | DUS0012018 | DUS001 | American Red Cross | Americas | USA | 2018 | 2662986.0 | 15.0 | 3.656908e+09 | 19649.0 | ... | 2283480.0 | 1.0 | 3.193028e+09 | 263.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
4 | DAD0012018 | DAD001 | Andorran Red Cross | Europe and Central Asia | AND | 2018 | 1084.0 | 9.0 | 1.177623e+06 | 20.0 | ... | 2901.0 | 1.0 | 1.379745e+06 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
763 | DVE0012021 | DVE001 | Venezuelan Red Cross | Americas | VEN | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
764 | DVN0012021 | DVN001 | Vietnam Red Cross Society | Asia Pacific | VNM | 2021 | 1304191.0 | 5.0 | NaN | 16008.0 | ... | 26811.0 | 1.0 | NaN | 12693.0 | NaN | NaN | NaN | NaN | NaN | NaN |
765 | DYE0012021 | DYE001 | Yemen Red Crescent Society | Middle East and North Africa | YEM | 2021 | NaN | 6.0 | 6.513051e+06 | 454.0 | ... | 3404.0 | 1.0 | 5.273168e+06 | 32.0 | 1.0 | 18300.0 | NaN | NaN | 41879.0 | 1769102.0 |
766 | DZM0012021 | DZM001 | Zambia Red Cross Society | Africa | ZMB | 2021 | NaN | 10.0 | 2.930602e+06 | 66.0 | ... | 1720.0 | 1.0 | 2.896418e+06 | 59.0 | 1.0 | 0.0 | 0.0 | NaN | 0.0 | 382.0 |
767 | DZW0012021 | DZW001 | Zimbabwe Red Cross Society | Africa | ZWE | 2021 | NaN | 17.0 | NaN | 147.0 | ... | 27376.0 | 1.0 | NaN | 132.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
768 rows × 31 columns
We Consider that previous year data has been validated
#count nb of validated submission per years
time_series.query('KPI_Year == 2019')["validated"].sum(),time_series.query('KPI_Year == 2020')["validated"].sum(), time_series.query('KPI_Year == 2021')["validated"].sum()
(158.0, 167.0, 45.0)
##replace validated =0 for KPI year 2020,2019,2018 with validated =1
time_series.loc[(time_series["KPI_Year"]==2020) & (time_series["validated"]==0), "validated"]=1
time_series.loc[(time_series["KPI_Year"]==2019) & (time_series["validated"]==0), "validated"]=1
time_series.loc[(time_series["KPI_Year"]==2018) & (time_series["validated"]==0), "validated"]=1
#import imputing function
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
sys.path.append(module_path+"\\src\data")
from imp_function import imputing_mean,imputing_na
data_clean=time_series.copy()
# 1st subset:before 2018
data_clean1=data_clean.query('KPI_Year >= 2012 & KPI_Year < 2018').copy()
# 2nd subset: after 2018 and validated
data_clean2=data_clean.query('KPI_Year >= 2018 & KPI_Year < 2022 & validated == 1').copy()
# 3rd subset: after 2018 and all data; idea : imputing all then filter out the validated data
data_clean3=data_clean.query('KPI_Year >= 2018').copy()
data_clean3.loc[(data_clean3["KPI_Year"]==2021) & (data_clean3["validated"]==0.0), kpi_code]=np.nan
# Imputing : NS Reach CPD KPIs
data_clean3_na_columns=data_clean3.copy()
#import codebook
codebook = pd.read_excel(r"..\references\codebook.xlsx")
#select only NS Reach CPD KPIs
kpi=(codebook.query('KPI_Note== "NS Reach - CPD"')["KPI_Code"]).to_list()
#add manualy the other Reach bKPIS
kpi_reach_code=kpi+["KPI_TrainFA_Tot","KPI_DonBlood_Tot"]
selected_cols=kpi_reach_code
imputing_mean(data_clean3_na_columns,selected_cols,data_clean3)
# Imputing : NS Finance and Parnership & NS Governance and Structure Section
kpi_gov_fi_code=["KPI_GB_Tot","KPI_PeopleVol_Tot","KPI_PStaff_Tot","KPI_noLocalUnits","KPI_IncomeLC_CHF" ,"KPI_expenditureLC_CHF"]
selected_cols_2d=kpi_gov_fi_code.copy()
# 18.11.21Comment : After reviewing with the Team, "supported1" and "received_support1" have been imputed seperatly in the script FDRS Network Data Transformation
imputing_na(data_clean3_na_columns,selected_cols_2d)
data_clean3_na_columns=data_clean3_na_columns.query('KPI_Year >= 2018 & KPI_Year < 2022 & validated != 1').copy()
# concatenate the two sub_datasets
fdrs_na_columns = [data_clean1, data_clean2, data_clean3_na_columns]
fdrs_data_fdrs_na_columns = pd.concat(fdrs_na_columns)
KPI_Climate_CPD KPI_ClimateHeat_CPD KPI_ReachCTP_CPD KPI_ReachDRER_CPD KPI_ReachDRR_CPD KPI_ReachH_CPD KPI_ReachHI_CPD KPI_ReachHPM_CPD KPI_ReachL_CPD KPI_ReachLTSPD_CPD KPI_ReachM_CPD KPI_ReachRCRCEd_CPD KPI_ReachS_CPD KPI_ReachSI_CPD KPI_ReachWASH_CPD KPI_TrainFA_Tot KPI_DonBlood_Tot KPI_GB_Tot KPI_PeopleVol_Tot KPI_PStaff_Tot KPI_noLocalUnits KPI_IncomeLC_CHF KPI_expenditureLC_CHF
Quick check for sum in Vol after imputing
fdrs_data_fdrs_na_columns[fdrs_data_fdrs_na_columns["KPI_Year"]==2021][["NSO_DON_name","KPI_Year","KPI_PeopleVol_Tot"]].sum()
NSO_DON_name Austrian Red CrossBaphalali Eswatini Red Cross... KPI_Year 388032 KPI_PeopleVol_Tot 15061782.0 dtype: object
fdrs_data_fdrs_na_columns[fdrs_data_fdrs_na_columns["KPI_Year"]==2020][["NSO_DON_name","KPI_Year","KPI_PeopleVol_Tot"]].sum()
NSO_DON_name Afghan Red CrescentAlbanian Red CrossAlgerian ... KPI_Year 387840 KPI_PeopleVol_Tot 14525876.0 dtype: object
Quick process to ingest imputed values in an excel read by PBI
#generate kpi_code_ip names from kpi_code
kpi_code=kpi_reach_code+kpi_gov_fi_code
kpi_code_ip= [s + "_IP" for s in kpi_code]
#print original dataset
fdrs_data_fdrs_na_columns
KPI ID | KPI_DON_code | NSO_DON_name | NSO_ZON_name | iso_3 | KPI_Year | KPI_DonBlood_Tot | KPI_GB_Tot | KPI_IncomeLC_CHF | KPI_PStaff_Tot | ... | KPI_TrainFA_Tot | KPI_WasSubmitted | KPI_expenditureLC_CHF | KPI_noLocalUnits | validated | KPI_ReachRCRCEd_CPD | KPI_Climate_CPD | KPI_ClimateHeat_CPD | KPI_ReachHI_CPD | KPI_ReachHPM_CPD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DAF0012018 | DAF001 | Afghan Red Crescent | Asia Pacific | AFG | 2018 | 0.0 | 15.0 | 1.803370e+07 | 2195.0 | ... | 1.020000e+03 | 1.0 | 1.847249e+07 | 122.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
1 | DAL0012018 | DAL001 | Albanian Red Cross | Europe and Central Asia | ALB | 2018 | 6531.0 | 14.0 | 1.555610e+06 | 53.0 | ... | 6.270900e+04 | 1.0 | 1.348786e+06 | 39.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
2 | DDZ0012018 | DDZ001 | Algerian Red Crescent | Middle East and North Africa | DZA | 2018 | NaN | NaN | NaN | NaN | ... | NaN | 1.0 | NaN | 248.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
3 | DUS0012018 | DUS001 | American Red Cross | Americas | USA | 2018 | 2662986.0 | 15.0 | 3.656908e+09 | 19649.0 | ... | 2.283480e+06 | 1.0 | 3.193028e+09 | 263.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
4 | DAD0012018 | DAD001 | Andorran Red Cross | Europe and Central Asia | AND | 2018 | 1084.0 | 9.0 | 1.177623e+06 | 20.0 | ... | 2.901000e+03 | 1.0 | 1.379745e+06 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
760 | DUA0012021 | DUA001 | Ukrainian Red Cross Society | Europe and Central Asia | UKR | 2021 | 809.5 | 75.0 | 1.009907e+07 | 535.0 | ... | 3.889000e+04 | NaN | 1.050011e+07 | 224.0 | NaN | NaN | NaN | NaN | NaN | NaN |
761 | DUY0012021 | DUY001 | Uruguayan Red Cross | Americas | URY | 2021 | 12.0 | 5.0 | 1.000076e+06 | 52.0 | ... | 1.138667e+03 | NaN | 9.840340e+05 | 19.0 | NaN | NaN | NaN | NaN | NaN | NaN |
762 | DVU0012021 | DVU001 | Vanuatu Red Cross Society | Asia Pacific | VUT | 2021 | 0.0 | 10.0 | 7.895120e+05 | 37.0 | ... | 6.026667e+02 | NaN | 9.459050e+05 | 7.0 | NaN | NaN | NaN | NaN | NaN | NaN |
763 | DVE0012021 | DVE001 | Venezuelan Red Cross | Americas | VEN | 2021 | 317.5 | 11.0 | 0.000000e+00 | 742.0 | ... | 6.533000e+03 | NaN | 0.000000e+00 | 42.0 | NaN | NaN | NaN | NaN | NaN | NaN |
764 | DVN0012021 | DVN001 | Vietnam Red Cross Society | Asia Pacific | VNM | 2021 | 1304191.0 | 5.0 | 5.918624e+06 | 16008.0 | ... | 2.681100e+04 | 1.0 | 2.976457e+06 | 12693.0 | NaN | NaN | NaN | NaN | NaN | NaN |
768 rows × 31 columns
kpi_code_ip
['KPI_Climate_CPD_IP', 'KPI_ClimateHeat_CPD_IP', 'KPI_ReachCTP_CPD_IP', 'KPI_ReachDRER_CPD_IP', 'KPI_ReachDRR_CPD_IP', 'KPI_ReachH_CPD_IP', 'KPI_ReachHI_CPD_IP', 'KPI_ReachHPM_CPD_IP', 'KPI_ReachL_CPD_IP', 'KPI_ReachLTSPD_CPD_IP', 'KPI_ReachM_CPD_IP', 'KPI_ReachRCRCEd_CPD_IP', 'KPI_ReachS_CPD_IP', 'KPI_ReachSI_CPD_IP', 'KPI_ReachWASH_CPD_IP', 'KPI_TrainFA_Tot_IP', 'KPI_DonBlood_Tot_IP', 'KPI_GB_Tot_IP', 'KPI_PeopleVol_Tot_IP', 'KPI_PStaff_Tot_IP', 'KPI_noLocalUnits_IP', 'KPI_IncomeLC_CHF_IP', 'KPI_expenditureLC_CHF_IP']
#rename list of columns kpi_codes to kpi_codes_ip
fdrs_data_fdrs_na_columns.rename(columns=dict(zip(kpi_code, kpi_code_ip)), inplace=True)
#print dataset with new names
fdrs_data_fdrs_na_columns
KPI ID | KPI_DON_code | NSO_DON_name | NSO_ZON_name | iso_3 | KPI_Year | KPI_DonBlood_Tot_IP | KPI_GB_Tot_IP | KPI_IncomeLC_CHF_IP | KPI_PStaff_Tot_IP | ... | KPI_TrainFA_Tot_IP | KPI_WasSubmitted | KPI_expenditureLC_CHF_IP | KPI_noLocalUnits_IP | validated | KPI_ReachRCRCEd_CPD_IP | KPI_Climate_CPD_IP | KPI_ClimateHeat_CPD_IP | KPI_ReachHI_CPD_IP | KPI_ReachHPM_CPD_IP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DAF0012018 | DAF001 | Afghan Red Crescent | Asia Pacific | AFG | 2018 | 0.0 | 15.0 | 1.803370e+07 | 2195.0 | ... | 1.020000e+03 | 1.0 | 1.847249e+07 | 122.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
1 | DAL0012018 | DAL001 | Albanian Red Cross | Europe and Central Asia | ALB | 2018 | 6531.0 | 14.0 | 1.555610e+06 | 53.0 | ... | 6.270900e+04 | 1.0 | 1.348786e+06 | 39.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
2 | DDZ0012018 | DDZ001 | Algerian Red Crescent | Middle East and North Africa | DZA | 2018 | NaN | NaN | NaN | NaN | ... | NaN | 1.0 | NaN | 248.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
3 | DUS0012018 | DUS001 | American Red Cross | Americas | USA | 2018 | 2662986.0 | 15.0 | 3.656908e+09 | 19649.0 | ... | 2.283480e+06 | 1.0 | 3.193028e+09 | 263.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
4 | DAD0012018 | DAD001 | Andorran Red Cross | Europe and Central Asia | AND | 2018 | 1084.0 | 9.0 | 1.177623e+06 | 20.0 | ... | 2.901000e+03 | 1.0 | 1.379745e+06 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
760 | DUA0012021 | DUA001 | Ukrainian Red Cross Society | Europe and Central Asia | UKR | 2021 | 809.5 | 75.0 | 1.009907e+07 | 535.0 | ... | 3.889000e+04 | NaN | 1.050011e+07 | 224.0 | NaN | NaN | NaN | NaN | NaN | NaN |
761 | DUY0012021 | DUY001 | Uruguayan Red Cross | Americas | URY | 2021 | 12.0 | 5.0 | 1.000076e+06 | 52.0 | ... | 1.138667e+03 | NaN | 9.840340e+05 | 19.0 | NaN | NaN | NaN | NaN | NaN | NaN |
762 | DVU0012021 | DVU001 | Vanuatu Red Cross Society | Asia Pacific | VUT | 2021 | 0.0 | 10.0 | 7.895120e+05 | 37.0 | ... | 6.026667e+02 | NaN | 9.459050e+05 | 7.0 | NaN | NaN | NaN | NaN | NaN | NaN |
763 | DVE0012021 | DVE001 | Venezuelan Red Cross | Americas | VEN | 2021 | 317.5 | 11.0 | 0.000000e+00 | 742.0 | ... | 6.533000e+03 | NaN | 0.000000e+00 | 42.0 | NaN | NaN | NaN | NaN | NaN | NaN |
764 | DVN0012021 | DVN001 | Vietnam Red Cross Society | Asia Pacific | VNM | 2021 | 1304191.0 | 5.0 | 5.918624e+06 | 16008.0 | ... | 2.681100e+04 | 1.0 | 2.976457e+06 | 12693.0 | NaN | NaN | NaN | NaN | NaN | NaN |
768 rows × 31 columns
#export imputed values in an excel file.
#to change: staging in prod when final
#fdrs_data_fdrs_na_columns.to_excel("C:/Users/simon.weiss/OneDrive - IFRC/Tech talk and files/1_PowerBI/1_FDRS/0.Data_Sources/1_FDRS_Wide_Format_IP_ALL_Staging.xlsx",index=False)
#fdrs_data_fdrs_na_columns.to_excel("C:/Users/simon.weiss/OneDrive - IFRC/Tech talk and files/1_PowerBI/1_FDRS/0.Data_Sources/1_FDRS_Wide_Format_IP_ALL_Staging.xlsx",index=False)
KPI = requests.get(f"https://data-api.ifrc.org/api/indicator?apiKey=21e401ae-6b35-404b-a72a-b74cce66dee3").json()
KPI = json_normalize(data=KPI)
kpi=(KPI.query('KPI_Note== "NS Reach - CPD"')["KPI_Code"]).to_list()
kpi_code=kpi+["KPI_TrainFA_Tot","KPI_DonBlood_Tot"]+["KPI_GB_Tot","KPI_PeopleVol_Tot","KPI_PStaff_Tot","KPI_noLocalUnits","KPI_IncomeLC_CHF" ,"KPI_expenditureLC_CHF"]
new_kpi=["KPI_Climate_CPD", "KPI_ClimateHeat_CPD", "KPI_ReachHI_CPD", "KPI_ReachHPM_CPD","KPI_ReachRCRCEd_CPD"]
for x in new_kpi:
kpi_code.remove(x)
kpi_code= [s + "_IP" for s in kpi_code]
kpi_float=kpi_code.copy()
Get imputed values from backoffice
years=["2021"]
time_series_imputed=api_function_imputed(years,kpi_code,kpi_float,api_key)
time_series_imputed
2021 number of KPI downloaded: 18
KPI ID | KPI_DON_code | NSO_DON_name | NSO_ZON_name | iso_3 | KPI_Year | nan | KPI_DonBlood_Tot_IP | KPI_GB_Tot_IP | KPI_IncomeLC_CHF_IP | ... | KPI_ReachH_CPD_IP | KPI_ReachLTSPD_CPD_IP | KPI_ReachL_CPD_IP | KPI_ReachM_CPD_IP | KPI_ReachSI_CPD_IP | KPI_ReachS_CPD_IP | KPI_ReachWASH_CPD_IP | KPI_TrainFA_Tot_IP | KPI_expenditureLC_CHF_IP | KPI_noLocalUnits_IP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DAF0012021 | DAF001 | Afghan Red Crescent | Asia Pacific | AFG | 2021 | NaN | NaN | NaN | 1.114095e+07 | ... | 2145871.0 | 2145871.0 | 219872.0 | NaN | NaN | NaN | 0.0 | 3182.0 | 1.156653e+07 | 122.0 |
1 | DAL0012021 | DAL001 | Albanian Red Cross | Europe and Central Asia | ALB | 2021 | NaN | NaN | 14.0 | 2.270842e+06 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.913412e+06 | 39.0 |
2 | DDZ0012021 | DDZ001 | Algerian Red Crescent | Middle East and North Africa | DZA | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | DUS0012021 | DUS001 | American Red Cross | Americas | USA | 2021 | NaN | 2218346.0 | 14.0 | 2.787293e+09 | ... | 6339537.0 | 7919326.0 | 244095.0 | NaN | NaN | 1512380.0 | NaN | 6190964.0 | 2.521967e+09 | 234.0 |
4 | DAD0012021 | DAD001 | Andorran Red Cross | Europe and Central Asia | AND | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
187 | DVE0012021 | DVE001 | Venezuelan Red Cross | Americas | VEN | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
188 | DVN0012021 | DVN001 | Vietnam Red Cross Society | Asia Pacific | VNM | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
189 | DYE0012021 | DYE001 | Yemen Red Crescent Society | Middle East and North Africa | YEM | 2021 | NaN | NaN | 6.0 | 6.513051e+06 | ... | 2805849.0 | 2449937.0 | 538645.0 | 576219.0 | 21534.0 | 99003.0 | 419470.0 | 3404.0 | 5.273168e+06 | 32.0 |
190 | DZM0012021 | DZM001 | Zambia Red Cross Society | Africa | ZMB | 2021 | NaN | NaN | 10.0 | 2.930602e+06 | ... | 3414257.0 | 14776.0 | 1249.0 | 4807.0 | NaN | 169.0 | 238278.0 | 1720.0 | 2.896418e+06 | 2.0 |
191 | DZW0012021 | DZW001 | Zimbabwe Red Cross Society | Africa | ZWE | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
192 rows × 25 columns
time_series_imputed.drop("nan",axis=1,inplace=True)
new_time_series_ip=pd.read_excel("FDRS2021NewKPIValues.xlsx")
# renaming the column by index
new_time_series_ip.columns.values[2:7] =["KPI_ReachRCRCEd_CPD_IP", "KPI_Climate_CPD_IP","KPI_ClimateHeat_CPD_IP","KPI_ReachHI_CPD_IP","KPI_ReachHPM_CPD_IP"]
time_series_imputed=time_series_imputed.merge(new_time_series_ip, how='left', on=['iso_3','KPI_Year'])
time_series_imputed
KPI ID | KPI_DON_code | NSO_DON_name | NSO_ZON_name | iso_3 | KPI_Year | KPI_DonBlood_Tot_IP | KPI_GB_Tot_IP | KPI_IncomeLC_CHF_IP | KPI_PStaff_Tot_IP | ... | KPI_ReachS_CPD_IP | KPI_ReachWASH_CPD_IP | KPI_TrainFA_Tot_IP | KPI_expenditureLC_CHF_IP | KPI_noLocalUnits_IP | KPI_ReachRCRCEd_CPD_IP | KPI_Climate_CPD_IP | KPI_ClimateHeat_CPD_IP | KPI_ReachHI_CPD_IP | KPI_ReachHPM_CPD_IP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DAF0012021 | DAF001 | Afghan Red Crescent | Asia Pacific | AFG | 2021 | NaN | NaN | 1.114095e+07 | 2309.0 | ... | NaN | 0.0 | 3182.0 | 1.156653e+07 | 122.0 | 105.0 | NaN | NaN | 455730.0 | 4646.0 |
1 | DAL0012021 | DAL001 | Albanian Red Cross | Europe and Central Asia | ALB | 2021 | NaN | 14.0 | 2.270842e+06 | 58.0 | ... | NaN | NaN | NaN | 1.913412e+06 | 39.0 | NaN | NaN | NaN | NaN | NaN |
2 | DDZ0012021 | DDZ001 | Algerian Red Crescent | Middle East and North Africa | DZA | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | DUS0012021 | DUS001 | American Red Cross | Americas | USA | 2021 | 2218346.0 | 14.0 | 2.787293e+09 | 17495.0 | ... | 1512380.0 | NaN | 6190964.0 | 2.521967e+09 | 234.0 | NaN | NaN | NaN | NaN | 38110.0 |
4 | DAD0012021 | DAD001 | Andorran Red Cross | Europe and Central Asia | AND | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
187 | DVE0012021 | DVE001 | Venezuelan Red Cross | Americas | VEN | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
188 | DVN0012021 | DVN001 | Vietnam Red Cross Society | Asia Pacific | VNM | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
189 | DYE0012021 | DYE001 | Yemen Red Crescent Society | Middle East and North Africa | YEM | 2021 | NaN | 6.0 | 6.513051e+06 | 454.0 | ... | 99003.0 | 419470.0 | 3404.0 | 5.273168e+06 | 32.0 | 18300.0 | NaN | NaN | 41879.0 | 1769102.0 |
190 | DZM0012021 | DZM001 | Zambia Red Cross Society | Africa | ZMB | 2021 | NaN | 10.0 | 2.930602e+06 | 66.0 | ... | 169.0 | 238278.0 | 1720.0 | 2.896418e+06 | 2.0 | 0.0 | 0.0 | NaN | 0.0 | 382.0 |
191 | DZW0012021 | DZW001 | Zimbabwe Red Cross Society | Africa | ZWE | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
192 rows × 29 columns
Code the values that are different.
0 for values that do not need to be updated,
1 for values that need to be updated.
KPI = requests.get(f"https://data-api.ifrc.org/api/indicator?apiKey=21e401ae-6b35-404b-a72a-b74cce66dee3").json()
KPI = json_normalize(data=KPI)
kpi=(KPI.query('KPI_Note== "NS Reach - CPD"')["KPI_Code"]).to_list()
kpi_code=kpi+["KPI_TrainFA_Tot","KPI_DonBlood_Tot"]+["KPI_GB_Tot","KPI_PeopleVol_Tot","KPI_PStaff_Tot","KPI_noLocalUnits","KPI_IncomeLC_CHF" ,"KPI_expenditureLC_CHF"]
#kpi_code=["KPI_PeopleVol_Tot","KPI_PStaff_Tot","KPI_DonBlood_Tot","KPI_TrainFA_Tot","KPI_noLocalUnits","KPI_ReachDRER_CPD","KPI_ReachLTSPD_CPD","KPI_ReachDRR_CPD","KPI_ReachS_CPD","KPI_ReachL_CPD","KPI_ReachH_CPD","KPI_ReachWASH_CPD","KPI_ReachM_CPD","KPI_ReachCTP_CPD","KPI_ReachSI_CPD","KPI_IncomeLC_CHF","KPI_expenditureLC_CHF"]
#kpi_code=["KPI_PeopleVol_Tot"]
# Convert to type int to remove decimals from calculations
#fdrs_data_fdrs_na_columns[kpi_code]=fdrs_data_fdrs_na_columns[kpi_code].fillna(0).astype('int')
#fdrs_data_fdrs_na_columns[kpi_code]=fdrs_data_fdrs_na_columns[kpi_code].replace(0,np.nan)
time_series_imputed
KPI ID | KPI_DON_code | NSO_DON_name | NSO_ZON_name | iso_3 | KPI_Year | KPI_DonBlood_Tot_IP | KPI_GB_Tot_IP | KPI_IncomeLC_CHF_IP | KPI_PStaff_Tot_IP | ... | KPI_ReachS_CPD_IP | KPI_ReachWASH_CPD_IP | KPI_TrainFA_Tot_IP | KPI_expenditureLC_CHF_IP | KPI_noLocalUnits_IP | KPI_ReachRCRCEd_CPD_IP | KPI_Climate_CPD_IP | KPI_ClimateHeat_CPD_IP | KPI_ReachHI_CPD_IP | KPI_ReachHPM_CPD_IP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DAF0012021 | DAF001 | Afghan Red Crescent | Asia Pacific | AFG | 2021 | NaN | NaN | 1.114095e+07 | 2309.0 | ... | NaN | 0.0 | 3182.0 | 1.156653e+07 | 122.0 | 105.0 | NaN | NaN | 455730.0 | 4646.0 |
1 | DAL0012021 | DAL001 | Albanian Red Cross | Europe and Central Asia | ALB | 2021 | NaN | 14.0 | 2.270842e+06 | 58.0 | ... | NaN | NaN | NaN | 1.913412e+06 | 39.0 | NaN | NaN | NaN | NaN | NaN |
2 | DDZ0012021 | DDZ001 | Algerian Red Crescent | Middle East and North Africa | DZA | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | DUS0012021 | DUS001 | American Red Cross | Americas | USA | 2021 | 2218346.0 | 14.0 | 2.787293e+09 | 17495.0 | ... | 1512380.0 | NaN | 6190964.0 | 2.521967e+09 | 234.0 | NaN | NaN | NaN | NaN | 38110.0 |
4 | DAD0012021 | DAD001 | Andorran Red Cross | Europe and Central Asia | AND | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
187 | DVE0012021 | DVE001 | Venezuelan Red Cross | Americas | VEN | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
188 | DVN0012021 | DVN001 | Vietnam Red Cross Society | Asia Pacific | VNM | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
189 | DYE0012021 | DYE001 | Yemen Red Crescent Society | Middle East and North Africa | YEM | 2021 | NaN | 6.0 | 6.513051e+06 | 454.0 | ... | 99003.0 | 419470.0 | 3404.0 | 5.273168e+06 | 32.0 | 18300.0 | NaN | NaN | 41879.0 | 1769102.0 |
190 | DZM0012021 | DZM001 | Zambia Red Cross Society | Africa | ZMB | 2021 | NaN | 10.0 | 2.930602e+06 | 66.0 | ... | 169.0 | 238278.0 | 1720.0 | 2.896418e+06 | 2.0 | 0.0 | 0.0 | NaN | 0.0 | 382.0 |
191 | DZW0012021 | DZW001 | Zimbabwe Red Cross Society | Africa | ZWE | 2021 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
192 rows × 29 columns
fdrs_data_fdrs_na_columns
KPI ID | KPI_DON_code | NSO_DON_name | NSO_ZON_name | iso_3 | KPI_Year | KPI_DonBlood_Tot | KPI_GB_Tot | KPI_IncomeLC_CHF | KPI_PStaff_Tot | ... | KPI_ReachS_CPD | KPI_ReachWASH_CPD | KPI_TrainFA_Tot | KPI_expenditureLC_CHF | KPI_noLocalUnits | KPI_ReachRCRCEd_CPD | KPI_Climate_CPD | KPI_ClimateHeat_CPD | KPI_ReachHI_CPD | KPI_ReachHPM_CPD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DAF0012018 | DAF001 | Afghan Red Crescent | Asia Pacific | AFG | 2018 | 0.000000e+00 | 15.0 | 1.803370e+07 | 2195.0 | ... | NaN | 60000.0 | 1.020000e+03 | 1.847249e+07 | 122.0 | 105.0 | NaN | NaN | 455730.0 | 4646.0 |
1 | DAL0012018 | DAL001 | Albanian Red Cross | Europe and Central Asia | ALB | 2018 | 6.531000e+03 | 14.0 | 1.555610e+06 | 53.0 | ... | 0.000000 | 47000.0 | 6.270900e+04 | 1.348786e+06 | 39.0 | NaN | NaN | NaN | NaN | NaN |
2 | DDZ0012018 | DDZ001 | Algerian Red Crescent | Middle East and North Africa | DZA | 2018 | NaN | NaN | NaN | NaN | ... | 10000.000000 | NaN | NaN | NaN | 248.0 | NaN | NaN | NaN | NaN | NaN |
3 | DUS0012018 | DUS001 | American Red Cross | Americas | USA | 2018 | 2.662986e+06 | 15.0 | 3.656908e+09 | 19649.0 | ... | 0.000000 | 0.0 | 2.283480e+06 | 3.193028e+09 | 263.0 | NaN | NaN | NaN | NaN | 38110.0 |
4 | DAD0012018 | DAD001 | Andorran Red Cross | Europe and Central Asia | AND | 2018 | 1.084000e+03 | 9.0 | 1.177623e+06 | 20.0 | ... | 0.000000 | 0.0 | 2.901000e+03 | 1.379745e+06 | 1.0 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
763 | DVE0012021 | DVE001 | Venezuelan Red Cross | Americas | VEN | 2021 | 3.175000e+02 | 11.0 | 0.000000e+00 | 742.0 | ... | 170.500000 | 126319.0 | 6.533000e+03 | 0.000000e+00 | 42.0 | NaN | NaN | NaN | NaN | NaN |
764 | DVN0012021 | DVN001 | Vietnam Red Cross Society | Asia Pacific | VNM | 2021 | 1.383143e+06 | 5.0 | 5.918624e+06 | 16806.0 | ... | 19931.333333 | 164013.0 | 1.301043e+05 | 2.976457e+06 | 17127.0 | NaN | NaN | NaN | NaN | NaN |
765 | DYE0012021 | DYE001 | Yemen Red Crescent Society | Middle East and North Africa | YEM | 2021 | 0.000000e+00 | 6.0 | 6.513051e+06 | 454.0 | ... | 99003.000000 | 419470.0 | 3.404000e+03 | 5.273168e+06 | 32.0 | 18300.0 | NaN | NaN | 41879.0 | 1769102.0 |
766 | DZM0012021 | DZM001 | Zambia Red Cross Society | Africa | ZMB | 2021 | 0.000000e+00 | 10.0 | 2.930602e+06 | 66.0 | ... | 169.000000 | 238278.0 | 1.720000e+03 | 2.896418e+06 | 59.0 | 0.0 | 0.0 | NaN | 0.0 | 382.0 |
767 | DZW0012021 | DZW001 | Zimbabwe Red Cross Society | Africa | ZWE | 2021 | 0.000000e+00 | 16.0 | 3.674074e+06 | 150.0 | ... | 5383.000000 | 287572.0 | 1.072267e+04 | 3.380148e+06 | 157.0 | NaN | NaN | NaN | NaN | NaN |
768 rows × 29 columns
kpi_code_imp= [s + "_IP" for s in kpi_code]
kpi_code_binary = [s + "_binary" for s in kpi_code]
#kpi_code=["KPI_PeopleVol_Tot","KPI_PStaff_Tot","KPI_DonBlood_Tot","KPI_TrainFA_Tot","KPI_noLocalUnits","KPI_ReachDRER_CPD","KPI_ReachLTSPD_CPD","KPI_ReachDRR_CPD","KPI_ReachS_CPD","KPI_ReachL_CPD","KPI_ReachH_CPD","KPI_ReachWASH_CPD","KPI_ReachM_CPD","KPI_ReachCTP_CPD","KPI_ReachSI_CPD","KPI_IncomeLC_CHF","KPI_expenditureLC_CHF"]
for kpi in list(kpi_code):
kpi_code_imp=kpi+"_IP"
kpi_code_binary=kpi+"_binary"
fdrs_data_fdrs_na_columns.loc[time_series_imputed[kpi_code_imp] == fdrs_data_fdrs_na_columns[kpi],kpi_code_binary]=0 #0 if same
fdrs_data_fdrs_na_columns.loc[time_series_imputed[kpi_code_imp] != fdrs_data_fdrs_na_columns[kpi],kpi_code_binary]=1 #1 if different
fdrs_data_fdrs_na_columns.loc[fdrs_data_fdrs_na_columns[kpi].isna(),kpi_code_binary]=2
#To be uncommented when the data is ready
fdrs_data_fdrs_na_columns.loc[fdrs_data_fdrs_na_columns[kpi_code_binary]==2,kpi_code_binary]=0
kpi_code_imp
'KPI_Climate_CPD_IP'
print(time_series_imputed[kpi_code_imp].dtypes)
KPI_Climate_CPD_IP float64 KPI_ClimateHeat_CPD_IP float64 KPI_ReachCTP_CPD_IP float64 KPI_ReachDRER_CPD_IP float64 KPI_ReachDRR_CPD_IP float64 KPI_ReachH_CPD_IP float64 KPI_ReachHI_CPD_IP float64 KPI_ReachHPM_CPD_IP float64 KPI_ReachL_CPD_IP float64 KPI_ReachLTSPD_CPD_IP float64 KPI_ReachM_CPD_IP float64 KPI_ReachRCRCEd_CPD_IP float64 KPI_ReachS_CPD_IP float64 KPI_ReachSI_CPD_IP float64 KPI_ReachWASH_CPD_IP float64 KPI_TrainFA_Tot_IP float64 KPI_DonBlood_Tot_IP float64 KPI_GB_Tot_IP float64 KPI_PeopleVol_Tot_IP float64 KPI_PStaff_Tot_IP float64 KPI_noLocalUnits_IP float64 KPI_IncomeLC_CHF_IP float64 KPI_expenditureLC_CHF_IP float64 dtype: object
print(fdrs_data_fdrs_na_columns[kpi_code].dtypes)
KPI_Climate_CPD float64 KPI_ClimateHeat_CPD float64 KPI_ReachCTP_CPD float64 KPI_ReachDRER_CPD float64 KPI_ReachDRR_CPD float64 KPI_ReachH_CPD float64 KPI_ReachHI_CPD float64 KPI_ReachHPM_CPD float64 KPI_ReachL_CPD float64 KPI_ReachLTSPD_CPD float64 KPI_ReachM_CPD float64 KPI_ReachRCRCEd_CPD float64 KPI_ReachS_CPD float64 KPI_ReachSI_CPD float64 KPI_ReachWASH_CPD float64 KPI_TrainFA_Tot float64 KPI_DonBlood_Tot float64 KPI_GB_Tot float64 KPI_PeopleVol_Tot float64 KPI_PStaff_Tot float64 KPI_noLocalUnits float64 KPI_IncomeLC_CHF float64 KPI_expenditureLC_CHF float64 dtype: object
time_series_imputed.to_excel("C:/Users/simon.weiss/OneDrive - IFRC/Tech talk and files/1_PowerBI/1_FDRS/0.Data_Sources/1_FDRS_Wide_Format_IP_ALL_Staging.xlsx",index=False)
2 methods:
payload={}
files={}
headers = {}
#read private fdrs api key from config files
f = open(module_path+"/src/config/api_key.txt", "r")
api_key=f.readline()
[User Input] Select Years to imputed below
years=[2021]
[User Input] Variables to impute
#kpi_code=["KPI_PeopleVol_Tot","KPI_PStaff_Tot","KPI_DonBlood_Tot","KPI_TrainFA_Tot","KPI_noLocalUnits","KPI_ReachDRER_CPD","KPI_ReachLTSPD_CPD","KPI_ReachDRR_CPD","KPI_ReachS_CPD","KPI_ReachL_CPD","KPI_ReachH_CPD","KPI_ReachWASH_CPD","KPI_ReachM_CPD","KPI_ReachCTP_CPD","KPI_ReachSI_CPD","KPI_IncomeLC_CHF","KPI_expenditureLC_CHF"]
kpi_code=["KPI_PeopleVol_Tot"]
for kpi in list(kpi_code):
for year in years:
for kpi_don_code in list(set(fdrs_data_fdrs_na_columns["KPI_DON_code"])):
for label, content in (fdrs_data_fdrs_na_columns[kpi][(fdrs_data_fdrs_na_columns["KPI_Year"]==year)&(fdrs_data_fdrs_na_columns["KPI_DON_code"]==kpi_don_code)]).items():
value = content
if type(value)==float:
if not np.isnan(value):
value=np.int64(value)
kpi_post=kpi+"_IP"
url = (f"https://data-api-staging.ifrc.org/api/ImputedKPI?apiKey={api_key}&kpicode={kpi_post}&year={year}&don_code={kpi_don_code}&value={value}&user=simon.weiss@ifrc.org")
r = requests.post(url)
print(url,r.text)
#fdrs_data_fdrs_na_columns_test.fillna(0,inplace=True)
for kpi in list(kpi_code):
kpi_binary = kpi + "_binary"
for year in years:
for kpi_don_code in list(set(fdrs_data_fdrs_na_columns["KPI_DON_code"])):
for bi in (fdrs_data_fdrs_na_columns[kpi_binary][(fdrs_data_fdrs_na_columns["KPI_Year"]==year)&(fdrs_data_fdrs_na_columns["KPI_DON_code"]==kpi_don_code)]):
binary = bi
if binary == 1.0:
for label, content in (fdrs_data_fdrs_na_columns[kpi][(fdrs_data_fdrs_na_columns["KPI_Year"]==year)&(fdrs_data_fdrs_na_columns["KPI_DON_code"]==kpi_don_code)]).items():
value = content
if type(value)==float:
if not np.isnan(value):
value=np.int64(value)
value = value
#if value == "nan":
if np.isnan(value):
value = ""
kpi_post=kpi+"_IP"
url = (f"https://data-api-staging.ifrc.org/api/ImputedKPI?apiKey={api_key}&kpicode={kpi_post}&year={year}&don_code={kpi_don_code}&value={value}&user=simon.weiss@ifrc.org")
r = requests.post(url,headers=headers, data=payload, files=files)
#print(url,r.text)
#print(r.request.body)
#print(r.request.headers)
df1 = pd.DataFrame([datetime.now()], columns=['Lattest Update Date'])
df1.to_excel(module_path+"data/processed/lattest_update.xlsx", index=False)
Time to execute all : before optimization : 111min
After optimization : 2min