Data Scientist’s Guide to Efficient Coding in Python

We are giving real-life coding scenarios where we have actually used them!

1. Use tqdm when working with for loops.

Imagine looping over a large iterable (list, dictionary, tuple, set), and not knowing whether the code has finished running! Bummer, right! In such scenarios make sure to use tqdm construct to display a progress bar alongside.

For instance, to display the progress as I read through all the files present in 44 different directories (whose paths I have already stored in a list called fpaths):

from tqdm import tqdmfiles = list() 
fpaths = ["dir1/subdir1", "dir2/subdir3", ......]  

for fpath in tqdm(fpaths, desc="Looping over fpaths")):          

Using tqdm with “for“ loop

Note: Use the desc argument to specify a small description for the loop.

2. Use type hinting when writing functions.

In simple terms, it means explicitly stating the type of all the arguments in your Python function definition.

I wish there were specific use cases I could provide to emphasize when I use type hinting for my work, but the truth is, I use them more often than not.

Here’s a hypothetical example of a function update_df(). It updates a given data frame by appending a row containing useful information from a simulation run — such as classifier used, accuracy scored, train-test split size, and additional remarks for that particular run.

def update_df(df: pd.DataFrame,                
    clf: str,               
     acc: float,               
     remarks: List[str] = []
     split:float = 0.5) -> pd.DataFrame:      
 new_row = {'Classifier'<