How to Use Static Analysis Tools on Python Code

The Tech Platform
Dec 13, 2022
6 min read

Updated: Dec 14, 2022

Static code analyzers have significantly benefited application development in the past few years. Instead of discovering code issues or vulnerabilities in a production system or deployment, static analysis tells us where the code will fail based on the typing and other code hints.

What is Static Code Analysis?

Static code analysis is simply the examination of a program’s source code without actually running it. This allows you to find errors in your code before it becomes a problem. Static analysis can also find vulnerabilities in your code, making it more secure.

Python Code Analysis To show just how robust static analysis can be, I’m going to build this application fully in theory on a computer that doesn’t even have Python installed.

Then I’ll set up an analyzer and follow some of the prompts it produces. Modern scanners go beyond code style and formatting to address SAST vulnerabilities, null checks, and standards applications, which we’ll try to demonstrate with this example project.

Let’s start first with this application that checks to see if a node in a binary tree contains certain information and prints out the path to it. It takes in a file that is broken up into some node and relationship defintions. It parses those definitions into a tree, then searches for a specific node within that tree.

import os 
class node:
    def__init__(self, id, ele, root) ->None:
        self.id=id
        self.root=root
        self.ele=ele
        self.left=None
        self.right=None

def filter(dict, callback):
    newDict= {}
    for(key, value) indict.items():
        if(callback(key, value)):
            newDict[key] =value
    return newDict
    
def safe_get(list: list, i: int):
    try:
        returnlist[i]
    except IndexError:
        return None

def relationship_observer(dict, idx, line):
    id, *rest=line.split(' ')
    left_id=safe_get(rest, 0)
    right_id=safe_get(rest, 1)
    if(left_id):
        dict[id].left=dict[left_id]
    if(right_id):
        dict[id].right=dict[right_id]
        
def build_trees():
    dict= {}
    for path, subdirs, filesinos.walk('/tmp'):
        for nameinfiles:
            filePath=os.path.join(path, name)
            file=open(filePath)
            lines=file.read().splitlines()
            observer=None
            foridx, lineinenumerate(lines):
                if(line=='nodes'):
                    observer=node_observer
                elif(line=='relationships'):
                    observer=relationship_observer
            else:
                    observer(dict, idx, line)
    return [filter(dict, lambda aelem: elem[1].root), dict]
    
[roots, nodes] =build_trees()

print(f'Root Count: {len(roots)}')
print(f'Node Count: {len(nodes)}')

def findNode(node, path, search):
    if(node.data==search):
        return node, path
    else:
        left=findNode(node.left, path.copy().append(node.left.id))
        if(left.data==search):
            return left, path
        right=findNode(node.right, path.copy().append(node.right.id))
        if(right.data==search):
            return right, path
    return None, []
    
search='FindMe!'

for root in roots:
    rootNode=roots[root]
    target_node, path=findNode(rootNode, [], search)
    if(target_node!=None):
        print(f'Root node {rootNode.id}{rootNode.data} contains {search} under node {target_node.id} ({" => ".join(path)})')
        break
    else:
        print("Not Found")

In addition to the core application code, we’ll have a dockerfile:

FROM python:3
WORKDIR /app

COPY requirements.txt ./

RUN pip install — no-cache-dir -r requirements.txt

COPY . .

CMD [“python”, “./main.py”]

And docker-compose to run the application:

services:
  python:
    build: .
    volumes:
      - ./tmp:/tmp

Let’s get some basics down first — we’re going to install the MyPy extension for VS Code. This will get us static type checking. Post-installation we get some immediate feedback in that the findNode method call on L74 is underlined in red.

This is some good info! MyPy is immediately able to detect that we are missing a positional argument in the findNode method call for the path — we can add that in quickly by changing the line to:

target_node, path = findNode(root, ‘’, search)

MyPy doesn’t seem to understand that though. This is because we’re missing type annotations for what we expect this value to be. Let’s update the method to have some type annotations:

def findNode(node: node, path: list[str], search: str):

Almost immediately we get the feedback that we’re sending in the wrong type if we keep this as a string:

Switch this over to an array declaration, updating the call to be:

target_node, path = findNode(root, [], search)

But we have a problem now with several of the params having incorrect annotations and the findNode method body is covered in red underlines.

This is because we have defined the node class in the file, but it has no annotations and MyPy has assumed the types of those parameters to the best of its ability based on the code that is written. It’s going to need some more help to clear up these issues:

Let’s fix those annotations by setting up the node class with the annotations it needs. We’ll need to import the Optional annotation from the types library, then we’ll be able to use it to correct some of the static typing failures:

from typing import Optional

class node:
    def__init__(self, id: str, data: str, root: int, left: 
    Optional['node'] , right: Optional['node']) ->None:
        self.id=id
        self.root=root
        self.data=data
        self.left=left
        self.right=right

Now we’re seeing some more interesting output from MyPy in the findNode method — both the internal findNode call and the path.copy().append are underlined in red.

When we hover over the red underlines for the call to findNode, we see the error: Argument 1 to “findNode” has incompatible type “Optional[node]”; expected “node” [arg-type]mypy(error)

Note that we’re making the left and right node attributes optional as we want to load in the tree definitions from root to leaves so we won’t have the left and right nodes until they are created and the relationships enforced.

Because of this, MyPy is warning us that we’re trying to access a property that might be null, which would be a runtime error. Let’s add some quick guard code to these to clear up the analysis error:

def findNode(node: node, path: list[str], search: str):
    if(node.data == search):
        return node, path
    else:
        if(node.left):
            left = findNode(node.left, path.copy()
                            .append(node.left.id), search)
            if(left.data == search):
                return left, path
        if(node.right):
            right = findNode(node.right, path.copy()
                            .append(node.right.id), search)
            if(right.data==search):
                return right, path
    return None, []

This clears up the first error, however when we hover over the red underlines for the call to path.copy().append, we see another error: ”append” of “list” does not return a value [func-returns-value]mypy(error)

So it seems that we’ve put some code down here that does what we want, but because it does not return the mutated list there would have been an error at runtime.

Luckily the static analysis has given us a heads up so we didn’t waste precious time trying to debug why the app was crashing or operating in unexpected ways. Let’s quickly write up a corrected set of commands:

def findNode(node: node, path: list[str], search: str):
    if(node.data==search):
        return node, path
    else:
        if(node.left):
            left_path=path.copy()
            left_path.append(node.left.id)
            left=findNode(node.left, left_path, search)
            if(left.data ==s earch):
                    return left, path
        if(node.right):
            right_path=path.copy()
            right_path.append(node.right.id)
            right=findNode(node.right, right_path, search)
            if(right.data == search):
                return right, path
    return None, []

These are major issues with our application code that we discovered before ever actually running the code. This saved us quite a lot of time in trying to figure out why the output of the code is incorrect.

Bandit Let’s also take a look at another solution for scrubbing our code for security issues — Bandit, which is a great tool for SAST vulnerability scanning. Following the documentation link, Bandit is really easy to set up. Once it’s installed on the local system we can just switch our linter to Bandit and execute it against the current main.py file.

Bandit is correct. Our app code currently references from /tmp — a top-level directory in the Linux distribution and probable attack vector.

We should reference data from a local folder related to the application code, or allow for parameterized entries at the code entry point.

As this is our only warning, we’ll just switch to a local folder under ./data and modify the docker-compose to drop the data folder contents into the new local folder.

If you check out Bandit’s open-source repository, there are a number of example files as well you can import into the project to view other vulnerabilities that Bandit can expose in your own code. Next, let’s get some heuristics about our code.

Radon We can use Radon for this, it’s an open-source package for generating all sorts of interesting code metrics. After a quick and easy installation, we can point radon at our code file with the cc command (cyclomatic complexity) to get some cool metrics:

It doesn’t look like we’re getting any particularly scary results for cyclomatic complexity (which can be a leading indicator for maintainability) so that’s good.

An average complexity so low is a success, but in a larger codebase, we might not even want to see the low complexity results. Radon also has a flag --nc that would only show results of “C” or worse.

Let’s also check out some core metrics about the code with the raw command:

This prints out some great information about lines of code, differentiating logical lines of code (LLOC) from source lines of code (SLOC), comments, and what is essentially comment coverage. Let’s add two comments and see what the difference is:

The halstead complexity metrics are another really interesting set of metrics we can glean with the help of Radon:

This command additionally can be run individually for every function in the code files using the -f flag.

Another interesting command is mi. It prints out a maintainability index, which collects information from several of the other commands to give your code an overall rating for how easy it would be to maintain: