Instance vs Static: A tale of memory leaks and OOP in Python

2019-04-27
3 minutes

Object-Oriented Programming (OOP) teaches that classes can have two kinds of attributes: Instance and Static. Instance variables are attached to a specific instance of the class, and each has separate memory locations. Static variables are tied to the class itself, and are shared between instances.

The difference between the two can be seen clearly in OOP-purist languages like Java. static denotes that the variable is static and attached to the class directly.

Java
public class A {
    public static int staticIndex = 0;
    public int instanceIndex = 0;
}


A a = new A();
A b = new B();

a.staticIndex += 1;
a.instanceIndex += 1;
b.staticIndex += 1;
b.instanceIndex += 1;

a.staticIndex; // 2
a.instanceIndex; // 1
b.staticIndex; // 2
b.instanceIndex; // 1
A.staticIndex; // 2

As programming goes, this working exactly how it should. The static variable is shared between each instantiation.

Python, in its infinite wisdom, doesn’t have any way of denoting whether a variable is static or not when defined at the class level, it just has variables. Taking the above code and mapping it as-is to Python yields some interesting results:

Python
class A():
    static_index = 0
    instance_index = 0


a = A()
b = B()

a.static_index += 1
a.instance_index += 1
b.static_index += 1
b.instance_index += 1


a.static_index  # 2
a.instance_index  # 2
b.static_index  # 2
b.instance_index  # 2

This is definitely not what’s wanted. Because Python doesn’t have any modifier for whether a variable is static, all variables defined at the class level are considered static. To create instance-level variables, these need to be defined in the constructor:

Python
class A():
    static_index = 0

    def __init__(self):
        self.instance_index = 0

This then works as expected.

#Memory leaks

Using a class to handle upstream API interaction is fairly common practice. However, if variables were defined on the class for convenience in defining their initial value or type, this could lead to unnecessary allocation and persistence.

Let’s see an example:

Python
class APIWrapper():
    data = {'page': 0, 'items': []}

    def fetch_data():
        response = requests.get('https://api.com/items.json').json()
        self.data['items'].append(response['items'])

With the above very simple example, we define a way of fetching data from an API, and access it using the .data attribute. In testing, this will almost certainly work fine. However, if the class gets re-instantiated, data persists between each instance, as it’s technically static, and so the new instance already has the previous data, before fetching data. After the data has been fetched, .data now contains duplicate information.

In use cases such as web application (e.g. Django), where threads are reused often to increase performance, this can lead to data persisting between responses, and hanging around in memory until the thread is disposed of (not technically true, but Python’s GC is a whole separate discussion). There’s no telling what madness could happen if this were mixed with asyncio.

If it were overriding the 'items' list rather than appending to it, the issue wouldn’t exist. If it were modifying specific keys of a dictionary, and only reusing those same keys, it would be missed just as easily, manifesting as simply higher than normal memory usage.

#The solution

There’s two key ways of fixing the above code: either do OOP properly (in Python), or stop storing things at the object level entirely. I personally lean much more towards the latter, although it may involve a larger refactor.

Python
class APIWrapper():
    def __init__(self):
        self.data = {'page': 0, 'items': []}

    def fetch_data():
        response = requests.get('https://api.com/items.json').json()
        self.data['items'].append(response['items'])

# or...

class APIWrapper():
    def fetch_data():
        data = {'page': 0, 'items': []}
        response = requests.get('https://api.com/items.json').json()
        data['items'].append(response['items'])

Both of these solve the issue by recreating the 'items' list for each instance of the variable, rather than once when the class is defined. This means when all references to the class are dropped, so too is the data related to it.

#The takeaway

If there’s one thing to take from this, it’s not to store things at the object level unless you really have to. Fetching from an API likely doesn’t need to store the responses themselves on the instance, and definitely doesn’t statically. Defining variables on the class and in the constructor do different things in Python vs other languages.

Share this page

Similar content

View all →

None

COUNTing is hard: A tale of types in SQL and Django

2023-06-03
5 minutes

Using an ORM adds a handy layer between you and the database tables. Sure, you can write SQL, but most developers find it easier to think of a list of complex objects than they do a collection of rows. However, it's important to remember what's going on under the hood…

Busy freeway traffic at night

Django ORM Performance

2020-06-07
10 minutes

Django already does some pretty incredible things when it comes to performance, especially in the ORM layer. The lazy loading, fluent interface for querying means it’ll only fetch the data you need, when you need it. But it can’t handle everything for you, and often needs some help to work…

" Identity layers " 
By penetrating any lower layer of a person, you can see a different face of that character and his appearance gradually disappears .

Changing the user-agent of urllib

2023-11-28
3 minutes

If you need to make HTTP requests in Python, you're probably using the fantastically simple and ergonomic requests library (or httpx in the async world). However, it's another dependency to manage and keep up-to-date. If you want to make HTTP requests without an additional dependency, or another library author has…