Multi processing bug fix #1026

Ernaldis · 2022-05-18T14:57:07Z

This PR proposes a fix to the bug described in #1013. In brief, when a user tries to populate a table which has already been populated while also setting processes to be greater than 1, the population should be skipped, just like it is when processes is equal to 1. Instead, a value error is thrown.

This PR also alters the return value to also return the number of keys processed. This can be useful in automating workflows.

Closes #1013

Standardize return

Ernaldis · 2022-05-18T15:00:54Z

Note to self: add unit tests which check return value of populate runs

dimitri-yatsenko · 2022-05-18T17:44:56Z

datajoint/autopopulate.py

@@ -252,8 +256,7 @@ def handler(signum, frame):
        if reserve_jobs:
            signal.signal(signal.SIGTERM, old_handler)

-        if suppress_errors:
-            return error_list
+        return error_list, nkeys


why does populate need to return nkeys now?

This can be useful in automating workflows, since it allows us to determine how many keys were found to be processed.
For example, this features is used in the neurophotonics project in check_db. This allows the EC2 instance to continue working until the user stops inserting keys into the database, for those times that keys are inserted after the instance started, but before the initial workload is finished.

Current users expect populate to return the error list. This change may break existing code. Changing the API should be done carefully.

That is a valid concern, but I think it is also useful to be able to tell how many keys were processed.

I notice that there is nothing returned if suppress_errors == False, the default value. What if the return was left as just error_list in the case suppress_errors == True, and returned nkeys only if suppress_erros == False?

dimitri-yatsenko · 2022-05-18T17:49:04Z

datajoint/autopopulate.py

+        if processes < 0:
+            raise Exception("processes must not be negative")
+        elif processes == 0:
+            return error_list, nkeys


here error_list is empty. Why does this need to be return? What's the use case for processes == 0?

In this case, error_list will indeed always be an empty list, but I think it is best to return it anyway. If we do it like this, then autopopulate will always return a list and an integer, regardless of which case this happens to be. Since we can always expect the same return type, we will never have to later worry about which case we will encounter.

As for the use case of processes == 0, this is the case which caused the error the PR is meant to solve. Consider the case in which you are using multiprocessing (that is, when you call this function, you have set processes to be greater than one), but the table you are populating has no keys remaining to process. This case occurred when automating the workflow of the neurophotonics project, since populate was being called on every table.

In this case, we would like for the function to do nothing and move on to the next task. Instead, it throws a value error. This is because of lines (215-216) in the autopopulate function.

if processes > 1: processes = min(processes, nkeys, mp.cpu_count())

Since processes has been set to a value greater than 1, the second line here runs. Since the table has already been populated in this case, nkeys = 0. Since there is no reason for any of these three values to be negative, the minimum is 0. Thus processes=0 at this point.

Previously, if processes was not equal to 1, we would try to spawn multiple processes equal to processes. Since 0 != 1, we try to call multiprocessing with a pool of 0 processes, which created a value error.

This line prevents this exception from being thrown by handling the case in which processes == 0. Now, the function simply reports that there were no errors and no keys to process, and moves on.

I think if nkeys <= 0, it should just quietly exit. That's consistent with previous behavior.

In the case nkeys == 0, it does quietly exit. It returns ([], 0), but it doesn't mutate any values, initialize new variables or print anything to the screen. Users don't have to use the return value, but they can if they want to.

The case nkeys < 0 should never occur. My understanding is that this value represents the number of keys which need to be processed. If this value is negative, an error has occurred.

datajoint/autopopulate.py

Make negative processes error more specific.

datajoint/autopopulate.py

Co-authored-by: Dimitri Yatsenko <dimitri@datajoint.com>

dimitri-yatsenko · 2022-05-18T22:02:52Z

@Ernaldis would you set the release date and actually make the release?

guzman-raphael

@Ernaldis Thanks for patching this! 🤝

Only have a small issue with the updated documentation since it doesn't seem correct. Although, should be straightforward to include.

guzman-raphael · 2022-05-19T15:28:50Z

datajoint/autopopulate.py

@@ -173,8 +173,7 @@ def populate(
        :param limit: if not None, check at most this many keys
        :param max_calls: if not None, populate at most this many keys
        :param display_progress: if True, report progress_bar
-        :param processes: number of processes to use. When set to a large number, then
-            uses as many as CPU cores
+        :param processes: number of processes to use. Set to None to use all cores


This is a great idea to improve how this can be specified but doesn't seem to be true based on what is here.

If you do in fact set process=None the following would happen:

[216]: Should throw an error like: TypeError: '>' not supported between instances of 'NoneType' and 'int'

You could probably just replace that particular if-block with something like this:

min(nkeys, *([processes, mp.cpu_count()] if processes else [mp.cpu_count()]))

OR

min(*(_ for _ in (processes, nkeys, mp.cpu_count()) if _))

These approaches would leverage argument expansion to our benefit to expand only the relevant values into the min function.

oops good point.

guzman-raphael

@Ernaldis This will still raise a TypeError. If you update this test with processes=None you will see what I am talking about. That test is meant for testing multiprocessing anyway. Probably you can just get rid of the if since it seems unnecessary. The default is already 1 which should always be the min.

dimitri-yatsenko · 2022-05-19T17:41:29Z

datajoint/autopopulate.py


-        if processes > 1:
-            processes = min(processes, nkeys, mp.cpu_count())
+        processes = min(*(_ for _ in (processes, nkeys, mp.cpu_count()) if _))


With this, processes=0 will have the same effect as processes=None.

Ernaldis and others added 4 commits April 27, 2022 17:39

attempt bug fix

0e010cc

Merge branch 'datajoint:master' into master

186421c

Update autopopulate.py

b34a02a

Standardize return

Merge branch 'datajoint:master' into multi-processing-bug-fix

e8652f7

Ernaldis added the bug Indicates an unexpected problem or unintended behavior label May 18, 2022

Ernaldis requested review from drewyangdev, guzman-raphael and jverswijver May 18, 2022 14:57

Ernaldis self-assigned this May 18, 2022

dimitri-yatsenko reviewed May 18, 2022

View reviewed changes

datajoint/autopopulate.py Outdated Show resolved Hide resolved

Ernaldis added 4 commits May 18, 2022 13:23

update error

5961f56

Make negative processes error more specific.

upadte change logs

7934794

fix whitespace

4324a0f

alter return type

faf117c

dimitri-yatsenko approved these changes May 18, 2022

View reviewed changes

dimitri-yatsenko reviewed May 18, 2022

View reviewed changes

datajoint/autopopulate.py Outdated Show resolved Hide resolved

Update datajoint/autopopulate.py

09c8336

Co-authored-by: Dimitri Yatsenko <dimitri@datajoint.com>

dimitri-yatsenko self-requested a review May 18, 2022 22:02

dimitri-yatsenko approved these changes May 18, 2022

View reviewed changes

jverswijver approved these changes May 18, 2022

View reviewed changes

set release date

ef0d217

guzman-raphael requested changes May 19, 2022

View reviewed changes

allow None processes

91c8db4

Ernaldis force-pushed the multi-processing-bug-fix branch from 146f07e to 91c8db4 Compare May 19, 2022 16:06

Ernaldis requested a review from guzman-raphael May 19, 2022 16:09

guzman-raphael requested changes May 19, 2022

View reviewed changes

fix value error and add unit test

cd46161

guzman-raphael approved these changes May 19, 2022

View reviewed changes

guzman-raphael merged commit 0ff34f2 into datajoint:master May 19, 2022

dimitri-yatsenko reviewed May 19, 2022

View reviewed changes

Ernaldis deleted the multi-processing-bug-fix branch May 19, 2022 18:19

Multi processing bug fix #1026

Multi processing bug fix #1026

Uh oh!

Conversation

Ernaldis commented May 18, 2022

Uh oh!

Ernaldis commented May 18, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dimitri-yatsenko commented May 18, 2022

Uh oh!

guzman-raphael left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guzman-raphael left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!