-
Notifications
You must be signed in to change notification settings - Fork 89
Multi processing bug fix #1026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi processing bug fix #1026
Conversation
Note to self: add unit tests which check return value of populate runs |
datajoint/autopopulate.py
Outdated
@@ -252,8 +256,7 @@ def handler(signum, frame): | |||
if reserve_jobs: | |||
signal.signal(signal.SIGTERM, old_handler) | |||
|
|||
if suppress_errors: | |||
return error_list | |||
return error_list, nkeys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does populate
need to return nkeys
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be useful in automating workflows, since it allows us to determine how many keys were found to be processed.
For example, this features is used in the neurophotonics project in check_db. This allows the EC2 instance to continue working until the user stops inserting keys into the database, for those times that keys are inserted after the instance started, but before the initial workload is finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current users expect populate
to return the error list. This change may break existing code. Changing the API should be done carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a valid concern, but I think it is also useful to be able to tell how many keys were processed.
I notice that there is nothing returned if suppress_errors == False
, the default value. What if the return was left as just error_list
in the case suppress_errors == True
, and returned nkeys only if suppress_erros == False
?
datajoint/autopopulate.py
Outdated
if processes < 0: | ||
raise Exception("processes must not be negative") | ||
elif processes == 0: | ||
return error_list, nkeys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here error_list
is empty. Why does this need to be return? What's the use case for processes == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, error_list will indeed always be an empty list, but I think it is best to return it anyway. If we do it like this, then autopopulate will always return a list and an integer, regardless of which case this happens to be. Since we can always expect the same return type, we will never have to later worry about which case we will encounter.
As for the use case of processes == 0
, this is the case which caused the error the PR is meant to solve. Consider the case in which you are using multiprocessing (that is, when you call this function, you have set processes to be greater than one), but the table you are populating has no keys remaining to process. This case occurred when automating the workflow of the neurophotonics project, since populate was being called on every table.
In this case, we would like for the function to do nothing and move on to the next task. Instead, it throws a value error. This is because of lines (215-216) in the autopopulate function.
if processes > 1:
processes = min(processes, nkeys, mp.cpu_count())
Since processes has been set to a value greater than 1, the second line here runs. Since the table has already been populated in this case, nkeys = 0. Since there is no reason for any of these three values to be negative, the minimum is 0. Thus processes=0
at this point.
Previously, if processes was not equal to 1, we would try to spawn multiple processes equal to processes
. Since 0 != 1, we try to call multiprocessing with a pool of 0 processes, which created a value error.
This line prevents this exception from being thrown by handling the case in which processes == 0
. Now, the function simply reports that there were no errors and no keys to process, and moves on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if nkeys <= 0
, it should just quietly exit. That's consistent with previous behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case nkeys == 0
, it does quietly exit. It returns ([], 0)
, but it doesn't mutate any values, initialize new variables or print anything to the screen. Users don't have to use the return value, but they can if they want to.
The case nkeys < 0
should never occur. My understanding is that this value represents the number of keys which need to be processed. If this value is negative, an error has occurred.
Make negative processes error more specific.
Co-authored-by: Dimitri Yatsenko <dimitri@datajoint.com>
@Ernaldis would you set the release date and actually make the release? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ernaldis Thanks for patching this! 🤝
Only have a small issue with the updated documentation since it doesn't seem correct. Although, should be straightforward to include.
@@ -173,8 +173,7 @@ def populate( | |||
:param limit: if not None, check at most this many keys | |||
:param max_calls: if not None, populate at most this many keys | |||
:param display_progress: if True, report progress_bar | |||
:param processes: number of processes to use. When set to a large number, then | |||
uses as many as CPU cores | |||
:param processes: number of processes to use. Set to None to use all cores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea to improve how this can be specified but doesn't seem to be true based on what is here.
If you do in fact set process=None
the following would happen:
- [216]: Should throw an error like:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
You could probably just replace that particular if-block with something like this:
min(nkeys, *([processes, mp.cpu_count()] if processes else [mp.cpu_count()]))
OR
min(*(_ for _ in (processes, nkeys, mp.cpu_count()) if _))
These approaches would leverage argument expansion to our benefit to expand only the relevant values into the min
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops good point.
146f07e
to
91c8db4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ernaldis This will still raise a TypeError
. If you update this test with processes=None
you will see what I am talking about. That test is meant for testing multiprocessing anyway. Probably you can just get rid of the if since it seems unnecessary. The default is already 1 which should always be the min.
|
||
if processes > 1: | ||
processes = min(processes, nkeys, mp.cpu_count()) | ||
processes = min(*(_ for _ in (processes, nkeys, mp.cpu_count()) if _)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, processes=0
will have the same effect as processes=None
.
This PR proposes a fix to the bug described in #1013. In brief, when a user tries to populate a table which has already been populated while also setting processes to be greater than 1, the population should be skipped, just like it is when processes is equal to 1. Instead, a value error is thrown.
This PR also alters the return value to also return the number of keys processed. This can be useful in automating workflows.
Closes #1013