Allow running one-off scripts manually
Open, Needs TriagePublic
Actions

Assigned To

Authored By

	Joe
	Jul 11 2023, 8:58 AM

Description

Our developers/deployers are used to be able to launch mediawiki maintenance scripts using mwscript as follows:

ssh to the maintenance server
run mwscript <script-name> --wiki <wiki> or mwscript <script-name> <wiki>

How can we get developers to run one-off scripts on kubernetes?

I would imagine it would go as follows:

Add a Job definition to the mediawiki chart. Make it possible to apply either the Deployment or a Job. The Job should allow values to inject the arguments to "mwscript".
Create a dedicated namespace to run these one offs
Each Job should be a separate helm release, if we want multiple Jobs to be launched in parallel - that's the only way these resources can share the same namespace. We need to check if it's possible to adapt our helmfile to accept arbitrary release names.
A small wrapper called something like mwscript-k8s should check the user name, generate a random release name, run helm(file) passing the arguments from CLI as a value we'll inject as args for the container.

Details

Subject	Repo	Branch	Lines +/-
scap: block interactive maintenance scripts on mwmaint	operations/puppet	production	+45 -16
scap: Loud deprecation warning for mwscript, now officially unsupported	operations/puppet	production	+20 -5
deployment_server: mwscript-k8s usability improvements	operations/puppet	production	+13 -6
scap: Also exclude (my)sql from mwscript deprecation warning	operations/puppet	production	+1 -1
scap: Add a deprecation warning to classic mwscript	operations/puppet	production	+8 -0
scap: Exclude mysql.php from mwscript deprecation warning	operations/puppet	production	+2 -2
deployment_server: Pass $HOME when mwscript_k8s shells out to kubectl	operations/puppet	production	+4 -2
deployment_server: Give mwscript-k8s --verbose more granular options	operations/puppet	production	+4 -3
deployment_server: mwscript_k8s uses report.json	operations/puppet	production	+17 -4
deployment_server: Add a daily systemd timer for mwscript_cleanup	operations/puppet	production	+13 -0
admin_ng: RBAC to allow mw-script user to attach to pods	operations/deployment-charts	master	+28 -0
deployment_server: Add a mwscript-k8s cleanup script	operations/puppet	production	+120 -0
deployment_server: Rework mwscript_k8s flags	operations/puppet	production	+12 -11
deployment_server: Label and annotation improvements for mwscript-k8s	operations/puppet	production	+6 -3
mediawiki: Add a comment annotation for mwscript jobs	operations/deployment-charts	master	+10 -1
mediawiki: Add mwscript labels to the job as well as the pods	operations/deployment-charts	master	+6 -1
deployment_server: Typo fix in mwscript_k8s.py	operations/puppet	production	+1 -1
deployment_server: Add missing env variables to mwscript_k8s	operations/puppet	production	+2 -0
deployment_server: Add mwscript_k8s	operations/puppet	production	+190 -0
Add helmfile for running MediaWiki one-off jobs.	operations/deployment-charts	master	+84 -0
mediawiki: Support one-off jobs	operations/deployment-charts	master	+78 -2
admin_ng: Add mw-script namespace	operations/deployment-charts	master	+24 -0
hieradata: Add kubeconfig files for mw-script	operations/puppet	production	+4 -0

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		dancy	T392693 Drop support for Debian Buster
Open		None	T341560 Migrate mwmaint server functionality to mw-on-k8s
Open		RLazarus	T341553 Allow running one-off scripts manually
Resolved		RLazarus	T348284 Handle sidecar containers in one-off Kubernetes jobs
Resolved		RLazarus	T359127 MW image version for maintenance scripts
Resolved		RLazarus	T367200 mw-script fails to render in CI
Resolved		RLazarus	T368966 Pipe stdin into one-off maintenance scripts on Kubernetes
Resolved		RLazarus	T369142 Show more useful information when mwscript-k8s fails to launch
Declined		None	T369143 Allow cleaning up specific mwscript-k8s runs
Resolved		RLazarus	T369175 mwscript-k8s --attach error: TypeError: 'NoneType' object is not iterable
Resolved	BUG REPORT	RLazarus	T369676 `mwscript-k8s --attach` seems to terminate IO after a few seconds without input
Resolved		RLazarus	T376099 --timeout flag for mwscript-k8s
Resolved		RLazarus	T376230 Support bringing text files into the container for one-off maintenance scripts
Resolved		Lucas_Werkmeister_WMDE	T376604 [PS] Update PropertySuggester update process for mwscript-k8s
Resolved		Arian_Bozorg	T377986 Q4 2024 update of Property Suggester data
Open		None	T376616 MWScript.php doesn't allow wikiless scripts without the .php suffix
Resolved		RLazarus	T376714 Evaluate running a statsd-exporter in the mw-script namespace
Open		None	T376776 mw-scripts SAL integration
Resolved		RLazarus	T377292 Support machine-readable output for mwscript-k8s
Resolved		RLazarus	T378429 Allow members of restricted to run maintenance scripts
Resolved		RLazarus	T378479 Allow using helper scripts inside of mwscript-k8s
Open		None	T378754 Persistent logging of mwscript-k8s runs
Open		None	T379675 Support output files in mwscript-k8s
Resolved		RLazarus	T380925 Support passing env variables to maintenance scripts in mwscript-k8s
Resolved		tstarling	T381251 MW maintenance scripts on k8s can't do internal HTTP requests
Duplicate		None	T382398 Mediawiki maint scripts using service proxied by the tls proxy might fail when running with mwscript-k8s
Invalid		None	T385818 mwscript-k8s fails when there are parenthesis used in arguments
Duplicate		RLazarus	T387127 mwscript-k8s purgeList does not reliably purge cached URLs
Resolved		Joe	T387208 Ensure tls-proxy container is started before launching main container
Resolved		hashar	T387480 Beta update job fails: The service mesh is unavailable, which can lead to unexpected results.
Open		None	T390972 Restart CronJobs on failure of the service mesh
Open		RLazarus	T387268 Kubernetes maintenance script status UI
Resolved		Clement_Goubert	T389484 Create a mediawiki-cli image
Resolved		Scott_French	T389499 Refactor scap's kubernetes DeploymentsConfig to support selection of image kinds
Open		None	T394534 Visualise mw-script jobs

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

On another note, how do we think about one-off maintenance scripts? mwscript allows me to run a script from my home, which I used before to debug issues that were difficult to debug otherwise. It seems that in the new system, I'd need to either copy and paste the whole script to shell.php or backport every single change one needs to try out. Any thoughts on how to proceed with those issues as we move ahead to the new system,

In T341553#10217538, @Urbanecm_WMF wrote:

On another note, how do we think about one-off maintenance scripts? mwscript allows me to run a script from my home, which I used before to debug issues that were difficult to debug otherwise. It seems that in the new system, I'd need to either copy and paste the whole script to shell.php or backport every single change one needs to try out. Any thoughts on how to proceed with those issues as we move ahead to the new system,

I think this is somewhere at the intersection between this task and T276994: Provide an mwdebug functionality on kubernetes (mw-experimental)?

In T341553#10217571, @Lucas_Werkmeister_WMDE wrote:

In T341553#10217538, @Urbanecm_WMF wrote:

On another note, how do we think about one-off maintenance scripts? mwscript allows me to run a script from my home, which I used before to debug issues that were difficult to debug otherwise. It seems that in the new system, I'd need to either copy and paste the whole script to shell.php or backport every single change one needs to try out. Any thoughts on how to proceed with those issues as we move ahead to the new system,

I think this is somewhere at the intersection between this task and T276994: Provide an mwdebug functionality on kubernetes (mw-experimental)?

It depends on the point of view. If the whole mwmaint machine goes away, then I can't do this sort of debugging anywhere. Granted, if mwdebug* stays as a VM (rather than a pod), I can get a shell there and either use the mwscript wrapper or run MWScript.php myself. I'd say it's the same use case, but present when running scripts, rather than when working with requests coming from the internet.

RLazarus closed subtask T376099: --timeout flag for mwscript-k8s as Resolved.Oct 10 2024, 7:36 PM

In T341553#10217525, @Urbanecm_WMF wrote:

[...]
The mediafiles can be very large – I've certainly uploaded files that had dozens of GBs in total. As long as mwmaint had enough space and sufficient sleep was allowed for videoscalers to catch up, things worked well.

The size is going to be the biggest challenge here, not just the number of files. Copying all that data in at startup time, and then uploading it from there, probably isn't going to work -- and it wouldn't be all that efficient even if it did. We'll come up with another solution for this; I'll open a separate task.

@RLazarus Not sure whether this falls under T376230: Support bringing text files into the container for one-off maintenance scripts, or whether a new task should be here. Reporting here as recommended by the deprecation warning – let me know if I should create a new task for this.

(In general, comments here are fine and subtasks of this one are fine too. Thanks for thinking about it.)

In T341553#10217538, @Urbanecm_WMF wrote:

On another note, how do we think about one-off maintenance scripts? mwscript allows me to run a script from my home, which I used before to debug issues that were difficult to debug otherwise. [...]

The solution for this will probably look like T376230 -- we'll provide a one-step way to bring your local script into the pod as a ConfigMap Volume and launch it. There are drawbacks to allowing that -- right now, because all the code is in the image, every maintenance script run is completely reproducible, with no potential for "if the problem comes back, just run foo.php out of so-and-so's homedir -- oops, I mean run the previous version of it." But in practice that's probably outweighed by the benefits of the kind of flexible debugging you describe.

In T341553#10219135, @Urbanecm_WMF wrote:

If the whole mwmaint machine goes away, then I can't do this sort of debugging anywhere. Granted, if mwdebug* stays as a VM (rather than a pod), [...]

The intent is for all MediaWiki in production to be on Kubernetes, so MediaWiki installations on any bare metal machine or VM, including both mwmaint* and mwdebug*, will eventually go away. (We don't intend to upgrade PHP to 8.1 in production outside of Kubernetes, for example.)

akosiaris closed subtask T376714: Evaluate running a statsd-exporter in the mw-script namespace as Resolved.Oct 14 2024, 10:55 AM

RLazarus mentioned this in T377292: Support machine-readable output for mwscript-k8s.Oct 16 2024, 1:15 AM

RLazarus mentioned this in T377497: Functional replacement for importImages.php on Kubernetes.Oct 17 2024, 7:11 PM

RLazarus closed subtask T377292: Support machine-readable output for mwscript-k8s as Resolved.Oct 22 2024, 6:43 PM

Hard learned feedback: Once starting a mwscript-k8s job, it is non-obvious how to terminate it. From today's example:

[urbanecm@deploy2002 ~]$ mwscript-k8s -f extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=nowiki --limit=10
⏳ Starting extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php on Kubernetes as job mw-script.codfw.5aoy4dbk ...
🚀 Job is running.
📜 Streaming logs:
Skipped 'Wikipedia:Flow/sandkasse' as it is already a subpage 
Moved 'Brukerdiskusjon:Roan Kattouw (WMF)' to 'Brukerdiskusjon:Roan Kattouw (WMF)/Flow'
Created stub at 'Brukerdiskusjon:Roan Kattouw (WMF)'
^C🔁 To resume streaming logs, run:
K8S_CLUSTER=codfw KUBECONFIG=/etc/kubernetes/mw-script-codfw.config kubectl logs -f job/mw-script.codfw.5aoy4dbk mediawiki-5aoy4dbk-app
[urbanecm@deploy2002 ~]$

In this case, I quickly Ctrl+C'ed once I realised the script is running with wrong parameters, but the job continued running instead, with no info on how to terminate it.

Sorry this happened. Unfortunately it's kind of working as intended -- not because it's supposed to be hard to kill a job when you want to kill it, but because the job is supposed to keep working after the mwscript-k8s launcher terminates. (Thus preventing the "oops, I forgot to start it in a tmux and now I'm stuck" scenario.)

Instructions for terminating the job are here: https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Terminating_a_job

In general the intent is to use standard kubectl commands to interact with the Kubernetes job once it's created. We know everyone isn't familiar with those yet, so some common cases are detailed there. (That's rather than creating a bunch of additional commands like mwscript-k8s-delete or something -- which would be equally unfamiliar at first, but less useful once you learned them.)

That's not to say "your fault, should have gone and read the wiki!" but rather -- how can I do better at putting that information where you'll find it, without printing a million extra lines of assorted information from mwscript-k8s on each run?

In T341553#10260109, @RLazarus wrote:

That's not to say "your fault, should have gone and read the wiki!" but rather -- how can I do better at putting that information where you'll find it, without printing a million extra lines of assorted information from mwscript-k8s on each run?

This may be a goofy suggestion, but I feel that needing to remember to setup the correct credentials with kube_env mw-script-deploy codfw before being able to use kubectl ... commands is a cognitive burden that could be replaced by mwscript-k8s making that environmental change as part of its normal operation. There probably will be a few power users that this somehow annoys, but it seems likely to me that people using kubectl on the deploy hosts will most often want to look at the namespace they most recently touched with mwscript-k8s, helmfile, or any other launcher scripts we provide. If the Kubernetes authn was setup in their environment already I think it might be enough to output just a simple "Use kubectl to manage the new job" reminder when mwscript-k8s starts a new job. Bonus points if the reminder includes the job name so that the user can skip doing a kubectl get job -l username=$USER lookup (e.g. "Use kubectl to manage job.batch/mw-script.codfw.xnd1s33p as needed.").

In T341553#10260109, @RLazarus wrote:

That's not to say "your fault, should have gone and read the wiki!" but rather -- how can I do better at putting that information where you'll find it, without printing a million extra lines of assorted information from mwscript-k8s on each run?

Suggestion: Catch ctrl-c and print out something like

⚠️ It looks like you may have wanted to stop your script's execution, use:
kube-env mw-script-deploy codfw; kubectl delete job mw-script.codfw.foo

then proceed to exit mwscript-k8s ?

In T341553#10260574, @bd808 wrote:

[...] needing to remember to setup the correct credentials with kube_env mw-script-deploy codfw before being able to use kubectl ... commands is a cognitive burden that could be replaced by mwscript-k8s making that environmental change as part of its normal operation.

There's a couple of moving parts there.

One is the idea of mwscript-k8s modifying your shell environment to effectively run kube_env for you. (It can't do that, because it's a child process, but we can imagine wrapping the Python wrapper in a shell script for that purpose. It's an extra layer of stuff, but we could do it.)

We've discussed this but leaned away from it, because of the effect on both experienced users -- who, yeah, might be reasonably annoyed that we mutated their environment without being asked -- and on inexperienced users, who will be confused by the resulting magical behavior if they're not familiar with kube_env. ("This kubectl command worked fine before, how come now the same command doesn't work in a different window?")

We could mitigate the effect on experienced users by touching those variables only if they're unset and printing a reminder otherwise -- i.e., we'll kube_env for you if you haven't already, but we won't mess with it if you're in the middle of something. But I think the other problem is a real issue.

(That's not quite true, we haven't previously talked about mwscript-k8s modifying the environment directly -- but when we print out a kubectl command for your convenience, we prefix it with the environment variables like

K8S_CLUSTER=[...] KUBECONFIG=[...] kubectl logs [...]

instead of just saying

kube_env mw-script codfw; kubectl logs [...]

because it's there for people to copy-and-paste without paying much attention, and we don't want that to have confusing environmental side effects. Actually running it on your behalf would be a step further.)

That's thing one. Thing two is auto-upgrading to the mw-script-deploy user (full deployment privileges) rather than mw-script (read-only).

That would mean you can do anything the deploy user can do, like delete jobs, without having to think about credentials. It would be convenient in the same way that always logging into a root shell is convenient: switching to the deploy user is like typing sudo, or flipping up the little plastic cover on the big red button, in that it's an extra step you probably should have to take before a powerful (and therefore potentially dangerous) action.

If we set your kube config to anything (having found solutions to the above) we would probably want it to be the mw-script user. You could easily use commands like kubectl get and describe and logs to get job information, but we'll still want you to flip up the molly-guard with kube_env mw-script-deploy codfw before you start deleting stuff.

Bonus points if the reminder includes the job name so that the user can skip doing a kubectl get job -l username=$USER lookup (e.g. "Use kubectl to manage job.batch/mw-script.codfw.xnd1s33p as needed.").

This is a good idea -- we do always print it in the first line of the output already, so that you can skip the get, but this would be a smart place to print it again.

In T341553#10262044, @Clement_Goubert wrote:
Suggestion: Catch ctrl-c and print out something like
⚠️ It looks like you may have wanted to stop your script's execution, use:
kube-env mw-script-deploy codfw; kubectl delete job mw-script.codfw.foo
then proceed to exit mwscript-k8s ?

We could, yeah -- we do catch the ctrl-C in order to print the

🔁 To resume streaming logs, run: [...]

message that @Urbanecm_WMF mentioned in T341553#10259819, so we could add another line about how to stop the script. (I worry about it getting long and spammy, at which point it doesn't matter how helpful it is because nobody'll read the whole thing, but this tradeoff might be a good one.)

Note that only helps people who use -f; otherwise you never get a chance to try and ctrl-C it.

In T341553#10260109, @RLazarus wrote:

Sorry this happened. Unfortunately it's kind of working as intended -- not because it's supposed to be hard to kill a job when you want to kill it, but because the job is supposed to keep working after the mwscript-k8s launcher terminates. (Thus preventing the "oops, I forgot to start it in a tmux and now I'm stuck" scenario.)

Yeah, I understand that. I'm wondering how often Ctrl+C would be because a "this is not what i wanted" scenario (my case) versus a "i need to close my session"

Instructions for terminating the job are here: https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Terminating_a_job

Good to know, thanks! FWIW, this (the docs page in general, not this particular section) should definitely be linked from somewhere in k8s. I did not realise this page exists; but if I did, it would've answered the "how to terminate" question pretty quickly. Not sure if it makes sense in a Ctrl+C context (termination and logs following are the only scenarios I see from that), but at the very least, --help should have that link.

In general the intent is to use standard kubectl commands to interact with the Kubernetes job once it's created. We know everyone isn't familiar with those yet, so some common cases are detailed there. (That's rather than creating a bunch of additional commands like mwscript-k8s-delete or something -- which would be equally unfamiliar at first, but less useful once you learned them.)

I'm in full support of this, and I did proceed to searching for the appropriate k8s deletion command, but I didn't find that before the job finished up.

In T341553#10262044, @Clement_Goubert wrote:
Suggestion: Catch ctrl-c and print out something like
⚠️ It looks like you may have wanted to stop your script's execution, use:
kube-env mw-script-deploy codfw; kubectl delete job mw-script.codfw.foo
then proceed to exit mwscript-k8s ?

I like this. An alternative could be to explicitly ask the user whether they want the job to stop by a [y/N] prompt. That way, intentional termination can be done by doing a Ctrl+C; Y, while preserving the container when the wrapper terminates out of luck.

Slightly different thing I just realised: This is an instance of a more general problem of "user needs to be aware job is running independently on the wrapper". There are other situations that might happen – for example, what if someone suspends the wrapper via SIGSTOP or something, expecting the job to pause itself? Is that something the new system is taking into account somehow?

In T341553#10263470, @Urbanecm_WMF wrote:

at the very least, --help should have that link.

Good suggestion, will do.

An alternative could be to explicitly ask the user whether they want the job to stop by a [y/N] prompt. That way, intentional termination can be done by doing a Ctrl+C; Y, while preserving the container when the wrapper terminates out of luck.

Interesting! I'm trying to balance the explicitness of that (which is good) against the fact that it sort of violates expectations -- if I hit ctrl-C, I really expect the program to die, not ask me more questions. Let's try the middle-ground approach of printing the delete command along with the logs command, and see if that works well enough.

This is an instance of a more general problem of "user needs to be aware job is running independently on the wrapper". There are other situations that might happen – for example, what if someone suspends the wrapper via SIGSTOP or something, expecting the job to pause itself? Is that something the new system is taking into account somehow?

I think this is the right big-picture statement of the problem. The fact is, we've moved to Kubernetes. Your script is now running in a container on a whole other computer than the one you're logged into, maybe in another data center -- things really are different now! That has upsides, like being more reliable, and downsides, like being more complicated.

We mitigated some of those downsides by writing the mwscript-k8s launcher. It wraps up some of that complexity and takes care of it for you; you don't have to figure out how to write an entire Kubernetes Job config. It has convenience features, like -f to automatically run kubectl logs after launching the job, because that's a super common thing to want -- but it doesn't attempt to make it look exactly like the job is running locally.

It can't and probably shouldn't look like that! There are just too many ways that sets you up with wrong intuitions about what's really going on. Your SIGSTOP example is spot-on -- we could handle SIGINT by pretending you actually SIGINTed a different process running on a different computer... but then you'll just be surprised when other signals don't work that way. The experience of using mwscript-k8s is supposed to be transparent, but not invisible -- make it easy to see what's going on with your Kubernetes job running elsewhere, but not so that you can forget that's what it is. And in particular, if a user interacts with the launcher thinking it's their script (trying to pause it with SIGSTOP, for example) then we've done them a disservice by giving them the wrong impression of what they're actually doing.

Some of the other upsides of being on Kubernetes come later, when we introduce features that take more advantage of the platform -- retry policies, sharding for parallelism, and so on -- but in order to be able to take advantage of them later, we need to be careful about letting users establish the right mental model now.

Change #1083284 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] deployment_script: mwscript-k8s usability improvements

https://gerrit.wikimedia.org/r/1083284

Change #1083284 merged by RLazarus:

[operations/puppet@production] deployment_server: mwscript-k8s usability improvements

https://gerrit.wikimedia.org/r/1083284

Recording here that I'm noticing myself still running one-off scripts on the maint-hosts because, as I understand it, for the new way of running them, I would need deployer-rights, and I do not have (and do not really want) those.

So going forward, I have to either ask for more priviledges which allow me to do many more things than I actually want to do, or I have to ask someone else to run the maintenance scripts for me in the future. Both are not great, but maybe these tradeoffs are worth it.

In T341553#10268045, @Michael wrote:

Recording here that I'm noticing myself still running one-off scripts on the maint-hosts because, as I understand it, for the new way of running them, I would need deployer-rights, and I do not have (and do not really want) those.

So going forward, I have to either ask for more priviledges which allow me to do many more things than I actually want to do, or I have to ask someone else to run the maintenance scripts for me in the future. Both are not great, but maybe these tradeoffs are worth it.

Thanks for this, filed T378429.

RLazarus closed subtask T376230: Support bringing text files into the container for one-off maintenance scripts as Resolved.Nov 5 2024, 12:49 AM

Is there a way to pass an env variable? The need came up in T380575: Make SUL3 authentication domain mode available from CLI; old mwscript doesn't do it smoothly either, but it can be made to work, which I suspect won't be the case with mwscript-k8s.

Circling back to the topic of job termination (previously discussed at T341553#10259819 et seq). I needed to run userOptions.php today, and I discovered that script's guidance is inappropriate/misleading in a mwscript-k8s context. Here is the output:

[urbanecm@deploy2002 ~]$ mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=placeholder --delete 'placeholder-for-Txxxxx'
⏳ Starting userOptions.php on Kubernetes as job mw-script.codfw.m0fqt9ft ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
The script is about to delete 'placeholder-for-Txxxxx' option for ALL USERS from user_properties table.
This action is IRREVERSIBLE.

Abort with control-c in the next five seconds....0
Done! Deleted 0 rows.
[urbanecm@deploy2002 ~]$

This kind of guidance is inappropriate for two reasons:

Hitting control-c no longer terminates the job (so blindly following the script's guidance will execute the job)
Under mwscript-k8s, the script waited five seconds first and then displayed the message (rendering it useless at that point). Under mwscript, the "Abort with" message displays immediately (and has a countdown), giving the user a chance to realise their mistake and terminate the command. I recorded the difference as a screencast below.

Since mwscript-k8s now prints guidance on ctrl+c, the first issue is probably a low-priority thing (we likely do not want to hardcode stuff specific to mwscript-k8s to userOptions.php itself, and hitting ctrl+c will print the right guidance; the only thing that could be done is explicitly asking the user about their intention, as I suggested at T341553#10263470, but I understand the concerns previously raised by @RLazarus).

However, the second problem seems much more serious. The script attempts to give users the opportunity to terminate it, but mwscript-k8s does not inform about that possibility until after the "termination window" is over. That probably should not happen – I'd expect the countdown to be visible in mwscript-k8s as well (if changing output is too much of a challenge, we can print the line five times or something, but at least it would be there).

Can we do something about this?

Screenshot of the second problem

In T341553#10357946, @Tgr wrote:

Is there a way to pass an env variable? The need came up in T380575: Make SUL3 authentication domain mode available from CLI; old mwscript doesn't do it smoothly either, but it can be made to work, which I suspect won't be the case with mwscript-k8s.

Not currently, but it could be done. I'm not sure offhand if there should be limits on that (e.g. do we want to make it easy to set $SERVER_NAME to an arbitrary value, or would that have Disastrous Consequences™)? I'll open a subtask to discuss further. (One of the component questions is, "is an env variable actually the best way to do what you're trying to do?" but I'm certainly willing to believe the answer is yes if you say it is.)

In T341553#10358334, @Urbanecm_WMF wrote

Hitting control-c no longer terminates the job (so blindly following the script's guidance will execute the job)

[...]
Since mwscript-k8s now prints guidance on ctrl+c, the first issue is probably a low-priority thing (we likely do not want to hardcode stuff specific to mwscript-k8s to userOptions.php itself, and hitting ctrl+c will print the right guidance; the only thing that could be done is explicitly asking the user about their intention, as I suggested at T341553#10263470, but I understand the concerns previously raised by @RLazarus).

Yeah agreed -- "abort with control-c" is now misleading advice and should be updated one way or another.

I'm hesitant about "we likely do not want to hardcode stuff specific to mwscript-k8s": definitely broadly true, in that other MediaWiki installations might not be using Kubernetes at all. But we will be, exclusively, for many years -- our maintenance scripts should be at home there, even if it means adjusting them. The conclusion is they need to be at home in both contexts.

One way to do that (probably overkill in this case, gesturing at the broader scenario) is to detect whether the script is running locally or on Kubernetes (e.g. via the presence of the $KUBERNETES_PORT env variable) and print a message appropriate to the situation.

Under mwscript-k8s, the script waited five seconds first and then displayed the message (rendering it useless at that point). Under mwscript, the "Abort with" message displays immediately (and has a countdown), giving the user a chance to realise their mistake and terminate the command. I recorded the difference as a screencast below.

[...]
The script attempts to give users the opportunity to terminate it, but mwscript-k8s does not inform about that possibility until after the "termination window" is over. That probably should not happen – I'd expect the countdown to be visible in mwscript-k8s as well (if changing output is too much of a challenge, we can print the line five times or something, but at least it would be there).

The current implementation of countDown uses ascii backspace characters to rewrite the line. I think kubectl logs -f only receives a flushed buffer after a newline, which is why you didn't see the message until after the countdown. I wouldn't be surprised if it works under mwscript-k8s with --attach rather than --follow, but offhand I don't know for sure.

For a real countdown, printing the line five times is probably the way to go -- or just printing a single line continuing in five seconds... and then sleeping, if you don't need second-by-second updates.

But separately, from a safety/UI standpoint I'd be careful with this; you don't know if the user is even looking. (As ever, their wifi might have hung, or they might have looked away to take a sip of their tea -- but now additionally they might not be looking at the logs until after the fact. Or if they're streaming with kubectl logs -f but it gets behind for whatever reason, it will asynchronously catch up, maybe too late for the countdown to be meaningful.)

If you really need confirmation, safer to say are you sure? y/n and wait for confirmation. (You can skip this step if stdout isn't a TTY, so the script can be run without --attach. Or you can refuse to run at all if stdout isn't a TTY, to make sure the user is watching and able to confirm.) If you don't need confirmation, a five-second pause isn't really adding much safety. Nothing wrong with keeping it as an extra measure, but it only gets you so much.

In T341553#10358853, @RLazarus wrote:

I wouldn't be surprised if it works under mwscript-k8s with --attach rather than --follow

It works...kind of. --attach has a tendency of eating the first few lines of input. For scripts like shell.php, this is likely fine (the most important part is the prompt, which you get eventually). Generally, the first few lines may contain important information. As of now, I would be afraid to use --attach unless I know for a fact that script is prompt-based.

Practically speaking, --attach currently eats the guidance, leaving only the number:

ℹ️ Expecting a prompt but don't see it? Due to a race condition, the beginning of the output might be missing. Try passing your input.
📜 Attached to stdin/stdout:
0
Done! Deleted 0 rows.

I'd say that right now, it's even more confusing than --follow, unfortunately. Related question: Do we plan to fix the race conditions related to --attach?

But separately, from a safety/UI standpoint I'd be careful with this; you don't know if the user is even looking. (As ever, their wifi might have hung, or they might have looked away to take a sip of their tea -- but now additionally they might not be looking at the logs until after the fact. Or if they're streaming with kubectl logs -f but it gets behind for whatever reason, it will asynchronously catch up, maybe too late for the countdown to be meaningful.)

Good point, this kind of safety was acceptable when userOptions.php was running locally (and the guidance appeared pretty much immediately). In the k8s model, we do not have those guarantees anymore.

If you really need confirmation, safer to say are you sure? y/n and wait for confirmation. (You can skip this step if stdout isn't a TTY, so the script can be run without --attach.

This would be great to have. We would need to think about how it would work like with eg. foreachwiki and the like, and it's certainly not appropriate for all scripts, but giving the script an opportunity to say "I am being asked to do something dangerous, is the user certain they want to proceed" seems nice. That would make the "--attach eats output" problem from above more important. It would also mean the user would need to "know" what script needs --attach and what script doesn't. I'm not sure if those problems have a nice solution though.

Or you can refuse to run at all if stdout isn't a TTY, to make sure the user is watching and able to confirm.) If you don't need confirmation, a five-second pause isn't really adding much safety.

The five second pause has the advantage of (not) working equally well regardless of what wikis the script is executed for. The script doesn't have a way to know the user already executed the same dangerous command for dozens of other wikis. This information is kept in foreachwiki (or user-written bash loops). When using foreachwiki, the pause makes the execution slower, but it otherwise doesn't harm anything. Requiring the user to individually confirm execution for every single wiki would be less than nice (as opposed to confirming the execution for all wikis at once, shortly before it starts executing).

Maybe we need to add a sense of "dangerousness" to mwscript-k8s itself, which would allow mwscript-k8s itself to ask for confirmation (rather than userOptions.php)? That would remove a bit of flexibility (developers wouldn't be able to add such confirmation to new scripts), but maybe that is an acceptable tradeoff?

In T341553#10358961, @Urbanecm_WMF wrote:

Do we plan to fix the race conditions related to --attach?

Unfortunately the bug is in the underlying kubernetes tooling (https://github.com/kubernetes/kubernetes/issues/27264) and we can't do much about it until it's fixed upstream.

Our only workarounds in the meantime are to cheat the race condition, with a long sleep at the beginning of the script before printing anything (annoying, for obvious reasons) or a ready? y/n prompt at the beginning of the script before printing anything else (if it gets a blank line, repeat the prompt so that you can safely mash enter, just like shell.php -- also annoying, also for obvious reasons).

Maybe we need to add a sense of "dangerousness" to mwscript-k8s itself, which would allow mwscript-k8s itself to ask for confirmation (rather than userOptions.php)? That would remove a bit of flexibility (developers wouldn't be able to add such confirmation to new scripts), but maybe that is an acceptable tradeoff?

IMO the real downside here is that mwscript-k8s doesn't know what each script does, which means it can't give a meaningful warning. For example, it can't say

The script is about to delete 'placeholder-for-Txxxxx' option for ALL USERS from user_properties table. This action is IRREVERSIBLE.

It can only say something like

You typed "userOptions.php --wiki=enwiki --old=placeholder --delete 'placeholder-for-Txxxxx'", is that what you meant?

and that doesn't add much value. Within the script is probably a better scope for this -- and you're right that it needs to come with an --i-solemnly-swear-that-i-know-what-i-am-doing option to disable it for use in loops.

RLazarus mentioned this in T380925: Support passing env variables to maintenance scripts in mwscript-k8s.Nov 26 2024, 9:26 PM

Usually, I find the output of kubectl get job unhelpful. For example, for T379146, I just executed userOptions.php with various arguments. When I request the list of jobs I executed, I get this:

[urbanecm@deploy2002 ~]$ kubectl get job -l username=urbanecm -L script
NAME                       COMPLETIONS   DURATION   AGE     SCRIPT
mw-script.codfw.0gisvqdt   1/1           10s        96s     userOptions.php
mw-script.codfw.22qey0zx   1/1           40m        6d15h   revalidateLinkRecommendations.php
mw-script.codfw.25l24055   1/1           11s        2m40s   userOptions.php
mw-script.codfw.326ye86f   0/1           23h        23h     userOptions.php
mw-script.codfw.388v9v09   1/1           10s        23h     userOptions.php
mw-script.codfw.4hzzmsrm   0/1           32d        32d     shell.php
mw-script.codfw.4mbi5rkn   1/1           10s        2m51s   userOptions.php
mw-script.codfw.5i5vkw76   1/1           10s        5m15s   userOptions.php
mw-script.codfw.5kwfahs8   1/1           10s        5m41s   userOptions.php
mw-script.codfw.5yr4wp5d   1/1           11s        23h     userOptions.php
mw-script.codfw.89twb9q6   1/1           11s        2m48s   userOptions.php
mw-script.codfw.9kypqqfp   0/1           21h        21h     userOptions.php
mw-script.codfw.9zn48ycb   0/1           23h        23h     userOptions.php
mw-script.codfw.c1p1p0b0   1/1           11s        2m44s   userOptions.php
mw-script.codfw.fl6tgju4   0/1           6d15h      6d15h   revalidateLinkRecommendations.php
mw-script.codfw.gz0ud0fr   1/1           11s        4m43s   userOptions.php
mw-script.codfw.icoz9fry   1/1           11s        20h     userOptions.php
mw-script.codfw.koxdhto9   1/1           9s         23h     userOptions.php
mw-script.codfw.kpafeq64   1/1           10s        5m3s    userOptions.php
mw-script.codfw.l34m31n7   1/1           19m        23h     userOptions.php
mw-script.codfw.m0fqt9ft   1/1           10s        23h     userOptions.php
mw-script.codfw.m6cy2k6v   1/1           10s        70s     userOptions.php
mw-script.codfw.pz1apewl   0/1           21h        21h     userOptions.php
mw-script.codfw.tme6p6a1   0/1           32d        32d     shell.php
mw-script.codfw.vcdsr8hf   1/1           10s        21h     userOptions.php
mw-script.codfw.we1yja7e   1/1           10s        5m22s   userOptions.php
mw-script.codfw.wgqvieqm   1/1           10s        4m31s   userOptions.php
mw-script.codfw.wrfw7ig3   1/1           10s        23h     userOptions.php
[urbanecm@deploy2002 ~]$

If I wanted to find details about the last frwiktionary run, I am lost. In theory, I can examine all 24 userOptions.php executions. However, even if I do that, I can't answer my question, because kubectl logs only prints the output, but not the arguments:

[urbanecm@deploy2002 ~]$ kubectl logs job/mw-script.codfw.0gisvqdt mediawiki-0gisvqdt-app
The script is about to delete 'growthexperiments-homepage-variant' option for ALL USERS from user_properties table.
This action is IRREVERSIBLE.

Abort with control-c in the next five seconds....0
Done! Deleted 1 rows.
[urbanecm@deploy2002 ~]$

The only place where I found that information is in kubectl describe (which provides a ton of other information).

@RLazarus Do you think it would be possible to add more details to the list of jobs?

(On a slightly unrelated note, in what orders are the jobs displayed? It would be useful to order them by their age, but that doesn't seem to be the case)

Also...

mw-script.codfw.4hzzmsrm 0/1 32d 32d shell.php

I apparently managed to create a hanging shell.php job that is waiting for someone's input for over a month? That feels like something that shouldn't happen.

In T341553#10362280, @Urbanecm_WMF wrote:

Usually, I find the output of kubectl get job unhelpful. For example, for T379146, I just executed userOptions.php with various arguments. When I request the list of jobs I executed, I get this:
...
@RLazarus Do you think it would be possible to add more details to the list of jobs?

You can use custom columns output to gather more info, for instance:

kubectl get job -l username=urbanecm \
-o custom-columns=NAME:.metadata.name,CREATED:.metadata.creationTimestamp,COMPLETED:.status.completionTime,SCRIPT:.metadata.labels.script,ARGS:.spec.template.spec.containers[0].args \
--sort-by=.status.completionTime
NAME                       CREATED                COMPLETED              SCRIPT                              ARGS
mw-script.codfw.pz1apewl   2024-11-26T18:28:30Z   <none>                 userOptions.php                     [/srv/mediawiki/multiversion/MWScript.php userOptions.php --wiki=enwiki --old=placeholder placeholder-for-Txxxxx]
mw-script.codfw.4hzzmsrm   2024-10-25T17:50:20Z   <none>                 shell.php                           [/srv/mediawiki/multiversion/MWScript.php shell.php --wiki=enwiki]
mw-script.codfw.326ye86f   2024-11-26T16:40:16Z   <none>                 userOptions.php                     [/srv/mediawiki/multiversion/MWScript.php userOptions.php --wiki=enwiki --old=control --delete growthexperiments-homepage-variant --quick]
mw-script.codfw.9zn48ycb   2024-11-26T16:21:03Z   <none>                 userOptions.php                     [/srv/mediawiki/multiversion/MWScript.php userOptions.php --wiki=enwiki --help]
mw-script.codfw.fl6tgju4   2024-11-21T00:14:42Z   <none>                 revalidateLinkRecommendations.php   [/srv/mediawiki/multiversion/MWScript.php GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose]
mw-script.codfw.9kypqqfp   2024-11-26T18:28:45Z   <none>                 userOptions.php                     [/srv/mediawiki/multiversion/MWScript.php userOptions.php --wiki=enwiki --old=placeholder placeholder-for-Txxxxx]
mw-script.codfw.22qey0zx   2024-11-21T00:15:03Z   2024-11-21T00:55:07Z   revalidateLinkRecommendations.php   [/srv/mediawiki/multiversion/MWScript.php extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose]
mw-script.codfw.koxdhto9   2024-11-26T16:21:39Z   2024-11-26T16:21:48Z   userOptions.php                     [/srv/mediawiki/multiversion/MWScript.php userOptions.php --wiki=enwiki --old=control --delete]

Unfortunately some of the output from the default can't be replicated that way, or at least I don't know how (duration, age and completion are constructed fields from the default printer).

(On a slightly unrelated note, in what orders are the jobs displayed? It would be useful to order them by their age, but that doesn't seem to be the case)

In whatever order the API decides to send them iirc. You can use --sort-by to sort by creation time --sort-by=.metadata.creationTimestamp, by completion time --sort-by=.status.completionTime...

You can get the fields names by running kubectl get job myjob -o yaml and construct your output and sort from there.

Thanks @Clement_Goubert! Yeah, --sort-by=.metadata.creationTimestamp is my go-to for ordering.

For pulling out specific details like args, there are two approaches; one is -o custom-columns as above; the other is -o json piped into a tool like jq. Now, instead of a dizzying array of powerful-but-inscrutable options to kubectl, you have a dizzying array of powerful-but-inscrutable filters for jq. Example:

rzl@deploy2002:~$ kubectl get job -l username=urbanecm -o json |
> jq '.items |
> sort_by(.metadata.creationTimestamp)[] |
> {
>   name: .metadata.name,
>   args: (.spec.template.spec.containers[0].args[1:] | join(" "))
> }'

{
  "name": "mw-script.codfw.4hzzmsrm",
  "args": "shell.php --wiki=enwiki"
}
{
  "name": "mw-script.codfw.tme6p6a1",
  "args": "shell.php --wiki=enwiki"
}
{
  "name": "mw-script.codfw.fl6tgju4",
  "args": "GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose"
}
{
  "name": "mw-script.codfw.22qey0zx",
  "args": "extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose"
}
[...]

That command looks like a lot, but you can start with kubectl get job -l username=urbanecm -o json | jq . | less to see the structure of the JSON, and iterate your jq filters from there. I find that's often easier than figuring out what goes into -o custom-columns when I don't have the API spec in front of me. ("Where's the name field again? Oh, there it is under metadata.")

(You can also see I only asked for args[1:] -- argument 0 is always /srv/mediawiki/multiversion/MWScript.php so we might as well drop it.)

This variant (with -r, @tsv, and arrays instead of objects) prints tabulated columns:

rzl@deploy2002:~$ kubectl get job -l username=urbanecm -o json |
> jq -r '.items |
> sort_by(.metadata.creationTimestamp)[] |
> [
>   .metadata.name,
>   (.spec.template.spec.containers[0].args[1:] | join(" "))
> ] | 
> @tsv'

mw-script.codfw.4hzzmsrm	shell.php --wiki=enwiki
mw-script.codfw.tme6p6a1	shell.php --wiki=enwiki
mw-script.codfw.fl6tgju4	GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose
mw-script.codfw.22qey0zx	extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose
[...]

Without using jq, there's also a lot of useful stuff in the reference docs for kubectl get.

(On a slightly unrelated note, in what orders are the jobs displayed? It would be useful to order them by their age, but that doesn't seem to be the case)

The default ordering is by name -- which in our case is randomly generated. Not very helpful, but that's what it is.

I apparently managed to create a hanging shell.php job that is waiting for someone's input for over a month? That feels like something that shouldn't happen.

There's a delicate balance there -- despite the accepted wisdom, some maintenance jobs are intentionally long-running and I don't want to kill them. It's also difficult to tell from the outside when a maintenance script is still making progress and when it isn't. The initial (very conservative) strategy is just to not do any cleanup at all on running scripts, but eventually we'll probably add some kind of comically long deadline so they don't live forever.

Thanks both! This is helpful, but it will take some time to get accustomed with.

In T341553#10362865, @RLazarus wrote:

I apparently managed to create a hanging shell.php job that is waiting for someone's input for over a month? That feels like something that shouldn't happen.

There's a delicate balance there -- despite the accepted wisdom, some maintenance jobs are intentionally long-running and I don't want to kill them. It's also difficult to tell from the outside when a maintenance script is still making progress and when it isn't. The initial (very conservative) strategy is just to not do any cleanup at all on running scripts, but eventually we'll probably add some kind of comically long deadline so they don't live forever.

I understand those challenges. I think the problem I was hinting at is that I was completely unaware a script of mine was running for that long. What about periodically notifying the owner their script is still running via email, maybe daily or weekly? For intentional long-running script, we might have an option to silence those emails (ideally, that'd be also an option one could apply to an already running script). If we had such notifications, then I would at least find out about this.

Can't do HTTP request.

$ mwscript-k8s -f -- extensions/WikimediaMaintenance/addWiki.php --wiki=idwikivoyage --allow-existing
...
Got no data from https://meta.wikimedia.org/w/api.php
...

$ mwscript-k8s --attach -- eval.php --wiki=idwikivoyage
...
> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json';

> $json = MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->get( $url, [], 'foo' );

> print strlen($json);
0
> ^D

$ mwscript eval.php --wiki=idwikivoyage
...
> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json';

> $json = MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->get( $url, [], 'foo' );

> print strlen($json);
157378

Logstash says:

GET https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json HTTP/1.1 - NULL cURL error 7: Failed to connect to meta.wikimedia.org port 443: Connection timed out (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json

In T341553#10369931, @tstarling wrote:

Can't do HTTP request.

$ mwscript-k8s -f -- extensions/WikimediaMaintenance/addWiki.php --wiki=idwikivoyage --allow-existing
...
Got no data from https://meta.wikimedia.org/w/api.php
...

$ mwscript-k8s --attach -- eval.php --wiki=idwikivoyage
...
> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json';

> $json = MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->get( $url, [], 'foo' );

> print strlen($json);
0
> ^D

$ mwscript eval.php --wiki=idwikivoyage
...
> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json';

> $json = MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->get( $url, [], 'foo' );

> print strlen($json);
157378

Logstash says:

GET https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json HTTP/1.1 - NULL cURL error 7: Failed to connect to meta.wikimedia.org port 443: Connection timed out (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json

I guess you'd need to use https://wikitech.wikimedia.org/wiki/Url-downloader as a proxy config with HttpRequestFactory:

> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json'
> $request =  MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->create( $url, [ 'proxy' => $wmgLocalServices['urldownloader'] ] )
> $request->execute();
> $request->getContent()

I'd somehow expect https://www.mediawiki.org/wiki/Manual:$wgLocalHTTPProxy (set to http://localhost:6501) to kick in, which would avoid the need for a proxy. Also see changes made in T288848 previously.

In T341553#10370264, @kostajh wrote:

In T341553#10369931, @tstarling wrote:

Can't do HTTP request.

$ mwscript-k8s -f -- extensions/WikimediaMaintenance/addWiki.php --wiki=idwikivoyage --allow-existing
...
Got no data from https://meta.wikimedia.org/w/api.php
...

$ mwscript-k8s --attach -- eval.php --wiki=idwikivoyage
...
> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json';

> $json = MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->get( $url, [], 'foo' );

> print strlen($json);
0
> ^D

$ mwscript eval.php --wiki=idwikivoyage
...
> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json';

> $json = MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->get( $url, [], 'foo' );

> print strlen($json);
157378

Logstash says:

GET https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json HTTP/1.1 - NULL cURL error 7: Failed to connect to meta.wikimedia.org port 443: Connection timed out (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json

I guess you'd need to use https://wikitech.wikimedia.org/wiki/Url-downloader as a proxy config with HttpRequestFactory:

> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json'
> $request =  MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->create( $url, [ 'proxy' => $wmgLocalServices['urldownloader'] ] )
> $request->execute();
> $request->getContent()

On k8s, we've generally used wgLocalHTTPProxy set to the api cluster, which should redirect any request to our public URLs to an internal request to mwapi, which is running on localhost:6501. This was only added, as far as I can tell, to HttpRequestFactory::createMultiClient when it was added.

So for instance:

$ mwscript-k8s --attach -- eval.php --wiki=idwikivoyage

> $cl =  MediaWiki\MediaWikiServices::getInstance()->getHttpRequestFactory()->createMultiClient( [] );
> $url = 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json';
>  $cl->run(['url' => $url, 'method' => 'GET']);
> $resp = $cl->run(['url' => $url, 'method' => 'GET']);
> print_r(strlen($resp["body"]));
157378
>

Joe mentioned this in T381251: MW maintenance scripts on k8s can't do internal HTTP requests.Dec 2 2024, 10:36 AM

So @tstarling please use MWHttpRequest or MultiHttpClient for the time being, I've opened T381251 to fix the underlying issue.

The actual problem is MWHttpRequest not using $wgLocalHTTPProxy (but rather $wgHTTPProxy) when it detects being called from a maintenance script.

In T341553#10370811, @Tgr wrote:

The actual problem is MWHttpRequest not using $wgLocalHTTPProxy (but rather $wgHTTPProxy) when it detects being called from a maintenance script.

Indeed, I got fooled by an error message when doing my tests, I assumed ::get was using ::createGuzzleClient, which indeed doesn't support the local proxying.

As for why the local proxy is disabled from cli, I find it surprising to be honest. I'd expect that to be switchable on/off in configuration rather than being hardcoded here.

In T341553#10370264, @kostajh wrote:

I guess you'd need to use https://wikitech.wikimedia.org/wiki/Url-downloader as a proxy config with HttpRequestFactory:

FYI, https://wikitech.wikimedia.org/w/index.php?title=Url-downloader&diff=prev&oldid=2249943

As for why the local proxy is disabled from cli, I find it surprising to be honest.

It was added in 2005 so presumably one of those things that made sense originally but never got reevaluated as the infrastructure grew.

The original feature was for Apache servers to make requests to localhost, avoiding the overhead of making a request over the network via the CDN. Maintenance servers didn't have Apache on localhost so the feature didn't work for them. It was always possible to contact Wikimedia public IP addresses from the private network, so it was just for performance.

In T341553#10370811, @Tgr wrote:

The actual problem is MWHttpRequest not using $wgLocalHTTPProxy (but rather $wgHTTPProxy) when it detects being called from a maintenance script.

You can see it there in the doc comment of the method linked there. "Check if the URL can be served by localhost". If you're running via CLI, it can't be served via localhost, so the answer is no.

In T341553#10374541, @tstarling wrote:

You can see it there in the doc comment of the method linked there. "Check if the URL can be served by localhost". If you're running via CLI, it can't be served via localhost, so the answer is no.

Except this hasn't been true, at least in production, for the last 5 years at least; every server including the maintenance hosts has a local instance of the service mesh allowing to talk to the wikis via localhost. We should at the very least add a switch (or let people set $wgLocalHTTPProxy to their liking).

tstarling closed subtask T381251: MW maintenance scripts on k8s can't do internal HTTP requests as Resolved.Dec 5 2024, 11:15 PM

Question: What is the recommended way to run a script in a given datacenter? Is mw-debug-repl the only way to do that, or is there a way to make mwscript-k8s do that?

(Context: Debugging an issue that is only present in one of the DCs)

In T341553#10387487, @Urbanecm_WMF wrote:

Question: What is the recommended way to run a script in a given datacenter?

We considered and rejected this option for mwscript-k8s, both because of how rarely it's desired to run a maintenance script in the read-only DC, and because of how bad it could be to do so inadvertently. If mw-debug-repl can do what you need, it's probably the best choice.

Lucas_Werkmeister_WMDE closed subtask T376604: [PS] Update PropertySuggester update process for mwscript-k8s as Resolved.Dec 9 2024, 2:03 PM

The extensions/Translate/scripts/moveTranslatableBundle.php script asks one to confirm a move by typing some text when prompted. This doesn't seem to work with mwscript-k8s as far as I can tell.

Maintenance_bot removed a project: Patch-For-Review.Jan 15 2025, 7:30 PM

In T341553#10463978, @taavi wrote:

The extensions/Translate/scripts/moveTranslatableBundle.php script asks one to confirm a move by typing some text when prompted. This doesn't seem to work with mwscript-k8s as far as I can tell.

Not even with --attach? I would expect that to work just fine, but I don't think I tried that in practice yet.

• dcausse added a subtask: T382398: Mediawiki maint scripts using service proxied by the tls proxy might fail when running with mwscript-k8s.Jan 29 2025, 11:20 AM

Question: How would I run a maintenance script within a debug pod (within k8s-mwdebug)? Occasionally, I need to do that, for example, when deploying T374348 (to run the migration script that allows me to fully verify the patch works as expected). I can run a repl shell there, but that doesn't allow me to run a maintenance script. Would running mwscript-k8s while scap is waiting on me to confirm the deployment have the desired effect (of running with the not-yet-synced patch)? In the non-k8s world, I used to run the script on the debug server directly, but that will no longer be an option.

In T341553#10510697, @Urbanecm_WMF wrote:

Would running mwscript-k8s while scap is waiting on me to confirm the deployment have the desired effect (of running with the not-yet-synced patch)?

Answering my question: No, it wouldn't; it does not use the latest image if it is only deployed to testservers. As far as I can see, this means I'm actually unable to run a maintenance script at the mwdebug stage. Am I missing something? Can we add that feature?

Reedy mentioned this in T376267: ☂ Wikitech account linking and SUL error reporting.Feb 3 2025, 8:02 PM

pmiazga closed subtask T385818: mwscript-k8s fails when there are parenthesis used in arguments as Invalid.Feb 6 2025, 5:15 PM

Hello! Can we please please find something better to view running jobs? I raised this before via T341553#10362280, and the jq solution is not really appropriate. I happen to run the same script for subset of wikis quite often. Having to come with nearly 200 characters long jq query _each time_ I need to know what wikis it runs on is far from convenient. Even if I type all of it, it's still not clear which one of the scripts is running, which one are successfully finished and which one errored out.

An example from right now:

[urbanecm@deploy2002 ~]$ kubectl get job -l username=urbanecm -o json | jq '.items | sort_by(.metadata.creationTimestamp)[] | {name: .metadata.name,args: (.spec.template.spec.containers[0].args[1:] | join(" "))}'
{
  "name": "mw-script.codfw.mtpyohr2",
  "args": "GrowthExperiments:revalidateLinkRecommendations.php --help"
}
{
  "name": "mw-script.codfw.kcjkgp6h",
  "args": "GrowthExperiments:revalidateLinkRecommendations.php --wiki=frwiki --help"
}
{
  "name": "mw-script.codfw.fs06mfc3",
  "args": "GrowthExperiments:revalidateLinkRecommendations.php --wiki=frwiki --exceptDatasetChecksums=frwiki-checksum.txt --deleteNullRecommendations"
}
{
  "name": "mw-script.codfw.znnzgj3h",
  "args": "GrowthExperiments:revalidateLinkRecommendations.php --wiki=frwiki --exceptDatasetChecksums=frwiki-checksum.txt --deleteNullRecommendations --verbose"
}
{
  "name": "mw-script.codfw.l9sw45ke",
  "args": "GrowthExperiments:revalidateLinkRecommendations.php --wiki=eswiki --exceptDatasetChecksums=eswiki-checksum.txt --deleteNullRecommendations --verbose"
}
{
  "name": "mw-script.codfw.21emjkol",
  "args": "GrowthExperiments:revalidateLinkRecommendations.php --wiki=eswiki --exceptDatasetChecksums=eswiki-checksum.txt --deleteNullRecommendations --verbose"
}
{
  "name": "mw-script.codfw.tg2xm4dn",
  "args": "GrowthExperiments:revalidateLinkRecommendations.php --wiki=ptwiki --exceptDatasetChecksums=ptwiki-checksum.txt --deleteNullRecommendations --verbose"
}
{
  "name": "mw-script.codfw.4t86vec5",
  "args": "GrowthExperiments:revalidateLinkRecommendations.php --wiki=idwiki --exceptDatasetChecksums=idwiki-checksum.txt --deleteNullRecommendations --verbose"
}
[urbanecm@deploy2002 ~]$

Because I happened to start this a few minutes ago, I happen to know znnzgj3h , 21emjkol, tg2xm4dn and 4t86vec5 are running (for frwiki, eswiki, ptwiki and idwiki respectively). However, this is not something I can get from any of the standard outputs. I could probably write my own jq for this, but typing it down every time I need this information is something I'd very much like to avoid.

Can we extend the kubectl get job output with at least error status, the wiki name and the other arguments? Similar to how -L script already displays the base script name.

In T341553#10574262, @Urbanecm_WMF wrote:

Hello! Can we please please find something better to view running jobs?

Moved this to T387268.

RLazarus closed subtask T367200: mw-script fails to render in CI as Resolved.Mar 11 2025, 10:29 PM

Clement_Goubert added a subtask: T389484: Create a mediawiki-cli image.Mar 20 2025, 12:34 PM

Clement_Goubert closed subtask T389484: Create a mediawiki-cli image as Resolved.Mar 20 2025, 3:52 PM

RLazarus closed subtask T378429: Allow members of restricted to run maintenance scripts as Resolved.May 6 2025, 2:46 AM

Change #1144668 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] scap: Loud deprecation warning for mwscript, now officially unsupported

https://gerrit.wikimedia.org/r/1144668

gerritbot added a project: Patch-For-Review.May 12 2025, 10:46 PM

RLazarus closed subtask T380925: Support passing env variables to maintenance scripts in mwscript-k8s as Resolved.May 13 2025, 1:39 AM

Change #1144668 merged by RLazarus:

[operations/puppet@production] scap: Loud deprecation warning for mwscript, now officially unsupported

https://gerrit.wikimedia.org/r/1144668

• JMeybohm reopened subtask T378429: Allow members of restricted to run maintenance scripts as Open.May 13 2025, 3:54 PM

RLazarus added a subtask: T394534: Visualise mw-script jobs .May 16 2025, 5:30 PM

Scott_French mentioned this in T394556: Clean up UcfirstOverrides.php following PHP 7.4 -> 8.1 transition.May 16 2025, 9:28 PM

RLazarus closed subtask T378479: Allow using helper scripts inside of mwscript-k8s as Resolved.May 21 2025, 2:22 AM

• JMeybohm closed subtask T378429: Allow members of restricted to run maintenance scripts as Resolved.May 22 2025, 1:20 PM

I would like to point out that as long as there is no solution for T379675 at least I am dependent on mwmaint and ask that there is a solution for persistent disk writing before mwmaint is finally shut down.

@Zabe - There will be an additional announcement soon, but similar to the guidance around other not-yet-supported use cases like sql.php in this wikitech-l thread, the interim solution is likely to involve moving your mwscript usage to the active deployment host (i.e., deployment.eqiad.wmnet) instead of the soon-to-be-decommissioned mwmaint* hosts.

Maintenance_bot removed a project: Patch-For-Review.Jun 3 2025, 10:30 PM

Change #1152820 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] scap: block interactive maintenance scripts on mwmaint

https://gerrit.wikimedia.org/r/1152820

gerritbot added a project: Patch-For-Review.Jun 5 2025, 3:23 PM

Change #1152820 merged by Scott French:

[operations/puppet@production] scap: block interactive maintenance scripts on mwmaint

https://gerrit.wikimedia.org/r/1152820

I'm trying to migrate the following idiom from mwmaint (using mwscript or foreachwikiindblist) to the deployment host (using mwscript-k8s).

$ foreachwikiindblist s1 mysql.php -- -e 'SELECT page_id FROM page WHERE page_namespace=8 AND page_title="Editsource" LIMIT 1';

Noting that mysql.php requires use of -- to separate two sets of CLI arguments.

What I've tried:

$ mwscript-k8s --comment="Testing new foreachwiki sql T341553" --dblist="s1" --follow -- mysql.php -- -e 'SELECT page_id FROM page WHERE page_namespace=8 AND page_title="Editsource" LIMIT 1';

Which fails as follows

⏳ Starting mysql.php on Kubernetes as job mw-script.eqiad.bjq93qlm ...
🚀 Job is running.
📜 Streaming logs:
mysql.php: Running on s1
-----------------------------------------------------------------
enwiki
-----------------------------------------------------------------
enwiki
enwiki ERROR: Unexpected option --e!
enwiki

It looks like along the way somewhere, -- -e x turned into --e.

EDIT: As a workaround, one can use sql.php --query instead of mysql.php -- -e. This has the downside of using quite awkward output formatting, but at least it works.

$ mwscript-k8s --comment="Testing new foreachwiki sql T341553" --dblist="open" --follow -- sql.php --query 'SELECT page_id FROM page WHERE page_namespace=8 AND page_title="Sitenotice" AND page_len < 10 LIMIT 1';
…
bnwikivoyage stdClass Object
bnwikivoyage (
bnwikivoyage     [page_id] => 1287
bnwikivoyage )
bnwikivoyage
…

Maintenance_bot removed a project: Patch-For-Review.Jun 5 2025, 8:30 PM

	F57750031: jcQj.webm
	Nov 26 2024, 4:39 PM

Allow running one-off scripts manuallyOpen, Needs TriagePublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Allow running one-off scripts manually
Open, Needs TriagePublic
Actions

Related Objects
Search...