extremetech[.]com/extreme/338403-what-video-ai-upscalers-can-and-cant-do
1st off this is a Long read, has some info that should be useful if you're thinking about trying AI video enlarging, But, treat it as one individual reporting their experiences, not as some sort of How To or guide. I don't mean to shortchange the amount of work the author's done, but from his workflow it seems his background in image & video editing and manipulation might be somewhat limited...
AI video enlarging has different models available that determine what the results look like, and one of its more tedious aspects is trying to find the best model for the video you're working with. And that best model may vary from one scene to the next. The obvious solution is to split the video into separate clips, enlarging scenes separately, then putting the clips back together in your video editor. While the problem of different models for different scenes is addressed, processing clips individually doesn't seem to be.
There's also not much of any discussion of video formats. The author talks about his saving enlarged video as individual images, and while that can work, IMHO it's definitely Not the way to do it. I think it's Much more efficient to use a lossless codec like Ut Video -- there are several -- though in fairness he does mention Apple's ProRes. Another aspect of codecs should also be mentioned -- working with video, software must decode the source, and for that the source format really does matter. Unless the source is using an editing format, like the mentioned UT video, some software may add &/or drop frames [both are possible in the same video (!)] so audio sync is almost impossible. You need to test your complete workflow, looking for this sort of problem before you start a project, so you can for example convert the video to an editing friendly format first if you need to. This part of things can get pretty involved, so be prepared to research.
One other omission that I think should be addressed concerns the author's use & mention of AviSynth & VapourSynth... They're both *frame servers*. If an app cannot open a certain video, you can use a compatible frame server instead -- the app requests the video from the frame server, which feeds it frame by frame to the app. AviSynth & VapourSynth do more than just feed the video however -- both can be scripted to perform all sorts of manipulations. There's a good comparison of the 2 here: video.stackexchange[.]com/questions/28548/avisynth-vs-avisynth-vs-vapoursynth-which-one-should-i-choose
You'll have to do some research too on these frame servers, and optionally which GUI to use with these command line apps. Google on AviSynth, AviSynth+, &/or VapourSynth and the video editor you want to use for directions on how to get video output into your editing app. If this all seems rather daunting, search the software at videohelp[.]com for VirtualDub... Like AviSynth, VirtualDub was born a Loooong time ago, when people like me were recording broadcast TV and saving it to Real & Windows Media formats, and sometimes VCDs. Like AviSynth, coders have made efforts to update the old software, with a version of VirtualDub as recent as 2020. Because VirtualDub & AviSynth were both designed for generally the same tasks, working with & cleaning up broadcast video recordings, their filters had lots of overlap, and you might well get the results you need from VirtualDub without the hassles of learning AviSynth or Vapoursynth.