# Metric depth video toolbox - Usage examples This guide contains a walkthrogh of how to use the tools in the metric depth video toolbox. ## Part one: generating metric depth video Select a video to work with. This should be a clip, preferably less than 6-7 minutes long (due to GPU memmory usage), and there should not be any cuts in the video. The video should preferably have the same zoom level over the hole clip. Due to GPU memmory constraints in Video-Depth-Anything the aspect ratio is best keept under 16:9. If you want to convert an entire movie split it up and do it scene by scene. There are tools that can cut down a movie to its scenes automatically. I will use [in_office_720p.mp4](https://github.com/calledit/metric_depth_video_toolbox/releases/download/ExampleFiles/in_office_720p.mp4) with two individuals walking in a hallway obtained from pexels.com video_in ### Step 0 On your machine Install metric depth video toolbox, see main README. ### Step 1 Generate a metric depth video from the source video ``` python video_metric_convert.py --color_video ~/in_office_720p.mp4 the result is a metric 3d video file called ~/in_office_720p.mp4_depth.mkv ``` depth ### Step 1.5 View result in 3D: ``` python3.11 3d_view_depthfile.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv --yfov 40 ```` ## Part two: generating rescaled metric depth video, camera tracking data and points clouds _You can skip step 2 - 6 if you just want a basic 3D stereo video._ ### Step 2 Generate a mask video from the source video ``` ./create_video_mask.sh -install ./create_video_mask.sh ~/in_office_720p.mp4 the result is a black and white mask video ~/in_office_720p.mp4_mask.mkv ``` mask ### Step 3 Generate tracking points from the source video, more iterations = more points steps_bewtwen_track_init is the numer of frames betwen initation of new tracking points. ``` python track_points_in_video.py --color_video ~/in_office_720p.mp4 --nr_iterations 4 --steps_bewtwen_track_init 30 the result is a tracking file called ~/in_office_720p.mp4_tracking.json ``` Visualised here as tiny dots in the images: tracking_on_video ### Step 4 Generate camera transformations from the depth and the source video. We make a guess of 30-50 deg and chose 40 deg. Later analysis showed that the real FOV is something like 42 deg. See [RECOVER_FOV.md](RECOVER_FOV.md) for more info on recovering the FOV of a video. If the video has paralax you can run sam_track_video.py with --optimize_intrinsic and it will give you a accurate FOV. ``` ./install_mdvtoolbox.sh -megasam #takes a long time to install python sam_track_video.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv --yfov 40 The result is a transformations file ~/in_office_720p.mp4_depth.mkv_transformations.json and two debug videos file called _megasam.mkv ``` ### Step 5 Triangulate points to get acurate depth readings and realigin the metric depth video to fit the more accurate depth readings. ``` python3.11 convert_metric_depth_video_to_other_format.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv --yfov 40 --transformation_file ~/in_office_720p.mp4_depth.mkv_transformations.json --track_file ~/in_office_720p.mp4_tracking.json --mask_video ~/in_office_720p.mp4_mask.mkv --show_scene_point_clouds --use_triangulated_points --tringulation_min_observations 20 --save_rescaled_depth --show_both_point_clouds --global_align The result is a rescaled depth video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv And two .ply files with point cloud data for the scene. One ply file with tirangualted points and one with averages of the depth map called in_office_720p.mp4_depth.mkv_avgmonodepth.ply, in_office_720p.mp4_depth.mkv_triangulated.ply. You can run the script again with the new _rescled.mkv file to get a rescaled version of the _avgmonodepth.ply file. python3.11 convert_metric_depth_video_to_other_format.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv --yfov 40 --transformation_file ~/in_office_720p.mp4_depth.mkv_transformations.json --track_file ~/in_office_720p.mp4_tracking.json --mask_video ~/in_office_720p.mp4_mask.mkv --show_scene_point_clouds ``` ### Step 6 View the result where the two subjects are walking throgh a point cloud. Camera movment has been canceled out, edges removed, a background .ply file inserted and we have added visulisation for the camera view-frustrum. Finally we use the mask video to mask out the bakground so we only see the point cloud. ``` python3.11 3d_view_depthfile.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv --yfov 40 --transformation_file ~/in_office_720p.mp4_depth.mkv_transformations.json --background_ply ~/in_office_720p.mp4_depth.mkv_avgmonodepth.ply --remove_edges --show_camera --x -0.1 --y 0 --z -3 --mask_video ~/in_office_720p.mp4_mask.mkv --invert_mask --background_ply ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_avgmonodepth.ply ``` in_the_clouds ## Part three: generating side by side stereo video Now that we have our depth video we can create stereo video. _Technically you dont need to do step 2 - 6, the end result will end up slightly better if you do them since you can the use the rescaled depth instead of the raw depth from the monocular depth model._ ### Step 7 This renders one frame for the right eye then one for the left you can alter the pupillary distance with --pupillary_distance if you want, but the default of 63 mm is more or less industry standard and is good enogh for most people. We tell stereo_rerender.py to remove all edges as we will use infill to fill them in later, and we add a argument to add create a infill_mask file. If you dont want to add infill, just skip the last two arguments. ``` python3.11 stereo_rerender.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv --yfov 40 --infill_mask --remove_edges ``` Raw side by side stero (black where there is paralax): sbs Side by side stero infill mask. Black where there is no infill needed and the normal of the projected edge where infill is needed: (From the normal finding the lower and higher side if the edge is trivial see mark_lower_side() in [stereo_crafter_infill.py](stereo_crafter_infill.py)) (example image is from a diffrent frame) normals_infill_format ### Step 8 Here we will use ML to add paralax infill using the tool stereo_crafter_infill.py Stereocrafter is based on stable defusion so is slow, be patient. ``` ./install_mdvtoolbox.sh -stereocrafter #downloads and installs stereocrafter in the right folder python3.11 stereo_crafter_infill.py --sbs_color_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_stereo.mkv --sbs_mask_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_infillmask.mkv ``` The result will be a video file named: ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_infilled.mkv As is visible in the image below, stereocrafter does a pretty good job. If you look hard enogh you will find discrepancies. These discrepancies are however not that bad and since the focus tends to be on things that are not the infilled areas a viewer may not notice them. sbs ### Final step compress and add back audio Here we use ffmpeg to extract the original audio and add it back in the video as well as compressing the large uncompressed video file in to a video format/size that a modern VR headset or other stereo capable device can handle. ``` #Extract audio as a wave file (if you have audio. The example video actually does not have any audio) ffmpeg -i ~/in_office_720p.mp4 ~/in_office_720p.wav #Compress video for viewing on other devices and add back audio ffmpeg -i ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_stereo.mkv_infilled.mkv -i ~/in_office_720p.wav -c:v libx265 -crf 18 -tag:v hvc1 -pix_fmt yuv420p -c:a aac -map 0:v:0 -map 1:a:0 ~/in_office_720p_final_stero.mp4 ```