Initial commit
fbshipit-source-id: da6be2f26e3a1202f4bffde8cb980e2dcb851294
This commit is contained in:
244
scripts/eval/veval/README.md
Normal file
244
scripts/eval/veval/README.md
Normal file
@@ -0,0 +1,244 @@
|
||||
# SA-Co/VEval Dataset
|
||||
**License** each domain has its own License
|
||||
* SA-Co/VEval - SA-V: CC-BY-NC 4.0
|
||||
* SA-Co/VEval - YT-Temporal-1B: CC-BY-NC 4.0
|
||||
* SA-Co/VEval - SmartGlasses: CC-by-4.0
|
||||
|
||||
**SA-Co/VEval** is an evaluation dataset comprising of 3 domains, each domain has a val and test split.
|
||||
* SA-Co/VEval - SA-V: videos are from the [SA-V dataset](https://ai.meta.com/datasets/segment-anything-video/)
|
||||
* SA-Co/VEval - YT-Temporal-1B: videos are from the [YT-Temporal-1B](https://cove.thecvf.com/datasets/704)
|
||||
* SA-Co/VEval - SmartGlasses: egocentric videos from [Smart Glasses](https://huggingface.co/datasets/facebook/SACo-VEval/blob/main/media/saco_sg.tar.gz)
|
||||
|
||||
## Environment
|
||||
Install the SA-Co/VEVal required environment
|
||||
```
|
||||
pip install -e ".[veval]"
|
||||
```
|
||||
This will allow us to run:
|
||||
* `scripts/eval/veval/saco_yt1b_downloader.py` preparing frames for SA-Co/VEval - YT-Temporal-1B
|
||||
* `examples/saco_veval_eval_example.ipynb` example of running an offline evaluator
|
||||
* `examples/saco_veval_vis_example.ipynb` example of loading and visualizing the data
|
||||
|
||||
## Download
|
||||
### The expected folder structure
|
||||
The following folder structure is expected after finishing all the download and pre-processing steps in this section
|
||||
```
|
||||
data/
|
||||
├── annotation/
|
||||
│ ├── saco_veval_sav_test.json
|
||||
│ ├── saco_veval_sav_val.json
|
||||
│ ├── saco_veval_smartglasses_test.json
|
||||
│ ├── saco_veval_smartglasses_val.json
|
||||
│ ├── saco_veval_yt1b_test.json
|
||||
│ ├── saco_veval_yt1b_val.json
|
||||
└── media/
|
||||
├── saco_sav
|
||||
│ └── JPEGImages_24fps
|
||||
├── saco_sg
|
||||
│ └── JPEGImages_6fps
|
||||
└── saco_yt1b
|
||||
└── JPEGImages_6fps
|
||||
```
|
||||
### Download ready-to-use data
|
||||
The following links provide ready-to-use data, hosted on Roboflow, after completing the pre-processing steps outlined in the next section.
|
||||
|
||||
For each domain:
|
||||
- [SA-Co/VEval - SA-V](https://universe.roboflow.com/sa-co-veval/sa-v-test/)
|
||||
- [SA-Co/VEval - YT-Temporal-1B](https://universe.roboflow.com/sa-co-veval/yt-temporal-1b-test/)
|
||||
- [SA-Co/VEval - SmartGlasses](https://universe.roboflow.com/sa-co-veval/smartglasses-test/)
|
||||
|
||||
For all three domains:
|
||||
- [SA-Co/VEval](https://universe.roboflow.com/sa-co-veval)
|
||||
|
||||
### Download via preprocessing steps
|
||||
#### Download annotations
|
||||
The GT annotations are available at Hugging Face:
|
||||
* [SA-Co/VEval](https://huggingface.co/datasets/facebook/SACo-VEval/tree/main)
|
||||
* SA-Co/VEval SA-V
|
||||
* Test: `annotation/saco_veval_sav_test.json`
|
||||
* Val: `annotation/saco_veval_sav_val.json`
|
||||
* SA-Co/VEval YT-Temporal-1B
|
||||
* Test: `annotation/saco_veval_yt1b_test.json`
|
||||
* Val: `annotation/saco_veval_yt1b_val.json`
|
||||
* SA-Co/VEval SmartGlasses
|
||||
* Test: `annotation/saco_veval_smartglasses_test.json`
|
||||
* Val: `annotation/saco_veval_smartglasses_val.json`
|
||||
|
||||
#### Download videos or frames
|
||||
##### SA-Co/VEval - SAV
|
||||
Follow instructions in [SA-V dataset](https://ai.meta.com/datasets/segment-anything-video/). Only the following two datasets are needed:
|
||||
* sav_test.tar
|
||||
* sav_val.tar
|
||||
|
||||
After untar:
|
||||
```
|
||||
sav_test/
|
||||
├── Annotations_6fps [ignore this is the SAM 2 annotation]
|
||||
├── JPEGImages_24fps
|
||||
sav_val/
|
||||
├── Annotations_6fps [ignore this is the SAM 2 annotation]
|
||||
└── JPEGImages_24fps
|
||||
```
|
||||
Then merge the two JPEGImages_24fps together to better match our annotation json file path e.g.
|
||||
```
|
||||
media/
|
||||
└── saco_sav
|
||||
└── JPEGImages_24fps [merged from the two JPEGImages_24fps above]
|
||||
```
|
||||
Example commands to download and merge folders
|
||||
```
|
||||
cd ../data/media/saco_sav
|
||||
wget -O sav_test.tar <sav_test.tar download link from the SA-V dataset page>
|
||||
wget -O sav_val.tar <sav_val.tar download link from the SA-V dataset page>
|
||||
tar -xf sav_test.tar
|
||||
tar -xf sav_val.tar
|
||||
mkdir JPEGImages_24fps
|
||||
chmod -R u+w sav_test/
|
||||
chmod -R u+w sav_val/
|
||||
mv sav_test/JPEGImages_24fps/* JPEGImages_24fps/
|
||||
mv sav_val/JPEGImages_24fps/* JPEGImages_24fps/
|
||||
```
|
||||
|
||||
##### SA-Co/VEval - YT-Temporal-1B
|
||||
Two files are needed to download the SA-Co/VEval - YT-Temporal-1B Youtube videos.
|
||||
* Download `media/yt1b_start_end_time.json` from [SA-Co/VEval](https://huggingface.co/datasets/facebook/SACo-VEval/tree/main), which contains the Youtube video ids and the start and end time used in SA-Co/VEval - YT-Temporal-1B.
|
||||
* Prepare the `cookies.txt` file. Follow instruction in yt-dlp [exporting-youtube-cookies](https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies) and [pass-cookies-to-yt-dlp](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp) to prepare the cookies_file.
|
||||
* Please see the full **WARNINGS** in yt-dlp regarding the risk of Youtube account ban!!
|
||||
|
||||
Then run `scripts/eval/veval/saco_yt1b_downloader.py` to download the videos and prepare the frames e.g.
|
||||
```
|
||||
python saco_yt1b_downloader.py \
|
||||
--data_dir ../data/media/saco_yt1b \
|
||||
--cookies_file ../data/media/saco_yt1b/cookies.txt \
|
||||
--yt1b_start_end_time_file ../data/media/saco_yt1b/yt1b_start_end_time.json \
|
||||
--yt1b_frame_prep_log_file ../data/media/saco_yt1b/yt1b_frame_prep.log
|
||||
```
|
||||
* data_dir: The directoy to download the Youtube videos and store the extraced frames
|
||||
* cookies_file: the `cookies.txt` downloaded above
|
||||
* yt1b_start_end_time_file: the `yt1b_start_end_time.json` downloaded above
|
||||
* yt1b_frame_prep_log_file: a log file to track the video downloading and frame extracting status
|
||||
|
||||
Then run `scripts/eval/veval/saco_yt1b_annot_update.py` to update the annotation based on the video availability e.g.
|
||||
```
|
||||
python saco_yt1b_annot_update.py \
|
||||
--yt1b_media_dir ../data/media/saco_yt1b/JPEGImages_6fps \
|
||||
--yt1b_input_annot_path ../data/annotation/saco_veval_yt1b_val.json \
|
||||
--yt1b_output_annot_path ../data/annotation/saco_veval_yt1b_val_updated.json \
|
||||
--yt1b_annot_update_log_path ../data/annotation/saco_veval_yt1b_val_updated.log
|
||||
```
|
||||
|
||||
**NOTE**:
|
||||
* Not all Youtube videos might be available as Youtube videos can be deleted or become private. The script `saco_yt1b_annot_update.py` is used to remove the annotations of the unavailable videos.
|
||||
* **Frame Shifting Alert!!** Even when the videos are still available, their specifications, such as fps and duration, may differ from those used during annotation when re-downloaded from YouTube. Additionally, sometimes `ffmpeg` seems to find it hard to guarantee consistent frame extraction from the same video across different environments. This may cause the re-downloaded and re-extracted frames to have alignment issues with our annotations due to frame shifting. Please be aware of this caveat when evaluating on SA-Co/VEval - YT-Temporal-1B.
|
||||
|
||||
##### SA-Co/VEval - SmartGlasses
|
||||
Go to [SACo-VEval](https://huggingface.co/datasets/facebook/SACo-VEval/tree/main) download `media/saco_sg.tar.gz`
|
||||
```
|
||||
cd ../data
|
||||
hf download facebook/SACo-VEval media/saco_sg.tar.gz --repo-type dataset --local-dir .
|
||||
cd ../data/media
|
||||
tar -xzf saco_sg.tar.gz
|
||||
```
|
||||
|
||||
## Annotation Format
|
||||
The format is similar to the [YTVIS](https://youtube-vos.org/dataset/vis/) format.
|
||||
|
||||
In the annotation json, e.g. `saco_veval_sav_test.json` there are 5 fields:
|
||||
* info:
|
||||
* A dict containing the dataset info
|
||||
* E.g. {'version': 'v1', 'date': '2025-09-24', 'description': 'SA-Co/VEval SA-V Test'}
|
||||
* videos
|
||||
* A list of videos that are used in the current annotation json
|
||||
* It contains {id, video_name, file_names, height, width, length}
|
||||
* annotations
|
||||
* A list of **positive** masklets and their related info
|
||||
* It contains {id, segmentations, bboxes, areas, iscrowd, video_id, height, width, category_id, noun_phrase}
|
||||
* video_id should match to the `videos - id` field above
|
||||
* category_id should match to the `categories - id` field below
|
||||
* segmentations is a list of [RLE](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py)
|
||||
* categories
|
||||
* A **globally** used noun phrase id map, which is true across all 3 domains.
|
||||
* It contains {id, name}
|
||||
* name is the noun phrase
|
||||
* video_np_pairs
|
||||
* A list of video-np pairs, including both **positive** and **negative** used in the current annotation json
|
||||
* It contains {id, video_id, category_id, noun_phrase, num_masklets}
|
||||
* video_id should match the `videos - id` above
|
||||
* category_id should match the `categories - id` above
|
||||
* when `num_masklets > 0` it is a positive video-np pair, and the presenting masklets can be found in the annotations field
|
||||
* when `num_masklets = 0` it is a negative video-np pair, meaning no masklet presenting at all
|
||||
```
|
||||
data {
|
||||
"info": info
|
||||
"videos": [video]
|
||||
"annotations": [annotation]
|
||||
"categories": [category]
|
||||
"video_np_pairs": [video_np_pair]
|
||||
}
|
||||
video {
|
||||
"id": int
|
||||
"video_name": str # e.g. sav_000000
|
||||
"file_names": List[str]
|
||||
"height": int
|
||||
"width": width
|
||||
"length": length
|
||||
}
|
||||
annotation {
|
||||
"id": int
|
||||
"segmentations": List[RLE]
|
||||
"bboxes": List[List[int, int, int, int]]
|
||||
"areas": List[int]
|
||||
"iscrowd": int
|
||||
"video_id": str
|
||||
"height": int
|
||||
"width": int
|
||||
"category_id": int
|
||||
"noun_phrase": str
|
||||
}
|
||||
category {
|
||||
"id": int
|
||||
"name": str
|
||||
}
|
||||
video_np_pair {
|
||||
"id": int
|
||||
"video_id": str
|
||||
"category_id": int
|
||||
"noun_phrase": str
|
||||
"num_masklets" int
|
||||
}
|
||||
```
|
||||
[sam3/examples/saco_veval_vis_example.ipynb](https://github.com/facebookresearch/sam3/blob/main/examples/saco_veval_vis_example.ipynb) shows some examples of the data format and data visualization.
|
||||
|
||||
## Run Offline Eval
|
||||
An example notebook and an eval script have been provided for offline evaluation.
|
||||
```
|
||||
sam3/
|
||||
├── examples/
|
||||
│ └── saco_veval_eval_example.ipynb # this notebook will load eval res or run the eval on the fly, and print the results
|
||||
└── sam3/eval/
|
||||
└── saco_veval_eval.py # this script will run the offline evaluator
|
||||
```
|
||||
`saco_veval_eval.py` supports two modes, `one` and `all`.
|
||||
* `one`: will take only one pair of gt and pred files to eval
|
||||
* `all`: will eval on all 6 SACo/VEval datasets
|
||||
|
||||
Example usage
|
||||
```
|
||||
python saco_veval_eval.py one \
|
||||
--gt_annot_file ../sam3/assets/veval/toy_gt_and_pred/toy_saco_veval_sav_test_gt.json \
|
||||
--pred_file ../sam3/assets/veval/toy_gt_and_pred/toy_saco_veval_sav_test_pred.json \
|
||||
--eval_res_file ../sam3/assets/veval/toy_gt_and_pred/toy_saco_veval_sav_test_eval_res.json
|
||||
```
|
||||
* `gt_annot_file`: the location of the GT file
|
||||
* `pred_file`: the location of the Pred file
|
||||
* `eval_res_file`: the location where the eval result will be written to
|
||||
|
||||
```
|
||||
python saco_veval_eval.py all \
|
||||
--gt_annot_dir ../data/annotation \
|
||||
--pred_dir ../data/pred \
|
||||
--eval_res_dir ../data/pred
|
||||
```
|
||||
* `gt_annot_dir`: the location of the GT files
|
||||
* `pred_dir`: the location of the Pred files
|
||||
* `eval_res_dir`: the location where the eval results will be written to
|
||||
1
scripts/eval/veval/__init__.py
Normal file
1
scripts/eval/veval/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
|
||||
136
scripts/eval/veval/saco_yt1b_annot_update.py
Normal file
136
scripts/eval/veval/saco_yt1b_annot_update.py
Normal file
@@ -0,0 +1,136 @@
|
||||
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
|
||||
import pandas as pd
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_available_saco_yt1b_ids(yt1b_meida_dir, data):
|
||||
vdf = pd.DataFrame(data["videos"])
|
||||
expected_saco_yt1b_ids = vdf.video_name.tolist()
|
||||
|
||||
yt1b_media_folders = os.listdir(yt1b_meida_dir)
|
||||
|
||||
available_saco_yt1b_ids = []
|
||||
for yt1b_media_folder in yt1b_media_folders:
|
||||
if yt1b_media_folder not in expected_saco_yt1b_ids:
|
||||
continue
|
||||
jpeg_folder_dir = os.path.join(yt1b_meida_dir, yt1b_media_folder)
|
||||
jpeg_count = len(os.listdir(jpeg_folder_dir))
|
||||
if jpeg_count > 0:
|
||||
available_saco_yt1b_ids.append(yt1b_media_folder)
|
||||
else:
|
||||
logger.info(
|
||||
f"No JPEG images found for {yt1b_media_folder}. The annotation related to this video will be removed."
|
||||
)
|
||||
|
||||
logger.info(
|
||||
f"Expected {len(expected_saco_yt1b_ids)} videos for {data['info']}. Found {len(available_saco_yt1b_ids)} videos available in {yt1b_meida_dir}."
|
||||
)
|
||||
return available_saco_yt1b_ids
|
||||
|
||||
|
||||
def update_yt1b_annot_per_field(data, field, id_col, available_ids):
|
||||
field_data = data[field]
|
||||
new_field_data = []
|
||||
for data_entry in field_data:
|
||||
if data_entry[id_col] not in available_ids:
|
||||
logger.info(
|
||||
f"{field}: Removing {data_entry} due to the video being unavailable."
|
||||
)
|
||||
continue
|
||||
new_field_data.append(data_entry)
|
||||
|
||||
data[field] = new_field_data
|
||||
logger.info(
|
||||
f"Updated {field} by {id_col} - Before: {len(field_data)}, After: {len(new_field_data)}, Removed: {len(field_data) - len(new_field_data)}"
|
||||
)
|
||||
return data
|
||||
|
||||
|
||||
def update_yt1b_annot(yt1b_input_annot_path, yt1b_media_dir, yt1b_output_annot_path):
|
||||
with open(yt1b_input_annot_path, "r") as f:
|
||||
data = json.load(f)
|
||||
|
||||
available_saco_yt1b_ids = get_available_saco_yt1b_ids(yt1b_media_dir, data)
|
||||
|
||||
data = update_yt1b_annot_per_field(
|
||||
data=data,
|
||||
field="videos",
|
||||
id_col="video_name",
|
||||
available_ids=available_saco_yt1b_ids,
|
||||
)
|
||||
|
||||
videos_data = data["videos"]
|
||||
available_video_incremental_ids = [data_entry["id"] for data_entry in videos_data]
|
||||
|
||||
data = update_yt1b_annot_per_field(
|
||||
data=data,
|
||||
field="annotations",
|
||||
id_col="video_id",
|
||||
available_ids=available_video_incremental_ids,
|
||||
)
|
||||
data = update_yt1b_annot_per_field(
|
||||
data=data,
|
||||
field="video_np_pairs",
|
||||
id_col="video_id",
|
||||
available_ids=available_video_incremental_ids,
|
||||
)
|
||||
|
||||
with open(yt1b_output_annot_path, "w") as f:
|
||||
json.dump(data, f)
|
||||
|
||||
return data
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Run video grounding evaluators")
|
||||
parser.add_argument(
|
||||
"--yt1b_media_dir",
|
||||
type=str,
|
||||
help="Path to the directory where the yt1b media is stored e.g media/saco_yt1b/JPEGImages_6fps",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--yt1b_input_annot_path",
|
||||
type=str,
|
||||
help="Path to the saco_veval_yt1b input annotation file e.g annotation/saco_veval_yt1b_test.json or annotation/saco_veval_yt1b_val.json",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--yt1b_output_annot_path",
|
||||
type=str,
|
||||
help="Path to the output annotation file e.g annotation/saco_veval_yt1b_test_updated.json or annotation/saco_veval_yt1b_val_updated.json",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--yt1b_annot_update_log_path",
|
||||
type=str,
|
||||
help="Path to the yt1b annot update log file e.g annotation/yt1b_annot_update_log.log",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
os.makedirs(os.path.dirname(args.yt1b_annot_update_log_path), exist_ok=True)
|
||||
os.makedirs(os.path.dirname(args.yt1b_output_annot_path), exist_ok=True)
|
||||
|
||||
logging.basicConfig(
|
||||
filename=args.yt1b_annot_update_log_path,
|
||||
format="%(asctime)s [%(threadName)s] %(levelname)s: %(message)s",
|
||||
level=logging.INFO,
|
||||
filemode="w",
|
||||
)
|
||||
|
||||
_ = update_yt1b_annot(
|
||||
yt1b_input_annot_path=args.yt1b_input_annot_path,
|
||||
yt1b_media_dir=args.yt1b_media_dir,
|
||||
yt1b_output_annot_path=args.yt1b_output_annot_path,
|
||||
)
|
||||
|
||||
print("Done!! Check the log at", args.yt1b_annot_update_log_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
136
scripts/eval/veval/saco_yt1b_downloader.py
Normal file
136
scripts/eval/veval/saco_yt1b_downloader.py
Normal file
@@ -0,0 +1,136 @@
|
||||
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
|
||||
import argparse
|
||||
import logging
|
||||
|
||||
import multiprocessing as mp
|
||||
import os
|
||||
from functools import partial
|
||||
|
||||
import pandas as pd
|
||||
from saco_yt1b_frame_prep_util import YtVideoPrep
|
||||
from tqdm import tqdm
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def download_and_extract_frames(saco_yt1b_id, args):
|
||||
video_prep = YtVideoPrep(
|
||||
saco_yt1b_id=saco_yt1b_id,
|
||||
data_dir=args.data_dir,
|
||||
cookies_file=args.cookies_file,
|
||||
yt1b_start_end_time_file=args.yt1b_start_end_time_file,
|
||||
ffmpeg_timeout=args.ffmpeg_timeout,
|
||||
sleep_interval=args.sleep_interval,
|
||||
max_sleep_interval=args.max_sleep_interval,
|
||||
)
|
||||
|
||||
status = video_prep.download_youtube_video()
|
||||
logger.info(f"[video download][{saco_yt1b_id}] download status {status}")
|
||||
|
||||
if status not in ["already exists", "success"]:
|
||||
logger.warning(
|
||||
f"Video download failed for {saco_yt1b_id}, skipping frame generation"
|
||||
)
|
||||
return False
|
||||
|
||||
status = video_prep.extract_frames_in_6fps_and_width_1080()
|
||||
logger.info(f"[frame extracting][{saco_yt1b_id}] frame extracting status {status}")
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--data_dir",
|
||||
type=str,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--cookies_file",
|
||||
type=str,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--yt1b_start_end_time_file",
|
||||
type=str,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--yt1b_frame_prep_log_file",
|
||||
type=str,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--ffmpeg_timeout",
|
||||
type=str,
|
||||
default=7200, # Use longer timeout in case of large videos processing timeout
|
||||
)
|
||||
parser.add_argument(
|
||||
"--sleep_interval",
|
||||
type=int,
|
||||
default=10,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max_sleep_interval",
|
||||
type=int,
|
||||
default=30,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--num_workers",
|
||||
type=int,
|
||||
default=4,
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
log_dir = os.path.dirname(args.yt1b_frame_prep_log_file)
|
||||
if log_dir:
|
||||
os.makedirs(log_dir, exist_ok=True)
|
||||
|
||||
# Set up logging to both file and console
|
||||
# Configure the ROOT logger so all child loggers inherit the configuration
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s [%(processName)s/%(threadName)s] %(name)s - %(levelname)s: %(message)s",
|
||||
handlers=[
|
||||
logging.FileHandler(args.yt1b_frame_prep_log_file, mode="w"),
|
||||
logging.StreamHandler(),
|
||||
],
|
||||
force=True, # Override any existing configuration
|
||||
)
|
||||
|
||||
YT_DLP_WARNING_STR = """ ==========
|
||||
NOTICE!!
|
||||
This script uses yt-dlp to download youtube videos.
|
||||
See the youtube account banning risk in https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies
|
||||
==========
|
||||
"""
|
||||
|
||||
logger.info(YT_DLP_WARNING_STR)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
with open(args.yt1b_start_end_time_file, "r") as f:
|
||||
yt1b_start_end_time_df = pd.read_json(f)
|
||||
|
||||
saco_yt1b_ids = yt1b_start_end_time_df.saco_yt1b_id.unique()
|
||||
num_workers = args.num_workers
|
||||
logger.info(
|
||||
f"Starting with {num_workers} parallel worker(s) (sleep_interval={args.sleep_interval}-{args.max_sleep_interval}s)"
|
||||
)
|
||||
|
||||
with mp.Pool(num_workers) as p:
|
||||
download_func = partial(download_and_extract_frames, args=args)
|
||||
list(tqdm(p.imap(download_func, saco_yt1b_ids), total=len(saco_yt1b_ids)))
|
||||
|
||||
done_str = f""" ==========
|
||||
All DONE!!
|
||||
Download, frame extraction, and frame matching is all done! YT1B frames are not ready to use in {args.data_dir}/JPEGImages_6fps
|
||||
Check video frame preparing log at {args.yt1b_frame_prep_log_file}
|
||||
Some videos might not be available any more which will affect the eval reproducibility
|
||||
==========
|
||||
"""
|
||||
logger.info(done_str)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
265
scripts/eval/veval/saco_yt1b_frame_prep_util.py
Normal file
265
scripts/eval/veval/saco_yt1b_frame_prep_util.py
Normal file
@@ -0,0 +1,265 @@
|
||||
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
|
||||
import pandas as pd
|
||||
import yt_dlp
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class YtVideoPrep:
|
||||
def __init__(
|
||||
self,
|
||||
saco_yt1b_id: str,
|
||||
data_dir: str,
|
||||
cookies_file: str,
|
||||
yt1b_start_end_time_file: str,
|
||||
ffmpeg_timeout: int,
|
||||
sleep_interval: int = 10,
|
||||
max_sleep_interval: int = 30,
|
||||
):
|
||||
self.saco_yt1b_id = saco_yt1b_id # saco_yt1b_id is like saco_yt1b_000000
|
||||
self.data_dir = data_dir
|
||||
self.cookies_file = cookies_file
|
||||
self.ffmpeg_timeout = ffmpeg_timeout
|
||||
self.sleep_interval = sleep_interval
|
||||
self.max_sleep_interval = max_sleep_interval
|
||||
|
||||
self.yt1b_start_end_time_df = pd.read_json(yt1b_start_end_time_file)
|
||||
(
|
||||
self.yt_video_id,
|
||||
self.yt_video_id_w_timestamps,
|
||||
self.start_time,
|
||||
self.end_time,
|
||||
self.expected_num_frames,
|
||||
) = self._get_yt_video_id_map_info()
|
||||
|
||||
self.raw_video_dir = os.path.join(self.data_dir, "raw_videos")
|
||||
self.raw_video_path = os.path.join(
|
||||
self.raw_video_dir, f"{self.yt_video_id}.mp4"
|
||||
)
|
||||
|
||||
self.JPEGImages_6fps_dir = os.path.join(
|
||||
self.data_dir, "JPEGImages_6fps", self.saco_yt1b_id
|
||||
)
|
||||
self.JPEGImages_6fps_pattern = os.path.join(
|
||||
self.JPEGImages_6fps_dir, "%05d.jpg"
|
||||
)
|
||||
|
||||
os.makedirs(self.raw_video_dir, exist_ok=True)
|
||||
os.makedirs(self.JPEGImages_6fps_dir, exist_ok=True)
|
||||
|
||||
def _get_yt_video_id_map_info(self):
|
||||
df = self.yt1b_start_end_time_df[
|
||||
self.yt1b_start_end_time_df.saco_yt1b_id == self.saco_yt1b_id
|
||||
]
|
||||
assert (
|
||||
len(df) == 1
|
||||
), f"Expected exactly 1 row for saco_yt1b_id: {self.saco_yt1b_id}, found {len(df)}"
|
||||
id_and_frame_map_row = df.iloc[0]
|
||||
|
||||
yt_video_id = (
|
||||
id_and_frame_map_row.yt_video_id
|
||||
) # yt_video_id is like -06NgWyZxC0
|
||||
yt_video_id_w_timestamps = id_and_frame_map_row.yt_video_id_w_timestamps
|
||||
start_time = id_and_frame_map_row.start_time
|
||||
end_time = id_and_frame_map_row.end_time
|
||||
expected_num_frames = id_and_frame_map_row.length
|
||||
|
||||
return (
|
||||
yt_video_id,
|
||||
yt_video_id_w_timestamps,
|
||||
start_time,
|
||||
end_time,
|
||||
expected_num_frames,
|
||||
)
|
||||
|
||||
def download_youtube_video(self):
|
||||
video_url = f"https://youtube.com/watch?v={self.yt_video_id}"
|
||||
|
||||
assert os.path.exists(
|
||||
self.cookies_file
|
||||
), f"Cookies file '{self.cookies_file}' not found. Must have it to download videos."
|
||||
|
||||
outtmpl = self.raw_video_path
|
||||
|
||||
# Check if the output file already exists
|
||||
if os.path.exists(outtmpl) and os.path.isfile(outtmpl):
|
||||
return "already exists"
|
||||
|
||||
ydl_opts = {
|
||||
"format": "best[height<=720]/best", # 720p or lower
|
||||
"outtmpl": outtmpl,
|
||||
"merge_output_format": "mp4",
|
||||
"noplaylist": True,
|
||||
"quiet": True,
|
||||
"cookiefile": self.cookies_file,
|
||||
"sleep_interval": self.sleep_interval, # Sleep before each download to avoid rate limiting
|
||||
"max_sleep_interval": self.max_sleep_interval, # Random sleep for more human-like behavior
|
||||
}
|
||||
|
||||
if self.yt_video_id in ["euohdDLEMRg", "nzfAn7n4d-0"]:
|
||||
# For "euohdDLEMRg", we have to specify the https protocol or the video sometimes can't be downloaded completely
|
||||
# For "nzfAn7n4d-0", without the https protocol, the video will be downloaded as 654×480, however we need 490×360 to match the frame matching after the 1080 width resizing
|
||||
ydl_opts["format"] = (
|
||||
"best[height<=720][ext=mp4][protocol^=https]/best[ext=mp4][protocol^=https]/best[height<=720]/best"
|
||||
)
|
||||
|
||||
try:
|
||||
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
|
||||
ydl.download([video_url])
|
||||
return "success"
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
f"[video download][{self.saco_yt1b_id}] Error downloading video {self.yt_video_id}: {e}"
|
||||
)
|
||||
return f"error {e}"
|
||||
|
||||
def extract_frames_in_6fps_and_width_1080(self):
|
||||
"""
|
||||
Extract target frames in 6fps and width 1080.
|
||||
"""
|
||||
if not os.path.exists(self.raw_video_path):
|
||||
logger.warning(
|
||||
f"[frame extracting][{self.saco_yt1b_id}] Raw video file not found at {self.raw_video_path}"
|
||||
)
|
||||
os.rmdir(self.JPEGImages_6fps_dir)
|
||||
return False
|
||||
|
||||
if (
|
||||
os.path.exists(self.JPEGImages_6fps_dir)
|
||||
and len(os.listdir(self.JPEGImages_6fps_dir)) == self.expected_num_frames
|
||||
):
|
||||
logger.info(
|
||||
f"[frame extracting][{self.saco_yt1b_id}] JPEGImages_6fps directory already exists at {self.JPEGImages_6fps_dir} and expected number of frames {self.expected_num_frames} matches"
|
||||
)
|
||||
return True
|
||||
|
||||
# Clear the directory before extracting new frames
|
||||
for file in os.listdir(self.JPEGImages_6fps_dir):
|
||||
os.remove(os.path.join(self.JPEGImages_6fps_dir, file))
|
||||
|
||||
args = [
|
||||
"-nostdin",
|
||||
"-y",
|
||||
# select video segment
|
||||
"-ss",
|
||||
str(self.start_time),
|
||||
"-to",
|
||||
str(self.end_time),
|
||||
"-i",
|
||||
self.raw_video_path,
|
||||
# set output video resolution to be 6fps and at most 1080p
|
||||
"-vf",
|
||||
"fps=6,scale=1080:-2",
|
||||
"-vsync",
|
||||
"0", # passthrough mode - no frame duplication/dropping
|
||||
"-q:v",
|
||||
"2", # high quality JPEG output
|
||||
"-start_number",
|
||||
"0", # start frame numbering from 0
|
||||
self.JPEGImages_6fps_pattern,
|
||||
]
|
||||
|
||||
result = subprocess.run(
|
||||
["ffmpeg"] + args,
|
||||
timeout=self.ffmpeg_timeout,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
logger.warning(
|
||||
f"[frame extracting][{self.saco_yt1b_id}] Failed to extract raw frames: {result.stderr}"
|
||||
)
|
||||
os.rmdir(self.JPEGImages_6fps_dir)
|
||||
return False
|
||||
|
||||
if len(os.listdir(self.JPEGImages_6fps_dir)) != self.expected_num_frames:
|
||||
logger.warning(
|
||||
f"[frame extracting][{self.saco_yt1b_id}] Expected {self.expected_num_frames} frames but extracted {len(os.listdir(self.JPEGImages_6fps_dir))}"
|
||||
)
|
||||
# Clear the directory after failed extraction
|
||||
for file in os.listdir(self.JPEGImages_6fps_dir):
|
||||
os.remove(os.path.join(self.JPEGImages_6fps_dir, file))
|
||||
|
||||
os.rmdir(self.JPEGImages_6fps_dir)
|
||||
return False
|
||||
|
||||
logger.info(
|
||||
f"[frame extracting][{self.saco_yt1b_id}] Successfully extracted {self.expected_num_frames} frames to {self.JPEGImages_6fps_dir}"
|
||||
)
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--saco_yt1b_id", type=str, required=True)
|
||||
parser.add_argument(
|
||||
"--data_dir",
|
||||
type=str,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--cookies_file",
|
||||
type=str,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--yt1b_start_end_time_file",
|
||||
type=str,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--yt1b_frame_prep_log_file",
|
||||
type=str,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--ffmpeg_timeout",
|
||||
type=str,
|
||||
default=7200, # Use longer timeout in case of large videos processing timeout
|
||||
)
|
||||
parser.add_argument(
|
||||
"--sleep_interval",
|
||||
type=int,
|
||||
default=10,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max_sleep_interval",
|
||||
type=int,
|
||||
default=30,
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
logging.basicConfig(
|
||||
filename=args.yt1b_frame_prep_log_file,
|
||||
format="%(asctime)s [%(threadName)s] %(levelname)s: %(message)s",
|
||||
level=logging.INFO,
|
||||
filemode="w",
|
||||
)
|
||||
|
||||
video_prep = YtVideoPrep(
|
||||
saco_yt1b_id=args.saco_yt1b_id,
|
||||
data_dir=args.data_dir,
|
||||
cookies_file=args.cookies_file,
|
||||
yt1b_start_end_time_file=args.yt1b_start_end_time_file,
|
||||
ffmpeg_timeout=args.ffmpeg_timeout,
|
||||
sleep_interval=args.sleep_interval,
|
||||
max_sleep_interval=args.max_sleep_interval,
|
||||
)
|
||||
|
||||
status = video_prep.download_youtube_video()
|
||||
logger.info(f"[video download][{args.saco_yt1b_id}] download status {status}")
|
||||
|
||||
status = video_prep.extract_frames_in_6fps_and_width_1080()
|
||||
logger.info(
|
||||
f"[frame extracting][{args.saco_yt1b_id}] frame extracting status {status}"
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user