first commit

This commit is contained in:
2026-02-12 13:17:11 +08:00
commit caa519e62e
504 changed files with 123004 additions and 0 deletions

View File

@@ -0,0 +1,24 @@
path_annotations: <YOUR_ANNOTATIONS_PATH>/saco_frames_test_sets/annotations/
# Paths with downloaded data
droid_path: <YOUR_DATASET_PATH>/saco_frames_test_sets/droid/
sav_path: <YOUR_DATASET_PATH>/saco_frames_test_sets/sav/
ego4d_path: <YOUR_DATASET_PATH>/saco_frames_test_sets/ego4d/
yt1b_path: <YOUR_DATASET_PATH>/saco_frames_test_sets/yt1b/
# Configuration to download and extract video frames
cookies_path: <YOUR_COOKIES_PATH>/cookies.txt # Required to download YT1B videos
update_annotation_yt1b: true
update_annotation_ego4d: true
sav_videos_fps_6_download_path: ''
remove_downloaded_videos_yt1b: false
remove_downloaded_videos_droid: false
remove_downloaded_videos_ego4d: false
remove_downloaded_videos_sav: false
# Configuration for visualization of data
num_images_show: 5
saco_subset_show: yt1b # Options: [yt1b, ego4d, sav, droid]
directory_save: <YOUR_SAVE_DIR>

View File

@@ -0,0 +1,405 @@
# SA-Co/Silver benchmark
SA-Co/Silver is a benchmark for promptable concept segmentation (PCS) in images. The benchmark contains images paired with text labels (also referred as Noun Phrases aka NPs), each annotated exhaustively with masks on all object instances that match the label.
SA-Co/Silver comprises 10 subsets, covering a diverse array of domains including food, art, robotics, driving etc. Unlike SA-Co/Gold, there is only a single ground-truth for each datapoint, which means the results may have a bit more variance and tend to underestimate model performance, since they don't account for possible different interpretations of each query.
- BDD100k
- DROID
- Ego4D
- MyFoodRepo-273
- GeoDE
- iNaturalist-2017
- National Gallery of Art
- SA-V
- YT-Temporal-1B
- Fathomnet
The README contains instructions on how to download and setup the annotations, image data to prepare them for evaluation on SA-Co/Silver.
# Preparation
## Download annotations
The GT annotations can be downloaded from [Hugging Face](https://huggingface.co/datasets/facebook/SACo-Silver) or [Roboflow](https://universe.roboflow.com/sa-co-silver)
## Download images and video frames
### Image Datasets
#### GeoDE
The processed images needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/geode/) OR follow the below steps to prepare the processed images.
1. Download dataset with raw images from [GeoDE](https://geodiverse-data-collection.cs.princeton.edu/).
2. Extract the downloaded file to a location, say `<RAW_GEODE_IMAGES_FOLDER>`
3. Run the below command to pre-process the images and prepare for evaluation. The proceesed images will be saved to the location specified in `<PROCESSED_GEODE_IMAGES_FOLDER>`
```
python preprocess_silver_geode_bdd100k_food_rec.py --annotation_file <FOLDER_WITH_SILVER_ANNOTATIONS>/silver_geode_merged_test.json --raw_images_folder <RAW_GEODE_IMAGES_FOLDER> --processed_images_folder <PROCESSED_GEODE_IMAGES_FOLDER> --dataset_name geode
```
#### National Gallery of Art (NGA)
The processed images needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/national-gallery-of-art/) OR follow the below steps to prepare the processed images.
1. Run the below command to download raw images and pre-process the images to prepare for evaluation. The proceesed images will be saved to the location specified in `<PROCESSED_NGA_IMAGES_FOLDER>`.
```
python download_preprocess_nga.py --annotation_file <FOLDER_WITH_SILVER_ANNOTATIONS>/silver_nga_art_merged_test.json --raw_images_folder <RAW_NGA_IMAGES_FOLDER> --processed_images_folder <PROCESSED_NGA_IMAGES_FOLDER>
```
#### Berkeley Driving Dataset (BDD) 100k
The processed images needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/bdd100k-gwmh6/) OR follow the below steps to prepare the processed images.
1. Download data with raw images from the `100K Images` dataset in [BDD100k](http://bdd-data.berkeley.edu/download.html)
2. Extract the downloaded file to a location, say `<RAW_BDD_IMAGES_FOLDER>`
3. Run the below command to pre-process the images and prepare for evaluation. The proceesed images will be saved to the location specified in `<PROCESSED_BDD_IMAGES_FOLDER>`
```
python preprocess_silver_geode_bdd100k_food_rec.py --annotation_file <FOLDER_WITH_SILVER_ANNOTATIONS>/silver_bdd100k_merged_test.json --raw_images_folder <RAW_BDD_IMAGES_FOLDER> --processed_images_folder <PROCESSED_BDD_IMAGES_FOLDER> --dataset_name bdd100k
```
#### Food Recognition Challenge 2022
1. Download data with raw images from the [website](https://www.aicrowd.com/challenges/food-recognition-benchmark-2022). Download `[Round 2] public_validation_set_2.0.tar.gz` file.
2. Extract the downloaded file to a location, say `<RAW_FOOD_IMAGES_FOLDER>`
3. Run the below command to pre-process the images and prepare for evaluation. The proceesed images will be saved to the location specified in `<PROCESSED_FOOD_IMAGES_FOLDER>`
```
python preprocess_silver_geode_bdd100k_food_rec.py --annotation_file <FOLDER_WITH_SILVER_ANNOTATIONS>/silver_food_rec_merged_test.json --raw_images_folder <RAW_FOOD_IMAGES_FOLDER> --processed_images_folder <PROCESSED_FOOD_IMAGES_FOLDER> --dataset_name food_rec
```
#### iNaturalist
The processed images needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/inaturalist-2017/) OR follow the below steps to prepare the processed images.
1. Run the below command to download, extract images in `<RAW_INATURALIST_IMAGES_FOLDER>` and prepare them for evaluation. The proceesed images will be saved to the location specified in `<PROCESSED_INATURALIST_IMAGES_FOLDER>`
```
python download_inaturalist.py --raw_images_folder <RAW_INATURALIST_IMAGES_FOLDER> --processed_images_folder <PROCESSED_INATURALIST_IMAGES_FOLDER>
```
#### Fathomnet
The processed images needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/fathomnet-kmz5d/) OR follow the below steps to prepare the processed images.
1. Install the FathomNet API
```
pip install fathomnet
```
2. Run the below command to download the images and prepare for evaluation. The proceesed images will be saved to the location specified in `<PROCESSED_BDD_IMAGES_FOLDER>`
```
python download_fathomnet.py --processed_images_folder <PROCESSED_BFATHOMNET_IMAGES_FOLDER>
```
### Frame Datasets
These datasets correspond to annotations for individual frames coming from videos. The file `CONFIG_FRAMES.yaml` is used to unify the downloads for the datasets, as explained below.
Before following the other dataset steps, update `CONFIG_FRAMES.yaml` with the correct `path_annotations` path where the annotation files are.
#### DROID
The processed frames needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/droid-cfual/) OR follow the below steps to prepare the processed frames.
1. Install the gsutil package:
```bash
pip install gsutil
```
2. Modify the `droid_path` variable in `CONFIG_FRAMES.yaml`. This is the path where the DROID data will be downloaded.
3. _\[Optional\] Update the variable `remove_downloaded_videos_droid` to (not) remove the videos after the frames have been extracted.
4. Download the data:
```bash
python download_videos.py droid
```
5. Extract the frames:
```bash
python extract_frames.py droid
```
See the [DROID website](https://droid-dataset.github.io/droid/the-droid-dataset#-using-the-dataset) for more information.
#### SA-V
The processed frames needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/sa-v) OR follow the below steps to prepare the processed frames.
1. Follow instructions in the [Segment Anything official website](https://ai.meta.com/datasets/segment-anything-video-downloads/) to obtain access to the download links (they are dynamic links).
2. Update `CONFIG_FRAMES.yaml`:
- Update the `sav_path` variable, where the frames will be saved.
- Update the `sav_videos_fps_6_download_path` variable. Copy paste the path corresponding to the `videos_fps_6.tar` in the list that you obtained in step 1.
- _\[Optional\]_ Update the variable `remove_downloaded_videos_sav` to (not) remove the videos after the frames have been extracted.
3. Download the videos:
```bash
python download_videos.py sav
```
4. Extract the frames:
```
python extract_frames.py sav
```
#### Ego4D
The processed frames needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/ego4d-w7fiu/) OR follow the below steps to prepare the processed frames.
1. Review and accept the license agreement in the [official Ego4D website](https://ego4d-data.org/docs/start-here/#license-agreement).
2. Configure AWS credentials. Run:
```bash
pip install awscli
aws configure
```
and copy the values shown in the email you received after step 1 (you can leave "region name" and "output format" empty). You can verify that the variables were set up correctly:
```bash
cat ~/.aws/credentials
```
3. Install the Ego4D library:
```bash
pip install ego4d
```
4. Update `CONFIG_FRAMES.yaml`:
- Set up AWS credentials following the instructions in the email you received after step 2. Modify the following variables: `aws_access_key_id` and `aws_secret_access_key`.
- Update the `ego4d_path` variable, where the frames will be saved.
- _\[Optional\]_ Update the variable `remove_downloaded_videos_ego4d` to (not) remove the videos after the frames have been extracted..
5. Download the `clips` subset of the Ego4D dataset:
```python
python download_videos.py ego4d
```
6. Extract the frames:
```
python extract_frames.py ego4d
```
See the [official CLI](https://ego4d-data.org/docs/CLI/) and the [explanation about the videos](https://ego4d-data.org/docs/data/videos/) for more information.
#### YT1B
The processed frames needed for evaluation can be downloaded from [Roboflow](https://universe.roboflow.com/sa-co-silver/yt-temporal-1b/) OR follow the below steps to prepare the processed frames.
1. Install the yt-dlp library:
```bash
python3 -m pip install -U "yt-dlp[default]"
```
2. Create a `cookies.txt` file following the instructions from yt-dlp [exporting-youtube-cookies](https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies) and [pass-cookies-to-yt-dlp](https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp). This is required to download youtube videos. Then, update the path for that file in the `CONFIG_FRAMES.yaml` file, in the variable `cookies_path`.
3. Update `CONFIG_FRAMES.yaml`:
- Update the `yt1b_path`, where the frames will be saved.
- _\[Optional\]_ Some YouTube videos may not be available on YouTube anymore. Set `update_annotation_yt1b` to `True` in `CONFIG_FRAMES.yaml` to remove the annotations corresponding to such videos. Note that the evaluations will not be directly comparable with other reported evaluations.
- _\[Optional\]_ Update the variable `remove_downloaded_videos_yt1b` to (not) remove the videos after the frames have been extracted.
4. Run the following code to download the videos:
```
python download_videos.py yt1b
```
5. Extract the frames:
```
python extract_frames.py yt1b
```
# Usage
## Visualization
- Visualize GT annotations: [saco_gold_silver_vis_example.ipynb](https://github.com/facebookresearch/sam3/blob/main/examples/saco_gold_silver_vis_example.ipynb)
## Run evaluation
The official metric for SA-Co/Silver is cgF1. Please refer to the SAM3 paper for details.
Unlike Gold, the silver subset only has a single annotation per image. Therefore, the performance may be underestimated, because the model may be wrongly penalized for choosing an interpretation which is valid but different from that of the human annotator.
### Evaluate SAM3
We provide inference configurations to reproduce the evaluation of SAM3.
First, please edit the file [eval_base.yaml](https://github.com/facebookresearch/sam3/blob/main/sam3/train/configs/eval_base.yaml) with the paths where you downloaded the images and annotations above.
There are 10 subsets and as many configurations to be run.
Let's take the first subset as an example. The inference can be run locally using the following command (you can adjust the number of gpus):
```bash
python sam3/train/train.py -c configs/silver_image_evals/sam3_gold_image_bdd100k.yaml --use-cluster 0 --num-gpus 1
```
The predictions will be dumped in the folder specified in eval_base.yaml.
We also provide support for SLURM-based cluster inference. Edit the eval_base.yaml file to reflect your slurm configuration (partition, qos, ...), then run
```bash
python sam3/train/train.py -c configs/silver_image_evals/sam3_gold_image_bdd100k.yaml --use-cluster 1
```
### Offline evaluation
If you have the predictions in the COCO result format (see [here](https://cocodataset.org/#format-results)), then we provide scripts to easily run the evaluation.
For an example on how to run the evaluator on all subsets and aggregate results, see the following notebook: [saco_gold_silver_eval_example.ipynb](https://github.com/facebookresearch/sam3/blob/main/examples/saco_gold_silver_eval_example.ipynb)
If you have a prediction file for a given subset, you can run the evaluator specifically for that one using the standalone script. Example:
```bash
python scripts/eval/standalone_cgf1.py --pred_file /path/to/coco_predictions_segm.json --gt_files /path/to/annotations/silver_bdd100k_merged_test.json
```
# Results
<table style="border-color:black;border-style:solid;border-width:1px;border-collapse:collapse;border-spacing:0;text-align:right" class="tg"><thead>
<tr style="text-align:center">
<th></th>
<th colspan="3">Average</th>
<th colspan="3">BDD100k</th>
<th colspan="3">Droids</th>
<th colspan="3">Ego4d</th>
<th colspan="3">Food Rec</th>
<th colspan="3">Geode</th>
<th colspan="3">iNaturalist</th>
<th colspan="3">Nga Art</th>
<th colspan="3">SAV</th>
<th colspan="3">YT1B</th>
<th colspan="3">Fathomnet</th>
</tr></thead>
<tbody>
<tr>
<td></td>
<td>cgF1</td>
<td>IL_MCC</td>
<td>PmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
<td>CGF1</td>
<td>IL_MCC</td>
<td>pmF1</td>
</tr>
<tr>
<td>gDino-T</td> <td>3.09</td> <td>0.12</td> <td>19.75</td> <td>3.33</td> <td>0.17</td> <td>19.54</td> <td>4.26</td> <td>0.15</td> <td>28.38</td> <td>2.87</td> <td>0.1</td>
<td>28.72</td> <td>0.69</td> <td>0.05</td> <td>13.88</td> <td>9.61</td> <td>0.24</td> <td>40.03</td> <td>0</td> <td>0</td> <td>1.97</td> <td>1.31</td> <td>0.09</td>
<td>14.57</td> <td>5.18</td> <td>0.19</td> <td>27.25</td> <td>3.6</td> <td>0.16</td> <td>22.5</td> <td>0</td> <td>0</td> <td>0.64</td>
</tr>
<tr>
<td>OWLv2*</td> <td>11.23</td> <td>0.32</td> <td>31.18</td> <td>14.97</td> <td>0.46</td> <td>32.34</td> <td>10.84</td> <td>0.36</td> <td>30.1</td> <td>7.36</td> <td>0.23</td>
<td>31.99</td> <td>19.35</td> <td>0.44</td> <td>43.98</td> <td>27.04</td> <td>0.5</td> <td>54.07</td> <td>3.92</td> <td>0.14</td> <td>27.98</td> <td>8.05</td> <td>0.31</td>
<td>25.98</td> <td>10.59</td> <td>0.32</td> <td>33.1</td> <td>10.15</td> <td>0.38</td> <td>26.7</td> <td>0.04</td> <td>0.01</td> <td>5.57</td>
</tr>
<tr>
<td>OWLv2</td> <td>8.18</td> <td>0.23</td> <td>32.55</td> <td>8.5</td> <td>0.31</td> <td>27.79</td> <td>7.21</td> <td>0.25</td> <td>28.84</td> <td>5.64</td> <td>0.18</td>
<td>31.35</td> <td>14.18</td> <td>0.32</td> <td>44.32</td> <td>13.04</td> <td>0.28</td> <td>46.58</td> <td>3.62</td> <td>0.1</td> <td>36.23</td> <td>7.22</td> <td>0.25</td>
<td>28.88</td> <td>10.86</td> <td>0.32</td> <td>33.93</td> <td>11.7</td> <td>0.35</td> <td>33.43</td> <td>-0.14</td> <td>-0.01</td> <td>14.15</td>
</tr>
<tr>
<td>LLMDet-L</td> <td>6.73</td> <td>0.17</td> <td>28.19</td> <td>1.69</td> <td>0.08</td> <td>19.97</td> <td>2.56</td> <td>0.1</td> <td>25.59</td> <td>2.39</td>
<td>0.08</td> <td>29.92</td> <td>0.98</td> <td>0.06</td> <td>16.26</td> <td>20.82</td> <td>0.37</td> <td>56.26</td> <td>27.37</td> <td>0.46</td> <td>59.5</td>
<td>2.17</td> <td>0.13</td> <td>16.68</td> <td>5.37</td> <td>0.19</td> <td>28.26</td> <td>3.73</td> <td>0.16</td> <td>23.32</td> <td>0.24</td> <td>0.04</td> <td>6.1</td>
</tr>
<tr>
<td>Gemini 2.5</td> <td>9.67</td> <td>0.19</td> <td>45.51</td> <td>5.83</td> <td>0.19</td> <td>30.66</td> <td>5.61</td> <td>0.14</td> <td>40.07</td>
<td>0.38</td> <td>0.01</td> <td>38.14</td> <td>10.92</td> <td>0.24</td> <td>45.52</td> <td>18.28</td> <td>0.26</td> <td>70.29</td> <td>26.57</td> <td>0.36</td>
<td>73.81</td> <td>8.18</td> <td>0.2</td> <td>40.91</td> <td>9.48</td> <td>0.22</td> <td>43.1</td> <td>8.66</td> <td>0.23</td> <td>37.65</td> <td>2.8</td>
<td>0.08</td> <td>34.99</td>
</tr>
<tr> <td>SAM3</td> <td>49.57</td> <td>0.76</td> <td>65.17</td> <td>46.61</td> <td>0.78</td> <td>60.13</td> <td>45.58</td> <td>0.76</td>
<td>60.35</td> <td>38.64</td> <td>0.62</td> <td>62.56</td> <td>52.96</td> <td>0.79</td> <td>67.21</td> <td>70.07</td> <td>0.89</td>
<td>78.73</td> <td>65.8</td> <td>0.82</td> <td>80.67</td> <td>38.06</td> <td>0.66</td> <td>57.62</td> <td>44.36</td> <td>0.67</td>
<td>66.05</td> <td>42.07</td> <td>0.72</td> <td>58.36</td> <td>51.53</td> <td>0.86</td> <td>59.98</td>
</tr>
</tbody></table>
# Annotation format
The annotation format is derived from [COCO format](https://cocodataset.org/#format-data). Notable data fields are:
- `images`: a `list` of `dict` features, contains a list of all image-NP pairs. Each entry is related to an image-NP pair and has the following items.
- `id`: an `int` feature, unique identifier for the image-NP pair
- `text_input`: a `string` feature, the noun phrase for the image-NP pair
- `file_name`: a `string` feature, the relative image path in the corresponding data folder.
- `height`/`width`: dimension of the image
- `is_instance_exhaustive`: Boolean (0 or 1). If it's 1 then all the instances are correctly annotated. For instance segmentation, we only use those datapoints. Otherwise, there may be either missing instances or crowd segments (a segment covering multiple instances)
- `is_pixel_exhaustive`: Boolean (0 or 1). If it's 1, then the union of all masks cover all pixels corresponding to the prompt. This is weaker than instance_exhaustive since it allows crowd segments. It can be used for semantic segmentation evaluations.
- `annotations`: a `list` of `dict` features, containing a list of all annotations including bounding box, segmentation mask, area etc.
- `image_id`: an `int` feature, maps to the identifier for the image-np pair in images
- `bbox`: a `list` of float features, containing bounding box in [x,y,w,h] format, normalized by the image dimensions
- `segmentation`: a dict feature, containing segmentation mask in RLE format
- `category_id`: For compatibility with the coco format. Will always be 1 and is unused.
- `is_crowd`: Boolean (0 or 1). If 1, then the segment overlaps several instances (used in cases where instances are not separable, for e.g. due to poor image quality)
- `categories`: a `list` of `dict` features, containing a list of all categories. Here, we provide the category key for compatibility with the COCO format, but in open-vocabulary detection we do not use it. Instead, the text prompt is stored directly in each image (text_input in images). Note that in our setting, a unique image (id in images) actually corresponds to an (image, text prompt) combination.
For `id` in images that have corresponding annotations (i.e. exist as `image_id` in `annotations`), we refer to them as a "positive" NP. And, for `id` in `images` that don't have any annotations (i.e. they do not exist as `image_id` in `annotations`), we refer to them as a "negative" NP.
A sample annotation from DROID domain looks as follows:
#### images
```
[
{
"id": 10000000,
"file_name": "AUTOLab_failure_2023-07-07_Fri_Jul__7_18:50:36_2023_recordings_MP4_22008760/00002.jpg",
"text_input": "the large wooden table",
"width": 1280,
"height": 720,
"queried_category": "3",
"is_instance_exhaustive": 1,
"is_pixel_exhaustive": 1
}
]
```
#### annotations
```
[
{
"area": 0.17324327256944444,
"id": 1,
"image_id": 10000000,
"source": "created by SAM3",
"bbox": [
0.03750000149011612,
0.5083333253860474,
0.8382812738418579,
0.49166667461395264
],
"segmentation": {
"counts": "[^R11]f03O0O100O2N100O1O100O100O100O100O1O100O100O100O100O100O1O10000O1O10000O1O100O10000O1O100O100O100O100O100O100O100O100O100O100O1O100O100O10000O100O100O100O101N100O1O011O0O1O101OO0010O100O1O100O2OO0100O100O100O100O100O10000O100O100O1O100O10000O1O100O100O100O10000O1O100O100O100O10000O1O10000O1O100O100O100O100O100O100O1O100O100O100O100O100O100O100O100O100O100O100O100O100O100O10000O100O100O1O100O10000O100O100O100O100O1O100O100O100O100O100O100O10O0100O100O2O000O1O10000O1O10000O100O100O100O1O100O100O100O100O100O100O100O100O100O100O100O100O1O100O100O100O10000O100O100O100O100O100O100O100O100O100O100O100O100O100O10000O100O100O100O100O100O100O1O10000O1O10000O100O1O100O100O100O100O100O100O100O100O10000O1O100O100O100O100O1O10000O10\\MP@hNo?W1U@gNk?X1W@gNh?Y1Z@fNf?Y1\\@fNc?[1^@dNb?[1`@dN_?]1b@bN^?]1e@aNZ?_1i@_NW?a1l@\\NS?d1RAXNn>h1TAVNk>k1VATNj>k1XATNg>m1YASNg>m1YASNf>m1[ASNe>m1[ASNd>m1]ASNc>m1]ASNb>l1`ATN`>i1cAWN\\>d1jA\\NV>_1oAaNP>^1RBbNn=\\1TBdNk=\\1VBdNj=1`@dNGO02P2Z1h=L_AfNj0^1g=FmC;R<EoC;Q<DPD<o;DRD<n;DQD=n;DjAnN?^1g=DhAQO?\\1h=DhAUO<W1l=EeAZO:R1P>F]ABa0h0Q>Hd@lNDV1e17S>k1iAWNW>i1hAXNW>j1gAWNY>i1fAXNY>j1eAWNZ>k1dAVN\\>k1bAVN^>k1`AVN_>l1`ATN`>m1^ATNa>o1]AQNc>P2[AQNd>P2\\APNd>Q2[AoMd>R2[AoMd>R2\\AnMd>S2ZAnMe>S2[AmMe>T2YAmMf>T2YAmMg>T2WAmMh>U2VAlMj>U2TAlMl>U2PAnMo>U2j@PNV?e4O100O100O100O100O100O100O100O100O100O100O100O100O101N100O100O10O0100O100O100O100O100O100O1000000O1000000O100O100O1O1O1O100O100O1O100O100O100O100O100O100O100O100O100O1O100O100O100O100O100O10000O100O1O100O100O100O100O100O100OkK_B]Oa=7oBEP=4YCKg<1^CNa<1bCN^<OeC1[<LhC4W<KlC4S<KoC5Q<JPD6o;JRD6n;JSD5l;LTD4l;LTD4k;MUD3k;MUD4j;LWD2i;OWD1i;OWD1h;0XD0h;1WDOh;2XDOg;1ZDNe;3[DMe;3[DNc;3]DLd;4\\DLc;5]DKb;7]DIc;7^DHa;9_DGa;9_DG`;:`DF`;;_DE`;<`DCa;=^DDa;=_DC`;>_DCa;>^DBb;[OUCiMW1n2c;YO[CeMn0V3g;TO^CeMf0[3k;POaCdM>b3Q<iNbCfM7f3V<dNeCeMKQ4`<YNgCfMAX4g<RNiCk2W<SMlCl2S<TMnCl2R<SMoCm2Q<RMQDm2n;TMRDl2n;SMTDl2k;UMUDk2k;UMVDj2i;VMXDj2h;VMXDj2g;VM[Di2e;VM\\Dj2c;VM^Dj2b;TMaDk2^;PMhDP3X;aL`CjM`1e5o:\\L^Ed3b:WLdEh3[:nKPFR4P:jKTFV4k9hKXFX4h9hKXFX4g9hKYFY4f9hKZFX4f9hKZFX4e9iKZFW4g9iKXFX4g9iKPElN\\O\\5c;iKeDYOEo4f;iK]DAJh4g;iKTDJ3^4i;jKkCO;X4i;hMVDX2j;hMUDY2j;iMUDW2k;iMTDW2l;kMSDU2m;kMRDV2m;lMRDT2n;mMPDT2P<mMoCS2P<oMnCR2R<V4O100O100OiInCR2Q<kMWDQ2i;kM_DQ2`;lMoDi1Q;TNWEg1h:XN^Ed1a:\\NdE`1\\:^NjE^1U:aNPF]1o9aNUF]1k9bNXF\\1g9dN]FY1c9fN`FX1_9hNdFV1\\9iNhFT1W9lNmFQ1S9nNQGo0n8QOTGn0l8ROWGk0h8UO[Gi0e8VO^Gh0a8YO`Gf0`8YOcGe0\\8\\OeGc0[8\\OiGa0V8@lG>T8AnG>Q8BQH=o7CRH<m7DVH:j7FWH9h7HYH7g7H[H7d7J^H4b7L^H4b7K`H4_7MbH2^7NcH1\\7OfH0Z70gHOX72iHMW73jHLV74jHLU74mHKS75mHKS75nHJR76oHIQ77oHIR7jMkDP1U4U1S7RM_D0h0g1f3W1^8hNcGV1_8iNaGX1_8gNaGY1`8fNaGY1_8gNaGY1`8fNaGY1_8gNaGY1`8fNaGY1_8gNaGY1`8fNaGY1_8gNaGY1_8gNaGY1_8gNbGX1_8gNaGY1_8gNaGY1_8fNbGY1`8fNaGY1_8gNaGY1_8gNaGY1_8gNaGY1_8gNbGX1^8hNbGX1^8hNbGX1^8hNbGX1^8hNbGX1^8iNbGV1^8jNbGV1^8jNbGV1^8jNbGV1^8jNbGV1^8jNbGV1^8jNbGV1]8lNbGT1^8lNcGS1\\8nNdGR1\\8nNdGR1[8oNeGQ1Z8POfGP1X8SOhGl0W8UOiGk0U8WOkGi0S8YOmGg0P8\\OPHd0n7_ORH`0l7BTH>j7DVH<g7HYH7d7L\\H4b7N^H2`71_HO^74bHL[77eHIY7:fHFX7<hHDV7>jHBT7a0kH_OT7b0mH]OR7d0nH\\OQ7f0nH]OQ7g0oHZOQ7g0oHYOQ7h0nHXOR7h0nHXOR7h0nHXOR7i0mHWOT7h0kHYOU7h0jHXOV7h0iHYOW7g0iHYOW7h0hHXOY7g0fHZOZ7f0eH[O\\7e0cHhNlKSNa;U3bHeNSLTN\\;W3_HbN]LRNU;\\3]H^Nb8c1\\G\\Ng8c1XG\\Nj8e1TGZNo8e1PGYNS9h1lFUNW9l1gFRN]9m1bFRN`9o1^FPNe9o1[FoMg9R2WFnMj9S2TFmMn9R2RFnMn9S2PFmMR:R2nEmMS:T2kEmMU:T2jEkMX:T2gEmMY:T2fElMZ:U2dEkM^:T2aEmM_:T2`ElM`:U2^ElMc:S2\\EmMe:T2YEmMg:T2WEmMj:S2UEmMk:T2SEmMn:S2PEnMP;S2nDoMQ;R2mDoMT;Q2kDoMU;R2iDoMX;Q2fDQNY;P2eDQN[;P2cDQN^;o1`DSN_;n1^DTNc;l1[DVNd;k1ZDVNg;j1WDXNh;j1UDWNk;j1SDWNn;i1oCZNP<h1mCYNS<h1kCZNU<g1gC\\NX<e1fC\\N[<d1cC^N\\<d1aC^N_<c1^C_Na<b1\\CaNc<a1ZCaNf<_1XCcNg<_1UCeNj<^1oBfNP=]1iBiN?gL^;e4hCkNf0dLb;`8YDcGg;^8VDdGk;^8mChGR<_8bCfG_<U900001N101O00001O001O00001O00001O0O2N1O1O2N1O2N100O2N1O1O2N1O2N1O1O2N1O2M200O2M2O2N1N2O2N1N3N1O1N3N1N3M2O2kMkAkKW>Q4RBiKo=8^AR2j0`Mk=:aAP2i0bMh==eAj1g0eMf=?hAh1f0eMd=?lAg1c0gMc=`0nAe1c0hMa=a0oAd1b0iM`=a0QBc1c0iM]=c0SB`1d0iM\\=e0SB^1e0jMY=g0VB[1e0jMV=k0WBW1V`0gNn_OT1T`0lNo_Oo0S`0POS@i0P`0VOT@d0n?\\OT@`0n?@T@<o?CR@^OUN6ka0=P@XO\\N6ga0a0j@WOY?i0X3O001O00010O00001O0010O0001O00010O001O00001O001O01O01O00001O001O000O2O0O2O0O2N1O2N1O2M3MYl51fSJ3L3O1O100O1O100000000001O000000001O00000000001O01OO1000000000001O000001O000O10000000000000000O10000O10000O10000O100O1O100O1O1O1O1O1O1N2O1O1O1O1O1O1O1O1O1O1O1O1O1O1O1O1N2O1O1O1O1O1O1O100O100N21O00001O001O2N1O1O2N1O2N1O2M3N4IVT_3",
"size": [
720,
1280
]
},
"category_id": 1,
"iscrowd": 0
}
]
```
### Data Stats
Here are the stats for the 10 annotation domains. The # Image-NPs represent the total number of unique image-NP pairs including both “positive” and “negative” NPs.
| Domain | # Image-NPs | # Image-NP-Masks|
|--------------------------|--------------| ----------------|
| BDD100k | 5546 | 13210 |
| DROID | 9445 | 11098 |
| Ego4D | 12608 | 24049 |
| MyFoodRepo-273 | 20985 | 28347 |
| GeoDE | 14850 | 7570 |
| iNaturalist-2017 | 1439051 | 48899 |
| National Gallery of Art | 22294 | 18991 |
| SA-V | 18337 | 39683 |
| YT-Temporal-1B | 7816 | 12221 |
| Fathomnet | 287193 | 14174 |

View File

@@ -0,0 +1,64 @@
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
# pyre-unsafe
import argparse
import json
import os
from multiprocessing import Pool
from pathlib import Path
import requests
from fathomnet.api import images
from tqdm import tqdm
def download_imgs(args, image_uuids):
flag = 0
for uuid in tqdm(image_uuids, desc="Downloading images"):
image = images.find_by_uuid(uuid)
file_name = (
Path(args.processed_images_folder)
/ f"{image.uuid}.{image.url.split('.')[-1]}"
)
if not file_name.exists():
try:
resp = requests.get(image.url, stream=True)
resp.raise_for_status()
with open(file_name, "wb") as f:
for chunk in resp.iter_content(chunk_size=1024):
f.write(chunk)
flag += 1
except requests.exceptions.RequestException as e:
print(f"Error downloading {image.url}: {e}")
print(f"Downloaded {flag} new images to {args.processed_images_folder}")
def main():
parser = argparse.ArgumentParser(description="Download images from FathomNet")
parser.add_argument("--processed_images_folder", help="Path to downloaded images")
parser.add_argument(
"--image-uuids",
default="fathomnet_image_uuids.json",
help="Path to JSON file containing image uuids to download",
)
parser.add_argument(
"--num-procs", type=int, default=16, help="Number of parallel processes"
)
args = parser.parse_args()
with open(args.image_uuids, "r") as f:
all_uuids = json.load(f)
Path(args.processed_images_folder).mkdir(parents=True, exist_ok=True)
chunk_size = len(all_uuids) // args.num_procs
chunks = [
all_uuids[i : i + chunk_size] for i in range(0, len(all_uuids), chunk_size)
]
with Pool(processes=args.num_procs) as pool:
pool.starmap(download_imgs, [(args, chunk) for chunk in chunks])
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,83 @@
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
# pyre-unsafe
import argparse
import json
import shutil
import subprocess
import sys
import tarfile
from pathlib import Path
from tqdm import tqdm
def download_archive(url, dest_dir):
dest_dir = Path(dest_dir)
dest_dir.mkdir(parents=True, exist_ok=True)
archive_path = dest_dir / url.split("/")[-1]
if not archive_path.exists():
print(f"Downloading archive to {archive_path}...")
result = subprocess.run(["wget", "-O", str(archive_path), url])
if result.returncode != 0:
print("Download failed.")
sys.exit(1)
else:
print(f"Archive already exists at {archive_path}")
return archive_path
def extract_archive(archive_path, dest_dir):
print(f"Extracting {archive_path} to {dest_dir}...")
with tarfile.open(archive_path, "r:gz") as tar:
tar.extractall(path=dest_dir)
print("Extraction complete.")
def copy_images(subset_json, untar_dir, output_dir):
with open(subset_json, "r") as f:
image_dict = json.load(f)
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
for target_name, rel_path in tqdm(image_dict.items(), "Copying image subset"):
src = Path(untar_dir) / rel_path
dst = output_dir / target_name
if not src.exists():
print(f"Warning: Source image {src} does not exist, skipping.")
continue
shutil.copy2(src, dst)
print(f"Copied {len(image_dict)} images to {output_dir}")
def main():
parser = argparse.ArgumentParser(
description="Download, extract, and copy subset of iNaturalist images from archive."
)
parser.add_argument(
"--raw_images_folder", help="Path to downloaded and extract the archive"
)
parser.add_argument("--processed_images_folder", help="Path to processed images")
parser.add_argument(
"--subset-json",
default="inaturalist_image_subset.json",
help="Path to iNaturalist images subset",
)
parser.add_argument(
"--archive-url",
default="https://ml-inat-competition-datasets.s3.amazonaws.com/2017/train_val_images.tar.gz",
help="URL of the archive to download",
)
args = parser.parse_args()
dest_dir = Path(args.raw_images_folder)
images_dir = Path(args.processed_images_folder)
archive_path = download_archive(args.archive_url, dest_dir)
extract_archive(archive_path, dest_dir)
untar_dir = dest_dir / "train_val_images"
copy_images(args.subset_json, untar_dir, images_dir)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,142 @@
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
# pyre-unsafe
import argparse
import os
from functools import partial
from multiprocessing import Pool
from pathlib import Path
import numpy as np
import pandas as pd
import requests
import utils
from PIL import Image
from tqdm import tqdm
METADATA_FILE = "published_images.csv"
METADATA_URL = "https://raw.githubusercontent.com/NationalGalleryOfArt/opendata/refs/heads/main/data" # data/published_iamges.csv from https://github.com/NationalGalleryOfArt/opendata/tree/main
IMG_URL = "https://api.nga.gov/iiif/%s/full/%s/0/default.jpg"
METADATA_FOLDER = "metadata"
EXTENSION = ".jpg"
def download_metadata(annotation_folder):
output_folder = annotation_folder / METADATA_FOLDER
output_folder.mkdir(exist_ok=True)
url = f"{METADATA_URL}/{METADATA_FILE}"
print(url)
response = requests.get(url)
if response.status_code == 200:
with open(output_folder / METADATA_FILE, "wb") as f:
f.write(response.content)
def download_url(row):
if np.isnan(row.maxpixels) or (
row.maxpixels > row.width and row.maxpixels > row.height
):
url = IMG_URL % (row.uuid, "full")
else:
url = IMG_URL % (row.uuid, f"!{row.maxpixels},{row.maxpixels}")
return url
def download_item(item, output_folder):
uuid, url = item
try:
if (output_folder / f"{uuid}{EXTENSION}").exists():
print("skipping", uuid, "already downloaded")
return
response = requests.get(url)
if response.status_code == 200:
with open(output_folder / f"{uuid}{EXTENSION}", "wb") as f:
f.write(response.content)
except:
print("errored", item)
return
def remove_non_compliant_image(item, output_folder):
uuid, max_pixels = item
if np.isnan(max_pixels):
return
if not (output_folder / f"{uuid}{EXTENSION}").exists():
return
img = Image.open(output_folder / f"{uuid}{EXTENSION}")
if img.width > max_pixels or img.height > max_pixels:
os.remove(output_folder / f"{uuid}{EXTENSION}") # delete image
return uuid
def reshape_image(rel_path, filename_size_map, output_folder):
w, h = filename_size_map[rel_path]
path = output_folder / f"{rel_path}"
img = Image.open(path)
if img.width != w or img.height != h:
new_size = (w, h)
resized_img = img.resize(new_size)
resized_img.save(path)
def main(args, workers=20):
raw_folder = Path(args.raw_images_folder)
processed_folder = Path(args.processed_images_folder)
utils.setup(raw_folder)
utils.setup(processed_folder)
uuids = utils.get_image_ids(args.annotation_file)
filename_size_map = utils.get_filename_size_map(args.annotation_file)
if not ((raw_folder / METADATA_FOLDER) / METADATA_FILE).exists():
download_metadata(raw_folder)
metadata = pd.read_csv((raw_folder / METADATA_FOLDER) / METADATA_FILE)
metadata["download_url"] = metadata.apply(download_url, axis=1)
available_uuids = list(uuids.intersection(set(metadata["uuid"].tolist())))
print(len(available_uuids), "available for download out of", len(uuids), "target")
url_data = list(
metadata.set_index("uuid")
.loc[available_uuids]
.to_dict()["download_url"]
.items()
)
download_single = partial(download_item, output_folder=(processed_folder))
print("Preparing to download", len(url_data), "items")
with Pool(20) as p:
for _ in tqdm(p.imap(download_single, url_data), total=len(url_data)):
continue
check_img_size = partial(
remove_non_compliant_image, output_folder=(processed_folder)
)
max_pixels_dict_all = metadata.set_index("uuid").to_dict()["maxpixels"]
max_pixels_dict = {item[0]: max_pixels_dict_all[item[0]] for item in url_data}
print("Checking all images within size constraints")
non_compliant = set()
with Pool(20) as p:
for each in tqdm(
p.imap(check_img_size, max_pixels_dict.items()), total=len(max_pixels_dict)
):
if each is not None:
non_compliant.add(each)
print(len(non_compliant), "not compliant size, removed")
reshape_single = partial(
reshape_image,
filename_size_map=(filename_size_map),
output_folder=(processed_folder),
)
rel_paths = os.listdir(args.processed_images_folder)
print("Preparing to reshape", len(rel_paths), "items")
with Pool(20) as p:
for _ in tqdm(p.imap(reshape_single, rel_paths), total=len(rel_paths)):
continue
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--annotation_file", help="Path to annotation file")
parser.add_argument("--raw_images_folder", help="Path to downloaded images")
parser.add_argument("--processed_images_folder", help="Path to processed images")
args = parser.parse_args()
main(args)

View File

@@ -0,0 +1,261 @@
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
# pyre-unsafe
import ast
import concurrent.futures
import os
import shutil
import subprocess
import sys
from concurrent.futures import as_completed, ThreadPoolExecutor
from pathlib import Path
import yt_dlp
from utils import (
annotation_files,
config,
load_json,
run_command,
save_json,
update_annotations,
)
def construct_gcs_path(original_video):
"""
Convert original_video string to GCS path.
Example:
'AUTOLab_failure_2023-07-07_Fri_Jul__7_18:50:36_2023_recordings_MP4_22008760.mp4'
->
'gs://gresearch/robotics/droid_raw/1.0.1/AUTOLab/failure/2023-07-07/Fri_Jul__7_18:50:36_2023/recordings/MP4/22008760.mp4'
"""
parts = original_video.split("_")
lab = parts[0]
failure = parts[1]
date = parts[2]
time = "_".join(parts[3:-3])
recordings = parts[-3]
mp4 = parts[-2]
file_id = parts[-1].split(".")[0]
gcs_path = (
f"gs://gresearch/robotics/droid_raw/1.0.1/"
f"{lab}/{failure}/{date}/{time}/{recordings}/{mp4}/{file_id}.mp4"
)
return gcs_path
def download_video(args):
gcs_path, dst_dir, json_file = args
# Ensure subdirectory exists
subdir = Path(dst_dir)
os.makedirs(subdir, exist_ok=True)
# Save file with its original name inside the subdir
print(json_file)
local_path = subdir / json_file
cmd = f'gsutil cp "{gcs_path}" "{local_path}"'
print(f"Running: {cmd}")
try:
run_command(cmd)
return (gcs_path, True, None)
except Exception as e:
return (gcs_path, False, str(e))
def download_youtube_video(youtube_id, output_path=None):
try:
if output_path is None:
output_path = os.path.join(
config["yt1b_path"], "downloaded_videos", f"video_{youtube_id}.mp4"
)
url = f"https://www.youtube.com/watch?v={youtube_id}"
if os.path.exists(output_path):
return youtube_id, None
format = "best[height<=720][fps<=30]/best[height<=720]/best" # 720p or lower, max 30fps
ydl_opts = {
"format": format,
"outtmpl": output_path,
"merge_output_format": "mp4",
"quiet": True,
"cookiefile": config["cookies_path"],
"socket_timeout": 60, # Increase timeout to 60 seconds (default is 10)
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
return youtube_id, None
except Exception as e:
return youtube_id, str(e)
def download_youtube():
all_videos_to_download = set()
for annotation_file in annotation_files["yt1b"]:
ann = load_json(os.path.join(config["path_annotations"], annotation_file))
for video_info in ann["images"]:
youtube_id = video_info["original_video"]
all_videos_to_download.add(youtube_id)
videos_to_download_still = all_videos_to_download
videos_downloaded = set()
videos_unavailable = set()
num_download_retries = 3
for _ in range(num_download_retries):
if len(videos_to_download_still) == 0:
break
videos_error = set()
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [
executor.submit(download_youtube_video, youtube_id)
for youtube_id in videos_to_download_still
]
for future in concurrent.futures.as_completed(futures):
youtube_id, exception = future.result()
if exception is None:
videos_downloaded.add(youtube_id)
elif "unavailable" in exception or "members-only" in exception:
videos_unavailable.add(youtube_id)
else:
videos_error.add(youtube_id)
videos_to_download_still = (
all_videos_to_download - videos_downloaded - videos_unavailable
)
assert videos_to_download_still == videos_error
if len(videos_unavailable) + len(videos_to_download_still) > 0:
message = "Some videos are either no longer available on YouTube, or are set to private, or resulted in some other error. "
if config["update_annotation_yt1b"]:
message += "The unavailable videos will be ***REMOVED*** from the annotation file. This will make the test results NOT DIRECTLY COMPARABLE to other reported results."
print(message)
update_annotations("yt1b", videos_downloaded)
else:
message += "You may want to either re-try the download, or remove these videos from the evaluation json"
print(message)
def download_droid():
ann_dir = Path(config["path_annotations"])
dst_dir = Path(config["droid_path"]) / "downloaded_videos"
json_files = annotation_files["droid"]
download_tasks = []
original_videos = set()
for json_file in json_files:
json_path = ann_dir / json_file
data = load_json(json_path)
for img in data["images"]:
original_video = img["original_video"]
original_videos.add(original_video)
print(len(original_videos))
for original_video in original_videos:
gcs_path = construct_gcs_path(original_video)
download_tasks.append((gcs_path, dst_dir, original_video))
max_workers = min(16, len(download_tasks))
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_task = {
executor.submit(download_video, task): task for task in download_tasks
}
for future in as_completed(future_to_task):
gcs_path, success, error = future.result()
if not success:
print(f"Failed to download {gcs_path}: {error}")
def download_ego4d():
output_dir = os.path.join(config["ego4d_path"], "downloaded_videos")
ann_dir = Path(config["path_annotations"])
json_files = annotation_files["ego4d"]
original_videos = set()
for json_file in json_files:
json_path = ann_dir / json_file
data = load_json(json_path)
for img in data["images"]:
original_video = img["original_video"]
original_videos.add(original_video)
original_video_uids = [
video_uid.replace(".mp4", "") for video_uid in original_videos
]
video_ids_download = original_video_uids
num_download_retries = 2
download_correct = False
message = ""
for _ in range(num_download_retries):
cmd = (
[
# "python", "-m", "ego4d.cli.cli",
"ego4d",
"--output_directory",
output_dir,
"--datasets",
"clips",
"--version",
"v1",
"--video_uids",
]
+ video_ids_download
+ ["--yes"]
)
# Run the command
result = subprocess.run(cmd, capture_output=True, text=True)
message = result.stderr
if (
"RuntimeError: The following requested video UIDs could not be found in the manifest for version:"
in result.stderr
):
not_findable_videos = ast.literal_eval(result.stderr.split("\n")[-2])
video_ids_download = [
video_uid
for video_uid in video_ids_download
if video_uid not in not_findable_videos
]
else:
download_correct = True
break
if not download_correct:
print(f"There was an error downloading the Ego4D data: {message}")
if len(video_ids_download) != len(original_video_uids):
message = "Some videos are no longer available. "
if config["update_annotation_ego4d"]:
message += "The unavailable videos will be ***REMOVED*** from the annotation file. This will make the test results NOT DIRECTLY COMPARABLE to other reported results."
print(message)
update_annotations("ego4d", video_ids_download)
else:
message += "You may want to either re-try the download, or remove these videos from the evaluation json"
print(message)
def download_sav():
tar_url = config["sav_videos_fps_6_download_path"]
tar_file = "videos_fps_6.tar"
sav_data_dir = os.path.join(config["sav_path"], "downloaded_videos")
os.makedirs(sav_data_dir, exist_ok=True)
subprocess.run(["wget", tar_url, "-O", tar_file], cwd=sav_data_dir, check=True)
subprocess.run(["tar", "-xvf", tar_file], cwd=sav_data_dir, check=True)
subprocess.run(["rm", tar_file], cwd=sav_data_dir, check=True)
def main():
assert len(sys.argv) > 1, "You have to provide the name of the dataset"
dataset_name = sys.argv[1]
assert dataset_name in annotation_files, (
f"The dataset can be one of {list(annotation_files.keys())}"
)
if dataset_name == "yt1b":
download_youtube()
elif dataset_name == "droid":
download_droid()
elif dataset_name == "ego4d":
download_ego4d()
elif dataset_name == "sav":
download_sav()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,101 @@
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
# pyre-unsafe
"""
This file extracts the frames for the frame datasets in SA-CO/Gold and Silver.
Call like:
> python extract_frames.py <dataset_name>
"""
import json
import os
import shutil
import sys
from multiprocessing import Pool
from PIL import Image
from tqdm import tqdm
from utils import (
annotation_files,
config,
get_frame_from_video,
is_valid_image,
update_annotations,
)
def extract_frame(path_video, global_frame_idx, path_frame, image_size, file_name):
frame = get_frame_from_video(path_video, global_frame_idx)
os.makedirs(os.path.dirname(path_frame), exist_ok=True)
img = Image.fromarray(frame)
if frame.shape[:2] != image_size:
print(f"Resizing image {file_name} from {frame.shape[:2]} to {image_size}")
height, width = image_size
img = img.resize((width, height)) # Uses Image.NEAREST by default
img.save(path_frame)
def process_image(args):
image, dataset_name, config = args
original_video, global_frame_idx, file_name, image_size = image
extra_subpath = ""
if dataset_name == "ego4d":
extra_subpath = "v1/clips"
elif dataset_name == "yt1b":
original_video = f"video_{original_video}.mp4"
elif dataset_name == "sav":
extra_subpath = "videos_fps_6"
path_video = os.path.join(
config[f"{dataset_name}_path"],
"downloaded_videos",
extra_subpath,
original_video,
)
path_frame = os.path.join(config[f"{dataset_name}_path"], "frames", file_name)
to_return = file_name
try:
extract_frame(path_video, global_frame_idx, path_frame, image_size, file_name)
if not is_valid_image(path_frame):
print(f"Invalid image in {path_frame}")
to_return = None
except:
print(f"Invalid image in {path_frame}")
to_return = None
return to_return
def main():
assert len(sys.argv) > 1, "You have to provide the name of the dataset"
dataset_name = sys.argv[1]
assert dataset_name in annotation_files, (
f"The dataset can be one of {list(annotation_files.keys())}"
)
all_outputs = []
for file in annotation_files[dataset_name]:
with open(os.path.join(config["path_annotations"], file), "r") as f:
annotation = json.load(f)
images = annotation["images"]
images = set(
(
image["original_video"],
image["global_frame_idx"],
image["file_name"],
tuple(image["image_size"]),
)
for image in images
)
args_list = [(image, dataset_name, config) for image in images]
with Pool(os.cpu_count()) as pool:
outputs = list(
tqdm(pool.imap_unordered(process_image, args_list), total=len(images))
)
all_outputs.extend(outputs)
if any(out is None for out in outputs):
update_annotations(dataset_name, all_outputs, key="file_name")
if config[f"remove_downloaded_videos_{dataset_name}"]:
shutil.rmtree(os.path.join(config[f"{dataset_name}_path"], "downloaded_videos"))
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,72 @@
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
# pyre-unsafe
import argparse
from multiprocessing import Pool
from pathlib import Path
import pandas as pd
import utils
from tqdm import tqdm
def main(args, n_workers=20):
raw_folder = Path(args.raw_images_folder)
processed_folder = Path(args.processed_images_folder)
utils.setup(processed_folder)
img_ids = utils.get_image_ids(args.annotation_file)
if args.dataset_name == "geode":
metadata = pd.read_csv(raw_folder / "index.csv")
metadata["flat_filepath"] = metadata.file_path.apply(
lambda x: x.replace("/", "_")
)
metadata["original_absolute_path"] = metadata.file_path.apply(
lambda x: str((raw_folder / "images") / x)
)
metadata["new_absolute_path"] = metadata.flat_filepath.apply(
lambda x: str(processed_folder / x)
)
metadata["filestem"] = metadata.new_absolute_path.apply(lambda x: Path(x).stem)
img_id_mapping = metadata.set_index("filestem").to_dict()
# print(img_id_mapping.keys())
paths = [
(
img_id_mapping["original_absolute_path"][each],
img_id_mapping["new_absolute_path"][each],
)
for each in img_ids
]
elif args.dataset_name == "bdd100k":
bdd_subfolder = "100k/train"
img_filenames = utils.get_filenames(args.annotation_file)
raw_folder_bdd_images = raw_folder / bdd_subfolder
paths = [
(raw_folder_bdd_images / each, processed_folder / each)
for each in img_filenames
]
elif args.dataset_name == "food_rec":
food_subfolder = "public_validation_set_2.0/images"
img_filenames = utils.get_filenames(args.annotation_file)
raw_folder_food_images = raw_folder / food_subfolder
paths = [
(
raw_folder_food_images
/ f"{Path(each).stem.split('_')[-1]}{Path(each).suffix}",
processed_folder / each,
)
for each in img_filenames
]
print("Preparing to copy and flatten filename for", len(paths), "images")
with Pool(20) as p:
for _ in tqdm(p.imap(utils.copy_file, paths), total=len(paths)):
continue
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--annotation_file", help="Path to annotation file")
parser.add_argument("--raw_images_folder", help="Path to downloaded images")
parser.add_argument("--processed_images_folder", help="Path to processed images")
parser.add_argument("--dataset_name", help="Path to processed images")
args = parser.parse_args()
main(args)

View File

@@ -0,0 +1,150 @@
# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
# pyre-unsafe
import json
import os
import shutil
import subprocess
from io import BytesIO
from pathlib import Path
import cv2
import matplotlib.pyplot as plt
import numpy as np
import yaml
from PIL import Image
from pycocotools import mask as mask_utils
from tqdm import tqdm
annotation_files = {
"droid": [
"silver_droid_merged_test.json",
],
"sav": [
"silver_sav_merged_test.json",
],
"yt1b": [
"silver_yt1b_merged_test.json",
],
"ego4d": [
"silver_ego4d_merged_test.json",
],
}
def load_yaml(filename):
with open(filename, "r") as f:
return yaml.safe_load(f)
def load_json(filename):
with open(filename, "r") as f:
return json.load(f)
def save_json(content, filename):
with open(filename, "w") as f:
json.dump(content, f)
def run_command(cmd):
"""Run a shell command and raise if it fails."""
result = subprocess.run(cmd, shell=True)
if result.returncode != 0:
raise RuntimeError(f"Command failed: {cmd}")
config = load_yaml("CONFIG_FRAMES.yaml")
def is_valid_image(img_path):
try:
img = Image.open(img_path).convert("RGB")
return True
except Exception:
return False
def get_frame_from_video(video_path, frame_id):
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_id)
ret, frame = cap.read()
cap.release()
if not ret:
# Some videos cannot be open with OpenCV
import av
container = av.open(video_path)
stream = container.streams.video[0]
for i, frame in tqdm(
enumerate(container.decode(stream)),
desc="Decoding with AV",
total=frame_id + 1,
):
if i == frame_id:
img = frame.to_ndarray(format="rgb24")
return img
raise ValueError(
f"Could not read frame {frame_id} from video {video_path} (out of frame)"
)
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
return frame_rgb
def update_annotations(dataset_name, file_names_keep, key="original_video"):
for annotation_file in annotation_files[dataset_name]:
path_ann = os.path.join(config["path_annotations"], annotation_file)
path_original_ann = os.path.join(
config["path_annotations"],
annotation_file.replace(".json", "_original.json"),
)
ann = load_json(path_ann)
shutil.copy(path_ann, path_original_ann)
new_images = []
image_ids_keep = set()
for image in ann["images"]:
if image[key].replace(".mp4", "") in file_names_keep:
new_images.append(image)
image_ids_keep.add(image["id"])
new_annotations = []
for annotation in ann["annotations"]:
if annotation["image_id"] in image_ids_keep:
new_annotations.append(annotation)
ann["images"] = new_images
ann["annotations"] = new_annotations
save_json(ann, path_ann)
def get_filename_size_map(annotation_path):
with open(annotation_path) as f:
annotations = json.load(f)
filename_size_map = {}
for each in annotations["images"]:
filename_size_map[each["file_name"]] = (each["width"], each["height"])
return filename_size_map
def get_filenames(annotation_path):
with open(annotation_path) as f:
annotations = json.load(f)
filenames = {Path(each["file_name"]) for each in annotations["images"]}
return filenames
def get_image_ids(annotation_path):
filenames = get_filenames(annotation_path)
filestems = {Path(each).stem for each in filenames}
return filestems
def setup(folder):
print("Making dir", folder)
folder.mkdir(exist_ok=True)
def copy_file(paths):
old_path, new_path = paths
print("Copy from", old_path, "to", new_path)
if not Path(new_path).exists():
shutil.copy2(old_path, new_path)