|
|
EarthVQA: Bridges the Earth Vision and Geoscience Language | ||
|
||
1.Description
The multi-modal multi-task VQA dataset (EarthVQA) is designed to bridge the Earth vision and Geoscience language, which includes the co-registered remote sensing image, land-cover semantic mask, and task-driven language text. The EarthVQA dataset contains 6000 images with 0.3m, and 208,593 QA pairs with urban and rural governance requirements embedded. The QA pairs are designed for judging, counting, object situation analysis, and comprehensive analysis types in many reasoning tasks such as traffic conditions, educational facilities, green ecology, cultivated land status, etc. This multi-modal and multi-task dataset poses new challenges, requiring geo-spatial relational reasoning and induction for remote sensing images.
2.Annotation format Semantic category labels: background-1, building-2, road-3, water-4, barren-5,forest-6, agriculture-7, playground-8. The no-data regions are assigned 0. The QA pairs are constructed and illustrated as follows: "275.png": [ { "Type": "Basic Judging", "Question": "Are there any buildings in this scene?", "Answer": "Yes" }, { "Type": "Comprehensive Analysis", "Question": "What are the road materials around the village?", "Answer": "There are cement roads, and asphalt roads" }]3.Download We hope that the release of the EarthVQA dataset can promote the development of multi-modal remote sensing, especially for land-cover classification and visual question answering. You can click the link below to download the data. ● Google Drive: download 4.Evaluation Server If you want to get the test scores, please join our hosted benchmark platform: Semantic Segmentation and Visual Question Answering. 5.Copyright The copyright belongs to Intelligent Data Extraction, Analysis and Applications of Remote Sensing(RSIDEA) academic research group, State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University, China. The EarthVQA dataset only can be used for academic purposes and need to cite the following paper, but any commercial use is prohibited. Any form of secondary development, including annotation work, is strictly prohibited for this dataset. Otherwise, RSIDEA of Wuhan University reserves the right to pursue legal responsibility. [1] Junjue Wang, Zhuo Zheng, Zihang Chen, Ailong Ma, Yanfei Zhong*. EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering//Proceedings of the AAAI Conference on Artificial Intelligence[C], vol. 38, pp. 5481-5489, 2024 [2] Junjue Wang, Ailong Ma*, Zihang Chen, Zhuo Zheng, Yuting Wan, Liangpei Zhang, Yanfei Zhong. EarthVQANet: Multi-task visual question answering for remote sensing image understanding//ISPRS Journal of Photogrammetry and Remote Sensing[J], vol. 212, pp. 422-439, 2024 6.Contact If you have any the problem or feedback in using EarthVQA dataset, please contact: Mr. Junjue Wang: [email protected] Prof. Ailong Ma: [email protected] Prof. Yanfei Zhong: [email protected] |
|