AstroLLaVA 🚀 🌋 🦙

AstroLLaVA is a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. Built on the LLaVA architecture, it has been fine-tuned on a dataset of ~30k astronomical images with captions and question-answer pairs.

Quickstart instructions

To quickly test out repository we are currently in the process of setting up a demo on Hugging Face Spaces (fails currently).

Dataset

Our training dataset consists of approximately 30,000 image-caption pairs from:

NASA's Astronomy Picture of the Day (APOD): ~10k images
European Southern Observatory (ESO): ~15k images
Hubble Space Telescope Archive: ~5k images

Each image is paired with expert-written captions and synthetic question-answer pairs generated using GPT-4.

Installation

# Clone the repository
git clone https://github.com/universeTBD/AstroLLaVA
cd AstroLLaVA

# Install dependencies
pip install -e .

Usage

from astrollava import AstroLLaVA

# Initialize model
model = AstroLLaVA.from_pretrained("universeTBD/AstroLLaVA")

# Load an image
image = load_image("galaxy.jpg")

# Ask questions about the image
response = model.generate_response(image, "What interesting features do you see in this galaxy?")
print(response)

How to finetune your own `*LLaVA`

To finetune your own *LLaVA model, follow these steps:

Prepare your dataset: Ensure your dataset is in the correct format (image-caption pairs). We use the exact same format as the baseline LLaVA.
Install dependencies: Make sure you have all the required libraries installed. And make sure to install the additional dependencies for training, such as flash-attention.

pip install -e ".[train]"

Set up the training script: We use the scripts similar to the ones in the LLaVA repository. You can find them in the scripts directory.
Run the pre-training and training script.

Model Architecture

AstroLLaVA combines:

CLIP ViT-L/14 vision encoder (pretrained at 336px resolution)
LLaMA 3 7.0B language model
Custom projection layers for bridging visual and language domains

The model is trained in two stages:

Training of visual-language projection layers
End-to-end instruction tuning with astronomical QA pairs

Resources

Environmental Impact

Training was performed on the ITER Teide HPC cluster using 100% renewable energy. Total energy usage was approximately 5kWh on 4xA100-40G GPUs.

Contributing

We welcome contributions! Join our Discord community to collaborate on improving AstroLLaVA.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
Evaluation		Evaluation
data		data
images		images
llava		llava
playground/data/prompts		playground/data/prompts
scripts		scripts
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
batchPretrain.sh		batchPretrain.sh
batchWeightConversion.sh		batchWeightConversion.sh
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AstroLLaVA 🚀 🌋 🦙

Quickstart instructions

Dataset

Installation

Usage

How to finetune your own `*LLaVA`

Model Architecture

Resources

Environmental Impact

Contributing

License

About

Releases

Packages

Contributors 3

Languages

License

UniverseTBD/AstroLLaVA

Folders and files

Latest commit

History

Repository files navigation

AstroLLaVA 🚀 🌋 🦙

Quickstart instructions

Dataset

Installation

Usage

How to finetune your own *LLaVA

Model Architecture

Resources

Environmental Impact

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

How to finetune your own `*LLaVA`

Packages