Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
Paper
โข
2402.12195
โข
Published
Models for this github repo that focuses on the modality isolation issues (image-text isolation and interimage isolation).
Detailed instructions are coming soon.