๐Ÿ›ธ Can: Vision-Language Integrated Smart Drone Project

Can is an experimental drone project that aims to integrate Vision-Language Pre-training (VLP) systems with a small-scale autonomous drone. The long-term goal is to create a general-purpose smart drone that can be controlled through natural language and capable of performing complex real-world tasks based on environmental understanding.


๐ŸŽฏ Project Goals

  1. ๐Ÿ“ Establish Evaluation Metrics
    Develop standardized performance metrics for evaluating and benchmarking intelligent drone systems.

  2. ๐Ÿ”ฅ Applied Smart Drone Use-Cases

  3. Forest fire inspection & early extinguishing (detecting and reacting to fires in initial stages)
  4. Area guarding / autonomous surveillance
  5. Smart delivery systems that can adapt to dynamic environments

๐Ÿง  Vision-Language Integration

The project is built upon modern Vision-Language Pre-training (VLP) techniques, aiming to empower the drone with the ability to: - Interpret natural language commands - Perceive its environment through onboard cameras - Make context-aware decisions autonomously


๐ŸŽฌ Demonstrations & Code

Below are three key prototypes that demonstrate different aspects of the project:

1. Prompt-Controlled Real Drone

Hardware: DJI Tello
Language Model: OpenAI GPT via API

๐Ÿ“น Video: Watch
๐Ÿ’ป Script: prompt_controlled_drone.py

Description:
A real-world implementation where a Tello drone is controlled via natural language prompts such as: - "Take off" - "Move forward and rotate" - "Land if battery is below threshold"


2. Microsoft's AirSim-GPT Integration

Based on Microsoftโ€™s open-source PromptCraft-Robotics

๐Ÿ“น Video: Watch

Description:
We tested the integration of GPT with Microsoft's AirSim drone simulation environment. This forms a testbed for simulating and refining prompt-based drone behavior before real-world deployment.


3. BLIP (VLP) Integration Test

๐Ÿ“น Video: Watch
๐Ÿ’ป Script: blip_droidcam.py

Description:
BLIP (Bootstrapped Language-Image Pretraining) was tested to evaluate its capability to understand drone surroundings via images. This is a critical module planned to be embedded in the drone for contextual awareness and decision-making.


๐Ÿ› ๏ธ Technologies Used


๐Ÿงช Future Plans

GitHub