Can is an experimental drone project that aims to integrate Vision-Language Pre-training (VLP) systems with a small-scale autonomous drone. The long-term goal is to create a general-purpose smart drone that can be controlled through natural language and capable of performing complex real-world tasks based on environmental understanding.
๐ Establish Evaluation Metrics
Develop standardized performance metrics for evaluating and benchmarking intelligent drone systems.
๐ฅ Applied Smart Drone Use-Cases
The project is built upon modern Vision-Language Pre-training (VLP) techniques, aiming to empower the drone with the ability to: - Interpret natural language commands - Perceive its environment through onboard cameras - Make context-aware decisions autonomously
Below are three key prototypes that demonstrate different aspects of the project:
Hardware: DJI Tello
Language Model: OpenAI GPT via API
๐น Video: Watch
๐ป Script: prompt_controlled_drone.py
Description:
A real-world implementation where a Tello drone is controlled via natural language prompts such as:
- "Take off"
- "Move forward and rotate"
- "Land if battery is below threshold"
Based on Microsoftโs open-source PromptCraft-Robotics
๐น Video: Watch
Description:
We tested the integration of GPT with Microsoft's AirSim drone simulation environment. This forms a testbed for simulating and refining prompt-based drone behavior before real-world deployment.
๐น Video: Watch
๐ป Script: blip_droidcam.py
Description:
BLIP (Bootstrapped Language-Image Pretraining) was tested to evaluate its capability to understand drone surroundings via images. This is a critical module planned to be embedded in the drone for contextual awareness and decision-making.