A brief overview of why and how to collect data from robots
Robotics companies collect data for a myriad of reasons. Common use-cases include calculating performance analytics, building machine learning datasets, or finding edge-case scenarios.
This type of data can be streamed over a network using protocols such as MQTT. It can alternatively be written to disk and offloaded later.
Modern robots produce complex data like imagery at high frequencies. This enables them to understand their surroundings and make decisions. But the volume is often too large to stream or write to disk.
Let’s walk through the data challenge and the various strategies available to get data from your fleet.
Sensors on robots are generally producing data at remarkable rates.
A single self-driving car may have multiple cameras, LIDAR sensors, and more. This can result in upwards of 19 terabytes per hour.
This data allows a robot to map out the dynamic environment in real-time. A necessity for performance and safety.
But most of this data won’t offer insights into a robot’s performance. A self-driving car spends nearly all its time driving straight.
To help manage this challenge, we must understand what data is important and what data can be thrown away.
Robots tend to be deployed in locations with poor connectivity such as on farms, oceans, and warehouses.
This places a natural limit on how much data can be uploaded. High resolution data must be written to disk and slowly uploaded later.
It may even be days before this data can be offloaded over a good connection.
Onboard drives and offboard cloud storage can help. But recording dozens of gigabytes per minute will fill up any storage medium. You can only fit so much.
With limited onboard storage and bandwidth, a robotics company must be selective about what data is streamed over a network and what is stored on disk.
Even if a company is fortunate enough to collect data from a large fleet, the cloud storage costs may explode. 100 robots each collecting just 10 gigs/hr could yield $182,000/mo in costs after a year of operation!
The simplest method is to simply record data the entire time the robot is in-use, then upload the whole recording to the cloud.
You may also consider recording data during planned and/or contrived scenarios.
The most effective option for a deployed fleet is generally to record on targeted events. For example, a self-driving car might record the 10 seconds before and after a heavy braking event.
Please note that these strategies may be mixed and matched for different use-cases.
Early-stage robotics companies often employ this strategy to save themselves engineering hours and generate comprehensive analytics.
It’s not uncommon for a company with more sophisticated sensors like cameras to choose to collect all data as well.
Collecting everything is an effective way to go when your robot only has a few sensors producing quantitative data.
Smaller amounts of data can often be collected on an onboard hard drive. But when you run out of space, data will need to be offloaded to another location like cloud storage.
This approach may make sense for a team with a small number of robots. Or a team with a machine-learning focused workflow which needs comprehensive data..
Due to cost and time constraints, this strategy will clearly not work in the long-term.
Most robotics companies employ this strategy to handle resource constraints.
During my time at Uber ATG, we would drive the vehicles through contrived situations on a test track, like making repeated unprotected left turns.
We would collect all data from the beginning to the end of the scenario. This required an individual to manually start and stop the recording.
This strategy is helpful for 1) reducing the quantity and 2) increasing the quality of data collected.
Roboticists use this to diagnose known problems. But it doesn’t help discover unknown problems. We recommend pairing this with the next strategy.
Most robotics companies can greatly benefit from this strategy. Data is automatically collected when particular conditions you specify are satisfied (e.g. model confidence too low).
As a perception engineer at Sea Machines Robotics, I needed data on rare events while the boat was out at sea.
Our network connection was practically nonexistent. And we were generating 4K imagery so we could detect ships miles away.
Some sort of programmable events would have helped us easily collect data while the boat was offline and know precisely what caused it.
This strategy tends to include a rolling buffer, where the last X seconds of data are written to disk at all times.
Unfortunately, the engineering cost was too high to hardcode monitoring these events. This is one of the reasons I founded Woeden.
On-device filtering of this form makes offload easier since you can selectively upload scenarios of interest.
It additionally reduces unnecessary data transfer costs, enables prioritization of disk space, and doesn’t require an engineer to be monitoring the robot. It’s automatic.
We recommend this approach for any robotics company looking to collect data as it saves your engineers time and produces the highest quality, organic data.
We’ll walk you through a few options to begin collecting data from your robots, starting from easiest to most difficult.
We provide the ability to achieve all of the above strategies.
If you follow our guide here, all you need to do is install our agent on your robot. It only takes five minutes.
You can easily record data on demand or events you specify. We enable a configurable rolling buffer to catch the moments leading up to an event.
You can even regulate the quantity of data collected by selecting what data to record at configurable intervals.
All metadata of collected data is registered in our web interface. There you can selectively offload data of interest and share it with your team.
We work around additional complexity in offloading data due to spotty networks. And we make your data easily searchable.
If you would prefer something more manual for now, you can use ssh to access your robot remotely.
This requires your robot to have an active connection to the internet. And it also requires you to be present in the data collection process.
If you are not using a common framework like ROS, you may need to write your own software to actually write the data to disk as well.
Read on here for more information on the ssh protocol.
If you are recording all data from your robots, you can run your data collection system immediately on startup of your device.
This can be accomplished with a cron task that occurs on system reboot.
Similar to the above, if you are not using a common framework like ROS, you may need to write your own data collection system.
For more details on this strategy, check out this guide for how to accomplish it on Linux.
You may opt to modify your stack to run your data collection system when certain conditions are met.
This could be a simple code change or very complex depending on your stack. It also would be a challenge to maintain, and you’d need something to keep track of every event.
Collecting data on robots is not an easy feat. Robots produce huge quantities of data and it should be filtered to only capture the most notable moments.
Our software provides an incredibly simple interface to collect and keep track of data from your organization’s fleets of robots.
Get started with us by following our guide here.