A walk-through on the challenges and strategies of offloading robotics data under a poor connection.
Once data is collected to disk on a robot (read our post last month on how this works), it must be offloaded (i.e. moved from the robot) to be put to use.
Offloaded data can be used to calculate performance analytics, build machine learning datasets, simulate edge-case scenarios, and more.
Poor connectivity is the norm for robotics applications though, which makes offloading a challenge. From remote farms to weak Wi-Fi in warehouses – a good connection is far from a guarantee.
The sheer volume of robotics data (often terabytes per day) is too large to be offloaded from remote areas using standard procedures.
We’ll dive deeper into the challenges of connectivity in varying environments and strategies to successfully offload data.
At Woeden, we like to think of data collection and offload as separate tasks with their own challenges.
Data collection is the process of identifying data worth recording and actually recording it. We went into depth last month on various data collection strategies (see here).
Data offload is the process of moving data off of a robot and into cloud storage or some other storage medium.
Numerous strategies exist for collecting data, including event-based and on-demand paradigms as well as enhancements like rolling buffers. These methods are essential to employ because modern networks lack the bandwidth to stream all this robotics data.
While an intelligent data collection strategy can save disk space, ultimately a powerful offload system must also be implemented to actually free up the disk space for future use.
It is common to see data offloaded from a fleet in one of two ways:
Both of these strategies have their own advantages and disadvantages, so let’s dive into the details!
One of the simplest ways of characterizing various robotics applications is identifying whether or not the robots are primarily deployed indoors or outdoors.
Indoor robots may be used in applications like manufacturing, warehousing, and food preparation. Robots deployed indoors can have complex connectivity issues, including:
They also tend to have physical access barriers that make direct hardware offload challenging. Food preparation robots, for example, tend to be inaccessible once deployed for sanitation purposes, which makes it difficult to simply fetch the drive with the recorded data.
Outdoor robots are used in applications such as agriculture, robo-taxiing, and defense. It can be a challenge to offload data from robots in these settings, because cellular connectivity can be limited or nonexistent and drives may need to be shipped hundreds to thousands of miles.
Let’s take a brief look at the various strategies you may employ to actually upload the data and what transmission mediums are available to you.
Direct offload of data from a robot may take one of three forms.
This method is as simple as it sounds: no strategy for uploading portions of the data. Just upload it all!
This is usually only achievable for companies with lower data volumes, high budgets for data storage, and also usually involves a hardwired uplink with a high bandwidth internet connection. You might see this in an R&D lab setting or when an autonomous car returns to a garage.
This approach is easy and requires little engineering effort. Once you offload data from your robot, you can send it right back into the field. But it’s rare that you will collect so little data or a network will be so easily available.
It’s often undesirable to offload all collected data from robots due to large transmission and cloud storage costs.
An engineer may opt to selectively offload a subset of the collected data. This may be both manual and automatic.
Manual review usually involves a preview of the data, such as a GIF, and using visualization or monitoring tools.
Automated review can help for events that are known in advance to be important. If a serious error or system fault occurs – like an emergency stop button push or an autonomous car disengagement – this data could be uploaded immediately.
For settings with limited bandwidth where a robot needs to upload, the data can be broken into chunks to upload when the time is right. This is the optimal way to offload from a robot that operates for long periods of time in areas with poor connectivity.
It is valuable to have a bandwidth-aware upload process since unrestricted offload could:
Event tags can be used to determine what data should be offloaded automatically.
This strategy increases data availability and ensures your team has access to all critical data as soon as possible.
A number of different options are available for transmitting the data.
There are a number of other approaches that may make sense for your application, including combined approaches where lightweight data is uploaded wirelessly and heavyweight sensor data is uploaded via a trickle stream or by direct connection. We expect Starlink, especially the RV variety, to make accessing data from your robots much easier in the future!
There are two major strategies for offloading data from a detached drive. Both of these strategies enable you to collect enormous amounts of data, so you will need to be selective about what data to keep after it’s uploaded to the cloud to regulate costs.
You may be fortunate enough to have a parking or storage facility for robots when they have completed a session. This may look like a garage for self-driving cars or autonomous tractors.
Many mobile robots have a “mission” or “session” oriented deployment model, where the robot is in the field for a limited amount of time and then returns to “home base”. It’s common for a robot operator to manually remove storage devices from robots and connect to a drive dock. Then the data can be automatically uploaded.
The saying goes: “Never underestimate the bandwidth of a station wagon full of tapeshurtling down the highway.” Even the fastest hardwired internet connections can be slower than shipping hard drives.
For robotics applications involving long deployments in extremely remote settings, like marine robotics or defense, the best approach may be to ship detached drives using a service like AWS Snowcone.
Alternatively, you may roll your own shipping infrastructure and collect data on a NAS, where it will then be shipped long distances, potentially even the world, and offloaded en-masse later.
This strategy allows you to upload high fidelity data without much of a software engineering effort. However, it tends to require significant physical and operational infrastructure that may be unrealistic for your business.
We’ll walk you through a few options to begin offloading data from your robots, starting from easiest to most difficult.
Perhaps the most simple approach to copying data off your robot is to use tools like ssh, scp, rsync, etc. to copy the data directly. You can then inspect your data in your local machine’s filesystem in the normal way.
While this approach is simple to implement and easy to understand, it does require that a connection is maintained throughout entire uploads.
It can also make it hard to share data between colleagues since the data is often stored on a single machine. This approach also lacks a unified web interface to browse and preview the data from your browser, making it much more difficult to effectively manage and share data.
Another solution is to mount a cloud storage bucket as a drive on your robot (or local computer if you have a detached disk), and you can copy data directly to it.
This makes it extremely easy to move files and get data onto a number of machines, and you can share references to logfiles via object keys in storage buckets. However, bucket mounting can be unstable under flaky internet connections, and fairly slow to upload and download.
Similar to the above, you’ll lack access to a unified web interface to effectively manage and share data. Read on here for mounting buckets as drives.
The last option is to build some sort of system that continually checks the status of the network connection.
This should upload any newly recorded data as soon as a connection to the internet is re-established.
Data is often lost or corrupted due to loss of an established connection.
This can be a challenging system to maintain and similarly does not offer a unified web interface to browse data.
This is a large system to build in-house, especially if you wish to benefit from more advanced functionality such as a trickle stream.
If you follow our guide here, all you need to do is install our agent on your robot. It only takes five minutes.
Woeden offers the ability to preview and offload recordings directly from your robot and via a detached disk.
You can manually select recordings of interest and notify your robot to begin uploading them.
We upload a small GIF with each recording for you to preview, along with metadata such as the duration, messages collected, etc.
Our infrastructure is aware of various network constraints. Spotty connections are expected, so we keep track of data that’s in the process of being uploaded even if a connection is lost.
Paired with our data collection capabilities, you can quickly begin building the ultimate database for your robotics data.
It’s not easy to offload data from your fleet. The amount of data is enormous, and a plethora of network limitations exist that make it a challenge.
Our approach immediately provides you with a number of critical data offload features, such as trickle streams, selective offload, and visual previews.
Get started with us by following our guide here.