A New Way of Managing Deep Learning Datasets




What is Hub?

Hub by Activeloop is an open-source Python package that arranges data in Numpy-like arrays. It integrated smoothly with deep learning frameworks such as Tensorflow and PyTorch for faster GPU processing and training. We can update the data, visualize the data, and create machine learning pipelines using Hub API.


Hub allows us to store images, audio, video, and time-series data in a way that can be accessed at lightning speed. The data can be stored on GCS/S3 buckets, local storage, or on Activeloop cloud. The data can directly be used in the training Pytorch model so that you don't need to set up data pipelines. The Hub also comes with data version control, dataset search queries, and distributed workloads.


My experience with Hub was amazing, as I was able to create and push data to the cloud within a couple of minutes. In this blog, we are going to see how Hub can be used to create and manage the dataset.

  • Initializing a dataset on Activeloop cloud

  • Processing the images

  • Pushing the data to the cloud

  • Data version control

  • Data visualization

Activeloop Storage

Activeloop provides free storage for open-source datasets and private datasets. You can also earn up to 200 GBs of free storage by referring people. Activeloop's Hub interfaces with the Database for AI, that allows us to visualize dataset with labels and complex search queries allows us to analyze the data in an effective way. The platform also contains more than 100 datasets on image segmentation, classification, and object detection.


To create the account you can sign up using the