top of page

DVC Commands for Data Science

What are DVC Commands?

DVC is a command-line tool written in Python. It mimics Git commands and workflows to ensure that users can quickly incorporate them into their regular Git practice. If you haven't worked with Git before, then be sure to check out Introduction to Git and GitHub for Python Developers.

DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.

List of DVC Commands

1. init

DVC initialization is dependent on Git. If you are in a new directory, first initialize the Git and then initialize DVC as shown below.

git init dvc init

The init command has created a .dvc directory. It consists of all metadata related to your DVC configuration and files.

2. remote

DVC remote command is used to share the data with a team or create a copy in remote storage.

Simply add a remote name and remote URL. As I told you early, the command is fairly similar to Git.

dvc remote add dagshub

To view the list of remote storage, use:

dvc remote list  >>> dagshub

To modify your existing remote. You can use the command below. It requires a remote name and a new URL.

dvc remote modify dagshub

You can rename or remove the remote using the above pattern. It is relatively easy.

3. add

Use this command to track single or multiple files and directories.

dvc add ./model ./data

When you add files to DVC, the command will remove it from Git using .gitignore. Instead, Git will track pointers with .dvc to track and commit the changes.

After running the add command, you have to add the file to the Git staging area.

git add model.dvc data.dvc .gitignore

4. remove

To stop tracking files and directories use the `dvc remove <file>` command. Make sure the directory or file has an extension .dvc. You can also use it to remove a stage from dvc.yml.

dvc remove model.dvc

5. status

It will display the changes in the project pipelines and showcase changes between cache and workspace or remote storage.

dvc status

6. commit

The commit command is used to record changes in files and folders tracked by DVC.

dvc commit

7. checkout

When you use `git checkout` to change the repository to an older version, the `dvc checkout` is used to update tracked files in the workspace based on dvc.lock and .dvc files.

dvc checkout

8. push

Similar to Git, you can push the files from the local workspace to the default remote using `dvc push`. The push command is necessary for team collaboration and keeping multiple copies of data to avoid disasters.

For default remote:

dvc push

For specific remote storage:

dvc push -r <remote-name>

9. pull

The pull command is used to update the local workspace using remote storage. The push and pull works similarly to Git.

For pulling files from default remote:

dvc pull

For pulling files from specific remote:

dvc pull -r <remote-name>

10. run

It helps you create and modify pipeline stages in dvc.yml. The run command can be used to assemble machine learning and data pipelines.

dvc run -n printer -d -o pages ./
  • -n is the name of stage

  • -d is dependencies

  • -o is outputs

The Tech Platform

bottom of page