DVC Commands for Data Science
What are DVC Commands?
DVC is a command-line tool written in Python. It mimics Git commands and workflows to ensure that users can quickly incorporate them into their regular Git practice. If you haven't worked with Git before, then be sure to check out Introduction to Git and GitHub for Python Developers.
DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.
List of DVC Commands
1. init
DVC initialization is dependent on Git. If you are in a new directory, first initialize the Git and then initialize DVC as shown below.
git init dvc init
The init command has created a .dvc directory. It consists of all metadata related to your DVC configuration and files.
2. remote
DVC remote command is used to share the data with a team or create a copy in remote storage.
Simply add a remote name and remote URL. As I told you early, the command is fairly similar to Git.
dvc remote add dagshub https://dagshub.com/kingabzpro/Urdu-ASR-SOTA.dvc
To view the list of remote storage, use:
dvc remote list >>> dagshub https://dagshub.com/kingabzpro/Urdu-ASR-SOTA.dvc
To modify your existing remote. You can use the command below. It requires a remote name and a new URL.
dvc remote modify dagshub https://dagshub.com/kingabzpro/solar-radiation-ISB-MLOps.dvc
You can rename or remove the remote using the above pattern. It is relatively easy.
3. add
Use this command to track single or multiple files and directories.
dvc add ./model ./data
When you add files to DVC, the command will remove it from Git using .gitignore. Instead, Git will track pointers with .dvc to track and commit the changes.
After running the add command, you have to add the file to the Git staging area.
git add model.dvc data.dvc .gitignore
4. remove
To stop tracking files and directories use the `dvc remove <file>` command. Make sure the directory or file has an extension .dvc. You can also use it to remove a stage from dvc.yml.
dvc remove model.dvc
5. status
It will display the changes in the project pipelines and showcase changes between cache and workspace or remote storage.
dvc status
6. commit
The commit command is used to record changes in files and folders tracked by DVC.
dvc commit
7. checkout
When you use `git checkout` to change the repository to an older version, the `dvc checkout` is used to update tracked files in the workspace based on dvc.lock and .dvc files.
dvc checkout
8. push
Similar to Git, you can push the files from the local workspace to the default remote using `dvc push`. The push command is necessary for team collaboration and keeping multiple copies of data to avoid disasters.
For default remote:
dvc push
For specific remote storage:
dvc push -r <remote-name>
9. pull
The pull command is used to update the local workspace using remote storage. The push and pull works similarly to Git.
For pulling files from default remote:
dvc pull
For pulling files from specific remote:
dvc pull -r <remote-name>
10. run
It helps you create and modify pipeline stages in dvc.yml. The run command can be used to assemble machine learning and data pipelines.
dvc run -n printer -d write.sh -o pages ./write.sh
-n is the name of stage
-d is dependencies
-o is outputs
The Tech Platform