본문 바로가기

좋아하는 것_매직IT/96.IT 핫이슈

DVC - 머신러닝 프로젝트를 위한 오픈소스 버전 관리 시스템을 소개합니다. (dvc.org)

반응형

DVC - 머신러닝 프로젝트를 위한 오픈소스 버전 관리 시스템을 소개합니다.

홈페이지에서는 아래와 같이 설명하고 있고요..

DVC tracks ML models and data sets

DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.

한마디로, DVC는 머신러닝 프로젝트를 위한 오픈소스 버전 관리 시스템이라고 머릿속에 넣어두시면 될것 같네요..

그리고 주요내용을 정리하면 아래와 같고요..

  • "Data Version Control" : Git for Data & Models, Makefiles for ML
  • 큰 파일, 데이터셋, 머신러닝 모델, 메트릭들 및 코드를 트래킹 하여 처리할 수 있도록 설계됨
  • AWS S3, Google Drive/GCS, Azure Blob Storage, SSH/SFTP, HDFS 등에 데이터와 모델을 저장하고, 버전 정보는 Git으로 관리
  • 실험은 로컬 Git 레포에서 트래킹
  • CLI 및 VSCode 확장 제공
  • 윈도우/맥/리눅스

설치는 아래와 같이 진행하시면됩니다.

Installation

There are several ways to install DVC: in VS Code; using snap, choco, brew, conda, pip; or with an OS-specific package. Full instructions are available here.

Snapcraft (Linux)

snap install dvc --classic

This corresponds to the latest tagged release. Add --beta for the latest tagged release candidate, or --edge for the latest main version.

Chocolatey (Windows)

choco install dvc

Brew (mac OS)

brew install dvc

Anaconda (Any platform)

conda install -c conda-forge mamba # installs much faster than conda
mamba install -c conda-forge dvc

Depending on the remote storage type you plan to use to keep and share your data, you might need to install optional dependencies: dvc-s3, dvc-azure, dvc-gdrive, dvc-gs, dvc-oss, dvc-ssh.

PyPI (Python)

pip install dvc

Depending on the remote storage type you plan to use to keep and share your data, you might need to specify one of the optional dependencies: s3, gs, azure, oss, ssh. Or all to include them all. The command should look like this: pip install 'dvc[s3]' (in this case AWS S3 dependencies such as boto3 will be installed automatically).

To install the development version, run:

pip install git+git://github.com/iterative/dvc

Package (Platform-specific)

Self-contained packages for Linux, Windows, and Mac are available. The latest version of the packages can be found on the GitHub releases page.

Ubuntu / Debian (deb)

sudo wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list
wget -qO - https://dvc.org/deb/iterative.asc | sudo apt-key add -
sudo apt update
sudo apt install dvc

Fedora / CentOS (rpm)

sudo wget https://dvc.org/rpm/dvc.repo -O /etc/yum.repos.d/dvc.repo
sudo rpm --import https://dvc.org/rpm/iterative.asc
sudo yum update
sudo yum install dvc

아래는 빠르게 시작하기이고요.

Quick start

Please read our Command Reference for a complete list.

A common CLI workflow includes:

TaskTerminal

Track data
$ git add train.py params.yaml
$ dvc add images/
Connect code and data
$ dvc stage add -n featurize -d images/ -o features/ python featurize.py
$ dvc stage add -n train -d features/ -d train.py -o model.p -M metrics.json python train.py
Make changes and experiment
$ dvc exp run -n exp-baseline
$ vi train.py
$ dvc exp run -n exp-code-change
Compare and select experiments
$ dvc exp show
$ dvc exp apply exp-baseline
Share code
$ git add .
$ git commit -m 'The baseline model'
$ git push
Share data and ML models
$ dvc remote add myremote -d s3://mybucket/image_cnn
$ dvc push

더 좀 자세한 내용은 아래 깃허브 페이지을 보시면 좋을것 같고요..

오늘의 블로그는 여기까지고요
항상믿고 봐주셔서 감사합니다.

300x250