Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed notebooks to support backup use case also #227

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 00_notebooks/00_index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@
"* [*Labelbox* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/labelbox-integration/)\n",
"* [*Kafka* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/kafka/)\n",
"* [*Flink* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/flink/)\n",
"* [How to **migrate or clone** a repo](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/migrate-or-clone-repo/)"
"* [How to **backup, migrate or clone** a repo](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/backup-migrate-or-clone-repo/)"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
"id": "c4e81663-80e6-4d27-b5d8-788660b20453",
"metadata": {},
"source": [
"# Migrate or clone a lakeFS repository on AWS\n",
"# Backup, migrate or clone a lakeFS repository on AWS\n",
"\n",
"#### Use this notebook if you want to migrate/clone a source repository to a target repository within the same lakeFS environment or in different lakeFS environments"
"#### Use this notebook if you want to backup & restore/migrate/clone a source repository to a target repository within the same lakeFS environment or in different lakeFS environments"
]
},
{
Expand Down Expand Up @@ -45,7 +45,8 @@
"import random\n",
"import os\n",
"import datetime\n",
"from awscliv2.api import AWSAPI"
"from awscliv2.api import AWSAPI\n",
"import json"
]
},
{
Expand Down Expand Up @@ -288,7 +289,8 @@
"source": [
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" print('Branch with uncommitted data: ' + branchList.id)"
" print('Branch with uncommitted data: ' + branchList.id)\n",
" break"
]
},
{
Expand All @@ -312,15 +314,17 @@
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" ref = sourceRepo.branch(branchList.id).commit(message='Committed changes during the migration of the repository')\n",
" print(ref.get_commit())"
" print(ref.get_commit())\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "d174ba40-7a8b-428a-94f8-868a2cb5fecc",
"metadata": {},
"source": [
"# Step 2 - Dump Metadata of Source Repository"
"# Step 2 - Dump Metadata of Source Repository\n",
"### IMPORTANT: Shutdown lakeFS services immediately after dumping the metadata so nobody can make any changes in the source repository"
]
},
{
Expand All @@ -338,7 +342,8 @@
"id": "e789b515-2333-4d45-879c-130afbd8ef85",
"metadata": {},
"source": [
"# Step 3 - Copy Data from Source to Target"
"# Step 3 - Copy Data from Source to Target\n",
"### You can restart lakeFS services after copying the data from source to target"
]
},
{
Expand All @@ -359,7 +364,9 @@
"id": "20314a68-533e-45f3-bd09-ac6be17a34cc",
"metadata": {},
"source": [
"## Step 4 - Create Target Bare Repository"
"## Step 4 - Create Target Bare Repository\n",
"\n",
"#### IMPORTANT: For Backup & Restore process, run this step only when you want to restore the repository"
]
},
{
Expand All @@ -377,7 +384,36 @@
"id": "95570b12-b39d-41d7-852c-f09dc4b05bdf",
"metadata": {},
"source": [
"## Step 5 - Restore Metadata to Target Repository"
"## Step 5 - Restore Metadata to Target Repository\n",
"\n",
"#### IMPORTANT: For Backup & Restore process, run this step only when you want to restore the repository"
]
},
{
"cell_type": "markdown",
"id": "5f31ade7-3aad-4d8f-b3b9-57cd39dc1226",
"metadata": {},
"source": [
"### Download metadata(refs_manifest.json) file created by \"Step 2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4ba304e-2b23-4f5c-b2e1-7bd8405359d1",
"metadata": {},
"outputs": [],
"source": [
"s3DownloadRefsManifestFileCommand = 'aws s3 cp ' + target_storage_namespace + '/_lakefs/refs_manifest.json .'\n",
"! $s3DownloadRefsManifestFileCommand"
]
},
{
"cell_type": "markdown",
"id": "2a7c0d4a-ddf9-4d25-a774-50fee7f4c1ed",
"metadata": {},
"source": [
"### Read refs_manifest.json file and restore metadata to new repository"
]
},
{
Expand All @@ -387,7 +423,11 @@
"metadata": {},
"outputs": [],
"source": [
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, source_lakefs_sdk_client.internal_api.dump_refs(source_repo_name))"
"with open('./refs_manifest.json') as file:\n",
" refs_manifest_json = json.load(file)\n",
" print(refs_manifest_json)\n",
" \n",
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, refs_manifest_json)"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
"id": "c4e81663-80e6-4d27-b5d8-788660b20453",
"metadata": {},
"source": [
"# Migrate or clone a lakeFS repository on Azure\n",
"# Backup, migrate or clone a lakeFS repository on AWS\n",
"\n",
"#### Use this notebook if you want to migrate/clone a source repository to a target repository within the same lakeFS environment or in different lakeFS environments"
"#### Use this notebook if you want to backup & restore/migrate/clone a source repository to a target repository within the same lakeFS environment or in different lakeFS environments"
]
},
{
Expand Down Expand Up @@ -44,7 +44,8 @@
"from lakefs_sdk.client import LakeFSClient\n",
"import random\n",
"import os\n",
"import datetime"
"import datetime\n",
"import json"
]
},
{
Expand Down Expand Up @@ -281,7 +282,8 @@
"source": [
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" print('Branch with uncommitted data: ' + branchList.id)"
" print('Branch with uncommitted data: ' + branchList.id)\n",
" break"
]
},
{
Expand All @@ -305,15 +307,17 @@
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" ref = sourceRepo.branch(branchList.id).commit(message='Committed changes during the migration of the repository')\n",
" print(ref.get_commit())"
" print(ref.get_commit())\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "998b31bd-87d8-4e5d-9fa9-42c3f2bf920e",
"metadata": {},
"source": [
"# Step 2 - Dump Metadata of Source Repository"
"# Step 2 - Dump Metadata of Source Repository\n",
"### IMPORTANT: Shutdown lakeFS services immediately after dumping the metadata so nobody can make any changes in the source repository"
]
},
{
Expand All @@ -331,7 +335,8 @@
"id": "e789b515-2333-4d45-879c-130afbd8ef85",
"metadata": {},
"source": [
"# Step 3 - Copy Data from Source to Target"
"# Step 3 - Copy Data from Source to Target\n",
"### You can restart lakeFS services after copying the data from source to target"
]
},
{
Expand All @@ -352,7 +357,9 @@
"id": "20314a68-533e-45f3-bd09-ac6be17a34cc",
"metadata": {},
"source": [
"## Step 4 - Create Target Bare Repository"
"## Step 4 - Create Target Bare Repository\n",
"\n",
"#### IMPORTANT: For Backup & Restore process, run this step only when you want to restore the repository"
]
},
{
Expand All @@ -370,17 +377,51 @@
"id": "95570b12-b39d-41d7-852c-f09dc4b05bdf",
"metadata": {},
"source": [
"## Step 5 - Restore Metadata to Target Repository"
"## Step 5 - Restore Metadata to Target Repository\n",
"\n",
"#### IMPORTANT: For Backup & Restore process, run this step only when you want to restore the repository"
]
},
{
"cell_type": "markdown",
"id": "bbd2617d-8605-43f6-b8cd-b9d6c5cb039c",
"metadata": {},
"source": [
"### Download metadata(refs_manifest.json) file created by \"Step 2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d61ff81e-3809-4351-b695-0d42a7de74dd",
"metadata": {},
"outputs": [],
"source": [
"azureDownloadRefsManifestFileCommand = \"azcopy copy '\" + target_storage_namespace + \"/_lakefs/refs_manifest.json?\" + target_container_SAS_token + \"' .\"\n",
"\n",
"! $azureDownloadRefsManifestFileCommand"
]
},
{
"cell_type": "markdown",
"id": "b520223e-8c7b-495b-b611-5a2ecd0f32a4",
"metadata": {},
"source": [
"### Read refs_manifest.json file and restore metadata to new repository"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cef6e90a-fd2d-42ff-b1ae-053a619ec560",
"id": "44e850ef-3126-4aec-9000-3a915892c329",
"metadata": {},
"outputs": [],
"source": [
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, source_lakefs_sdk_client.internal_api.dump_refs(source_repo_name))"
"with open('./refs_manifest.json') as file:\n",
" refs_manifest_json = json.load(file)\n",
" print(refs_manifest_json)\n",
" \n",
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, refs_manifest_json)"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@
"import random\n",
"import os\n",
"import datetime\n",
"from awscliv2.api import AWSAPI"
"from awscliv2.api import AWSAPI\n",
"import json"
]
},
{
Expand Down Expand Up @@ -286,7 +287,8 @@
"source": [
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" print('Branch with uncommitted data: ' + branchList.id)"
" print('Branch with uncommitted data: ' + branchList.id)\n",
" break"
]
},
{
Expand All @@ -310,15 +312,17 @@
"for branchList in sourceRepo.branches():\n",
" for diff in sourceRepo.branch(branchList.id).uncommitted():\n",
" ref = sourceRepo.branch(branchList.id).commit(message='Committed changes during the migration of the repository')\n",
" print(ref.get_commit())"
" print(ref.get_commit())\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "89dcf103-090d-43a2-bb60-080a0dbe828d",
"metadata": {},
"source": [
"# Step 2 - Dump Metadata of Source Repository"
"# Step 2 - Dump Metadata of Source Repository\n",
"### IMPORTANT: Shutdown lakeFS services immediately after dumping the metadata so nobody can make any changes in the source repository"
]
},
{
Expand Down Expand Up @@ -346,7 +350,9 @@
"source": [
"#### You can directly copy data from local storage to target storage on your own\n",
"#### or you can run following printed command on your local machine to copy data from local Docker container to local machine first\n",
"#### (change the Docker container name for lakeFS and go to the folder where you cloned lakefs-samples Git repo before running the command)"
"#### (change the Docker container name for lakeFS and go to the folder where you cloned lakefs-samples Git repo before running the command)\n",
"\n",
"#### You can restart lakeFS services after copying the data from source to target"
]
},
{
Expand Down Expand Up @@ -408,14 +414,45 @@
"## Step 5 - Restore Metadata to Target Repository"
]
},
{
"cell_type": "markdown",
"id": "3f36b60d-2971-445d-8772-7134ef4c9b6d",
"metadata": {},
"source": [
"### Download metadata(refs_manifest.json) file created by \"Step 2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d1a74484-c1e0-4ed4-ab5c-f468d2c46d32",
"metadata": {},
"outputs": [],
"source": [
"s3DownloadRefsManifestFileCommand = 'aws s3 cp ' + target_storage_namespace + '/_lakefs/refs_manifest.json .'\n",
"! $s3DownloadRefsManifestFileCommand"
]
},
{
"cell_type": "markdown",
"id": "128838da-0048-49a6-95e0-020d10e1c78b",
"metadata": {},
"source": [
"### Read refs_manifest.json file and restore metadata to new repository"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cef6e90a-fd2d-42ff-b1ae-053a619ec560",
"id": "d8799812-9434-4d8e-b810-81266b938a7e",
"metadata": {},
"outputs": [],
"source": [
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, source_lakefs_sdk_client.internal_api.dump_refs(source_repo_name))"
"with open('./refs_manifest.json') as file:\n",
" refs_manifest_json = json.load(file)\n",
" print(refs_manifest_json)\n",
" \n",
"target_lakefs_sdk_client.internal_api.restore_refs(target_repo_name, refs_manifest_json)"
]
},
{
Expand Down
Loading
Loading