Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: use official action to setup rust toolchain #585

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/dependabot.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice! are we planning on enabling dependabot for the repo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well - i guess thats up for debate. but since this was literally just copy and paste from somewhere else, it was low cost to spark the discussion by just providing the solution 😆.

From my end, I think the utility may unfortunately be limited as auto-updates for the major dependencies (e.g. arrow) almost never work out of the box - at least in delta-rs. That said, I still landed on "let's add it" to keep the actions fresh and have the occasional rust version bump that works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I think it is helpful insofar as it just automatically opens small PRs to bump versions and we may (probably) have to manually help upgrade but the 'reminder' to upgrade seems useful? curious to hear what others think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dependabot-quickstart-guide doesn't recommend setting up manually like this, but rather suggests going into settings and letting github create this file. Wondering if we should just do it that way?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay trying this out, but also a bit skeptical of the utility here. As far as I can tell it only runs cargo update. Generally with how we've set up our Cargo.toml this will only do minor updates (i.e. it'll never update arrow 53->54). Maybe that's good, but we need a process to also run cargo outdated and try and stay on top of more major updates as well, so I'm not sure how much time this saves us.

Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Please see the documentation for all configuration options:
# https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link doesn't seem to work. it just goes to a generic support page


version: 2
updates:
- package-ecosystem: "cargo"
directory: "/"
schedule:
interval: "weekly"

- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
58 changes: 18 additions & 40 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,20 @@ jobs:
format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Install minimal stable with clippy and rustfmt
uses: actions-rs/toolchain@v1
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
profile: default
toolchain: stable
override: true
- uses: Swatinem/rust-cache@v2
components: rustfmt
- name: format
run: cargo fmt -- --check

msrv:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Install minimal stable and cargo msrv
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: stable
override: true
- uses: Swatinem/rust-cache@v2
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Install cargo-msrv
shell: bash
run: |
Expand All @@ -41,21 +34,18 @@ jobs:
cargo msrv --path derive-macros/ verify --all-features
cargo msrv --path ffi/ verify --all-features
cargo msrv --path ffi-proc-macros/ verify --all-features

docs:
runs-on: ubuntu-latest
env:
RUSTDOCFLAGS: -D warnings
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Install minimal stable
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: stable
override: true
- uses: Swatinem/rust-cache@v2
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: build docs
run: cargo doc

build:
runs-on: ${{ matrix.os }}
strategy:
Expand All @@ -65,18 +55,16 @@ jobs:
- ubuntu-latest
- windows-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Install minimal stable with clippy and rustfmt
uses: actions-rs/toolchain@v1
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
profile: default
toolchain: stable
override: true
- uses: Swatinem/rust-cache@v2
components: clippy
- name: build and lint with clippy
run: cargo clippy --benches --tests --all-features -- -D warnings
- name: lint without default features
run: cargo clippy --no-default-features -- -D warnings

test:
runs-on: ${{ matrix.os }}
strategy:
Expand All @@ -86,14 +74,9 @@ jobs:
- ubuntu-latest
- windows-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Install minimal stable with clippy and rustfmt
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: stable
override: true
- uses: Swatinem/rust-cache@v2
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: test
run: cargo test --workspace --verbose --all-features -- --skip read_table_version_hdfs

Expand All @@ -109,7 +92,7 @@ jobs:
- name: Setup cmake
uses: jwlawson/actions-setup-cmake@v2
with:
cmake-version: '3.30.x'
cmake-version: "3.30.x"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In yaml double quotes allow escapes and other things, so I tend to prefer single quotes unless I need that, as, fewer surprises. Any reason you changed this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto-formatting ... maybe i need to make prettier or whoever it was less agressive :).

- name: Install arrow-glib
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
Expand Down Expand Up @@ -186,14 +169,9 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Install rust
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: stable
override: true
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Install cargo-llvm-cov
uses: taiki-e/install-action@cargo-llvm-cov
- uses: Swatinem/rust-cache@v2
- name: Generate code coverage
run: cargo llvm-cov --all-features --workspace --codecov --output-path codecov.json -- --skip read_table_version_hdfs
- name: Upload coverage to Codecov
Expand Down
12 changes: 3 additions & 9 deletions .github/workflows/run_integration_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,11 @@ jobs:
- name: Skip job for pull requests on Windows
if: ${{ matrix.skip }}
run: echo "Skipping job for pull requests on Windows."
- uses: actions/checkout@v3
- uses: actions/checkout@v4
if: ${{ !matrix.skip }}
- name: Install minimal stable rust
if: ${{ !matrix.skip }}
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: stable
override: true
- uses: Swatinem/rust-cache@v2
- name: Setup rust toolchain
if: ${{ !matrix.skip }}
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Run integration tests
if: ${{ !matrix.skip }}
shell: bash
Expand Down
7 changes: 1 addition & 6 deletions .github/workflows/semver-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,7 @@ jobs:
fetch-depth: 0
ref: ${{ github.event.pull_request.head.sha }}
- name: Install minimal stable
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: stable
override: true
- uses: Swatinem/rust-cache@v2
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Install cargo-semver-checks
shell: bash
run: |
Expand Down
83 changes: 49 additions & 34 deletions kernel/src/engine/arrow_data.rs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my auto-formatter in the IDE, since there are cargo fmt checks in ci, nut sure why that happened. In most cases I do like the prior formatting better though, do we a have some line-length config somewhere I am not picking up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm that's interesting - guessing your editor is applying something 'extra' on top of rustfmt? I do prefer the old formatting, can we revert these changes?

Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ use crate::{DeltaResult, Error};

use arrow_array::cast::AsArray;
use arrow_array::types::{Int32Type, Int64Type};
use arrow_array::{Array, ArrayRef, GenericListArray, MapArray, OffsetSizeTrait, RecordBatch, StructArray};
use arrow_schema::{FieldRef, DataType as ArrowDataType};
use tracing::{debug};
use arrow_array::{
Array, ArrayRef, GenericListArray, MapArray, OffsetSizeTrait, RecordBatch, StructArray,
};
use arrow_schema::{DataType as ArrowDataType, FieldRef};
use tracing::debug;

use std::collections::{HashMap, HashSet};

Expand Down Expand Up @@ -138,14 +140,20 @@ impl EngineData for ArrowEngineData {
self.data.num_rows()
}

fn visit_rows(&self, leaf_columns: &[ColumnName], visitor: &mut dyn RowVisitor) -> DeltaResult<()> {
fn visit_rows(
&self,
leaf_columns: &[ColumnName],
visitor: &mut dyn RowVisitor,
) -> DeltaResult<()> {
// Make sure the caller passed the correct number of column names
let leaf_types = visitor.selected_column_names_and_types().1;
if leaf_types.len() != leaf_columns.len() {
return Err(Error::MissingColumn(format!(
"Visitor expected {} column names, but caller passed {}",
leaf_types.len(), leaf_columns.len()
)).with_backtrace());
leaf_types.len(),
leaf_columns.len()
))
.with_backtrace());
}

// Collect the names of all leaf columns we want to extract, along with their parents, to
Expand All @@ -154,20 +162,19 @@ impl EngineData for ArrowEngineData {
let mut mask = HashSet::new();
for column in leaf_columns {
for i in 0..column.len() {
mask.insert(&column[..i+1]);
mask.insert(&column[..i + 1]);
}
}
debug!("Column mask for selected columns {leaf_columns:?} is {mask:#?}");

let mut getters = vec![];
Self::extract_columns(&mut vec![], &mut getters, leaf_types, &mask, &self.data)?;
if getters.len() != leaf_columns.len() {
return Err(Error::MissingColumn(
format!(
"Visitor expected {} leaf columns, but only {} were found in the data",
leaf_columns.len(), getters.len()
)
));
return Err(Error::MissingColumn(format!(
"Visitor expected {} leaf columns, but only {} were found in the data",
leaf_columns.len(),
getters.len()
)));
}
visitor.visit(self.len(), &getters)
}
Expand All @@ -185,14 +192,11 @@ impl ArrowEngineData {
path.push(field.name().to_string());
if column_mask.contains(&path[..]) {
if let Some(struct_array) = column.as_struct_opt() {
debug!("Recurse into a struct array for {}", ColumnName::new(path.iter()));
Self::extract_columns(
path,
getters,
leaf_types,
column_mask,
struct_array,
)?;
debug!(
"Recurse into a struct array for {}",
ColumnName::new(path.iter())
);
Self::extract_columns(path, getters, leaf_types, column_mask, struct_array)?;
} else if column.data_type() == &ArrowDataType::Null {
debug!("Pushing a null array for {}", ColumnName::new(path.iter()));
getters.push(&());
Expand All @@ -215,16 +219,20 @@ impl ArrowEngineData {
col: &'a dyn Array,
) -> DeltaResult<&'a dyn GetData<'a>> {
use ArrowDataType::Utf8;
let col_as_list = || if let Some(array) = col.as_list_opt::<i32>() {
(array.value_type() == Utf8).then_some(array as _)
} else if let Some(array) = col.as_list_opt::<i64>() {
(array.value_type() == Utf8).then_some(array as _)
} else {
None
let col_as_list = || {
if let Some(array) = col.as_list_opt::<i32>() {
(array.value_type() == Utf8).then_some(array as _)
} else if let Some(array) = col.as_list_opt::<i64>() {
(array.value_type() == Utf8).then_some(array as _)
} else {
None
}
};
let col_as_map = || {
col.as_map_opt().and_then(|array| {
(array.key_type() == &Utf8 && array.value_type() == &Utf8).then_some(array as _)
})
};
let col_as_map = || col.as_map_opt().and_then(|array| {
(array.key_type() == &Utf8 && array.value_type() == &Utf8).then_some(array as _)
});
let result: Result<&'a dyn GetData<'a>, _> = match data_type {
&DataType::BOOLEAN => {
debug!("Pushing boolean array for {}", ColumnName::new(path));
Expand All @@ -236,11 +244,15 @@ impl ArrowEngineData {
}
&DataType::INTEGER => {
debug!("Pushing int32 array for {}", ColumnName::new(path));
col.as_primitive_opt::<Int32Type>().map(|a| a as _).ok_or("int")
col.as_primitive_opt::<Int32Type>()
.map(|a| a as _)
.ok_or("int")
}
&DataType::LONG => {
debug!("Pushing int64 array for {}", ColumnName::new(path));
col.as_primitive_opt::<Int64Type>().map(|a| a as _).ok_or("long")
col.as_primitive_opt::<Int64Type>()
.map(|a| a as _)
.ok_or("long")
}
DataType::Array(_) => {
debug!("Pushing list for {}", ColumnName::new(path));
Expand All @@ -252,14 +264,17 @@ impl ArrowEngineData {
}
data_type => {
return Err(Error::UnexpectedColumnType(format!(
"On {}: Unsupported type {data_type}", ColumnName::new(path)
"On {}: Unsupported type {data_type}",
ColumnName::new(path)
)));
}
};
result.map_err(|type_name| {
Error::UnexpectedColumnType(format!(
"Type mismatch on {}: expected {}, got {}",
ColumnName::new(path), type_name, col.data_type()
ColumnName::new(path),
type_name,
col.data_type()
))
})
}
Expand Down
Loading