Skip to content

Commit

Permalink
more wip + started categorical raster
Browse files Browse the repository at this point in the history
  • Loading branch information
asinghvi17 committed Sep 24, 2024
1 parent e262fff commit c78468c
Show file tree
Hide file tree
Showing 2 changed files with 186 additions and 3 deletions.
4 changes: 4 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
[deps]
ArchGDAL = "c9ce4bd3-c3d5-55b8-8973-c0e20141b8c3"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DataFramesMeta = "1313f7d8-7da2-5740-9ea0-a2ca25f37964"
DimensionalData = "0703355e-b756-11e9-17c0-8b28908087d0"
GeoDataFrames = "62cb38b5-d8d2-4862-a48e-6a340996859f"
GeoFormatTypes = "68eda718-8dee-11e9-39e7-89f7f65f511f"
Expand All @@ -14,9 +16,11 @@ LightOSM = "d1922b25-af4e-4ba3-84af-fe9bea896051"
OSMToolset = "a1c25ae6-0f93-4b3a-bddf-c248cb99b9fa"
OpenStreetMapX = "86cd37e6-c0ff-550b-95fe-21d72c8d4fc9"
Proj = "c94c279d-25a6-4763-9509-64d165bea63e"
Query = "1a8c2f83-1ff3-5112-b086-8aa67b057ba1"
Rasters = "a3a2b9e3-a471-40c9-b274-f788e487c689"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
TidierData = "fe2206b3-d496-4ee9-a338-6a095c4ece80"

[compat]
GeometryOps = "0.1.12"
185 changes: 182 additions & 3 deletions chapters/02-attribute-operations.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ Notice that the first column, `:geom`, is composed of `IGeometry{wkbMultiPolygon

We can also get some geospatial information - `GI.geometrycolumns(world)` returns `{julia} GI.geometrycolumns(world)`, and `GI.crs(world)` returns `{julia} GI.crs(world)`.

## Dropping geometries
### Dropping geometries

We can drop the geometry column by subsetting the `DataFrame`:

Expand All @@ -150,7 +150,24 @@ Becoming skilled at geographic attribute data manipulation means becoming skille

### Vector attribute subsetting

There are multiple ways to subset data in Julia. First, and probably most simply, we can index into the DataFrame object using a few kinds of selectors. Rows are always selected first, and then columns go in the second position. We can select the first 5 rows of the `:pop_est` column like so:
There are multiple ways to subset data in Julia.
First, and probably most simply, we can index into the DataFrame object using a few kinds of selectors. This can select rows and columns.

Indices placed inside square brackets placed directly after a data frame object name specify the elements to keep.

Rows are always selected first, and then columns go in the second position. We can select the first 5 rows of the `:pop_est` column like so:

::: {.callout-note collapse="true"}

## Recap: indexing in Julia

Indexing in Julia is 1-based, like R, and unlike Python which is 0-based.

It's performed using the `[inds...]` operator. The `:` operator is used to select all elements in that dimension, and you can select a range using `start:stop`.
You can also pass vectors of indices or boolean values to select specific elements.

In DataFrames.jl, you can construct a view over all rows by using the `!` operator, like `world[!, :pop]` (in place of `world[:, :pop]`). This syntax is also needed when modify the entire column, or creating a new column.
:::

```{julia}
world[1:5, :pop]
Expand All @@ -164,4 +181,166 @@ world[5:end, [:pop, :continent]]

and note that this returns a new DataFrame with only the selected columns.

We can also use the `select` function to subset by some predicate. Let's select all countries whose populations are greater than 30 million, but less than 1 billion:
We can also drop all missing values in a column using the `dropmissing` function:

```{julia}
world_with_pop = dropmissing(world, :pop)
```

There is also a mutating version of `dropmissing`, called `dropmissing!`, which modifies the input in place.

We can also subset by a boolean vector, computed on some predicate. Let's select all countries whose populations are greater than 30 million, but less than 1 billion.
```{julia}
countries_to_select = 30_000_000 .< world_with_pop.pop .< 1_000_000_000
```

```{julia}
world_with_pop[countries_to_select, :]
```

A more concise way to achieve the same result is `world_with_pop[30_000_000 .< world_with_pop.pop .< 1_000_000_000, :]`.


Here's a small exercise: guess the number of rows and columns in the `DataFrame` objects returned by each of the following commands, then check your answer by executing the commands in Julia.

```{julia}
#| eval: false
world[1:6, ] # subset rows by position
world[:, 1:3] # subset columns by position
world[1:6, 1:3] # subset rows and columns by position
world[:, [:name_long, :pop]] # columns by name
world[:, [true, true, false, false, false, false, false, true, true, false, false]] # by logical indices
world[:, 888] # an index representing a non-existent column
```



There are ways to achieve this result using all of the DataFrame manipulation packages mentioned above.


::: {.panel-tabset}

## DataFrames.jl

DataFrames.jl also defines a `subset` function, which is another way to achieve this result:

```{julia}
subset(world_with_pop, :pop => x -> !ismissing(x) && 30_000_000 < x < 1_000_000_000)
```

## DataFramesMeta.jl

DataFramesMeta.jl provides a convenient syntax for subsetting DataFrames using a DSL that closely resembles the tidyverse.

```{julia}
using DataFramesMeta
@chain world_with_pop begin
@subset @byrow (!ismissing(:pop) && 30_000_000 < :pop < 1_000_000_000)
select(:name_long, :pop)
end
```

## TidierData.jl

TidierData.jl provides a convenient syntax for subsetting DataFrames using a DSL that closely resembles the tidyverse.

```{julia}
#| eval: false
using TidierData
@chain world_with_pop begin
@subset @byrow (!ismissing(:pop) && 30_000_000 < :pop < 1_000_000_000)
select(:name_long, :pop)
end
```

## Query.jl

Query.jl provides a convenient syntax for subsetting DataFrames using a DSL that closely resembles the tidyverse.

```{julia}
using Query
@from row in world_with_pop |>
@where !ismissing(row.pop) && 30_000_000 < row.pop < 1_000_000_000 |>
@select {name_long = row.name_long, pop = row.pop} |>
DataFrame
```

:::































## Manipulating raster objects

In contrast to the vector data model underlying simple features (which represents points, lines and polygons as discrete entities in space), raster data represent continuous surfaces.
This section shows how raster objects work by creating them *from scratch*, building on Section \@ref(an-introduction-to-terra).
Because of their unique structure, subsetting and other operations on raster datasets work in a different way, as demonstrated in Section \@ref(raster-subsetting).


The following code recreates the raster dataset used in Section \@ref(raster-classes), the result of which is illustrated in Figure \@ref(fig:cont-raster).
This demonstrates how the `Raster()` constructor works to create an example raster named `elev` (representing elevations).

```{julia}
vals = reshape(1:36, 6, 6)
elev = Raster(vals, (X(LinRange(-1.5, 1.5, 6)), Y(LinRange(-1.5, 1.5, 6))))
```


The result is a raster object with 6 rows and 6 columns, and spatial lookup vectors for the dimensions `X` (horizontal) and `Y` (vertical).
The `vals` argument sets the values that each cell contains: numeric data ranging from 1 to 36 in this case.


Raster objects can also contain categorical values, like strings or even values corresponding to categories.
The following code creates the raster datasets shown in Figure \@ref(fig:cont-raster):

```{julia}
# First, construct a categorical array
using CategoricalArrays
grain_order = ["clay", "silt", "sand"]
grain_char = rand(grain_order, 6, 6)
grain_fact = CategoricalArray(grain_char, levels = grain_order)
# Then, wrap the categorical array in a Raster object
grain = Raster(grain_fact, (X(LinRange(-1.5, 1.5, 6)), Y(LinRange(-1.5, 1.5, 6))))
```

```{julia}
elev = Raster("raster/elev.tif")
grain = Raster("raster/grain.tif")
```

This `CategoricalArray` is stored in two parts: a matrix of integer codes, and a dictionary of levels, that maps the integer codes to the string values.
We can retrieve and modify the levels of a `CategoricalArray` using the `levels()` function.

0 comments on commit c78468c

Please sign in to comment.