Load from CSV #128

kwinkunks · 2021-03-25T00:40:20Z

Needs improving. For example:

Less strict about column names etc. Allow to use_cols like NumPy/Pandas
Allow to load from URL
Encodings
Handling bad intervals (ignore or error), nulls for missing zones

mtb-za · 2021-04-18T12:34:21Z

There are a number of cases that should probably be handled:

Only tops are given - bases are inferred to be the next top.
Only bases are given - tops are inferred to be the next base.
Both bases and tops are given
Either bases or tops are given along with a thickness - the missing value is calculated using the thickness.

Currently we can handle the first and third cases. The second should not be too difficult, and if that is working both of the fourth case become essentially analogous. This probably should happen when we build a list of intervals, rather than being something that the from_csv method handles specially. This will let other from_* methods to do the same.

One major change that I am making is to explicitly require a top, base and/or thickness column to be specified, unless names=True is passed, in which case it should find them automatically. We are still assuming that there is a component column or similar exists, which can be used to define those for the interval.

We still need to think about the possible things in an Interval object, to decide what we are going to give to the Interval constructor:

Top - self-evident
Base - self-evident -- if one of these are missing, we can infer from the next one, or possibly a thickness and one of these.
Component - A list of Component objects.
Description - Plain-text description that gets parsed into a list of Components. Probably this is what we are going to mostly use, when reading a CSV? Do we need to get a number of Components somewhere? That feels like a tricky problem though.
Data - Anything else gets added to this dictionary.

Simplifying and extending `from_csv`. Initial test uses `np.genfromtext` internally. Explicitly expects to be told which columns the `top`, `base` and/or `thickness` are in. Other columns will be added to a dictionary from which the intervals will be created. Addresses, but does not entirely close #128

mtb-za · 2021-04-20T20:04:56Z

https://gist.github.com/mtb-za/3f94ffc426e804e7b2c778c2f0c6f051 has a couple of approaches, one using np.genfromtxt and one using csv.DictReader. Not sure if one appeals to you more than another one.

If we want to get input as something other than strings using either base approach, we need to cast them. We can start with the most specific type: int, then try float, and finally leave them as str. We might be able to handle other things, but that might be tricky to decide what that needs to be cleanly. genfromtxt allows for a sequence of dtypes, which is probably possible, but more difficult with csv.DictReader.

* Update striplog.py Simplifying and extending `from_csv`. Initial test uses `np.genfromtext` internally. Explicitly expects to be told which columns the `top`, `base` and/or `thickness` are in. Other columns will be added to a dictionary from which the intervals will be created. Addresses, but does not entirely close #128 * Update `from_csv` Expanded docs. The will need to be restyled to match the rest of the library, but this will work for now. Added a series of FutureWarnings for args that we want to remove going forward. Added some new args. * Updated `from_csv` Updated docs and args to be the same. Changed version for deprecation to 0.9.1. * Updated test suite Changed tests to use new `from_csv` method. The tests were passing in strings. These no need to be explicitly converted to `io.StringIO` objects, along with the `names=True`. * Updated `from_csv()` Removed extra args. These are not currently being used, but we might want them later. No need for them for now. * Remove old `from_csv` Removed old `from_csv` function - it is no longer being used anyway.

kwinkunks added the enhancement label Mar 25, 2021

mtb-za self-assigned this Apr 17, 2021

This was referenced Apr 19, 2021

Next release number? #133

Open

IO changes #134

Merged

mtb-za mentioned this issue Apr 21, 2021

Load more properties from CSV #55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load from CSV #128

Load from CSV #128

kwinkunks commented Mar 25, 2021

mtb-za commented Apr 18, 2021

mtb-za commented Apr 20, 2021

Load from CSV #128

Load from CSV #128

Comments

kwinkunks commented Mar 25, 2021

mtb-za commented Apr 18, 2021

mtb-za commented Apr 20, 2021