First Notebook example doesn't work: apparently expects a state data file to already be there? #2

akkana · 2020-02-14T18:27:02Z

I'm trying to run the gerrymandertests, but apparently it relies on my separately downloading state-specific files (I'm particularly interested in New Mexico) and I can't find any documentation on where to get them.

If I just run the notebook, here's the error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-54dcfe840d25> in <module>
     41 
     42 for chamber in chambers:
---> 43     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
     44     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     45         chambers[chamber]['elections_df'],

~/outsrc/gerrymandertests/gerrymetrics/utils.py in parse_results(input_filepath, start_year, coerce_odd_years)
     12     '''
     13 
---> 14     df = pd.read_csv(input_filepath)
     15 
     16     df = df[df['Year'] >= start_year]

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer, kwds)
    677 
    678     parser_f.__name__ = name

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    446 
    447     # Create the parser.
--> 448     parser = TextFileReader(fp_or_buf, **kwds)
    449 
    450     if chunksize or iterator:

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    878             self.options["has_index_names"] = kwds["has_index_names"]
    879 
--> 880         self._make_engine(self.engine)
    881 
    882     def close(self):

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1112     def _make_engine(self, engine="c"):
   1113         if engine == "c":
-> 1114             self._engine = CParserWrapper(self.f, **self.options)
   1115         else:
   1116             if engine == "python":

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File election_data/state_legislative/state_legislative_election_results_post1971.csv does not exist: 'election_data/state_legislative/state_legislative_election_results_post1971.csv'

election_data/congressional_election_results_post1948.csv comes as part of the repository, but election_data/state_legislative/ is an empty directory. Where can I get the files that it expected there?

In NM we're actively fighting for better redistricting (I'm webmaster for fairdistrictsnm.org) and I'd love to get some quantitative measurements I could show to legislators and display on the website.

The text was updated successfully, but these errors were encountered:

hjohns12 · 2020-02-14T20:03:09Z

Hi @akkana, the file you're looking for is here: https://github.com/PrincetonUniversity/historic_state_legislative_election_results/blob/2bf28f2ac1a74636b09dfb700eef08a4324d2650/state_legislative_election_results_post1971.csv

I'll update the notebook to update the file path to this data set!

akkana · 2020-02-14T20:29:11Z

Thanks! I downloaded that and put it in election_data/state_legislative and got past that error. Now it's dying with a different error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-54dcfe840d25> in <module>
     41 
     42 for chamber in chambers:
---> 43     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
     44     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     45         chambers[chamber]['elections_df'],

~/outsrc/gerrymandertests/gerrymetrics/utils.py in parse_results(input_filepath, start_year, coerce_odd_years)
     34     new['District Numbers'] = grouped['District'].apply(list)
     35 
---> 36     if df.columns.contains('Dem Votes'):
     37         new['Weighted Voteshare'] = grouped['Dem Votes'].apply(sum) / (grouped['Dem Votes'].apply(sum) +
     38                                                          grouped['GOP Votes'].apply(sum))

AttributeError: 'Index' object has no attribute 'contains'

akkana · 2020-02-14T21:59:15Z

I realized that was with the pip install gerrymetrics; but I tried pip uninstall gerrymetrics followed by pip install . from the checked-out code, and got the same error. If it matters, this virtualenv's pandas reports version 1.0.1 (Python version 3.7.5).

hjohns12 · 2020-04-20T15:15:06Z

Hi @akkana,

I tried to reproduce your issue but was not able to do so. I created a virtual environment (python version 3.7.4) and successfully installed gerrymetrics just now. I wonder if your issue is coming up because your version of pandas does not agree with the version of pandas automatically installed by this package.

What I recommend is that you create a virtual environment, and before installing any other packages, install gerrymetrics with the following code:

python3 -m venv install_ve
source install_ve/bin/activate
pip install gerrymetrics

Let me know if that works, thanks so much!

akkana · 2020-04-20T18:17:08Z

I get exactly the same error as before when I type those three lines followed by
jupyter-notebook run_gerrymandering_metrics.ipynb
I tried it outside of jupyter-notebook and got the same error, still AttributeError: 'Index' object has no attribute 'contains'

akkana · 2020-04-20T18:27:55Z

If I edit utils.py and put double underscores at eiither end of the "contains" in the line that's erroring (I can't illustrate that because apparently double underscores have a meaning in markdown) in parse_results(), I get a little farther and it even appears to download something (some data?), but then it dies with

  File "<stdin>", line 6, in <module>
  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 66, in tests_df
    df = yearstatedf()
  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 55, in yearstatedf
    names=['Year', 'State'])
TypeError: __new__() got an unexpected keyword argument 'labels'

(I should warn you my utils.py line numbers will be a little off because I've inserted some print()s).
And that does look like a Pandas difference, since the line with the error is creating a pd.MultiIndex with labels as a keyword arg.

This is Python 3.7.5 on Ubuntu 19.10, so probably the pandas the virtualenv is pulling in is tied to that. pandas double-underscore version is 1.0.3.

hjohns12 · 2020-04-24T15:32:45Z

@akkana I just pushed some code that updates the pandas syntax and data path. Will you try cloning again with the updated code and run in a virtual environment with:

python3 -m venv install_ve
source install_ve/bin/activate
pip install gerrymetrics
jupyter-notebook run_gerrymandering_metrics.ipynb

Thanks so much!

akkana · 2020-05-02T16:20:40Z

Sorry for the delay, I've been super busy with election stuff.

Following those instructions (after git pull in the gerrymandertests repo) gives this mysterious error:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-9649b5edd3ef> in <module>
----> 1 import gerrymetrics as g
      2 import IPython.display as ipd
      3 
      4 from collections import defaultdict
      5 

~/outsrc/gerrymandertests/gerrymetrics/__init__.py in <module>
----> 1 from .metrics import *
      2 from .plots import *
      3 from .utils import *

~/outsrc/gerrymandertests/gerrymetrics/metrics.py in <module>
     11 from __future__ import division  # for python 2
     12 import numpy as np
---> 13 import scipy.stats as sps
     14 
     15 

ModuleNotFoundError: No module named 'scipy'

It's mysterious because clearly scipy is there; if I run python inside the venv and run import scipy.stats as sps, it works fine. But it doesn't work inside the notebook.

Aha: that's because Ubuntu's jupyter-notebook begins with: #!/usr/bin/python3.
So I ran a pip install jupyterlab, then ran install_ve/bin/jupyter-notebook run_gerrymandering_metrics.ipynb
That gets me past the import error and now it dies with:

Traceback (most recent call last):

  File "/home/akkana/outsrc/gerrymandertests/install_ve/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-2-9649b5edd3ef>", line 1, in <module>
    import gerrymetrics as g

  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/__init__.py", line 3, in <module>
    from .utils import *

  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 37
    if 'Dem Votes' in df.columns:
    ^
IndentationError: unexpected indent

Sure enough, that line is indented more than the lines before it. If I fix the indentation, I get a little farther:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-9649b5edd3ef> in <module>
     39     print(chamber)
     40     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
---> 41     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     42         chambers[chamber]['elections_df'],
     43         impute_val=impute_val,

~/outsrc/gerrymandertests/gerrymetrics/utils.py in tests_df(tests_dict)
     63     '''
     64 
---> 65     df = yearstatedf()
     66 
     67     for year in tests_dict:

~/outsrc/gerrymandertests/gerrymetrics/utils.py in yearstatedf()
     50     '''
     51 
---> 52     index = pd.MultiIndex(levels=[[], []],
     53                           labels=[[], []],
     54                           names=['Year', 'State'])

TypeError: __new__() got an unexpected keyword argument 'labels'

so alas, now I'm just back to the error from two weeks ago.

hjohns12 self-assigned this Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First Notebook example doesn't work: apparently expects a state data file to already be there? #2

First Notebook example doesn't work: apparently expects a state data file to already be there? #2

akkana commented Feb 14, 2020

hjohns12 commented Feb 14, 2020

akkana commented Feb 14, 2020

akkana commented Feb 14, 2020

hjohns12 commented Apr 20, 2020

akkana commented Apr 20, 2020

akkana commented Apr 20, 2020

hjohns12 commented Apr 24, 2020

akkana commented May 2, 2020

First Notebook example doesn't work: apparently expects a state data file to already be there? #2

First Notebook example doesn't work: apparently expects a state data file to already be there? #2

Comments

akkana commented Feb 14, 2020

hjohns12 commented Feb 14, 2020

akkana commented Feb 14, 2020

akkana commented Feb 14, 2020

hjohns12 commented Apr 20, 2020

akkana commented Apr 20, 2020

akkana commented Apr 20, 2020

hjohns12 commented Apr 24, 2020

akkana commented May 2, 2020