Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First Notebook example doesn't work: apparently expects a state data file to already be there? #2

Open
akkana opened this issue Feb 14, 2020 · 8 comments
Assignees

Comments

@akkana
Copy link

akkana commented Feb 14, 2020

I'm trying to run the gerrymandertests, but apparently it relies on my separately downloading state-specific files (I'm particularly interested in New Mexico) and I can't find any documentation on where to get them.

If I just run the notebook, here's the error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-54dcfe840d25> in <module>
     41 
     42 for chamber in chambers:
---> 43     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
     44     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     45         chambers[chamber]['elections_df'],

~/outsrc/gerrymandertests/gerrymetrics/utils.py in parse_results(input_filepath, start_year, coerce_odd_years)
     12     '''
     13 
---> 14     df = pd.read_csv(input_filepath)
     15 
     16     df = df[df['Year'] >= start_year]

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer, kwds)
    677 
    678     parser_f.__name__ = name

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    446 
    447     # Create the parser.
--> 448     parser = TextFileReader(fp_or_buf, **kwds)
    449 
    450     if chunksize or iterator:

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    878             self.options["has_index_names"] = kwds["has_index_names"]
    879 
--> 880         self._make_engine(self.engine)
    881 
    882     def close(self):

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1112     def _make_engine(self, engine="c"):
   1113         if engine == "c":
-> 1114             self._engine = CParserWrapper(self.f, **self.options)
   1115         else:
   1116             if engine == "python":

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File election_data/state_legislative/state_legislative_election_results_post1971.csv does not exist: 'election_data/state_legislative/state_legislative_election_results_post1971.csv'

election_data/congressional_election_results_post1948.csv comes as part of the repository, but election_data/state_legislative/ is an empty directory. Where can I get the files that it expected there?

In NM we're actively fighting for better redistricting (I'm webmaster for fairdistrictsnm.org) and I'd love to get some quantitative measurements I could show to legislators and display on the website.

@hjohns12
Copy link
Contributor

Hi @akkana, the file you're looking for is here: https://github.com/PrincetonUniversity/historic_state_legislative_election_results/blob/2bf28f2ac1a74636b09dfb700eef08a4324d2650/state_legislative_election_results_post1971.csv

I'll update the notebook to update the file path to this data set!

@hjohns12 hjohns12 self-assigned this Feb 14, 2020
@akkana
Copy link
Author

akkana commented Feb 14, 2020

Thanks! I downloaded that and put it in election_data/state_legislative and got past that error. Now it's dying with a different error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-54dcfe840d25> in <module>
     41 
     42 for chamber in chambers:
---> 43     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
     44     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     45         chambers[chamber]['elections_df'],

~/outsrc/gerrymandertests/gerrymetrics/utils.py in parse_results(input_filepath, start_year, coerce_odd_years)
     34     new['District Numbers'] = grouped['District'].apply(list)
     35 
---> 36     if df.columns.contains('Dem Votes'):
     37         new['Weighted Voteshare'] = grouped['Dem Votes'].apply(sum) / (grouped['Dem Votes'].apply(sum) +
     38                                                          grouped['GOP Votes'].apply(sum))

AttributeError: 'Index' object has no attribute 'contains'

@akkana
Copy link
Author

akkana commented Feb 14, 2020

I realized that was with the pip install gerrymetrics; but I tried pip uninstall gerrymetrics followed by pip install . from the checked-out code, and got the same error. If it matters, this virtualenv's pandas reports version 1.0.1 (Python version 3.7.5).

@hjohns12
Copy link
Contributor

Hi @akkana,

I tried to reproduce your issue but was not able to do so. I created a virtual environment (python version 3.7.4) and successfully installed gerrymetrics just now. I wonder if your issue is coming up because your version of pandas does not agree with the version of pandas automatically installed by this package.

What I recommend is that you create a virtual environment, and before installing any other packages, install gerrymetrics with the following code:

python3 -m venv install_ve
source install_ve/bin/activate
pip install gerrymetrics

Let me know if that works, thanks so much!

@akkana
Copy link
Author

akkana commented Apr 20, 2020

I get exactly the same error as before when I type those three lines followed by
jupyter-notebook run_gerrymandering_metrics.ipynb
I tried it outside of jupyter-notebook and got the same error, still AttributeError: 'Index' object has no attribute 'contains'

@akkana
Copy link
Author

akkana commented Apr 20, 2020

If I edit utils.py and put double underscores at eiither end of the "contains" in the line that's erroring (I can't illustrate that because apparently double underscores have a meaning in markdown) in parse_results(), I get a little farther and it even appears to download something (some data?), but then it dies with

  File "<stdin>", line 6, in <module>
  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 66, in tests_df
    df = yearstatedf()
  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 55, in yearstatedf
    names=['Year', 'State'])
TypeError: __new__() got an unexpected keyword argument 'labels'

(I should warn you my utils.py line numbers will be a little off because I've inserted some print()s).
And that does look like a Pandas difference, since the line with the error is creating a pd.MultiIndex with labels as a keyword arg.

This is Python 3.7.5 on Ubuntu 19.10, so probably the pandas the virtualenv is pulling in is tied to that. pandas double-underscore version is 1.0.3.

@hjohns12
Copy link
Contributor

@akkana I just pushed some code that updates the pandas syntax and data path. Will you try cloning again with the updated code and run in a virtual environment with:

python3 -m venv install_ve
source install_ve/bin/activate
pip install gerrymetrics
jupyter-notebook run_gerrymandering_metrics.ipynb

Thanks so much!

@akkana
Copy link
Author

akkana commented May 2, 2020

Sorry for the delay, I've been super busy with election stuff.

Following those instructions (after git pull in the gerrymandertests repo) gives this mysterious error:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-9649b5edd3ef> in <module>
----> 1 import gerrymetrics as g
      2 import IPython.display as ipd
      3 
      4 from collections import defaultdict
      5 

~/outsrc/gerrymandertests/gerrymetrics/__init__.py in <module>
----> 1 from .metrics import *
      2 from .plots import *
      3 from .utils import *

~/outsrc/gerrymandertests/gerrymetrics/metrics.py in <module>
     11 from __future__ import division  # for python 2
     12 import numpy as np
---> 13 import scipy.stats as sps
     14 
     15 

ModuleNotFoundError: No module named 'scipy'

It's mysterious because clearly scipy is there; if I run python inside the venv and run import scipy.stats as sps, it works fine. But it doesn't work inside the notebook.

Aha: that's because Ubuntu's jupyter-notebook begins with: #!/usr/bin/python3.
So I ran a pip install jupyterlab, then ran install_ve/bin/jupyter-notebook run_gerrymandering_metrics.ipynb
That gets me past the import error and now it dies with:

Traceback (most recent call last):

  File "/home/akkana/outsrc/gerrymandertests/install_ve/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-2-9649b5edd3ef>", line 1, in <module>
    import gerrymetrics as g

  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/__init__.py", line 3, in <module>
    from .utils import *

  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 37
    if 'Dem Votes' in df.columns:
    ^
IndentationError: unexpected indent

Sure enough, that line is indented more than the lines before it. If I fix the indentation, I get a little farther:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-9649b5edd3ef> in <module>
     39     print(chamber)
     40     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
---> 41     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     42         chambers[chamber]['elections_df'],
     43         impute_val=impute_val,

~/outsrc/gerrymandertests/gerrymetrics/utils.py in tests_df(tests_dict)
     63     '''
     64 
---> 65     df = yearstatedf()
     66 
     67     for year in tests_dict:

~/outsrc/gerrymandertests/gerrymetrics/utils.py in yearstatedf()
     50     '''
     51 
---> 52     index = pd.MultiIndex(levels=[[], []],
     53                           labels=[[], []],
     54                           names=['Year', 'State'])

TypeError: __new__() got an unexpected keyword argument 'labels'

so alas, now I'm just back to the error from two weeks ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants