-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux (Ubuntu) build problems. #9
Comments
From @jhkennedy on April 23, 2015 22:59 On a fresh Ubuntu 14.04LTS system (virtual machine), this is the script I am using to install the necessary dependencies: #!/bin/bash
sudo apt-get update
sudo apt-get install -y git git-doc
sudo apt-get install -y gfortran
sudo apt-get install -y build-essential cmake cmake-curses-gui
sudo apt-get install -y libblas-dev liblapack-dev
sudo apt-get install -y openmpi-bin libopenmpi-dev
sudo apt-get install -y libhdf5-dev hdf5-tools hdf5-helpers flex
sudo apt-get install -y libnetcdf-dev netcdf-bin netcdf-doc nco ncview
sudo apt-get install -y python-dev python-numpy python-matplotlib python-scientific python-pip
sudo -E pip install netcdf4
sudo apt-get update
sudo apt-get upgrade
sudo apt-get autoremove I then install Trilinos as outlined in the CISM documentation. And I install CISM using the |
From @jhkennedy on April 23, 2015 23:4 For LIVV The Since, we are trying to sync up these LIVV tests with the tests in the CISM code base. I'll focus on the higher-order test cases that work (no slab). Therefore, this issue isn't time sensitive, but worth looking into at some point. |
From @stephenprice on April 23, 2015 23:9 Hi Joe - to clarify, you are no longer seeing the problem with slap that we talked about earlier today? |
From @jhkennedy on April 23, 2015 23:12 @stephenprice I don't think so. The problems I was having looked really similar to the parallel slab errors. But, I found an error in the So, the only issues I'm having now are with the tests detailed above. |
From @matthewhoffman on April 24, 2015 1:4 @jhkennedy , I just pushed a branch matthewhoffman/fix_testcase_errors_ubuntu to work on these issues. The fix for the slab test case is easy and is in 9e16db8. (It was an error on all platforms, but since we've never blessed the slab test case, it has not been run regularly.) I will look into the EISMINT errors next, although those look a bit mysterious. I may ask for more information after I've thought about it a bit. I'll also try running them on the Ubuntu box in my office tomorrow. |
From @jhkennedy on April 24, 2015 12:28 @matthewhoffman Great! Thanks. Let me know that more info you need. |
From @matthewhoffman on April 24, 2015 18:11 I see the exact same error on my Ubuntu machine running 14.04. I spend about 2 hours trying to understand what is happening and it got me pretty deep into SLAP without seeing light at the end of the tunnel. There has been some known problems with SLAP (or how we are calling it, more likely) for awhile. Specifically, if you compile CISM with
I am wondering if this error on EISMINT is somehow related. However, thinking a bit more about why we would see this for EISMINT but not other test cases that use the Glide dycore, I realized the distinguishing feature about EISMINT is the domain starts with no ice. I confirmed that if the SLAP solve for thickness evolution is skipped if the domain has no ice then the EISMINT run proceeds normally on my Ubuntu machine. I pushed a new commit to the same branch with this change. However, the fix is a little weird because Glide is mysterious in places. There was an existing check for @whlipscomb and @stephenprice , do either of you have any insight into the This does oddly change answers in the EISMINT-1 fixed margin case EISMINT-1 moving-margin case is BFB. |
From @stephenprice on April 24, 2015 18:14 @matthewhoffman , unfortunately, I have no insight into anything that goes on inside of the SIA code (or at least the subroutines that are specific to that dycore). |
From @whlipscomb on April 24, 2015 18:25 Hi Matt, Thanks for the sleuthing and the fix. A while back I tried to understand the logic of glide_thck_index and model%geometry%empty, and I found it pretty confusing. Given that we have more pressing issues, I suggest that we consider the problem fixed for now. We can always look at it later if/when we have time. Bill |
From @matthewhoffman on April 24, 2015 18:35 Ok thanks for the feedback. @jhkennedy , when you've had a chance to check this works for you and your needs for LIVV, let me know and I will merge the branch into develop. |
From @jhkennedy on April 24, 2015 19:33 @matthewhoffman it's much better now, but I'm still getting some errors on both my Ubuntu 14.04LTS, and the "clean" virtual machines of Ubuntu 12.04LTS and 14.04LTS. These all now work:
These still do not:
e1-fm.1.config output: $ ./cism_driver e1-fm.1.config
CISM dycore type (0=Glide, 1=Glam, 2=Glissade, 3=AlbanyFelix, 4 = BISICLES) = 0
g2c%which_gcm (1 = data, 2 = minimal) = 0
call cism_init_dycore
Setting halo values: nhalo = 0
WARNING: parallel dycores tested only with nhalo = 2
Layout(EW,NS) = 31 31 total procs = 1
Global idiag, jdiag: 1 1
Local idiag, jdiag, task: 1 1 0
*** Error in `./cism_driver': free(): invalid pointer: 0x000000000270e380 ***
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7F674A8327D7
#1 0x7F674A832DDE
#2 0x7F6749F66D3F
#3 0x7F6749F66CC9
#4 0x7F6749F6A0D7
#5 0x7F6749FA3393
#6 0x7F6749FAF66D
#7 0x5B0511 in __glimmer_sparse_slap_MOD_slap_solve at glimmer_sparse_slap.F90:210 (discriminator 1)
#8 0x524597 in __glimmer_sparse_MOD_sparse_solve at glimmer_sparse.F90:237
#9 0x524938 in __glimmer_sparse_MOD_sparse_easy_solve at glimmer_sparse.F90:373 (discriminator 1)
#10 0x466ABD in thck_evolve at glide_thck.F90:561
#11 0x469CE5 in __glide_thck_MOD_thck_lin_evolve at glide_thck.F90:170
#12 0x452B1A in __glide_MOD_glide_tstep_p2 at glide.F90:862
#13 0x418394 in __cism_front_end_MOD_cism_run_dycore at cism_front_end.F90:302
#14 0x4189D6 in __gcm_cism_interface_MOD_gci_run_model at gcm_cism_interface.F90:118
#15 0x417D53 in cism_driver at cism_driver.F90:49
Aborted e1-fm.2.config output: $ ./cism_driver e1-fm.2.config
CISM dycore type (0=Glide, 1=Glam, 2=Glissade, 3=AlbanyFelix, 4 = BISICLES) = 0
g2c%which_gcm (1 = data, 2 = minimal) = 0
call cism_init_dycore
Setting halo values: nhalo = 0
WARNING: parallel dycores tested only with nhalo = 2
Layout(EW,NS) = 31 31 total procs = 1
Global idiag, jdiag: 1 1
Local idiag, jdiag, task: 1 1 0
*** Error in `./cism_driver': free(): invalid pointer: 0x00000000018d97f0 ***
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7FB9A26977D7
#1 0x7FB9A2697DDE
#2 0x7FB9A1DCBD3F
#3 0x7FB9A1DCBCC9
#4 0x7FB9A1DCF0D7
#5 0x7FB9A1E08393
#6 0x7FB9A1E1466D
#7 0x5B0511 in __glimmer_sparse_slap_MOD_slap_solve at glimmer_sparse_slap.F90:210 (discriminator 1)
#8 0x524597 in __glimmer_sparse_MOD_sparse_solve at glimmer_sparse.F90:237
#9 0x524938 in __glimmer_sparse_MOD_sparse_easy_solve at glimmer_sparse.F90:373 (discriminator 1)
#10 0x466ABD in thck_evolve at glide_thck.F90:561
#11 0x469CE5 in __glide_thck_MOD_thck_lin_evolve at glide_thck.F90:170
#12 0x452B1A in __glide_MOD_glide_tstep_p2 at glide.F90:862
#13 0x418394 in __cism_front_end_MOD_cism_run_dycore at cism_front_end.F90:302
#14 0x4189D6 in __gcm_cism_interface_MOD_gci_run_model at gcm_cism_interface.F90:118
#15 0x417D53 in cism_driver at cism_driver.F90:49
Aborted e1-fm.3.config output: $ ./cism_driver e1-fm.3.config
CISM dycore type (0=Glide, 1=Glam, 2=Glissade, 3=AlbanyFelix, 4 = BISICLES) = 0
g2c%which_gcm (1 = data, 2 = minimal) = 0
call cism_init_dycore
Setting halo values: nhalo = 0
WARNING: parallel dycores tested only with nhalo = 2
Layout(EW,NS) = 31 31 total procs = 1
Global idiag, jdiag: 1 1
Local idiag, jdiag, task: 1 1 0
*** Error in `./cism_driver': free(): invalid pointer: 0x00000000022747f0 ***
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7FB9289F17D7
#1 0x7FB9289F1DDE
#2 0x7FB928125D3F
#3 0x7FB928125CC9
#4 0x7FB9281290D7
#5 0x7FB928162393
#6 0x7FB92816E66D
#7 0x5B0511 in __glimmer_sparse_slap_MOD_slap_solve at glimmer_sparse_slap.F90:210 (discriminator 1)
#8 0x524597 in __glimmer_sparse_MOD_sparse_solve at glimmer_sparse.F90:237
#9 0x524938 in __glimmer_sparse_MOD_sparse_easy_solve at glimmer_sparse.F90:373 (discriminator 1)
#10 0x466ABD in thck_evolve at glide_thck.F90:561
#11 0x469CE5 in __glide_thck_MOD_thck_lin_evolve at glide_thck.F90:170
#12 0x452B1A in __glide_MOD_glide_tstep_p2 at glide.F90:862
#13 0x418394 in __cism_front_end_MOD_cism_run_dycore at cism_front_end.F90:302
#14 0x4189D6 in __gcm_cism_interface_MOD_gci_run_model at gcm_cism_interface.F90:118
#15 0x417D53 in cism_driver at cism_driver.F90:49
Aborted |
From @jhkennedy on April 24, 2015 19:38 As for LIVV, LIVV wasn't running these FO test cases at all. With the fix for slab, I can incorporate all the higher-order test cases now (which we were mostly running). That will at least allow me to get the structure I want set up, and the LIVV workflow together. Do we want to include these (working) FO test cases in LIVV as well? ( @stephenprice, @kevans32 ) If not, I'm good! At this point, fixing these last three EISMINT-1 cases is more for CISM in general, not for LIVV particularly. |
From @stephenprice on April 24, 2015 20:6 @jhkennedy , @kevans32 at some point in the last week or two I sent an email out with a list of existing test cases we want to support and others we might want to add. Which test cases in particular are you referring to? |
From @jhkennedy on April 24, 2015 20:12 @stephenprice yep, sorry, I forgot to pull that list out. So then, everything is working for LIVV; working on the |
From @matthewhoffman on April 24, 2015 21:8 @jhkennedy , thanks for catching the e1.fm.* errors. I looked into a bit and am not sure what to do next. I've confirmed that there is indeed ice present and the code should therefore be attempting a SLAP solve when the error occurs (testing on my Ubuntu machine). So perhaps my assumption that this error is only caused when there is no ice is not correct. Without thoroughly digging into SLAP (which I don't think is a good use of time right now), I'm not sure what else to try. @stephenprice , @whlipscomb do either of you have preferences about how to proceed? I hate to punt on fully supporting such a basic test case, but I'll suggest the fallback would be to leave this for now and add an issue to the public repo documenting this particular error and machine configuration in case any users attempt it since as Joe points out, it is listed in the User's Guide as a good place to start. |
From @whlipscomb on April 27, 2015 14:33 @matthewhoffman, I agree that it's troubling to have such a basic test case failing, and also that digging into SLAP isn't a good use of your time right now. I talked last week with @jhkennedy, and we're planning to test the new LIVV on my Mac after the ACME meeting. Maybe at that point I could try this test myself and see if I reproduce the error. I could then try to track it down. (At that point I hope to have made enough progress with CICE IR that I could leave it for a day or two.) @stephenprice, does that sound OK to you? |
From @matthewhoffman on April 27, 2015 15:15 @whlipscomb , that sounds fine with me. Note that I do not see this error on my Mac - only on the Ubuntu machine in my office. |
From @stephenprice on April 27, 2015 15:28 @whlipscomb , yes fine with me too. I agree that we probably don't want to spend any significant amount of time trying to track down slap errors. At some point I worry that maintaining backward compatibility with the serial SIA code is going to drain more of our resources than it is worth. |
From @whlipscomb on April 27, 2015 15:47 If the problem is seen only on Ubuntu machines, I'm not sure I can be as helpful (unless @matthewhoffman wants to loan me his laptop for a couple of days). So I agree this should go on the back burner. |
With this commit, the user can specify config variable thermal_forcing_anomaly_basin to apply the anomaly to a single basin, e.g. the Amundsen Sea (ISMIP6 basin #9). If this variable is not specified, the default value is 0, which means the anomaly is applied to all basins. Applying anomalies to a single basin can be useful on multi-century time scales, e.g. to determine whether ASE retreat is locally forced or is triggered by retreat in neighboring basins.
Add build scripts for dutch national clusters lisa and cartesius
From @jhkennedy on April 23, 2015 22:47
CISM (dev and public) builds (with errors) on my Ubuntu systems. However, some of the test cases in the CISM dev branch do not run (detailed below).
Tested on both Ubuntu 14.04LTS and 12.04LTS -- with fresh installs
Build output:
For a parallel build these tests work (run as serial and parallel [where applicable]):
And these don't (errors detailed below):
Typical EISMINT-1 and EISMINT-2 output:
higher-order/slab output (serial):
higher-order/slab output (parallel):
Copied from original issue: E3SM-Project/cism-piscees#28
The text was updated successfully, but these errors were encountered: