Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gaia rpc node stop sync at block 15214022 #2476

Closed
Tracked by #2617
lcgogo opened this issue May 9, 2023 · 25 comments
Closed
Tracked by #2617

gaia rpc node stop sync at block 15214022 #2476

lcgogo opened this issue May 9, 2023 · 25 comments
Assignees

Comments

@lcgogo
Copy link

lcgogo commented May 9, 2023

Problem

I use gaia 9.0.1 before and it stop sync at block 15214022 today
So I upgrade to 9.1.0 but meet followed error

panic: precommit step; +2/3 prevoted for an invalid block: wrong Block.Header.AppHash.  Expected 0ED75E0CE39A6D85C6BC634DF17C73F193B6DB8F5B9C08580F26A4E2CC174713, got 0E5196A05C8ACA00C7B130A3FE5B6855CD4DD0195BDB9F5F00B36C825479C2FE

then gaia restart

Problem details

3:30AM INF Replay: Vote blockID={"hash":"B1F35612D2B8EA1652781010DDE1058B8475CDDAA020D8B48F39D90F8A91515C","parts":{"hash":"014DF211A143DE3BD009850D5D242E71254BAB91273DB8CB3C74E7997D342406","total":1}} height=15214023 module=consensus peer=eb644d5ede024ce6083c0f1ca038eb41b257b795 round=1 type=1
3:30AM INF Replay: Vote blockID={"hash":"B1F35612D2B8EA1652781010DDE1058B8475CDDAA020D8B48F39D90F8A91515C","parts":{"hash":"014DF211A143DE3BD009850D5D242E71254BAB91273DB8CB3C74E7997D342406","total":1}} height=15214023 module=consensus peer=eb644d5ede024ce6083c0f1ca038eb41b257b795 round=1 type=1
3:30AM INF Replay: Vote blockID={"hash":"B1F35612D2B8EA1652781010DDE1058B8475CDDAA020D8B48F39D90F8A91515C","parts":{"hash":"014DF211A143DE3BD009850D5D242E71254BAB91273DB8CB3C74E7997D342406","total":1}} height=15214023 module=consensus peer=eb644d5ede024ce6083c0f1ca038eb41b257b795 round=1 type=1
panic: precommit step; +2/3 prevoted for an invalid block: wrong Block.Header.AppHash.  Expected 0ED75E0CE39A6D85C6BC634DF17C73F193B6DB8F5B9C08580F26A4E2CC174713, got 0E5196A05C8ACA00C7B130A3FE5B6855CD4DD0195BDB9F5F00B36C825479C2FE

goroutine 211 [running]:
github.com/tendermint/tendermint/consensus.(*State).enterPrecommit(0xc000265180, 0xe825c7, 0x1)
	github.com/tendermint/[email protected]/consensus/state.go:1414 +0x179f
github.com/tendermint/tendermint/consensus.(*State).addVote(0xc000265180, 0xc08259a140, {0xc01d03d020, 0x28})
	github.com/tendermint/[email protected]/consensus/state.go:2137 +0x188f
github.com/tendermint/tendermint/consensus.(*State).tryAddVote(0xc000265180, 0xc08259a140, {0xc01d03d020?, 0xc30511c400?})
	github.com/tendermint/[email protected]/consensus/state.go:1963 +0x2c
github.com/tendermint/tendermint/consensus.(*State).handleMsg(0xc000265180, {{0x264a0e0?, 0xc08254a5b8?}, {0xc01d03d020?, 0xc08257d320?}})
	github.com/tendermint/[email protected]/consensus/state.go:861 +0x44b
github.com/tendermint/tendermint/consensus.(*State).readReplayMessage(0xc000265180, 0x1db9141?, {0x0?, 0x0?})
	github.com/tendermint/[email protected]/consensus/replay.go:81 +0x87f
github.com/tendermint/tendermint/consensus.(*State).catchupReplay(0xc000265180, 0xe825c7)
	github.com/tendermint/[email protected]/consensus/replay.go:160 +0x71d
github.com/tendermint/tendermint/consensus.(*State).OnStart(0xc000265180)
	github.com/tendermint/[email protected]/consensus/state.go:324 +0x19b
github.com/tendermint/tendermint/libs/service.(*BaseService).Start(0xc000265180)
	github.com/tendermint/[email protected]/libs/service/service.go:144 +0x2e9
github.com/tendermint/tendermint/consensus.(*Reactor).SwitchToConsensus(_, {{{0xb, 0x0}, {0x0, 0x0}}, {0xc3051179a0, 0xb}, 0x4f5b97, 0xe825c6, {{0xc01c2fecc0, ...}, ...}, ...}, ...)
	github.com/tendermint/[email protected]/consensus/reactor.go:129 +0x197
github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc000180000, 0x0)
	github.com/tendermint/[email protected]/blockchain/v0/reactor.go:324 +0x115d
created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart
	github.com/tendermint/[email protected]/blockchain/v0/reactor.go:112 +0x7a
@lcgogo lcgogo changed the title atom rpc node stop sync at block 15214022 gaia rpc node stop sync at block 15214022 May 9, 2023
@flynnji
Copy link

flynnji commented May 9, 2023

exactly same error....

@ricewang666
Copy link

same error too...

@flynnji
Copy link

flynnji commented May 9, 2023

me upgrade my nodes from v9.0.3 to v9.1.0 in the morning today, and met the same error as issue mentioned...

@bb4L
Copy link

bb4L commented May 9, 2023

same error on gaia v9.0.1

@nuaays
Copy link

nuaays commented May 9, 2023

same error too...

@ghost
Copy link

ghost commented May 9, 2023

same error here...

@lightmelv
Copy link

I have the same problem too, I use v9.0.3

@yihuang
Copy link

yihuang commented May 9, 2023

https://github.com/cosmos/gaia/releases/tag/v9.1.0
I guess related to this emergency release here.

@bb4L
Copy link

bb4L commented May 9, 2023

how can we recover from it?

is there another way than resyncing?

@nddeluca
Copy link

nddeluca commented May 9, 2023

@bb4L You can try gaiad rollback with v9.0.x, then restart with v9.1.0

@bb4L
Copy link

bb4L commented May 10, 2023

@bb4L You can try gaiad rollback with v9.0.x, then restart with v9.1.0

when trying this i get the same result as described in #2478

@lcgogo
Copy link
Author

lcgogo commented May 10, 2023

@bb4L You can try gaiad rollback with v9.0.x, then restart with v9.1.0

I have tried rollback, but rollback command only rollback one block

/ # which gaiad
/usr/bin/gaiad
/ # /usr/bin/gaiad --home /data/atom rollback
Rolled back state to height 15214021 and hash FF2C91F1F92CB7FC3AB0C96353138C40298A3338B23AE5DD707A02B67AA432EE/ #
/ #

After rollback, I start gaia 9.1.0, but meet another error, and the header block stop at 15214023

5:53AM INF Reconnecting to peer addr={"id":"d6318b3bd51a5e2b8ed08f2e520d50289ed32bf1","ip":"52.79.43.100","port":26656} module=p2p
5:53AM INF Error reconnecting to peer. Trying again addr={"id":"ec779a2741da6dd2ccdaa6dfc0bebb10e595dfa4","ip":"50.18.113.67","port":26656} err="auth failure: secret conn failed: read tcp 192.168.131.2:38026->50.18.113.67:26656: i/o timeout" module=p2p tries=0
5:53AM INF VSCPacket enqueued: chainID=neutron-1 len unbonding ops=0 len updates=1 module=x/ibc-provider vscID=743523
5:53AM INF executed block height=15214023 module=state num_invalid_txs=0 num_valid_txs=2
5:53AM INF commit synced commit=436F6D6D697449447B5B39342031303820313131203232352032323220363820393620313731203131302032392033352032313820323433203937203239203433203136362031373220313020313135203138392031313820332032333020313931203137342032323920363620313120323230203234332034395D3A4538323543377D
5:53AM INF committed state app_hash=5E6C6FE1DE4460AB6E1D23DAF3611D2BA6AC0A73BD7603E6BFAEE5420BDCF331 height=15214023 module=state num_txs=2
5:53AM INF indexed block exents height=15214023 module=txindex
panic: couldn't find validators at height 15214022 (height 15214023 was originally requested): %!w(<nil>)

goroutine 197 [running]:
github.com/tendermint/tendermint/state.getBeginBlockValidatorInfo(0xc2026d01e0, {0x2671ca8, 0xc0125d2180}, 0x4f5b97)
	github.com/tendermint/[email protected]/state/execution.go:346 +0x3ea
github.com/tendermint/tendermint/state.execBlockOnProxyApp({0x26649e8?, 0xc2334b5aa0}, {0x266ab70, 0xc00dd83880}, 0xc2026d01e0, {0x2671ca8, 0xc0125d2180}, 0xe825c7?)
	github.com/tendermint/[email protected]/state/execution.go:293 +0x219
github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x0}, {0xc2334d8040, 0x8}}, {0xc2334d8050, 0xb}, 0x4f5b97, 0xe825c7, {{0xc21522b3e0, ...}, ...}, ...}, ...)
	github.com/tendermint/[email protected]/state/execution.go:140 +0x171
github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc230e24a80, 0x0)
	github.com/tendermint/[email protected]/blockchain/v0/reactor.go:400 +0xbda
created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart
	github.com/tendermint/[email protected]/blockchain/v0/reactor.go:112 +0x7a

So I use old gaia to rollback muti-times, but only one block is rollback, can not rollback to 15214021 now.

/ # /usr/bin/gaiad --home /data/atom rollback
Rolled back state to height 15214023 and hash 5E6C6FE1DE4460AB6E1D23DAF3611D2BA6AC0A73BD7603E6BFAEE5420BDCF331/ #
/ # /usr/bin/gaiad --home /data/atom rollback
Rolled back state to height 15214023 and hash 5E6C6FE1DE4460AB6E1D23DAF3611D2BA6AC0A73BD7603E6BFAEE5420BDCF331/ #
/ # /usr/bin/gaiad --home /data/atom rollback
Rolled back state to height 15214023 and hash 5E6C6FE1DE4460AB6E1D23DAF3611D2BA6AC0A73BD7603E6BFAEE5420BDCF331/ #
/ # /usr/bin/gaiad --home /data/atom rollback
Rolled back state to height 15214023 and hash 5E6C6FE1DE4460AB6E1D23DAF3611D2BA6AC0A73BD7603E6BFAEE5420BDCF331/ #
/ # /usr/bin/gaiad --home /data/atom rollback 15214021
Rolled back state to height 15214023 and hash 5E6C6FE1DE4460AB6E1D23DAF3611D2BA6AC0A73BD7603E6BFAEE5420BDCF331/ #
/ # /usr/bin/gaiad --home /data/atom rollback 15214021

@lcgogo
Copy link
Author

lcgogo commented May 11, 2023

I use an old backup data and gaia 9.1.0 to sync, met the same problem at block 15056920

{"log":"\u001b[90m8:46AM\u001b[0m \u001b[32mINF\u001b[0m committed state \u001b[36mapp_hash=\u001b[0mADD4BCA73239366EB1
89AC61144120F4A20A1C7EB06A203254756F6483CC0145 \u001b[36mheight=\u001b[0m15056920 \u001b[36mmodule=\u001b[0mstate \u001
b[36mnum_txs=\u001b[0m8\n","stream":"stderr","time":"2023-05-10T08:46:10.225916076Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[32mINF\u001b[0m indexed block exents \u001b[36mheight=\u001b[0m15056920 \u001b
[36mmodule=\u001b[0mtxindex\n","stream":"stderr","time":"2023-05-10T08:46:10.234153363Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[1m\u001b[31mERR\u001b[0m\u001b[0m Error in validation \u001b[36merr=\u001b[0m\
"wrong Block.Header.AppHash.  Expected ADD4BCA73239366EB189AC61144120F4A20A1C7EB06A203254756F6483CC0145, got A50ACCE304
13B02C02380D5BCEDDC45EFDA1DB84322D3190D0C4F075430B981C\" \u001b[36mmodule=\u001b[0mblockchain\n","stream":"stderr","tim
e":"2023-05-10T08:46:10.234606891Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[32mINF\u001b[0m IBC fungible token transfer \u001b[36mamount=\u001b[0m2100000
\u001b[36mmodule=\u001b[0mx/ibc-transfer \u001b[36mreceiver=\u001b[0mjuno13gd97ke6erejqk2p050xkpc63jhtujrejmvj90 \u001b
[36msender=\u001b[0mcosmos13gd97ke6erejqk2p050xkpc63jhtujreyf0fzn \u001b[36mtoken=\u001b[0muatom\n","stream":"stderr","
time":"2023-05-10T08:46:09.773051368Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[32mINF\u001b[0m executed block \u001b[36mheight=\u001b[0m15056920 \u001b[36mmo
dule=\u001b[0mstate \u001b[36mnum_invalid_txs=\u001b[0m1 \u001b[36mnum_valid_txs=\u001b[0m7\n","stream":"stderr","time"
:"2023-05-10T08:46:09.863515696Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[32mINF\u001b[0m commit synced \u001b[36mcommit=\u001b[0m436F6D6D697449447B5B31
37332032313220313838203136372035302035372035342031313020313737203133372031373220393720323020363520333220323434203136322
03130203238203132362031373620313036203332203530203834203131372031313120313030203133312032303420312036395D3A453543303138
7D\n","stream":"stderr","time":"2023-05-10T08:46:10.225885506Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[32mINF\u001b[0m committed state \u001b[36mapp_hash=\u001b[0mADD4BCA73239366EB1
89AC61144120F4A20A1C7EB06A203254756F6483CC0145 \u001b[36mheight=\u001b[0m15056920 \u001b[36mmodule=\u001b[0mstate \u001
b[36mnum_txs=\u001b[0m8\n","stream":"stderr","time":"2023-05-10T08:46:10.225916076Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[32mINF\u001b[0m indexed block exents \u001b[36mheight=\u001b[0m15056920 \u001b
[36mmodule=\u001b[0mtxindex\n","stream":"stderr","time":"2023-05-10T08:46:10.234153363Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[1m\u001b[31mERR\u001b[0m\u001b[0m Error in validation \u001b[36merr=\u001b[0m\
"wrong Block.Header.AppHash.  Expected ADD4BCA73239366EB189AC61144120F4A20A1C7EB06A203254756F6483CC0145, got A50ACCE304
13B02C02380D5BCEDDC45EFDA1DB84322D3190D0C4F075430B981C\" \u001b[36mmodule=\u001b[0mblockchain\n","stream":"stderr","tim
e":"2023-05-10T08:46:10.234606891Z"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[1m\u001b[31mERR\u001b[0m\u001b[0m Stopping peer for error \u001b[36merr=\u001b
[0m\"blockchainReactor validation error: wrong Block.Header.AppHash.  Expected ADD4BCA73239366EB189AC61144120F4A20A1C7E
B06A203254756F6483CC0145, got A50ACCE30413B02C02380D5BCEDDC45EFDA1DB84322D3190D0C4F075430B981C\" \u001b[36mmodule=\u001
b[0mp2p \u001b[36mpeer=\u001b[0m{\"Data\":{},\"Logger\":{}}\n","stream":"stderr","time":"2023-05-10T08:46:10.238583317Z
"}
{"log":"\u001b[90m8:46AM\u001b[0m \u001b[32mINF\u001b[0m service stop \u001b[36mimpl=\u001b[0m{\"Logger\":{}} \u001b[36
mmodule=\u001b[0mp2p \u001b[36mmsg=\u001b[0m{} \u001b[36mpeer=\u001b[0m{\"id\":\"213857e741833d17275ea559bb2d0342398cec
99\",\"ip\":\"35.245.206.45\",\"port\":26656}\n","stream":"stderr","time":"2023-05-10T08:46:10.239251689Z"}

@mpoke mpoke added this to Cosmos Hub May 12, 2023
@mpoke mpoke moved this from 🩹 Triage to 📥 Todo in Cosmos Hub May 12, 2023
@github-project-automation github-project-automation bot moved this to 🩹 Triage in Cosmos Hub May 12, 2023
@faddat
Copy link
Contributor

faddat commented May 14, 2023

Hey guys, to me this is expected behavior.

Except of course for the block height. So the cosmos hub went down and came back up at block 15213800

This was a breaking change. So from now on validators are going to need to do the very same thing: go down at that height, and then come back up at 15213800. I don't think that there's any way around this. I believe that this is a permanent matter without a clear solution.

Update: there is a somewhat clear solution for this. Specifically, set a hall height after the start of version 9, the halt height should be 15213800 and after your node halts, start it again without the halt height set. This will avoid the issue.

@maomaozhou
Copy link

I also encountered the same problem, so how can we solve it

@MuchaFortyeighth
Copy link

i got the same. why can't rollback to the specified block height

@lightmelv
Copy link

Try to restart it with snapshot
https://quicksync.io/networks/cosmos.html

@maomaozhou
Copy link

Try to restart it with snapshot https://quicksync.io/networks/cosmos.html

This method is too expensive, is it equivalent to resynchronizing all nodes?

@lightmelv
Copy link

@maomaozhou or you can try this snapshot https://polkachu.com/tendermint_snapshots/cosmos

@faddat
Copy link
Contributor

faddat commented May 19, 2023

I don't recommend snapshots since they're not as secure as state sync.

To state sync gaia v9.1.0 you do like:

git clone https://github.com/cosmos/gaia
cd gaia
bash contrib/statesync.bash

@faddat
Copy link
Contributor

faddat commented May 19, 2023

@mpoke wdyt about giving v9.0.x a halt height by default, so that users don't encounter this issue?

To users: what specifically are you trying to accomplish?

@MSalopek
Copy link
Contributor

MSalopek commented Jun 26, 2023

Unfortunately, as already stated above this is expected behaviour for a co-ordinated emergency upgrade.

The OP posted that an app hash happened on block 15214022 with v9.0.1, however, v9.1.0 should have been used from height >=15213800

An app hash error will always happen in case you don't upgrade your node version since your node will be using a version different than the rest of the network that cannot process certain messages (data) correctly and update the state database correctly. Due to incorrect data processing the node cannot participate in consensus.

Emergency releases are a last resort but they can happen on any blockchain if a critical vulnerability is found in any of the modules used by the chain binary.

If possible, follow the releases and discord channels where regular updates are posted.

Please let us know if you want to be notified about upcoming upgrades through different channels and reach out so we can set those up.

@mmulji-ic mmulji-ic moved this from 📥 Todo to 🏗 In progress in Cosmos Hub Jun 27, 2023
@mmulji-ic
Copy link
Contributor

@mpoke wdyt about giving v9.0.x a halt height by default, so that users don't encounter this issue?

To users: what specifically are you trying to accomplish?

Hey @faddat what do you mean by this?

@MSalopek MSalopek removed their assignment Jul 10, 2023
@yaruwangway yaruwangway self-assigned this Jul 10, 2023
@yaruwangway
Copy link
Contributor

Hi @lcgogo, please let me know if your issue is solved. If yes, we will close this issue.

@yaruwangway
Copy link
Contributor

Hi, @lcgogo, please let us know if you still experience this issue, otherwise, we will close it end of this week, you can always reopen if the issue still exists.

@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Cosmos Hub Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

No branches or pull requests