Skip to content
This repository has been archived by the owner on Sep 17, 2019. It is now read-only.

Added a framework to be able to diff shitty/ios config #277

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

dbarrosop
Copy link
Member

@dbarrosop dbarrosop commented Jul 15, 2017

This framework does two things:

  1. Given a nested configuration it creates a nested structure:
    https://github.com/napalm-automation/napalm-base/compare/indent_differ?expand=1#diff-cae653bb93fa20df55fada139a153965R22
  2. Adds code to be able to diff two IndentedConfig objects.

What I would like to do is have people help testing (2). To do it you need to create a folder like this one; test/unit/test_indent_differ/test_case_1/. Inside, you have to put three files:

  1. cli.1.show_running_config.0. A configuration file retrieved from a device with a poor configuration model like IOS uses.
  2. candidate.txt. A bunch of commands you want to merge on the device.
  3. diff.txt. The expected diff.

Finally, run the test with py.test test/unit/test_indent_differ.py::Test_Indent_differ

Feel free to send PRs against the branch indent_differ for tests both passing and failing. The more we have the more assurance we will have this actually works.

@itdependsnetworks
Copy link

You can ignore my comment, the double git confused me

@ogenstad
Copy link
Contributor

Just tested briefly, it looks nice. It will need some more ordering logic though, for example with access-lists. I sent in #278, which Travis seems happy with but it doesn't look like the tests are run. It fails locally for me due to ordering issues. It could possibly work sometimes but it will be random. Perhaps an ordered dict would do it.

@mirceaulinic mirceaulinic added this to the APPROVED milestone Jul 15, 2017

It is important this is up to date or the diff might fail to detect changes properly.
"""
EXACT_MATCHES = [
Copy link
Member

@mirceaulinic mirceaulinic Jul 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have two questions:

  1. What happens if a command that can have multiple instances does not appear in this list?
  2. How long is this list going to be?

Copy link
Member Author

@dbarrosop dbarrosop Jul 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It might fail to generate a proper diff.
  2. Only god and John Chamber know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am worried that item 2 might be huge and that might cause this diff to be unreliable and quite a bit of work to maintain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maintenance is easy, we just have to add the missing statements as we find them. Lots could actually be autodiscovered as the identifier is going to be an IP address or a number so you can regex the command and try to figure out if there is an identifier. For example, logging syslog 1.1.1.1, ntp server 1.1.1.1, ip address 1.1.1.1. Not sure if it's worth adding the extra complexity/computation as commands are finite, at some point you will have a complete list.

Also, this is best effort. People is free to not trust our diff and go and yell their vendors until they provide a reliable mechanism.

@mirceaulinic
Copy link
Member

Out of curiosity: what happens if you load a configuration that removes an entire (sub-)block?
Let's have a better example:

running-config:

ntp server 1.2.3.4
ntp peer 5.6.7.8

Loading no ntp I would expect to provide the diff:

- ntp server 1.2.3.4
- ntp peer 5.6.7.8

Would that be the case?

parsed[last] = parse_indented_config(config, leading_spaces, current_indent, True)
elif leading_spaces < current_indent:
config.insert(0, line)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break are hidden traps in general, it's good practice to avoid.
I would suggest having a simple for loop going through the lines.

Copy link
Member Author

@dbarrosop dbarrosop Jul 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are break hidden traps? Is there a PEP or something I can look at? I don’t think you can build this with a for loop but if you have ideas I am all ears.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what is the argument here a break is very common pattern. Why should they be avoided?

Copy link
Member

@mirceaulinic mirceaulinic Jul 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using / avoiding break tends to be very subjective (everyone has a different opinion: https://stackoverflow.com/questions/3922599/is-it-a-bad-practice-to-use-break-in-a-for-loop, https://ubuntuforums.org/showthread.php?t=810001), so this is more like a suggestion.

I tend to avoid, in particular when the length is deterministic, like here (i.e., you are not waiting for some input you don't know how much it is going to take, but the config has a fixed size).
In this case, I think it is even simpler, as you can replace the break with return parsed.

I've dug a bit and finally found what I was looking for: http://wiki.c2.com/?IsBreakStatementArchaic, although speaking mainly about the switch-case, it covers also the usage inside the loops; it's a good article, well explained.

Some snippets:

It should be noted that break originally existed because C was originally (and for the most part still is) essentially a mid-high-level portable assembler.

I don't recommend that it be removed from the language, only that a better alternative is also offered. In other words, extend the language with a better construct. The old one is simply deprecated, but supported.

And I think there are very good alternatives in Python.

Anyway, the main reasoning on why better return in the favor of break:

Use a return statement, which causes execution to leave the current subroutine and resume at the point in the code immediately after where the subroutine was called (return address).

Which is a bit more optimized approach rather than a forced interrupt.

Copy link
Member Author

@dbarrosop dbarrosop Jul 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

particular when the length is deterministic, like here

That's where you are wrong. The length is not deterministic. I don't know where the block ends until I find the end of the block. This is a parser, I am consuming tokens and building an object as I make sense of the unstructured text. I have no prior knowledge or clue of how the original text look like until I look at it and I certainly can't predict how the resulting object will look like.

In this case, I think it is even simpler, as you can replace the break with return parsed.

Consider this case:

while True:
    if not data:
        break
    token = data.pop(0)
    # things happen
    if condition:
        break
return a

What you are proposing is to do this instead:

while True:
    if not data:
        return a
    token = data.pop(0)
    # things happen
    if condition:
        return a

Which is IMHO duplicating code because I have the same return statement twice and if I now have to process a before returning I will have to duplicate more code:

while True:
    if not data:
        # code to process a
        return a
    token = data.pop(0)
    # things happen
    if condition:
        # code to process a
        return a

With a break statement the code is as clear and I avoid duplicating code:

while True:
    if not data:
       break
    token = data.pop(0)
    # things happen
    if condition:
        break
# code to process a
return a

This is an extremely common pattern in parsers and search algorithms:

https://github.com/python/cpython/blob/6f0eb93183519024cb360162bdd81b9faec97ba6/Lib/email/parser.py#L42

If you look at the stdlib and search for parse you will see most parsers replicate that pattern.

And I think there are very good alternatives in Python.

For example?

Which is a bit more optimized approach rather than a forced interrupt.

I tested that statement:

def test_break():
    while True:
        break
    return

for _ in range(0, 10000000):
    test_break()

python test_break.py  1.71s user 0.15s system 99% cpu 1.874 total
def test_return():
    while True:
        return

for _ in range(0, 10000000):
    test_return()

python test_return.py  1.62s user 0.15s system 99% cpu 1.778 total

So, every 10.000.0000 executions we save 0.1s. I think we can leave with the extra cost of break if we save a few bytes of memory and avoid duplicating code :P

@dbarrosop
Copy link
Member Author

@ogenstad, yes an ordereddict is probably a good idea. Will look at your PR and at changing the logic to use an ordereddict to preserve order.

@dbarrosop
Copy link
Member Author

@mirceaulinic regarding the ntp case you gave, that’s exactly how it works. Check test case 1, there is a logging and a bgp neighbor examples that I think behave as you described.

@ubaumann ubaumann mentioned this pull request Jul 16, 2017
return parsed


def _can_have_multiple(command):
Copy link

@sjtarik sjtarik Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a platform dependent list, although most Cisco rip-offs use pretty much the same syntax. For NXOS "ip access-list", ""object-group ip" should be added. I wonder if we can scan through the show running config lines or candidate config lines, split the words and check if one word, two words, and three words key have collusion (multiple occurences) if so build this EXACT_MATCHES list dynamically from this collusion list. This will only work if existing config (running or candidate) have collusions though.

* Add vlan example

* Fix typo
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants