diff --git a/.gitignore b/.gitignore index f1c6544..2a040e5 100644 --- a/.gitignore +++ b/.gitignore @@ -105,3 +105,9 @@ venv.bak/ # data files referencePathways/reactome/*.gOut + +# output files +*.sif + +# .DS_store +*.DS_Store \ No newline at end of file diff --git a/README.md b/README.md index 049a4d8..27dd3a1 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,18 @@ More details of the algorithm can be found in: Chris S Magnano, Anthony Gitter. *npj Systems Biology and Applications*, 7:12, 2021. +## Edge Handling +The code is designed to process both undirected and directed edges, prioritizing directed edges in scenarios where an equivalent undirected edge exists and selecting higher edge weights in the case of duplicate edges. + +## Input Format Example +The input should be formatted as follows, with columns for node1, node2, rank, and direction: +``` +A B 0.9 U +B A 0.1 D +... +``` +In this format, "U" represents an undirected edge, and "D" represents a directed edge. + ## Dependencies Google's [OR-Tools library](https://developers.google.com/optimization/flow/mincostflow) is required to run this script. @@ -32,3 +44,8 @@ Python 3 is required to run this script > --output Prefix for all output files. > > --capacity The amount of flow which can pass through a single edge. + +## Testing +`python test_minCostFlow.py` + +The code executes two sets of graph series, namely the 'graph series' and the 'test series' The graphs series of graphs are used to check the code's correctness. Except for internal tiebreaking by the solver, each result is deterministic. The tests series of graphs are used to verify whether the code is executing appropriately depending on distinct edge cases. The expected results for both series can be found in graphs/correct_outputs.txt for the graph series and tests/correct_outputs.txt for the test series. \ No newline at end of file diff --git a/graphs/correct_outputs.txt b/graphs/correct_outputs.txt new file mode 100644 index 0000000..7a36547 --- /dev/null +++ b/graphs/correct_outputs.txt @@ -0,0 +1,53 @@ +The graphs series of graphs are used to check the code's correctness. Each result is deterministic. + +graph1: +A B D +B D D + +graph2: +A B D +B D D + +graph3: +A C D +C D D + +graph4: +A B D +B D D + +graph5: +A B U +B D U + +graph6: +A B U +B D U + +graph7: +A B U +B D U + +graph8: +A B U +B D U + +graph9: +A B D +B D U + +graph10: +A B D +B D D + +graph11: +A B U +B D U + +graph12: +A B D +B D D + +graph13: +A B U +B D U diff --git a/graphs/graph1/edges.txt b/graphs/graph1/edges.txt index e002179..c771da0 100644 --- a/graphs/graph1/edges.txt +++ b/graphs/graph1/edges.txt @@ -1,4 +1,4 @@ -A B 0.9 -A C 0.1 -B D 0.9 -C D 0.1 \ No newline at end of file +A B 0.9 D +A C 0.1 D +B D 0.9 D +C D 0.1 D \ No newline at end of file diff --git a/graphs/graph10/edges.txt b/graphs/graph10/edges.txt new file mode 100644 index 0000000..71c125d --- /dev/null +++ b/graphs/graph10/edges.txt @@ -0,0 +1,4 @@ +A B 0.9 D +A C 0.1 U +B D 0.9 D +C D 0.1 U diff --git a/graphs/graph10/sources.txt b/graphs/graph10/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph10/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph10/targets.txt b/graphs/graph10/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph10/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/graphs/graph11/edges.txt b/graphs/graph11/edges.txt new file mode 100644 index 0000000..eb56b16 --- /dev/null +++ b/graphs/graph11/edges.txt @@ -0,0 +1,4 @@ +A B 0.9 U +A C 0.1 D +B D 0.9 U +C D 0.1 D diff --git a/graphs/graph11/sources.txt b/graphs/graph11/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph11/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph11/targets.txt b/graphs/graph11/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph11/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/graphs/graph12/edges.txt b/graphs/graph12/edges.txt new file mode 100644 index 0000000..828be53 --- /dev/null +++ b/graphs/graph12/edges.txt @@ -0,0 +1,4 @@ +A B 0.9 D +A C 0.1 D +B D 0.8 D +C D 0.2 D diff --git a/graphs/graph12/sources.txt b/graphs/graph12/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph12/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph12/targets.txt b/graphs/graph12/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph12/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/graphs/graph13/edges.txt b/graphs/graph13/edges.txt new file mode 100644 index 0000000..5b5ce23 --- /dev/null +++ b/graphs/graph13/edges.txt @@ -0,0 +1,4 @@ +A B 0.9 U +A C 0.1 U +B D 0.8 U +C D 0.2 U diff --git a/graphs/graph13/sources.txt b/graphs/graph13/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph13/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph13/targets.txt b/graphs/graph13/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph13/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/graphs/graph2/edges.txt b/graphs/graph2/edges.txt index d5212d2..33cf94d 100644 --- a/graphs/graph2/edges.txt +++ b/graphs/graph2/edges.txt @@ -1,5 +1,5 @@ -A B 0.9 -A C 0.1 -B D 0.9 -C D 0.1 -A D 0.8 \ No newline at end of file +A B 0.9 D +A C 0.1 D +B D 0.9 D +C D 0.1 D +A D 0.8 D \ No newline at end of file diff --git a/graphs/graph3/edges.txt b/graphs/graph3/edges.txt index 15e48ba..ccef531 100644 --- a/graphs/graph3/edges.txt +++ b/graphs/graph3/edges.txt @@ -1,4 +1,4 @@ -A B 0.9 -A C 0.1 -B D 0.1 -C D 0.9 +A B 0.9 D +A C 0.1 D +B D 0.1 D +C D 0.9 D diff --git a/graphs/graph4/edges.txt b/graphs/graph4/edges.txt index e1cf8cd..5d440a3 100644 --- a/graphs/graph4/edges.txt +++ b/graphs/graph4/edges.txt @@ -1,4 +1,4 @@ -A B 0.9 -A C 0.9 -B D 0.9 -C D 0.9 +A B 0.9 D +A C 0.9 D +B D 0.9 D +C D 0.9 D diff --git a/graphs/graph5/edges.txt b/graphs/graph5/edges.txt new file mode 100644 index 0000000..74294cf --- /dev/null +++ b/graphs/graph5/edges.txt @@ -0,0 +1,4 @@ +A B 0.9 U +A C 0.1 U +B D 0.9 U +C D 0.1 U \ No newline at end of file diff --git a/graphs/graph5/sources.txt b/graphs/graph5/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph5/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph5/targets.txt b/graphs/graph5/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph5/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/graphs/graph6/edges.txt b/graphs/graph6/edges.txt new file mode 100644 index 0000000..1064f97 --- /dev/null +++ b/graphs/graph6/edges.txt @@ -0,0 +1,5 @@ +A B 0.9 U +A C 0.1 U +B D 0.9 U +C D 0.1 U +A D 0.8 U \ No newline at end of file diff --git a/graphs/graph6/sources.txt b/graphs/graph6/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph6/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph6/targets.txt b/graphs/graph6/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph6/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/graphs/graph7/edges.txt b/graphs/graph7/edges.txt new file mode 100644 index 0000000..a964d9f --- /dev/null +++ b/graphs/graph7/edges.txt @@ -0,0 +1,4 @@ +A B 0.9 U +A C 0.1 U +B D 0.1 U +C D 0.9 U diff --git a/graphs/graph7/sources.txt b/graphs/graph7/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph7/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph7/targets.txt b/graphs/graph7/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph7/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/graphs/graph8/edges.txt b/graphs/graph8/edges.txt new file mode 100644 index 0000000..f5496da --- /dev/null +++ b/graphs/graph8/edges.txt @@ -0,0 +1,4 @@ +A B 0.9 U +A C 0.9 U +B D 0.9 U +C D 0.9 U diff --git a/graphs/graph8/sources.txt b/graphs/graph8/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph8/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph8/targets.txt b/graphs/graph8/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph8/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/graphs/graph9/edges.txt b/graphs/graph9/edges.txt new file mode 100644 index 0000000..6edc4bb --- /dev/null +++ b/graphs/graph9/edges.txt @@ -0,0 +1,4 @@ +A B 0.9 D +A C 0.1 U +B D 0.9 U +C D 0.1 D diff --git a/graphs/graph9/sources.txt b/graphs/graph9/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/graphs/graph9/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/graphs/graph9/targets.txt b/graphs/graph9/targets.txt new file mode 100644 index 0000000..02358d2 --- /dev/null +++ b/graphs/graph9/targets.txt @@ -0,0 +1 @@ +D \ No newline at end of file diff --git a/minCostFlow.py b/minCostFlow.py index ebd10b9..0dde89f 100644 --- a/minCostFlow.py +++ b/minCostFlow.py @@ -11,6 +11,10 @@ import argparse from ortools.graph.python.min_cost_flow import SimpleMinCostFlow +# (node1, node2) : weight +directed_dict = dict() +undirected_dict = dict() + def parse_nodes(node_file): ''' Parse a list of sources or targets and return a set ''' with open(node_file) as node_f: @@ -26,13 +30,17 @@ def construct_digraph(edges_file, cap): capacity of 1. ''' G = SimpleMinCostFlow() - idDict = dict() #Hold names to number ids + idDict = dict() # Hold names to number ids curID = 0 default_capacity = int(cap) with open(edges_file) as edges_f: for line in edges_f: tokens = line.strip().split() + + if len(tokens) != 4 : + raise ValueError (f"Each row in the edges file {edges_file} should contain 4 values to define an edge. Currently a row has {len(tokens)} values.") + node1 = tokens[0] if not node1 in idDict: idDict[node1] = curID @@ -41,10 +49,42 @@ def construct_digraph(edges_file, cap): if not node2 in idDict: idDict[node2] = curID curID += 1 - #Google's solver can only handle int weights, so round to the 100th - w = int((1-(float(tokens[2])))*100) - G.add_arc_with_capacity_and_unit_cost(idDict[node1],idDict[node2], default_capacity, int(w)) - G.add_arc_with_capacity_and_unit_cost(idDict[node2],idDict[node1], default_capacity, int(w)) + # Google's solver can only handle int weights, so round to the 100th + w = int((1-(float(tokens[2])))*100) # lower the weight from token[2], higher the cost + d = tokens[3] + edge = (node1, node2) + sorted_edge = tuple(sorted(edge, reverse=False)) # all undirected edges are sorted edges + sorted_edge_reverse = tuple(sorted(edge, reverse=True)) + + if d == "D": + if edge in directed_dict: + if w < directed_dict[edge]: # if weight is lower than the current edge, replace with newer edge weight + directed_dict[edge] = w + elif sorted_edge in undirected_dict: # priorize directed edges over undirected edges + del undirected_dict[sorted_edge] + directed_dict[edge] = w + else: # edge not in directed_dict + directed_dict[edge] = w + + elif d == "U": + # add new edge to undirected dict; check for edge existing in directed_edges or undirected_dict + # if edge == sorted_edge, there is a chance reverse of edge (sorted_edge_reverse) is still in the directed_dict + if edge not in directed_dict and sorted_edge not in directed_dict and sorted_edge_reverse not in directed_dict and sorted_edge not in undirected_dict: + undirected_dict[sorted_edge] = w + elif sorted_edge in undirected_dict: + if w < undirected_dict[sorted_edge]: # if weight is lower than the current edge, replace with newer edge weight + undirected_dict[sorted_edge] = w + else: + raise ValueError (f"Cannot add edge: d = {d}") + + + # go through and add the edges from directed_dict and undirected_dict to G + for key, value in directed_dict.items(): + G.add_arc_with_capacity_and_unit_cost(idDict[key[0]],idDict[key[1]], default_capacity, int(value)) + for key, value in undirected_dict.items(): + G.add_arc_with_capacity_and_unit_cost(idDict[key[0]],idDict[key[1]], default_capacity, int(value)) + G.add_arc_with_capacity_and_unit_cost(idDict[key[1]],idDict[key[0]], default_capacity, int(value)) + idDict["maxID"] = curID return G,idDict @@ -87,8 +127,9 @@ def write_output_to_sif(G,out_file_name,idDict): names = {v: k for k, v in idDict.items()} numE = 0 for i in range(G.num_arcs()): - node1 = names[G.head(i)] - node2 = names[G.tail(i)] + node1 = names[G.tail(i)] + node2 = names[G.head(i)] + flow = G.flow(i) if flow <= 0: continue @@ -97,7 +138,17 @@ def write_output_to_sif(G,out_file_name,idDict): if node2 in ["source","target"]: continue numE+=1 - out_file.write(node1+"\t"+node2+"\n") + + edge = (node1, node2) + sorted_edge = tuple(sorted(edge)) + + if edge in directed_dict: + out_file.write(edge[0]+"\t"+edge[1]+"\t"+"D"+"\n") + elif sorted_edge in undirected_dict: + out_file.write(sorted_edge[0]+"\t"+sorted_edge[1]+"\t"+"U"+"\n") + else: + raise KeyError(f"edge {edge} is not in the dicts") + print("Final network had %d edges" % numE) out_file.close() diff --git a/test_minCostFlow.py b/test_minCostFlow.py new file mode 100644 index 0000000..5833cf2 --- /dev/null +++ b/test_minCostFlow.py @@ -0,0 +1,34 @@ +import subprocess + +command = "python" +script = "minCostFlow.py" + +print("TEST SERIES") +for i in range (1,8): + + print("test: ",i) + args = [ + "--edges_file", f"tests/test{i}/edges.txt", + "--sources_file", f"tests/test{i}/sources.txt", + "--targets_file", f"tests/test{i}/targets.txt", + "--output", f"test{i}" + ] + cmd = [command, script] + args + + # Run the command + subprocess.run(cmd) + + +print("\nGRAPHS SERIES") +for i in range (1,14): + print("graph: ",i) + args = [ + "--edges_file", f"graphs/graph{i}/edges.txt", + "--sources_file", f"graphs/graph{i}/sources.txt", + "--targets_file", f"graphs/graph{i}/targets.txt", + "--output", f"graph{i}" + ] + cmd = [command, script] + args + + # Run the command + subprocess.run(cmd) \ No newline at end of file diff --git a/tests/correct_outputs.txt b/tests/correct_outputs.txt new file mode 100644 index 0000000..fd5ea5f --- /dev/null +++ b/tests/correct_outputs.txt @@ -0,0 +1,35 @@ +The tests series of graphs are used to verify whether the code is executing appropriately depending on distinct edge cases. + +test 1: check if unique directed edges are added to directed_dict +Output: +A B D +B C D + +test2: check if higher edge weight is selected for the same directed edge from the input +Output: +A B D +B C D + +test3: If a directed edge is present in the input and an undirected edge from that edge already exists, the directed edge is prioritized and added to directed_dict and the undirected edge is deleted from undirected_dict. +Output: +B C D +A B D + +test4: check if unique undirected edges are added to undirected_dict +Output: +A B U +B C U + +test5: check that an undirected edge is not added if a directed edge of that edge already exists +Output: +A B D +B C D + +test6: check if higher edge weight is selected for the same undirected edge from the input +Output: +A B U +B C U + +test7: check that code still runs and outputs an error message with an empty edges.txt +Output: +N/A diff --git a/tests/test1/edges.txt b/tests/test1/edges.txt new file mode 100644 index 0000000..768b395 --- /dev/null +++ b/tests/test1/edges.txt @@ -0,0 +1,3 @@ +A B 0.1 D +B C 0.1 D +B C 0.1 D diff --git a/tests/test1/sources.txt b/tests/test1/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/tests/test1/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/tests/test1/targets.txt b/tests/test1/targets.txt new file mode 100644 index 0000000..96d80cd --- /dev/null +++ b/tests/test1/targets.txt @@ -0,0 +1 @@ +C \ No newline at end of file diff --git a/tests/test2/edges.txt b/tests/test2/edges.txt new file mode 100644 index 0000000..de81734 --- /dev/null +++ b/tests/test2/edges.txt @@ -0,0 +1,4 @@ +A B 0.2 D +B C 0.2 D +A B 0.9 D +B C 0.1 D \ No newline at end of file diff --git a/tests/test2/sources.txt b/tests/test2/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/tests/test2/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/tests/test2/targets.txt b/tests/test2/targets.txt new file mode 100644 index 0000000..96d80cd --- /dev/null +++ b/tests/test2/targets.txt @@ -0,0 +1 @@ +C \ No newline at end of file diff --git a/tests/test3/edges.txt b/tests/test3/edges.txt new file mode 100644 index 0000000..85d22e4 --- /dev/null +++ b/tests/test3/edges.txt @@ -0,0 +1,4 @@ +A B 0.1 U +B C 0.1 D +A B 0.1 D +B C 0.1 U \ No newline at end of file diff --git a/tests/test3/sources.txt b/tests/test3/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/tests/test3/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/tests/test3/targets.txt b/tests/test3/targets.txt new file mode 100644 index 0000000..96d80cd --- /dev/null +++ b/tests/test3/targets.txt @@ -0,0 +1 @@ +C \ No newline at end of file diff --git a/tests/test4/edges.txt b/tests/test4/edges.txt new file mode 100644 index 0000000..48feaa9 --- /dev/null +++ b/tests/test4/edges.txt @@ -0,0 +1,3 @@ +A B 0.1 U +B C 0.1 U +B C 0.1 U diff --git a/tests/test4/sources.txt b/tests/test4/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/tests/test4/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/tests/test4/targets.txt b/tests/test4/targets.txt new file mode 100644 index 0000000..96d80cd --- /dev/null +++ b/tests/test4/targets.txt @@ -0,0 +1 @@ +C \ No newline at end of file diff --git a/tests/test5/edges.txt b/tests/test5/edges.txt new file mode 100644 index 0000000..1647c45 --- /dev/null +++ b/tests/test5/edges.txt @@ -0,0 +1,5 @@ +A B 0.1 D +A B 0.1 U +B A 0.1 U +B C 0.1 D +B C 0.1 U \ No newline at end of file diff --git a/tests/test5/sources.txt b/tests/test5/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/tests/test5/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/tests/test5/targets.txt b/tests/test5/targets.txt new file mode 100644 index 0000000..96d80cd --- /dev/null +++ b/tests/test5/targets.txt @@ -0,0 +1 @@ +C \ No newline at end of file diff --git a/tests/test6/edges.txt b/tests/test6/edges.txt new file mode 100644 index 0000000..aad1cb9 --- /dev/null +++ b/tests/test6/edges.txt @@ -0,0 +1,4 @@ +A B 0.2 U +B C 0.2 U +A B 0.9 U +B C 0.1 U \ No newline at end of file diff --git a/tests/test6/sources.txt b/tests/test6/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/tests/test6/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/tests/test6/targets.txt b/tests/test6/targets.txt new file mode 100644 index 0000000..96d80cd --- /dev/null +++ b/tests/test6/targets.txt @@ -0,0 +1 @@ +C \ No newline at end of file diff --git a/tests/test7/edges.txt b/tests/test7/edges.txt new file mode 100644 index 0000000..e69de29 diff --git a/tests/test7/sources.txt b/tests/test7/sources.txt new file mode 100644 index 0000000..8c7e5a6 --- /dev/null +++ b/tests/test7/sources.txt @@ -0,0 +1 @@ +A \ No newline at end of file diff --git a/tests/test7/targets.txt b/tests/test7/targets.txt new file mode 100644 index 0000000..96d80cd --- /dev/null +++ b/tests/test7/targets.txt @@ -0,0 +1 @@ +C \ No newline at end of file