|
1 |
| -Coming soon... |
| 1 | +Source: https://www.hackerrank.com/challenges/torque-and-development/ |
| 2 | + |
| 3 | +TODO(domfarolino): revise this post. |
| 4 | + |
| 5 | +This is a pretty interesting graph problem. It vexed me for a bit until I made some cruicial realizations. |
| 6 | + |
| 7 | +# Divide the problem into connected components |
| 8 | + |
| 9 | +When starting with this problem I fumbled around quite a bit. Eventually I came to some good realizations: |
| 10 | + |
| 11 | + - We'll need at least one library per connected component |
| 12 | + - In each component, there are two extremes: |
| 13 | + - Every city in a connected component has a library |
| 14 | + - Only one city in a connected component has a library |
| 15 | + |
| 16 | +My next thought was that the naive solution would be to find all possible combinations of library/road |
| 17 | +allocations in between the extremes, which seems combinatorially explosive. For example, what if there |
| 18 | +were not the extreme `n` libraries and `0` roads in a component, but instead `n - 1` libraries and `1` |
| 19 | +road, or `n - 2` libraries and `2` roads. How many different ways can we |
| 20 | +[*choose*](https://en.wikipedia.org/wiki/Binomial_coefficient) how to allocate which cities have libraries |
| 21 | +and which cities to connect, and more importantly, does the choosing of these actually affect the cost? |
| 22 | +Determining the number of possible choices we can make when allocating libraries to cities is actually pretty |
| 23 | +easy (it's just the summation of binomial coefficients, [see here](https://math.stackexchange.com/questions/519832/)), |
| 24 | +it would just be combinatorially explosive to go through each one; was it necessary? |
| 25 | + |
| 26 | +When looking at an example graph with five (once-) connected cities I realized that the allocation of libraries |
| 27 | +doesn't matter at all and won't affect the cost. (I was considering the idea that perhaps the degree of each city |
| 28 | +might have an affect on, or indicate priority of library assignment). The allocation makes no difference as long |
| 29 | +as we don't waste a road connecting two library-bearing cities, because why would we do that? |
| 30 | + |
| 31 | +[Enter a tangent]... |
| 32 | + |
| 33 | +The whole reason this accidentally-connecting-two-library-bearing-cities issue came up is because I was examining a |
| 34 | +quite feasible 5-city graph with a cycle trying to allocate `3` libraries and `2` roads. I wondered if I could choose |
| 35 | +a "bad" allocation of libraries and roads, namely one that doesn't actually connect each city in the component. This is |
| 36 | +certainly possible in a graph with cycles when only dealing with `numberOfCities` resources (`3` libraries and `2` roads). |
| 37 | + |
| 38 | +I was then worried about making sure my implementation would not accidentally theoretically waste a road on two |
| 39 | +library-bearing cities, and then I realized well yeah, if the allocation doesn't matter, we just have to know that |
| 40 | +some working allocation exists, and that will be the minimum total cost for such choices of the number of libraries |
| 41 | +and roads for that connected component. |
| 42 | + |
| 43 | +# A connected component is at least a tree |
| 44 | + |
| 45 | +The "choice" of which roads to build dissolves when you realize that the connected component by definition is at least a |
| 46 | +tree, and thus always has valid allocations of libraries and roads in the form of: |
| 47 | + |
| 48 | +`N - K` libraries + `K` roads, `∀ K < N` (remember, we need at least one library). |
| 49 | + |
| 50 | +This means each connected component had `N` possible solutions, and for each of the values of `K`, we needed to choose the |
| 51 | +minumum one. Going through some examples I realized the best answer always seemed to be one of the extreme allocations, namely |
| 52 | +an allocation with all `N` libraries or only `1` library. I tried to find an example where one of the middleground less |
| 53 | +extreme allocations could be more optimal, but I came to the conclusion that that will never be the case, because we greedily |
| 54 | +want to choose to employ as many of the cheapest resource (either libraries or roads) as possible. In other words, if roads were |
| 55 | +cheaper to build then libraries, and there exist the possible roads to repair to connect the entire component (the definition!), |
| 56 | +then we'd want to only build `1` library, and as many remaining roads as we'd need. We could build two libraries, and one less |
| 57 | +road, but that would give us the same connected result but with a higher cost, unnecessarily. |
| 58 | + |
| 59 | +# Implementation design |
| 60 | + |
| 61 | +When thinking about the implementation, I knew the number of connected components was relevant to this problem. I also knew |
| 62 | +we could get an entire connected component (but more importantly its size) using a trivial-to-implement BFS algorithm. I figured |
| 63 | +I'd use an adjecency list to store the graph, since I wasn't going to perform any operations that a matrix would be more suited |
| 64 | +for. The necessary steps were something like this: |
| 65 | + |
| 66 | + - Build the graph's adjacency list |
| 67 | + - For each connected component |
| 68 | + - Get the size of the component |
| 69 | + - Minimal cost of connecting this component was `min(a, b)` where: |
| 70 | + - `a = numCities * costLib` |
| 71 | + - `b = costLib + (numCities - 1) * costRoad` |
| 72 | + - With the minimal cost of the component in hand, add the value to the running some, and perform the same operation for the next component. |
| 73 | + |
| 74 | +Moving from component-to-component is as easy as just using BFS with some sort of global visitation store. |
| 75 | +We can try to find a connected component from each given city. The first time we run BFS, we'll mark *all* nodes in |
| 76 | +the discovered component as visited. Then in the next given city, we'll try to find another connected component *if* |
| 77 | +the city has not already been visited (does not exist as a part of an already-discovered connected component). We keep |
| 78 | +a running sum, adding to it the minimum cost required to connect a once-connected component, and eventually return the |
| 79 | +final value. |
| 80 | + |
| 81 | +Time complexity: O(n) (by marking nodes as visited, we're repeating ourselves) |
| 82 | +Space complexity: O(n) |
| 83 | + |
| 84 | +*It should be noted that the complexity of this algorithm could easily by O(n^2) (due to edge processing in the complete |
| 85 | +graph of K~n~), however the problem description on Hackerrank specifically limits the number of edges to `n`* |
0 commit comments