Top K Problem – Intro to Algorithms

Top K Problem – Intro to Algorithms

Articles, Blog , , , , , , , , 2 Comments

This brings us to a slightly more interesting problem called the Top K Problem. Here’s the idea. Imagine we’ve got our list of values L, and there’s n values in the list. We can compute the max or the second max, the second largest, just like we’ve been doing, but in some sense, it doesn’t really give you a very representative sample of what the list is about. Sometimes, it’s nice to have a bit more. Let’s imagine that this list is sorted though it turns out at one half to actually be sorted. And imagine, they’re going to focus on just the ones that would be at the k best out of the n. It could be the k small as to the k largest depending on the particular application that you’re considering. But knowing a bunch of the ones near the top instead of just the very tippe top can very instructive. One natural strategy is to iterate the max procedure so go through the list, find the max, pull it out of the list, so the list is now in -1 long and then repeat this. Each time we compute the max, it will be actually the second, the third, the fourth largest from original list, but it’s always the max of the current list. The simply pretty reasonable thing to do especially if k is small. You just need one or two or three at the top values. Another natural thing to do is just sort the whole list. Once we sorted the whole list, the first k elements of that sorted list are exactly the ones that we want. This is actually a very simple way of doing it. In Python, it can just be essentially one or two statements. Insertion is another strategy. This is the generalization of the solution that I gave for the find the second best example before. In that example, I kept track of both of the best and the second best, and then it was running through the list. In each new element of a list that was encountered, figure out where it would fit into that top two lists. If it was smaller than both of those, ignore it. If it’s bigger than both of those, put it at the top of the list and bump the list down and so forth. As always, we’re trying to find the best algorithm that we can use to run on our data. Which of these algorithms has the best Θ? And I’m going to give a hint. We’re looking for the top k out of the list of n elements, but we’re going to look at this question for different values of k. In particular, if k is half of the list, we want the largest half of the list. The question is are we better of using selection or insertion? Both of these are going to have the same Θ for all these examples so I just grouped them together or is it better to sort the whole list keeping in mind that sorting is a Θ of a log n operation. Let’s say you want to find the best root and items. Is it better to use selection or sorting? If you want the top log n items, which one is better? This function is the logarithm of the logarithm of n. As slow growing as logarithm turns out to be, this is even slower or growing, the logarithm of the logarithm of n, so which one would you choose? In that case, n finally, let’s say you want the top 100. This is a common thing that you see on the internet a lot, the top 100 movies, the top 100 Beatles songs, the top 100 times I’ve used the phrase top 100. There’s a lot of different contexts in which this appears. And if you’re going to do that, are you better of using a selection or insertion algorithm or to sort the whole list? What I would like you to do is for each of these columns, check this box or that box or if they are Θ of each other. In other words, from an asymptotic standpoint, in this algorithm, we’ll run the same worse case asymptotically then check both of the boxes.

2 thoughts on “Top K Problem – Intro to Algorithms

Leave a Reply

Your email address will not be published. Required fields are marked *