Skip to main content

Invention as a Reincarnation of an Old Idea


In all creative disciplines fundamental ideas reoccur in different shapes. This cannot be truer in computer science and software engineering. Advances in technology even enables old ideas to appear as pure novelties. In this article we will focus on an intellectual toolbox associated with morphing existing ideas, concepts, and technique to fit new scenarios. This is tightly connected with generalization and pattern matching, which are both prominent ingredients in achieving success in professional practice and various competitive activities. The latter is examined in more detail in my article about competitive software engineering.

To make the point clear let us use a concrete example from Petlja, a website maintained by the Serbian Mathematical Society, which also organizes official competitions in mathematics and programming. The problem we will use is called Mingo. Below is its description translated to English.

Problem Description

Now, after completing the competition, Miroslav realized how strenuous programming is, and he decided to escape to Mars. He came across the Mingo sweepstakes on the internet, where the main prize is a trip to Mars.

The sweepstakes consists of each contestant choosing exactly K different natural numbers on their ticket, where each number is from the set 1, 2, 3, ..., M. After that, Q drawings are conducted, where each drawing involves a random selection of exactly K different numbers from the set. Each ticket is valid for all Q drawings, and the contestant who has a ticket with all the drawn numbers in any of the Q drawings wins a trip to Mars.

To increase his chances, Miroslav filled out N tickets. As it is now difficult to keep track of all these tickets, he has asked you for help. He wants to know for each drawing at every moment how many tickets contain each of the numbers drawn up to that point, i.e., for each drawing separately, after the i-th number is drawn, he wants to know how many of his tickets contain each of the i numbers drawn in that drawing.

Input description

The first line of the standard input contains the numbers N, K, and M, which respectively indicate the number of tickets Miroslav filled out, the number of different numbers that need to be marked on the ticket, and the largest number that can be marked. The following N lines describe the tickets filled out by Miroslav, where the i-th line contains K numbers that indicate the chosen numbers on the i-th ticket. After that, there is one line with the number Q representing the number of drawings. The next Q lines contain a description of all the drawings. The i-th line contains K numbers that represent the numbers in the order they were drawn.

Output description

It is necessary to print Q lines on the standard output, where the i-th line corresponds to the solution for the i-th drawing, i.e., the j-th number in the i-th line represents the number of tickets that contain each of the j numbers drawn in the i-th drawing.

Example 1

Input
3 3 15
2 1 3
1 10 4
4 11 1
2
1 4 11 
4 3 11
Output
3 2 1
2 0 0

In the first test example, during the first drawing, the solution is "3 2 1" because the number 1 is on all 3 of Miroslav's tickets. After the second drawn number, we have the combination "1 4" which appears on 2 tickets. At the end of the third drawing, we have the number combination "1 4 11" and the only ticket that has all 3 numbers from the combination is the third ticket, so the answer is 1. In the second drawing of the first test example, the number 4 is contained in tickets 2 and 3. While there are no tickets that contain all the numbers from the combinations "4 3" and "4 3 11".

Example 2

Input
2 7 39
7 6 5 4 3 2 1
1 2 3 4 5 6 8
2
4 3 5 1 2 6 7
6 5 4 3 2 8 39
Output
2 2 2 2 2 2 1
2 2 2 2 2 1 0

Constraints

1≤N≤10,000 
1≤K≤8
K≤M≤500 
1≤Q≤20,000

Time limit 1.5 seconds.
Memory limit 256MB.

Basic Solution

The core idea is very simple. We need to count all subsets of numbers for each ticket. In each test case we just read out the counter for the matching subset. Depending on how we arrange numbers in each subset we may reduce complexity by considering subsets in some predefined order, like numbers being sorted. In the initial solution below we use another approach leveraging bitset. At any rate, the provided solution below is correct, but unfortunately produces a time limit exception (TLE) for larger test cases.

#include <bitset>
#include <iostream>
#include <unordered_map>
#include <vector>

using namespace std;

const int MAXM = 500;

typedef bitset<MAXM + 1> Selection;

unordered_map<Selection,int> counts;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);

    int n, k, m;
    cin >> n >> k >> m;

    vector<int> numbers(k);
    Selection group;

    auto registerSelection = [&] () {
        group.reset();
        for (int pivot = 0; pivot < k; pivot++) {
            group.set(numbers[pivot]);
            for (int i = 0; i < (1 << pivot); i++) {
                for (int j = 0; j < pivot; j++)
                    if (i & (1 << j))
                        group.set(numbers[j]);
                    else
                        group.reset(numbers[j]);
                counts[group]++;
            }
        }
    };

    for (int i = 0; i < n; i++) {
        for (int j = 0; j < k; j++)
            cin >> numbers[j];
        registerSelection();
    }

    int q;
    cin >> q;

    while (q--) {
        group.reset();
        for (int i = 0; i < k; i++) {
            int number;
            cin >> number;
            group.set(number);
            cout << counts[group] << " ";
        }
        cout << endl;
    }
    return 0;
}

Optimized Solution

There are many possible ways to manage subsets and count them properly. The official solution mimics the behavior of a sorted set and iteratively applies binary search to filter out relevant subsets. Nevertheless, this approach entails a quite complex and error prone code, especially when developed under huge time pressure.

Let us think about what makes our first version slow. Using a bitset with 500 bits as a key together with the fact that there are lots of subsets to store turns out to be the reason. We might ask ourselves can we perhaps handle special cases much faster? Obviously, subsets with only a single element would require a simple static integer array with maximum of 500 elements. Managing subsets with two elements is also straightforward by using a 2D integer array. Unfortunately, we cannot continue this pattern for subsets with 3 elements since 5003 integers cannot fit into allowed memory limit. But we can be creative and turn each 3-element subset into a unique integer by treating members as digits of a base 500 number system (for this to work we must also sort the elements of a subset). Interestingly, just by extracting these 3 special cases we can speed up the code with minimal additional complexity up to the point of passing all test cases successfully. Below you can see the source code.

#include <algorithm>
#include <bitset>
#include <iostream>
#include <unordered_map>

using namespace std;

const int MAXM = 501; // Maximum M increased by 1 to avoid zero indexing.
const int MAXM_SQUARED = MAXM * MAXM;
const int MAXK = 8;

int ticket[MAXK];
int combCount1[MAXM];
int combCount2[MAXM][MAXM];
unordered_map<int,int> combCount3;
unordered_map<bitset<MAXM>,int> combCount;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);

    int n, k, m;
    cin >> n >> k >> m;

    const int maxSubsets = 1 << k;
    bitset<MAXM> selection;

    auto registerTicket = [&] () {
        for (int mask = 0; mask < maxSubsets; mask++) {
            selection.reset();
            int a, b, c;
            int combSize = 0;
            for (int i = 0; i < k; i++)
                if (mask & (1 << i)) {
                    a = b;
                    b = c;
                    selection.set(c = ticket[i]);
                    combSize++;
                }
            if (combSize == 1)
                combCount1[c]++;
            else if (combSize == 2)
                combCount2[c][b] = ++combCount2[b][c];
            else if (combSize == 3)
                combCount3[a + b * MAXM + c * MAXM_SQUARED]++;
            else
                combCount[selection]++;
        }
    };

    for (int i = 0; i < n; i++) {
        for (int j = 0; j < k; j++)
            cin >> ticket[j];
        sort(ticket, ticket + k);
        registerTicket();
    }

    int q;
    cin >> q;

    while (q--) {
        selection.reset();
        for (int i = 0; i < k; i++) {
            cin >> ticket[i];
            selection.set(ticket[i]);
            if (i == 0)
                cout << combCount1[ticket[0]] << " ";
            else if (i == 1)
                cout << combCount2[ticket[0]][ticket[1]] << " ";
            else if (i == 2) {
                sort(ticket, ticket + 3);
                cout << combCount3[ticket[0] + ticket[1] * MAXM + ticket[2] * MAXM_SQUARED] << " ";
            } else
                cout << combCount[selection] << " ";
        }
        cout << endl;
    }
    return 0;
}

In retrospect, what we have done above is nothing else than employing a well-known technique called unrolling. The most famous form of unrolling is loop unrolling. It is also possible to apply this technique on data structures, like unrolled linked list. Nothing stops us to be creative and use unrolling as a mechanism for handling special cases that were previously buried inside a single abstraction. Hence, our new code base builds upon a reincarnation of an old idea in a new setting.

Conclusion

This article has showcased the power of unrolling when approached in creative fashion. There is always a need to stop, rethink what we are doing and seek for opportunities to reuse artifacts. An ability to produce relevant abstractions, recognize patterns and figure out how to reuse a vast amount of accumulated knowledge in a particular domain are all mandatory constituent elements of a professional's intellectual toolbox. Colossal scientific breakthroughs and brand-new ideas are rare commodities. It is more often the case that we will need to pick up an existing stuff and transmogrify it to our needs. What does it mean is hopefully illuminated in this post.

Comments

Popular posts from this blog

The Power of a Command-Line Processor

A command-line processor, as its name implies, is a software appliance intended to be executed from a command-line in pipelined fashion. Most operating systems are equipped with bunch of utilities that can be ingeniously combined to create powerful mini programs for transforming data. We will focus our attention here on jq specialized to mangle JSON data similarly how sed crunches textual content. You can easily start using it by issuing brew install jq on macOS (or download it for other operating systems). Nonetheless, even without placing anything on your machine, there is also a nice playground for trying out things online. The following example illustrates what sort of actions could be crafted into a unified instruction, i.e., mini program that may be reused as a whole: > echo "A test string." | jq -R "ascii_upcase | gsub(\"STRING\"; \"CONTENT\")" "A TEST CONTENT." The input is piped into jq as an ordinary string (this is hi

Brief Introduction to the JSON API v1 Specification

JSON API provides a systematic method of specifying Level 3 REST APIs in JSON. It has a registered hypermedia type of application/vnd.api+json issued by the Internet Assigned Numbers Authority (IANA). The next list briefly enrolls the most salient features and benefits of JSON API (see my book Creating Maintainable APIs for more information): It is a hypermedia-driven conventional interface : This is a fundamental difference between JSON API and other pure message formats. The JSON API, besides establishing a shared convention regarding message formats, also encompasses rules for how to interact with services (it specifies an assumed API). For example, JSON API contains instructions on how services should fetch, create, update, and delete resources, specify pagination, pass query parameters, and so on. This trait of JSON API is tightly related to the concept of openness in distributed systems. An open distributed system mandates certain rules regarding how services should be implem