Best solutions for Codility Lessons. Lesson 5 Prefix Sums

Alexander Molchevskyi
8 min readDec 14, 2019

Task 1 PassingCars

Solution:

The solution is simple. We need to define two variables and count separately ones and zeroes. Each time we meet one we add to number of ones the number of zeroes which were passed by so far. Thus we getting number of pairs passed by at the moment. When we reach end of the array we will have full number of the pairs.

In C++ it looks like:

int get_pairs_passed_by(vector<int> &v) {
size_t v_size = v.size();
size_t pairs_passed_by = 0;
size_t zeroes = 0;
for(size_t i = 0; i < v_size; ++i) { if(0 == v[i]) {
++zeroes;
}
else if(1 == v[i]) {
pairs_passed_by += zeroes;
if(pairs_passed_by > 1'000'000'000) {
return -1;
}
}
}
return pairs_passed_by;
}

Task 2 GenomicRangeQuery

Solution:

It seems this is the only task in the lesson which really needs to use cumulative sums for it’s solution because we need to make a lot of queries to the same set of data. So the queries should be as fast as possible and time for building of additional data structure to increase speed of these queries is not so important, because the more queries we make the lower the part of the time for building of the additional data. In our case this additional data is the array of cumulative sums. Of course it is possible to use hash tables or trees but they are not free they use additional memory, need for more computations to access to theirs elements and most of them has non constant time complexity to access to and adding of their elements. So the simple array of cumulative sums with constant size is more than enough for this task.

What the cumulative (prefix, postfix doesn’t matter) sums are and how to use them you may read in the description of the lesson. Here is my solution in C++:

vector<int> DNA_impact(const string &s, vector<int> &p, 
vector<int> &q) {
enum impact_t { A = 1, C = 2, G = 3, T = 4 };
struct counters_t { int A = 0; int C = 0; int G = 0; };
size_t dna_len = s.length();
size_t p_size = p.size();
vector<int> result(p_size, 0);
vector<counters_t> cumulative_sums;
cumulative_sums.reserve(dna_len+1);
counters_t counters = counters_t();
for(size_t i = 0; i <= dna_len; ++i) {
cumulative_sums.push_back(counters);
switch (s[i]) {
case 'A': { counters.A++; break; }
case 'C': { counters.C++; break; }
case 'G': { counters.G++; break; }
}
}
for(size_t i = 0; i < p_size; ++i) {
int from = p[i], to = q[i]+1;
// If the number of A is increased throughout
// the query the it contains A
if(cumulative_sums[to].A - cumulative_sums[from].A > 0) {
result[i] = impact_t::A;
}
// the same for C and other nucleotides with higher
// impact factor
else if(cumulative_sums[to].C-cumulative_sums[from].C > 0) {
result[i] = impact_t::C;
}
else if(cumulative_sums[to].G-cumulative_sums[from].G > 0) {
result[i] = impact_t::G;
}
else {//one of the counters must be changed, so it is the T
result[i] = impact_t::T;
}
}
return result;
}

Task 3 MinAvgTwoSlice

Solution:

There are two brilliant solutions of this task and both of them are much more advanced than assumed authors of the task and both of them don’t use the Prefix sums.

The first solution proposed here https://codesays.com/2014/solution-to-min-avg-two-slice-by-codility/

It is based on several statements which need for rigorous mathematical proof so I don’t I don’t consider it as the best solution. But it is so wonderful that I should describe it here.

It is based on the following two statements:

(1) There must be some slices, with length of two or three, having the minimal average value among all the slices.

(2) And all the longer slices with minimal average are built up with these 2-element and/or 3-element small slices.

Firstly we will prove the statement (1). In all the following discussion, we assume the slices have two or more element. In every array, there must be at lease one slice, to say S, having the Minimal Average (MA). And PLEASE PAY ATTENTION, the following proof is done with the S, which HAS the global minimal average.

  1. If the length of S is two or three, it follows our conclusion.
  2. If the length of S is odd, we could divide it into a 3-element sub-slice and some 2-element sub-slice. For example, for the slice [1, 2, 3, 4, 5], we could get a 3-element sub-slice [1, 2, 3] and a 2-element sub-slice [4, 5]. Or differently we could get [1, 2] and [3, 4, 5]. But the split does not matter in the following prove.
  • If the sub-slices have different averages, some of them will have smaller average than MA. But it conflicts with the condition that, the MA is known as the global minimal average for all slices. By this contradictory, it’s proved that all the sub-slices MUST have the same average.
  • If all the sub-slices have the same average, the average of them must be MA. So all these sub-slices have overall minimal average and length of two or three.
  1. If the length of S is even, we could divide it into some 2-element sub-slice. For the slice [1, 2, 3, 4], the only possible split is [1, 2] and [3, 4]. Then, similar with the previous case, all these 2-element sub-slices have the global minimal average.

And in the construction in step b, we have already proven the statement (2).

Please read more details by the URL above.

In C++ this solution looks like:

int min_avg_two_slice(vector<int> &v) {
size_t v_size = v.size();
// The minimal average of the first slice
double min_avg_value = (v[0] + v[1]) / 2.0;
// The start position of the first
// slice with minimal average
size_t min_avg_pos = 0;
for (size_t i = 0; i < v_size - 2; ++i) {
// Try the next 2-element slice
double min_avg_slice_2 = (v[i] + v[i + 1]) / 2.0;
if (min_avg_value > min_avg_slice_2) {
min_avg_value = min_avg_slice_2;
min_avg_pos = i;
}
// Try the next 3-element slice
double min_avg_slice_3 = (v[i] + v[i + 1] + v[i + 2]) / 3.0;
if (min_avg_value > min_avg_slice_3) {
min_avg_value = min_avg_slice_3;
min_avg_pos = i;
}
}
// Try the last 2-element slice
double min_avg_last_slice = (v[v_size - 2] + v[v_size - 1]) / 2.0;
if (min_avg_value > min_avg_last_slice) {
min_avg_pos = v_size - 2;
}
return min_avg_pos;
}

This code may be a little bit optimized.

Here is implementation of the same algorithm but without divisions and floating point variables because divisions and operations with floating point numbers are significantly slower than work with integer numbers.

int min_avg_two_slice_optimized(vector<int> &v) {
int v_size = v.size();
if (v_size == 2) return 0;
int min_slice_2 = v[0] + v[1];
int min_slice_2_pos = 0;
int min_slice_3 = INT_MAX;
int min_slice_3_pos = 0;
for (int i = 2; i < v_size; i++) {
int slice_2 = v[i - 1] + v[i];
if (slice_2 < min_slice_2) {
min_slice_2 = slice_2;
min_slice_2_pos = i - 1;
}
int slice_3 = slice_2 + v[i - 2];
if (slice_3 < min_slice_3) {
min_slice_3 = slice_3;
min_slice_3_pos = i - 2;
}
}
int avg_min_2 = min_slice_2 * 3;
int avg_min_3 = min_slice_3 * 2;
if (avg_min_2 == avg_min_3) {
return min(min_slice_2_pos, min_slice_3_pos);
}
else {
return avg_min_2 < avg_min_3 ? min_slice_2_pos : min_slice_3_pos;
}
}

The last and IMHO the best algorithm based on Kadane’s algorithm which is the best theoretically possible algorithm for finding of a subarray with maximal sum of elements. This algorithm on pseudocode looks pretty simple:

maxsofar = 0
maxendinghere = 0
for i = 0 to n-1
maxendinghere = max(maxendinghere + x[i], 0)
maxsofar = max(maxendinghere, maxsofar)

We moving through the array and adding all positive elements to maxendinghere variable which keeps the local maximum, if the local maximum appears more than the global maximum which keeps in maxsofar we save the new maximum as a global one.

In our cases we just need to change calculation of the maximum to calculation of the minimal average. In C++ it looks like:

int min_avg_two_best(vector<int> &v) {    // keep current sum of the processed elements here
int sum = v[0] + v[1];
int v_size = v.size(); int left_index, min_left_index;
double avg_here, min_avg, avg_of_two, avg_with_prev;
// Initialize variables at the first possible slice (v[0:1]).
left_index = min_left_index = 0;
min_avg = (v[0] + v[1]) / 2.0;
// Find min average of every slice that ends at ith element,
// starting at i = 2.
for (int i = 2; i < v_size; ++i) {
sum += v[i];

// average of v[left_index : i]
avg_with_prev = ((double) sum) / (i - left_index + 1);
// average of v[i - 1 : i]
avg_of_two = (v[i - 1] + v[i]) / 2.0;
// Find minimum and update left_index of slice
// (previous left_index or i - 1).
if (avg_of_two < avg_with_prev) {
avg_here = avg_of_two;
left_index = i - 1;
sum = v[i - 1] + v[i];
}
else {
avg_here = avg_with_prev;
}
// Keep track of minimum so far and its left index.
if (avg_here < min_avg) {
min_avg = avg_here;
min_left_index = left_index;
}
}
return min_left_index;
}

Task 4 CountDiv

Solution:

Solution of this task based on a simple mathematical trick: Number of integers in range [1..B] that are divisible by K is B/K. It is easy to notice if you try to divide a sequence by a number, for example lets divide 1,2,3,4,5,6,7,8,9 by 3. You notice that you can divide without remainder only the multiples of 3 that is 3,6,9. So 9/3 = 3 that is the number of integers divisible by 4 in this sequence.

Therefore to solve this task we need to find how many integers divisible by K consists sequence [1..B] then exclude from this number the integers divisible by K which consists sequence [1..A-1].

That is the result = B/K — (A — 1)/K

In C++ the solution will looks like:

int count_div(int A, int B, int K) {    int b = B/K, a = 0;    if (A > 0) {
a = (A – 1) / K;
}
else {
// If A == 0 we need to count
// it in because 0 is divisible by any positive number
b++;
}
return b – a;
}

Go back to the table of contents.

--

--