preamble
Okay.It's been a long time.The second half of this post is finally here!
Previous Post Link:Technology-Engineering: Constructing a Data Artifact - Up
In the previous post, we wrote a batch data generator, and in this post, you'll learn how some of the common ones are built.
Common data
What are some of the common data to be built? It's just trees, graphs, multiple measurements, etc. We'll explore some of them today and fill in the holes when I think of something else
random number
rand and srand
The two functions areC
The random number function in the header file<cstdlib>
Under Definitions, function prototypes:
int rand();
void srand( unsigned seed );
rand()
is a pseudo-randomized function that returns a\([0,RAND\_MAX]\) The number in the middle.
srand( unsigned seed )
is used to set the random seed, the initial random seed is\(1\)。
So, we commonly usesrand(time(0))
to initialize the random seed, which ensures that the random number comes out differently each time it is run.
stillrand()
There's a big problem. Let's see.cppreference A quote from the top:
...... does not guarantee the quality of the random number series generated. In the past, some
rand()
There are serious flaws in the randomness, distribution, and periodicity of the resulting sequence (in one well-known example, the lowest bit between calls is simply in the\(1\) together with\(0\) (switching between them).
Not recommended for serious random number generation needsrand()
. RecommendedC++11
The random number generation facility replaces therand()
。 (C++11
up)
From this text we know thatrand()
The quality of the generated random numbers is not very high, it is fine for daily use, you should look for another way when building data.
mt19937
this isC++11
One of the random number engines in the currentOI
A common tool for generating random numbers in
This random number engine usesMersenne rotation algorithmThe cycle length is up to\(2^{19937}-1\)That's where it got its name.
Its domain of values is roughly the entire\(\text{int}\) range, and it also has\(64\) The version of the bit -mt19937_64
。
We set the random seed at definition time, again using time as the random seed:
std::mt19937_64 rnd(std::chrono::steady_clock::now().time_since_epoch().count());
It is then called with thernd()
Ready to go.
We use this same random number engine next.
At the same time, we can also write a function in the\([l,r]\) A function that generates random numbers between
ll rand(ll l, ll r) {
return rnd() % (r - l + 1) + l;
}
confuse
Disrupting can obviously be done with<algorithm>
lowershuffle
function, please refer to other syntax information for the use of this function:
template<class RandomIt>
void shuffle(RandomIt first, RandomIt last) {
std::default_random_engine generator(rnd());
std::shuffle(first, last, generator);
}
template<class RandomIt>
void shuffle(RandomIt* first, RandomIt* last) {
std::default_random_engine generator(rnd());
std::shuffle(first, last, generator);
}
Notice that the native pointer is notclass template
, so we need to bias specialization for the case of native pointers.
multi-test
It is very common to find questions on multiple tests where some of the parameters are usually given for the\(\footnotesize\sum\) as the data range, then we need to [randomize out the\(n\) and is\(s\) The number of]
In addition, these numbers will often have(of Gods) descend to the world of mortals, this lower bound is generally a smaller constant, and it is common that it should be\(1\) until (a time)\(3\)。
So the problem shifts to [random out\(n\) and is\(s\) of numbers, each of which is at least\(mn\)】。
It's easy to think of just randomizing out the prefixes and doing the diff again.\(mn\) The restriction is equivalent to [the difference between the sum of the two prefixes randomized out is at least\(mn\)], and sostd::set
A heavy sentence will suffice.
std::vector<ll> split(ll n, ll s, ll mn = 1) {
assert(n > 0 && s >= n * mn);
std::vector<ll> res;
std::set<ll> st;
while ((ll) () < n) {
assert((ll) () < s - mn + 1);
int tmp = () ? s : rand(mn, s);
if ((tmp).second) {
res.push_back(tmp);
for (ll i = 1; i < mn; i++) {
if(tmp - i >= mn) (tmp - i);
if(tmp + i <= s) (tmp + i);
}
}
}
std::sort((), ());
for (ll i = () - 1; i >= 1; i--) res[i] -= res[i - 1];
return res;
}
Note that this code requires\(mn\gt1\)Otherwise it would not have been possible to build a new system for the\(0\) The number of numbers that are (beingstd::set
(Ruled out), change the code/handwrite it yourself if needed.
Also, this code relies purely on randomization. That is, if there is only one optional number left, we think to\(O(s)\) times to pick this number, so the complexity\(O(s\log s)\), which is completely unaffordable when the data range is too large. But as we said.\(mn\) is generally a small constant, and besides\(n\) The order of magnitude should be higher than\(s\) It's a lot smaller (otherwise each number would be too small), so it's extremely unlikely that this will be the case, and if there's a real need for each number to be small then write your own.
set up
How do you build a random tree? Easy to think of\(\text{prufer}\) Sequence: first randomize the\(\text{prufer}\) Sequence, then\(\text{prufer to tree}\) That's it. Expect depth.\(O(\log n)\)If your needs are more complex, you can research/use other tools, such as ouuan's\(\text{Tree Generator}\)。
std::vector<std::pair<int, int>> tree(int n) {
assert(n > 0);
std::vector<std::pair<int, int>> res;
if (n == 1) return res;
std::vector<int> prufer(n - 2);
for (auto &x : prufer) x = rand(1, n);
std::vector<int> deg(n + 1, 1);
for (int x : prufer) ++deg[x];
int cur = 0;
while (deg[++cur] != 1);
int leaf = cur;
for (int x : prufer) {
res.push_back({x, leaf});
if (--deg[x] == 1 && x < cur) leaf = x;
else {
while (deg[++cur] != 1);
leaf = cur;
}
}
res.push_back({n, leaf});
return res;
}
seek
Just the connectivity map, build the rest yourself.
Just add random edges to the tree withstd::set
Sentencing off heavy edges.
You can add aminor optimizationThat's when\(m>\frac{n(n-1)}{2}-(n-1)\) When, at this point, the graph is very dense and the randomized complexity is high, it is sufficient to build the complete graph, followed by randomly deleting edges. (But this would mean that\(n\) (It's very small and doesn't seem to be a big problem).
std::vector<std::pair<int, int>> graph(int n, ll m) {
assert(n > 0);
assert(m >= n - 1 && m <= n * (n - 1) / 2);
if(m > n * (n - 1) / 2 - (n - 1)){
std::vector<std::pair<int, int>> res;
for(int u = 1; u <= n; u++){
for(int v = u + 1; v <= n; v++){
res.push_back({u, v});
}
}
shuffle((),());
for(int i = m + 1; i <= n * (n - 1) / 2; i++) res.pop_back();
return res;
}
auto res = tree(n);
std::set<std::pair<int, int>> st;
for (const auto &e : res) {
(e);
({, });
}
m -= (n - 1);
while (m--) {
generate_edge:;
int u = rand(1, n), v = u;
while (u == v) v = rand(1, n);
if (!({u, v}).second || !({v, u}).second) goto generate_edge;
res.push_back({u, v});
}
return res;
}
herald
This work utilizes all content except for that within the code blockCC BY-SA 4.0 License is granted and additional terms can be used.
All code in this work is subject toMIT License Protection, copyright notice below:
MIT License
Copyright (c) 2024 godmoo
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
To get the full code, visit:Hint。