Задача на строки(хеши)

azatu · 11.Январь.2024 10:33:13

В чем ошибка? Я понял что проблема в модулях, но хз конкретно где

#pragma GCC optimize("Ofast,no-stack-protector,unroll-loops,fast-math")
#pragma GCC target("sse,sse2,sse3,ssse3,sse4.1,sse4.2,avx,avx2,popcnt,tune=native")
#include <iostream>
#include <vector>
#include <algorithm>
#include <cmath>
#include <cstdlib>
#include <time.h>
#include <set>
#include <cstring>
#include <queue>
#include <map>
#include <unordered_map>
#include <stack>
#include <numeric>
#define ll long long
#define pb push_back
using namespace std;
using namespace __gnu_pbds;
const ll mod = 1e9+7;

 
void fc(){
	ll _x=257;
	string s,t;
	cin>>t>>s;
	ll tz=t.size(),sz=s.size();
	ll x[sz+3];
	x[0]=1;
	for (int i=1; i<=max(sz,tz); i++){
		x[i]=x[i-1]*_x%mod;
	}
	ll h_t[tz+3],h[sz+3];
	h_t[0]=0;
	h[0]=0;
	set<ll> st;
	for (int i=1; i<=tz; i++){
		h_t[i]=h_t[i-1]*_x+t[i-1];
		h_t[i]%=mod;
	}
	for (int i=1; i<=sz; i++){
		h[i]=h[i-1]*_x+s[i-1];
		h[i]%=mod;
	}
	cout<<"\n";
	for (int i=1; i<=tz; i++){
		ll ig=(mod+h_t[i-1]*x[tz-i+1]+(h_t[tz]-h_t[i]*x[tz-i]+mod)%mod);
		
		ig%=mod;
		for (char h='a'; h<='z'; h++){
			ll res=ig+int(h)*x[tz-i]%mod;
			res%=mod;
			st.insert(res);
		}
		for (char h='A'; h<='Z'; h++){
			ll res=ig+int(h)*x[tz-i]%mod;
			res%=mod;
			st.insert(res);
		}
		
		st.insert(ig);
	}
	
	for (int i=1; i<=sz-tz+1; i++){
		ll res=(h[i+tz-1]-h[i-1]*x[tz]+mod)%mod;
		if (st.count(res)){
			cout<<i<<" ";
		}
	}
}



int main(){
	ios_base::sync_with_stdio(0);
    cout.tie(0);
	ll T=1;
	
// 	cin>>T;
	while (T --> 0)	fc();
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

}

Nick · 11.Январь.2024 13:09:42

В строке, где выводишь индексы в ответ, res должен выглядеть так:
ll res=((h[i+tz-1]-h[i-1]*x[tz]+mod)%mod + mod) % mod;

Т.к. ты в этой строчке вычитаешь, у тебя может выйти отрицательное число, а значит дальше пойдет не тот результат взятия по модулю.

Вообще советую в задачах на хеши создавать отдельную структуру для чисел, где перегруженные операторы будут автоматически результат по модулю брать, кучу времени на дебаг спасешь)

azatu · 11.Январь.2024 15:14:03

Главная ошибка ушла, но код все равно не правильный. С сетом по времени не проходит, поставил анорд. мап, но все равно не правильно. В чем ошибка?

#pragma GCC optimize("Ofast,no-stack-protector,unroll-loops,fast-math")
#pragma GCC target("sse,sse2,sse3,ssse3,sse4.1,sse4.2,avx,avx2,popcnt,tune=native")
#include <iostream>
#include <vector>
#include <algorithm>
#include <cmath>
#include <cstdlib>
#include <time.h>
#include <set>
#include <cstring>
#include <queue>
#include <map>
#include <unordered_map>
#include <stack>
#include <numeric>
#define ll long long
#define pb push_back
using namespace std;
const ll mod = 1e9+7;

 
void fc(){
	ll _x=257;
	string s,t;
	cin>>t>>s;
	ll tz=t.size(),sz=s.size();
	if (tz>sz){
		cout<<"0";
		return;
	}
	ll x[sz+3];
	x[0]=1;
	for (int i=1; i<=sz; i++){
		x[i]=x[i-1]*_x%mod;
	}
	ll h_t[tz+3],h[sz+3];
	h_t[0]=0;
	h[0]=0;
	set <ll> st;
	for (int i=1; i<=tz; i++){
		h_t[i]=h_t[i-1]*_x+t[i-1];
		h_t[i]%=mod;
	}
	for (int i=1; i<=sz; i++){
		h[i]=h[i-1]*_x+s[i-1];
		h[i]%=mod;
	}
	unordered_map <ll,ll> mp;
	for (int i=1; i<=tz; i++){
		ll ig=(mod+h_t[i-1]*x[tz-i+1]+((h_t[tz]-h_t[i]*x[tz-i]+mod)%mod+mod)%mod);
		
		ig%=mod;
		for (char h='a'; h<='z'; h++){
			ll res=ig+int(h)*x[tz-i]%mod;
			res%=mod;
//			st.insert(res);
			mp[res]++;
			res=ig+(int(h)-32)*x[tz-i]%mod;
			res%=mod;
			mp[res]++;
//			st.insert(res);
		}
		
		st.insert(ig);
	}
	vector <ll> itog;
	for (int i=1; i<=sz-tz+1; i++){
		ll res=((h[i+tz-1]-h[i-1]*x[tz]+mod)%mod+mod)%mod;
		if (mp[res]){
			itog.push_back(i);
		}
	}
	cout<<itog.size()<<"\n";
	for (int i=0; i<itog.size(); i++){
		cout<<itog[i]<<" ";
	}
}



int main(){
	ios_base::sync_with_stdio(0);
    cout.tie(0);
	ll T=1;
	
// 	cin>>T;
	while (T --> 0)	fc();
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

}

Anton · 11.Январь.2024 17:15:04

@eudaimon can you analyze azatu’s code and give him any specific recommendations? do not give general advice, tailor your response to his specific code.

sophia · 11.Январь.2024 17:15:32

@Anton, I’ll focus on the specific code provided by azatu and analyze the potential issues with it.

Looking at the code snippet, azatu is implementing a solution to a string hashing problem, where the goal is to find all the occurrences of string t in string s with exactly one character allowed to differ. Here are some specific recommendations:

Optimization with unordered_map:
You’ve already switched from set<ll> to unordered_map<ll, ll>, which is good for improving average time complexity from O(log n) to O(1) on hash table operations. However, the issue may not be with the data structure, but with how many elements you’re inserting into it. Since you’re hashing every possible single-character modification of t, this can be very large.
Potential Integer Overflow:
Ensure there’s no integer overflow when performing operations like h_t[i-1]*x[tz-i+1]. The multiplication might exceed the bounds of a long long before you take the modulo. You might want to cast the operands to a larger type or use a modular multiplication function.
Check the Hashing Logic:
Double-check the logic behind how you generate the hashes, especially the parts where you add and subtract and then take the modulo. Hashing is very sensitive to small mistakes, and an off-by-one error or incorrect modular arithmetic can lead to wrong results.
Debugging with Known Test Cases:
Use known test cases where you know the expected output. Step through your code using a debugger or print statements to ensure each part of your code is functioning as you expect.
Corner Cases:
Make sure you handle corner cases. For example, when tz > sz, you correctly return “0”, but do you handle the case correctly when tz == sz? Also, ensure that you handle the case where t and s are identical.
Code Cleanliness:
There are many commented-out lines and unused includes, which might not cause functional issues but can make the code harder to read and maintain. Cleaning up the code can help you and others understand it better and may help you spot mistakes.

By focusing on these specific parts of the code, you may find the reason why it’s not producing the correct results or why it’s not passing the time constraints. Good luck with debugging!

azatu · 12.Январь.2024 14:55:04

Повторно прошу, столько времени искал ошибку, не могу найти. поменял тип данных, изменял модули, изменял _х - ничего не помогло.

#pragma GCC optimize("Ofast,no-stack-protector,unroll-loops,fast-math")
#pragma GCC target("sse,sse2,sse3,ssse3,sse4.1,sse4.2,avx,avx2,popcnt,tune=native")
#include <iostream>
#include <vector>
#include <algorithm>
#include <cmath>
#include <cstdlib>
#include <time.h>
#include <set>
#include <cstring>
#include <queue>
#include <map>
#include <unordered_map>
#include <stack>
#include <numeric>
#define ll long long
#define pb push_back
using namespace std;
const ll mod = 65610001;

vector <bool> mp(mod);
void fc(){
	ll _x=257;
	string s="",t="";
// 	for (int i=0; i<=1e6; i++){
// 		t+="a";
// 		s+="a";
// 	}
	cin>>t>>s;
	ll tz=t.size(),sz=s.size();
	if (tz>sz){
		cout<<"0";
		return;
	}
	ll x[sz+3];
	x[0]=1;
	for (int i=1; i<=sz; i++){
		x[i]=x[i-1]*_x%mod;
	}
	ll h_t[tz+3],h[sz+3];
	h_t[0]=0;
	h[0]=0;
	for (int i=1; i<=tz; i++){
		h_t[i]=h_t[i-1]*_x%mod+t[i-1];
		h_t[i]%=mod;
	}
	for (int i=1; i<=sz; i++){
		h[i]=h[i-1]*_x%mod+s[i-1];
		h[i]%=mod;
	}
	
	for (int i=1; i<=tz; i++){
		ll ig=(mod+h_t[i-1]*x[tz-i+1]%mod+((h_t[tz]-h_t[i]*x[tz-i]%mod+mod)%mod+mod)%mod);
		ig%=mod;
		for (char h='a'; h<='z'; h++){
			ll res=ig+int(h)*x[tz-i]%mod;
			res%=mod;
			mp[res]=1;
			res=ig+(int(h)-32)*x[tz-i]%mod;
			res%=mod;
			mp[res]=1;
		}
		
	}
	
	vector <ll> itog;
	for (int i=1; i<=sz-tz+1; i++){
		ll res=((h[i+tz-1]-h[i-1]*x[tz]%mod+mod)%mod+mod)%mod;
		if (mp[res]){
			itog.push_back(i);
		}
	}
	cout<<itog.size()<<"\n";
	for (int i=0; i<itog.size(); i++){
		cout<<itog[i]<<" ";
	}
}



int main(){
	ios_base::sync_with_stdio(0);
    cout.tie(0);
	ll T=1;
	
// 	cin>>T;
	while (T --> 0)	fc();
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

}

sophia · 12.Январь.2024 14:55:06

I’m afraid you’ve reached this week’s quota for bot interactions

Nick · 12.Январь.2024 17:16:48

Я не смог понять, что именно ведет к ошибке в твоем случае, но когда менял модули и основания хешей, почему-то сильно скакало кол-во пройденных тестов.

Это не совсем нормально, по идее если модуль и основание остаются взаимопростыми, то код должен валиться на одном и том же тесте.

Попробуй пострессить (почитать что это такое можешь тут) или подумать над новой идеей для задачи

Еще есть вариант написать двойные хеши, возможно ввиду кол-ва перебираемых тобой вариантов строки возникают многочисленные коллизии, и тесты валятся, так что попробуй

Anton · 12.Январь.2024 19:45:47

@eudaimon can you find bugs in azatu’s code? point to specific lines of concern

sophia · 12.Январь.2024 19:46:08

@Anton, I’ll examine the code and point out specific lines that could be problematic:

Line with ig Calculation:
The line where ig is calculated looks overly complex and is potentially incorrect due to excessive modulo operations and additions:
```
ll ig=(mod+h_t[i-1]*x[tz-i+1]%mod+((h_t[tz]-h_t[i]*x[tz-i]%mod+mod)%mod+mod)%mod);
```
It should probably be simplified to something like:
```
ll ig = (h_t[i-1]*x[tz-i] + mod - (h_t[i]*x[tz-i-1])%mod + mod)%mod;
```
Make sure that the indices for x are correct and you are not going out of bounds or missing a character.
Line with h Hashes:
There might be a mistake in calculating the hashes for s and t. On this line:
```
ll res=((h[i+tz-1]-h[i-1]*x[tz]%mod+mod)%mod+mod)%mod;
```
The subtraction and multiplication should also be modded separately to avoid negative values:
```
ll res = (h[i+tz-1] - (h[i-1]*x[tz])%mod + mod) % mod;
```
Using bool Vector for Hashes:
You are using a vector<bool> to store the hashes, which is indexed by the hash value itself:
```
vector <bool> mp(mod);
//...
mp[res]=1;
```
This is potentially problematic because hash collisions can occur, and there might be multiple strings with the same hash value. Using a vector indexed by the hash may not be the best approach. Consider using an unordered_map with the hash as the key and the starting positions as values.
Handling of Upper and Lowercase Characters:
When you calculate the hash values for all possible single-character modifications, you’re adding both lowercase and uppercase characters. Ensure that this is intended and correctly matches the problem statement.
Potential Array Out-of-Bounds Access:
Be cautious with array indices. For example, here:
```
ll x[sz+3];
```
You’re declaring x with sz+3 elements, but in the loop, you’re iterating up to sz:
```
for (int i=1; i<=sz; i++){
    x[i]=x[i-1]*_x%mod;
}
```
Ensure that you’re not accessing x out of bounds in other parts of the code, especially if sz is close to the actual size of x.

By addressing these specific issues, you might be able to correct the bugs in your code. Remember to thoroughly test your code after each change to ensure that you’re moving in the right direction.

Тема		Ответов	Просм.
Вопрос насчет хэшей Информатика	11	299	11.10.2022
USACO Robot Instructions Информатика	21	637	05.03.2023
Задача с codeforces, нужна помощь Информатика	3	562	31.10.2022
Хорватская задача. В чем ошибка в моем коде? Информатика	7	431	07.04.2022
Задача на дп, выводит ошибку при выполнении программы, хотя на компиляторе выдает правильный результат Информатика	5	286	28.06.2023
Помогите оптимизировать код Информатика	4	492	11.07.2023
Segment tree beats, Tima's problemset Информатика	7	390	29.01.2024
Проблема с нюансами или Поворот на 45 градусов Информатика процесс-подготовки	8	351	03.10.2022
Задача на ДО с массовыми операциями Информатика	3	150	19.05.2025
Задача на декартово дерево Информатика	13	455	15.01.2023

Задача на строки(хеши)

Связанные темы