chenglongchen / kaggle-crowdflower Goto Github PK
View Code? Open in Web Editor NEW1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.
Home Page: https://www.kaggle.com/c/crowdflower-search-relevance
1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.
Home Page: https://www.kaggle.com/c/crowdflower-search-relevance
Hi
Is there any possible to run the whole codes in Python 3?
When I run the code of run_all.py, I import pickle instead of cPickle in Python 3 and got such error:
File "./preprocess.py", line 79, in
pickle.dump(dfTrain, f, -1)
_pickle.PicklingError: Can't pickle <function at 0x7f8a35868ae8>: attribute lookup on main failed
Traceback (most recent call last):
File "./genFeat_id_feat.py", line 36, in
dfTrain = pickle.load(f)
EOFError: Ran out of input
Traceback (most recent call last):
File "./genFeat_counting_feat.py", line 172, in
dfTrain = pickle.load(f)
EOFError: Ran out of input
Traceback (most recent call last):
File "./genFeat_distance_feat.py", line 236, in
dfTrain = pickle.load(f)
EOFError: Ran out of input
Then I import dill and use dill in the command of pickle.dump(dfTrain, f, -1) as :
dill.dump(dfTrain, f ,-1)
But I got the new error when import the load method
File "./genFeat_id_feat.py", line 36, in
dfTrain = pickle.load(f)
ModuleNotFoundError: No module named 'builtin'
Traceback (most recent call last):
File "./genFeat_counting_feat.py", line 176, in
skf = dill.load(f)
File "/home/mwp141/anaconda3/envs/chenQA/lib/python3.6/site-packages/dill/_dill.py", line 270, in load
return Unpickler(file, ignore=ignore, **kwds).load()
你好。我在编译的时候遇到了下面的错误,请问我是那个地方每配置好吗?
[zhouge@fly Feat]$ python run_all.py
Load data...
Done.
Pre-process data...
./preprocess.py:54: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
dfTrain["median_relevance_%d" % (i+1)][dfTrain["median_relevance"]==(i+1)] = 1
Traceback (most recent call last):
File "./preprocess.py", line 67, in
dfTrain = dfTrain.apply(clean, axis=1)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 3718, in apply
return self.apply_standard(f, axis, reduce=reduce)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 3808, in apply_standard
results[i] = func(v)
File "./preprocess.py", line 66, in
clean = lambda line: clean_text(line, drop_html_flag=config.drop_html_flag)
File "/home/zhouge/software/tool/kaggle/Kaggle_CrowdFlower/Code/Feat/nlp_utils.py", line 184, in clean_text
l = drop_html(l)
File "/home/zhouge/software/tool/kaggle/Kaggle_CrowdFlower/Code/Feat/nlp_utils.py", line 211, in drop_html
return BeautifulSoup(html).get_text(separator=" ")
TypeError: ("'NoneType' object is not callable", u'occurred at index 0')
Traceback (most recent call last):
File "./genFeat_id_feat.py", line 35, in
with open(config.processed_train_data_path, "rb") as f:
IOError: [Errno 2] No such file or directory: '../../Feat/solution/train.processed.csv.pkl'
Traceback (most recent call last):
File "./genFeat_counting_feat.py", line 171, in
with open(config.processed_train_data_path, "rb") as f:
IOError: [Errno 2] No such file or directory: '../../Feat/solution/train.processed.csv.pkl'
Traceback (most recent call last):
File "./genFeat_distance_feat.py", line 235, in
with open(config.processed_train_data_path, "rb") as f:
IOError: [Errno 2] No such file or directory: '../../Feat/solution/train.processed.csv.pkl'
Traceback (most recent call last):
File "./genFeat_basic_tfidf_feat.py", line 46, in
from sklearn.manifold import TSNE
ImportError: cannot import name TSNE
Traceback (most recent call last):
File "./genFeat_cooccurrence_tfidf_feat.py", line 144, in
with open(config.processed_train_data_path, "rb") as f:
IOError: [Errno 2] No such file or directory: '../../Feat/solution/train.processed.csv.pkl'
Traceback (most recent call last):
File "./combine_feat[LSA_and_stats_feat_Jun09][Low].py", line 387, in
gen_info(feat_path_name="LSA_and_stats_feat_Jun09")
File "/home/zhouge/software/tool/kaggle/Kaggle_CrowdFlower/Code/Feat/gen_info.py", line 38, in gen_info
with open(config.processed_train_data_path, "rb") as f:
IOError: [Errno 2] No such file or directory: '../../Feat/solution/train.processed.csv.pkl'
Traceback (most recent call last):
File "./combine_feat_[LSA_svd150_and_Jaccard_coef_Jun14][Low].py", line 387, in
gen_info(feat_path_name="LSA_svd150_and_Jaccard_coef_Jun14")
File "/home/zhouge/software/tool/kaggle/Kaggle_CrowdFlower/Code/Feat/gen_info.py", line 38, in gen_info
with open(config.processed_train_data_path, "rb") as f:
IOError: [Errno 2] No such file or directory: '../../Feat/solution/train.processed.csv.pkl'
Traceback (most recent call last):
File "./combine_feat[svd100_and_bow_Jun23][Low].py", line 391, in
gen_info(feat_path_name="svd100_and_bow_Jun23")
File "/home/zhouge/software/tool/kaggle/Kaggle_CrowdFlower/Code/Feat/gen_info.py", line 38, in gen_info
with open(config.processed_train_data_path, "rb") as f:
IOError: [Errno 2] No such file or directory: '../../Feat/solution/train.processed.csv.pkl'
Traceback (most recent call last):
File "./combine_feat[svd100_and_bow_Jun27]_[High].py", line 437, in
gen_info(feat_path_name="svd100_and_bow_Jun27")
File "/home/zhouge/software/tool/kaggle/Kaggle_CrowdFlower/Code/Feat/gen_info.py", line 38, in gen_info
with open(config.processed_train_data_path, "rb") as f:
IOError: [Errno 2] No such file or directory: '../../Feat/solution/train.processed.csv.pkl'
How do you generate the statistical distance features (described in Sect. 3.2.2 of your notes) for test data? There is no median_relevance labels for test data. How could it possible to group the test data by median_relevance?
Hi, I am trying to reproduce your solution, but the following error was raised when I executed
python3 getFeat_id_feat.py
Generate id features...
For cross-validation...
Traceback (most recent call last):
File "genFeat_id_feat.py", line 56, in
for fold, (validInd, trainInd) in enumerate(skf[run]):
File "/usr/local/lib/python3.5/dist-packages/sklearn/cross_validation.py", line 82, in iter
ind = np.arange(self.n)
AttributeError: 'StratifiedKFold' object has no attribute 'n'
So it should be the issue of depreciation of the attribute. I am wondering if you could help me by showing the code to generate stratifiedKFold.query.pkl stratifiedKFold.relevance.pkl this two files.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.