princetonlips / sketchgraphs Goto Github PK
View Code? Open in Web Editor NEWA dataset of 15 million CAD sketches with geometric constraint graphs.
Home Page: https://princetonlips.github.io/SketchGraphs/
License: MIT License
A dataset of 15 million CAD sketches with geometric constraint graphs.
Home Page: https://princetonlips.github.io/SketchGraphs/
License: MIT License
To provide a bit more context, I cloned a fresh repo and downloaded the data as well as the metadata. I ran
python -m sketchgraphs_models.graph.train --dataset_train sg_t16_train.npy
This produced a bus error.
The caveat is that I did not install the python package via pip install .
or python setup.py install
, as I'm currently still trying to figure out some issues associated with nvcc. I did install all the relevant packages in requirements.txt
. Is this an expected consequence of not installing the cuda extension?
pardon my vagueness as I don't know how to properly articulate this yet, but try to reach me half-way.
I'm using your dataset for a kind of cognitive experiment, i.e. just want a sample of cool shapes that looked different.
I downloaded the validation set, which contained more than enough interesting shapes. however, noticed that most of the shapes in validation are duplicates in that they're conceptually same, i.e. a single circle, albeit of different sizes, a single rectangle, albeit different width/heights, or a combination of.
what would you suggest to get a set of "interesting" shapes? I thought about doing stuff like generating a kind of "fingerprint" for each shape, such as the number of lines and such, and keeping one shape per fingerprint. however it does seem a bit restrictive. I just want some idea that I can play with.
the "gold standard" would be using the constraint solver and see if some shapes can be isomorphic to others by modifying the parameters while still satisfying the constraints. but no way I'm doing that lol. something quick and dirty would be preferable.
thanks in advance
Is anybody else experiencing issues making OnShape calls? I'm currently getting
status_code:500
An internal error has occurred; support code xxxxx
I was previously able to make OnShape calls with the same code. Has there been a change to the FeatureScript microversion?
Hello SketchGraph Team,
I have been trying to look at the output of your generative model and hit a problem. I'm not sure if perhaps I did something wrong at the training or sampling stage.
I tried the generative model as follows
python -m sketchgraphs_models.graph.train --dataset_train /path/to/data/sg_t16_train.npy
This generated the following files
0219/time_104135
0219/time_104135/args.txt
0219/time_104135/model_state_20.pt
0219/time_104135/model_state_40.pt
0219/time_104135/model_state_30.pt
0219/time_104135/model_state_10.pt
0219/time_104135/eval
0219/time_104135/eval/events.out.tfevents.1613731298.lamboujdevbox.10253.1
0219/time_104135/model_state_50.pt
0219/time_104135/events.out.tfevents.1613731298.lamboujdevbox.10253.0
I then try to run the sketchgraphs_models/graph/sample.py
to extract some generated examples from the model. I did this using
python -m sketchgraphs_models.graph.sample --output_path /path/to/output/sampled_data.pkl --model_state path/to/output/0219/time_104135/model_state_50.pt
I hit an error
Exception has occurred: KeyError
'xCenter'
File "xxxx/sketchgraphs/pipeline/graph_model/quantization.py", line 258, in _numerical_features
feature[i + offset] = int(np.searchsorted(edges, params[param_name]))
Digging into what is going on I see the dictionary params
is generated by the following code
sketchgraphs/pipeline/graph_model/quantization.py#L314-L315
for i, (param_name, centers) in enumerate(self._bin_centers.get(target, {}).items()):
features[param_name] = centers[index[i + offset]]
Here target
has type <TargetType.NodeCircle: 8>
but inside the self._bin_centers
dictionary we have <EntityType.Circle: 2>
. Hence the features
dictionary isn't built up correctly.
I find that adding the code
_entity_label_from_target_type_dict = {
TargetType.NodeArc: datalib.EntityType.Arc,
TargetType.NodeCircle: datalib.EntityType.Circle,
TargetType.NodeLine: datalib.EntityType.Line,
TargetType.NodePoint: datalib.EntityType.Point
}
target_entity = _entity_label_from_target_type_dict[target]
for i, (param_name, centers) in enumerate(self._bin_centers.get(target_entity, {}).items()):
features[param_name] = centers[index[i + offset]]
fixed the problem.
Could you let me know if I did something wrong at the training or test phase. If this fix would be useful for you I can create a PR to submit it back to the main repo.
Thank you for your help.
I don't quite understand some schemata - eg the Horizontal
the Horizontal
constraint with just one reference to local0
makes sense but in what context would a Horizontal
constraint that refers to local0, local1
make sense?
Not quite sure I get that.
I have a sequence of sketch entities, i.e. ARC, Line, Points, etc.
how do I find the bounding of these entities put together? Is there a line of code somewhere in the library that does this?
much thanks
i tried train autoconstraint model but i haven't gpu... I trained it back to m1 (mps) and it takes 30 hours per epoch. colab is paid.
If possible, I would appreciate it if you could upload a pre-trained autoconstraint model.
I'm running the autoconstraint task, and I realized that the default batch size is 2048 for the released training code. I'm reading the paper and it states that a batch size of 8192 is used.
I'm currently playing with the code using a batch size of 128, so that things fit on my local desktop GPU, and everything runs smoothly so far. But I was wondering whether the original choice of batch size was mainly to speed up training given a fixed number of epochs, or that it affected performance somehow (in the sense that it may affect test log-likelihood/precision/recall).
Notice that seq[-1].label = <EntityType.Stop: 8>
even if the last element of the sequence is a stop node. To actually get the stop label we'd write seq[-1].label.name
which indeed is 'stop'
. Is the desired behavior to have two stop nodes, maybe each communicating something different? Otherwise, should we change this line to check that seq[-1].label.name != 'stop'
instead?
There's also something interesting going on here in that the string being compared against is 'Stop'
(capital s) while the node appended to the sequence has label 'stop'
(lower case s).
Again, maybe this is the desired behavior, just looking for some clarity one what's going on.
Hi SketchGraphs experts,
I'm trying to train on a subset of the dataset sg_t16_train.npy with some "geometric" duplicates removed. It was a very trivial change to the GraphDataset
from sketchgraphs_models/graph/dataset/__init__.py
to get the data loader to choose only sequences from my de-duplicated list.
When training on the subset I find that I get some NaNs reported in the kappa edge statistics
Kappa Edges
EdgeAngle: aligned (nan); clockwise (nan); angle (nan)
EdgeLength: direction (1.000); length (-0.043)
EdgeDistance: direction (0.000); halfSpace0 (0.019); halfSpace1 (0.371); length (0.000)
EdgeDiameter: length (-0.013)
EdgeRadius: length (0.033)
Debugging I see that the NaNs get generated in the cohen_kappa()
function in sketchgraphs_models/nn/summary.py
For the EdgeAngle
TargetType the value of self.recorded = [0, 0, 0, 0]
.
pm = self.prediction_matrix.float()
N = self.recorded.sum().float() <------ N == 0
p_observed = pm.diag().sum() / N <---- NaNs in here
p_expected = torch.dot(pm.sum(dim=0), pm.sum(dim=1)) / (N * N) <---- and here
My understanding of the problem is that the data subset doesn't contain any EdgeAngle
constraints and consequently I can fix this with
if N == 0:
return 1
Does this sound like a valid solution, or are there other parts of the code I would need to change to work with a subset of the data?
Thank you for your help with this!
Hey,
.
I was trying to work through your code and this strikes me as a bug.
The entities Arc/Circle would be concentric if they have the same center. You test for the center and the start point being the same.
I don't think the Circle even has a start_point. Does this sound right?
Hello,
Thank you for sharing an interesting dataset.
I am currently exploring this dataset as it consists of sketches. I wanted to know how to get a unique label from the sequence information , so that could be used for either classification or object detection task?
Also, what is the distribution of graphs sketches , are the graphs repeated in train-val and test sets?
Thank you
Anshu
Hello,
Thank you for sharing an interesting dataset.
I am currently exploring this dataset as it consists of sketches. I wanted to know how to get a unique label from the sequence information , so that could be used for either classification or object detection task?
Also, what is the distribution of graphs sketches , are the graphs repeated in train-val and test sets?
Thank you
Anshu
Hi, thanks for the amazing working!
This is a really minor issue, but it would be nice if we'd be able to install the requirements via
pip install -r requirements.txt
This currently breaks due to pytorch>=1.5
and python>=3.7
.
i can't found auxiliary dataset. how to get auxiliary dataset?
make_quantization_statistics.py
When interacting with Onshape's API with the guide of demo, I find some sketches in the dataset result in the warning "Some constraints are not applicable to the current external references and have not been solved."
Will this affect Onshape's solver? what causes the warning and how to deal with it?
code below
from sketchgraphs.data import flat_array
import sketchgraphs.data as datalib
import sketchgraphs.onshape.call as onshape_call
dataset = 'validation'
url = R'https://cad.onshape.com/documents/xxxxxx' # onshape document url
seq_data = flat_array.load_dictionary_flat('sequence_data/sg_t16_validation.npy')
seq = seq_data['sequences'][100746]
sketch = datalib.sketch_from_sequence(seq)
onshape_call.add_feature(url, sketch.to_dict(), 'my sketch')
Hello,
I was trying to compile the package with setup.py
under an Ubuntu 18.04 system with PyTorch 1.7.0. But I cannot compile the extensions. I get the following error
/usr/include/c++/6/tuple: In instantiation of ‘static constexpr bool std::_TC<, _Elements>::_MoveConstructibleTuple() [with _UElements = {std::tuple<at::Tensor, at::Tensor, at::Tensor>}; bool = true; _Elements = {at::Tensor, at::Tensor, at::Tensor}]’:
/usr/include/c++/6/tuple:626:248: required by substitution of ‘template<class ... _UElements, typename std::enable_if<(((std::_TC<(sizeof... (_UElements) == 1), at::Tensor, at::Tensor, at::Tensor>::_NotSameTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor, at::Tensor, at::Tensor>::_MoveConstructibleTuple<_UElements ...>()) && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor, at::Tensor, at::Tensor>::_ImplicitlyMoveConvertibleTuple<_UElements ...>()) && (3ul >= 1)), bool>::type > constexpr std::tuple< >::tuple(_UElements&& ...) [with _UElements = {std::tuple<at::Tensor, at::Tensor, at::Tensor>}; typename std::enable_if<(((std::_TC<(sizeof... (_UElements) == 1), at::Tensor, at::Tensor, at::Tensor>::_NotSameTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor, at::Tensor, at::Tensor>::_MoveConstructibleTuple<_UElements ...>()) && std::_TC<(1ul == sizeof... (_UElements)), at::Tensor, at::Tensor, at::Tensor>::_ImplicitlyMoveConvertibleTuple<_UElements ...>()) && (3ul >= 1)), bool>::type = ]’
/home/parawr/anaconda3/lib/python3.7/site-packages/torch/include/ATen/core/TensorMethods.h:5613:173: required from here
/usr/include/c++/6/tuple:483:67: error: mismatched argument pack lengths while expanding ‘std::is_constructible<_Elements, _UElements&&>’
return _and<is_constructible<_Elements, _UElements&&>...>::value;
Could you help me with this?
Hi SketchGraphs Team,
I'm running the sketchgraphs generative model like this
python -m sketchgraphs_models.graph.train \
--dataset_train /data/sg_t16_train.npy \
--dataset_test /data/sg_t16_test.npy
I'm seeing a worrying warning from NumPy
UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1607370156314/work/torch/csrc/utils/tensor_numpy.cpp:141.)
self._offsets = torch.as_tensor(self._offsets).share_memory_()
I'm using Ubuntu 18.04.5 with a Quadro RTX 6000 GPU
Python 3.7.7
Pytorch 1.7.1
Cuda 11.0.221
Numpy 1.19.2
Full output from conda list
is at the bottom of this message
I find I can fix the problem following the pytorch thread here like this
In (flat_array.py)[flat_array.py]
def __init__(self, offsets, pickle_data):
"""
pickle_data : array_like
an array of bytes representing the serialized data for all objects.
"""
- self._offsets = offsets
- self._pickle_data = pickle_data
+ self._offsets = np.copy(offsets)
+ self._pickle_data = np.copy(pickle_data)
This gets rid of the warning, but I'm worried it might be having side effects. I'm seeing some very odd stuff happening. Could you let me know if you think this fix is sensible. I'm happy to make a PR to submit the change if it looks useful.
Also please let me know if you need any other details of my setup.
Full output from conda list
# packages in environment at /home/lambouj/anaconda3/envs/sketchgraphs_fresh:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
absl-py 0.12.0 pyhd8ed1ab_0 conda-forge
blas 1.0 mkl
blinker 1.4 py_1 conda-forge
brotlipy 0.7.0 py37hb5d75c8_1001 conda-forge
c-ares 1.17.1 h36c2ea0_0 conda-forge
ca-certificates 2020.10.14 0 anaconda
cachetools 4.2.1 pyhd8ed1ab_0 conda-forge
cairo 1.14.12 h8948797_3
certifi 2020.6.20 py37_0 anaconda
cffi 1.14.4 py37h11fe52a_0 conda-forge
chardet 4.0.0 py37h89c1867_1 conda-forge
click 7.1.2 pyh9f0ad1d_0 conda-forge
cryptography 3.2.1 py37hc72a4ac_0 conda-forge
cudatoolkit 11.0.221 h6bb024c_0
cycler 0.10.0 py_2 conda-forge
dbus 1.13.6 he372182_0 conda-forge
expat 2.2.10 he6710b0_2
fontconfig 2.13.1 h6c09931_0
freetype 2.10.4 h5ab3b9f_0
fribidi 1.0.10 h7b6447c_0
glib 2.63.1 h5a9c865_0
google-auth 1.21.3 py_0 conda-forge
google-auth-oauthlib 0.4.1 py_2 conda-forge
graphite2 1.3.14 h23475e2_0
graphviz 2.40.1 h21bd128_2
grpcio 1.33.2 py37haffed2e_2 conda-forge
gst-plugins-base 1.14.5 h0935bb2_2 conda-forge
gstreamer 1.14.5 h36ae1b5_2 conda-forge
harfbuzz 1.8.8 hffaf4a1_0
icu 58.2 he6710b0_3
idna 2.10 pyh9f0ad1d_0 conda-forge
importlib-metadata 3.7.3 py37h89c1867_0 conda-forge
intel-openmp 2020.2 254
jpeg 9b h024ee3a_2
kiwisolver 1.3.1 py37hc928c03_0 conda-forge
lcms2 2.11 h396b838_0
ld_impl_linux-64 2.33.1 h53a641e_7
libffi 3.2.1 hf484d3e_1007
libgcc-ng 9.1.0 hdf63c60_0
libpng 1.6.37 hbc83047_0
libprotobuf 3.13.0.1 h8b12597_0 conda-forge
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.2.0 h3942068_0
libuuid 1.0.3 h1bed415_2
libuv 1.40.0 h7b6447c_0
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.10 hb55368b_3
lz4 3.1.0 py37h7b6447c_0 anaconda
lz4-c 1.9.3 h2531618_0
markdown 3.3.4 pyhd8ed1ab_0 conda-forge
matplotlib 3.3.4 py37h89c1867_0 conda-forge
matplotlib-base 3.3.4 py37h62a2d02_0
mkl 2020.2 256
mkl-service 2.3.0 py37he8ac12f_0
mkl_fft 1.3.0 py37h54f3939_0
mkl_random 1.1.1 py37h0573a6f_0
ncurses 6.2 he6710b0_1
ninja 1.10.2 py37hff7bd54_0
numpy 1.19.2 py37h54aff64_0
numpy-base 1.19.2 py37hfa32c7d_0
oauthlib 3.0.1 py_0 conda-forge
olefile 0.46 py37_0
openssl 1.1.1h h7b6447c_0 anaconda
pango 1.42.4 h049681c_0
pcre 8.44 he6710b0_0
pillow 8.1.2 py37he98fc37_0
pip 21.0.1 py37h06a4308_0
pixman 0.40.0 h7b6447c_0
protobuf 3.13.0.1 py37h745909e_1 conda-forge
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.7 py_0 conda-forge
pycparser 2.20 pyh9f0ad1d_2 conda-forge
pygraphviz 1.3 py37h14c3975_1
pyjwt 2.0.1 pyhd8ed1ab_0 conda-forge
pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyqt 5.9.2 py37hcca6a23_4 conda-forge
pysocks 1.7.1 py37h89c1867_3 conda-forge
python 3.7.7 hcf32534_0_cpython
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.7 1_cp37m conda-forge
pytorch 1.7.1 py3.7_cuda11.0.221_cudnn8.0.5_0 pytorch
qt 5.9.7 h5867ecd_1
readline 8.1 h27cfd23_0
requests 2.25.1 pyhd3deb0d_0 conda-forge
requests-oauthlib 1.3.0 pyh9f0ad1d_0 conda-forge
rsa 4.7.2 pyh44b312d_0 conda-forge
setuptools 52.0.0 py37h06a4308_0
sip 4.19.8 py37hf484d3e_1000 conda-forge
six 1.15.0 py37h06a4308_0
sqlite 3.35.2 hdfb4753_0
tensorboard 2.4.1 pyhd8ed1ab_0 conda-forge
tensorboard-plugin-wit 1.8.0 pyh44b312d_0 conda-forge
tk 8.6.10 hbc83047_0
torchaudio 0.7.2 py37 pytorch
torchvision 0.8.2 py37_cu110 pytorch
tornado 6.1 py37h4abf009_0 conda-forge
typing_extensions 3.7.4.3 pyha847dfd_0
urllib3 1.26.4 pyhd8ed1ab_0 conda-forge
werkzeug 1.0.1 pyh9f0ad1d_0 conda-forge
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
zipp 3.4.1 pyhd8ed1ab_0 conda-forge
zlib 1.2.11 h7b6447c_3
zstd 1.4.5 h9ceee32_0
Hey there,
I am trying to parse the dataset. I don't completely grok the external node. Moreso the fact that there are certain points that are coincident to it.
What is it used for? Is it expected by the OnShape API? What would happen if one submits a sequence to OnShape without a) the external node b) constraints on the external node? I am guessing the constraints are mostly coincidences.
Thanks!
In the documentation, there is a mention that training can be greatly accelerated using the native extensions, but no given comment on how to generate the extensions.
Here is where the documentation says there should be native extensions
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.