Comments (17)
It would be amazing to learn how to train from scratch, i.e. on a a bunch of folders with SVGs.
Super excited by this project. :D
from deepsvg.
Hello, Alex.
Great work and Thank You for this library π
I have been playing around with it and inspired by the preprocess.py
, modified it to a (much needed) simple script to batch convert SVGs to *.pkl
tensors.
To anyone interested:
from concurrent import futures
import os
from argparse import ArgumentParser
import logging
from tqdm import tqdm
import glob
import pickle
import sys
sys.path.append('..')
from deepsvg.svglib.svg import SVG
def convert_svg(svg_file, output_folder):
filename = os.path.splitext(os.path.basename(svg_file))[0]
svg = SVG.load_svg(svg_file)
tensor_data = svg.to_tensor()
with open(os.path.join(output_folder, f"{filename}.pkl"), "wb") as f:
dict_data = {
"tensors": [[tensor_data]],
"fillings": [0]
}
pickle.dump(dict_data, f, pickle.HIGHEST_PROTOCOL)
def main(args):
with futures.ThreadPoolExecutor(max_workers=args.workers) as executor:
svg_files = glob.glob(os.path.join(args.input_folder, "*.svg"))
with tqdm(total=len(svg_files)) as pbar:
preprocess_requests = [executor.submit(convert_svg, svg_file, args.output_folder) for svg_file in svg_files]
for _ in futures.as_completed(preprocess_requests):
pbar.update(1)
logging.info("SVG files' conversion to tensors complete.")
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
parser = ArgumentParser()
parser.add_argument("--input_folder")
parser.add_argument("--output_folder")
parser.add_argument("--workers", default=4, type=int)
args = parser.parse_args()
if not os.path.exists(args.output_folder): os.makedirs(args.output_folder)
main(args)
All the best.
from deepsvg.
Will close when I've written the custom dataset creation notebook π
from deepsvg.
Many months ago, I had retrained DeepSVG from scratch and developed a new library for preprocessing SVGs. I was able to retrain from scratch. Please ping me (here or on Twitter: @wichmaennchen) if the problems persist. I may be able to invest some time and help out.
from deepsvg.
Hey, great question!
I'll write a simple notebook this evening or tomorrow explaining the process step by step.
- You're correct, you need to have individual glyphs in SVG format. There is already a method implemented to convert from FontForge's SplineSet to SVG. So if you don't use FontForge, you will need to convert .ttf to .svg yourself.
- It's almost like that, although you also need to add data augmentation. All tensors are then added to a dictionary and saved in .pkl format.
- Please create a separate issue for this! I'll add your use case to the notebook, although it may take a little longer since I didn't have to use it yet, but it sounds very feasible :)
from deepsvg.
ok, I will create a new issue about 3
thank you for your reply
from deepsvg.
you are so nice !!!
from deepsvg.
Hi Alex,
Great work. Congrats!
I want to try the network on my own data, which are raster images.
I dynamically convert them to SVG using potrace (let me know if there exists a more efficient way please!), and then use the above code to get the tensor:
svg=SVG.load_svg("some_char.svgβ).normalize().zoom(0.9).canonicalize().simplify_heuristic()
tensor_data = svg.to_tensor()
svg_data = SVGTensor.from_data(tensor_data)
Here are my questions:
1- Do I need these: ".normalize().zoom(0.9).canonicalize().simplify_heuristic()" ?
2- Is there any preprocessing I have to do ?
3- How can I convert svg_data to the format you pass to the model? They are tensors but this one is SVGTensor.
Thanks
from deepsvg.
@alexandre01 you mentioned that you have already added the "custom dataset creation notebook" but I am not sure which one it is. Am I missing something?
from deepsvg.
Hi @alexandre01
Thank you so sharing this repo! Very interesting work!
I'm also trying to train deepsvg on a custom dataset, but I'm unsure how the data should be structured.
I've tried to train and got into an indexing issue I don't fully understand:
Traceback (most recent call last):
File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 150, in <module>
train(cfg, model_name, experiment_name, log_dir=args.log_dir, debug=args.debug, resume=args.resume)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 26, in train
dataset = dataset_load_function(cfg)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 242, in load_dataset
cfg.filter_uni, cfg.filter_platform, cfg.filter_category, cfg.train_ratio)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 57, in __init__
loaded_tensor = self._load_tensor(self.idx_to_id(0))
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 111, in idx_to_id
return self.df.iloc[idx].id
File "C:\Users\george.profenza\Downloads\gp\deepsvg-env\lib\site-packages\pandas\core\indexing.py", line 931, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\Users\george.profenza\Downloads\gp\deepsvg-env\lib\site-packages\pandas\core\indexing.py", line 1566, in _getitem_axis
self._validate_integer(key, axis)
File "C:\Users\george.profenza\Downloads\gp\deepsvg-env\lib\site-packages\pandas\core\indexing.py", line 1500, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
(self.idx_to_id(0)
seems to be the issue)
I've tried using the preprocess script and noticed it's augmenting svgs, but it wasn't saving the pickle files.
I've attempted to use the comments above and got it to save .pkl files in different ways:
just using to_tensor()
:
tensor_data = svg.to_tensor()
with open(os.path.join(output_folder, f"{filename}.pkl"), "wb") as f:
dict_data = {
"tensors": [[tensor_data]],
"fillings": [0]
}
pickle.dump(dict_data, f, pickle.HIGHEST_PROTOCOL)
a variation of the above (spotted in the svglib notebook): tensor_data = svg.copy().numericalize().to_tensor()
and also using SVGTensor:
tensor_data = svg.copy().numericalize().to_tensor()
tensor_data = SVGTensor.from_data(tensor_data)
I'm not sure what the correct method of converting the processed svg to pickle is so I can train.
Printing the pandas object from the loaded fonts dataset I do see relevant data:
self.df.iloc.obj id binary_fp uni total_len nb_groups len_groups max_len_group
0 5658657305760304754_99 5658657305760304754 99 22 1 [22] 22
1 11280665330421698568_108 11280665330421698568 108 19 1 [19] 19
2 6786671966848343352_97 6786671966848343352 97 27 2 [18, 9] 18
3 17302457245611577159_121 17302457245611577159 121 22 1 [22] 22
5 18110689581214114864_66 18110689581214114864 66 44 3 [27, 9, 8] 27
... ... ... ... ... ... ... ...
99994 13209403418406559934_117 13209403418406559934 117 15 1 [15] 15
99996 9524159807492630733_50 9524159807492630733 50 23 1 [23] 23
99997 17351593260041237331_51 17351593260041237331 51 49 5 [26, 5, 6, 6, 6] 26
99998 14735752356892000110_110 14735752356892000110 110 26 1 [26] 26
99999 3067464349541363522_50 3067464349541363522 50 25 1 [25] 25
However, when loading my converted dataset (either using SVGTensor (larger pickle file) or just to_tensor()
(smaller pickle file)), obj is empty:
self.df.iloc.obj Empty DataFrame
For reference, here's a raw svg:
<?xml version="1.0"?>
<!DOCTYPE svg PUBLIC '-//W3C//DTD SVG 1.0//EN'
'http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd'>
<svg xmlns:xlink="http://www.w3.org/1999/xlink" style="fill-opacity:1; color-rendering:auto; color-interpolation:auto; text-rendering:auto; stroke:black; stroke-linecap:square; stroke-miterlimit:10; shape-rendering:auto; stroke-opacity:1; fill:black; stroke-dasharray:none; font-weight:normal; stroke-width:1; font-family:'Dialog'; font-style:normal; stroke-linejoin:miter; font-size:12px; stroke-dashoffset:0; image-rendering:auto;" width="500" height="500" xmlns="http://www.w3.org/2000/svg"
><!--Generated by the Batik Graphics2D SVG Generator--><defs id="genericDefs"
/><g
><g style="stroke-linecap:round;"
><line y2="324.067" style="fill:none;" x1="236.4454" x2="109.2297" y1="204.986"
/></g
><g style="stroke-linecap:round;"
><line y2="422.2296" style="fill:none;" x1="109.2297" x2="263.5546" y1="324.067"
/><line y2="303.1487" style="fill:none;" x1="263.5546" x2="390.7703" y1="422.2296"
/><line y2="204.986" style="fill:none;" x1="390.7703" x2="236.4454" y1="303.1487"
/><line y2="77.7704" style="fill:none;" x1="109.2297" x2="236.4454" y1="196.8513"
/><line y2="175.9331" style="fill:none;" x1="236.4454" x2="390.7703" y1="77.7704"
/><line y2="295.014" style="fill:none;" x1="390.7703" x2="263.5546" y1="175.9331"
/><line y2="196.8513" style="fill:none;" x1="263.5546" x2="109.2297" y1="295.014"
/><line y2="422.2296" style="fill:none;" x1="390.7703" x2="263.5546" y1="303.1487"
/><line y2="295.014" style="fill:none;" x1="263.5546" x2="263.5546" y1="422.2296"
/><line y2="175.9331" style="fill:none;" x1="263.5546" x2="390.7703" y1="295.014"
/><line y2="303.1487" style="fill:none;" x1="390.7703" x2="390.7703" y1="175.9331"
/><line y2="204.986" style="fill:none;" x1="109.2297" x2="236.4454" y1="324.067"
/><line y2="77.7704" style="fill:none;" x1="236.4454" x2="236.4454" y1="204.986"
/><line y2="196.8513" style="fill:none;" x1="236.4454" x2="109.2297" y1="77.7704"
/><line y2="324.067" style="fill:none;" x1="109.2297" x2="109.2297" y1="196.8513"
/><line y2="175.9331" style="fill:none;" x1="390.7703" x2="390.7703" y1="303.1487"
/><line y2="77.7704" style="fill:none;" x1="390.7703" x2="236.4454" y1="175.9331"
/><line y2="204.986" style="fill:none;" x1="236.4454" x2="236.4454" y1="77.7704"
/><line y2="303.1487" style="fill:none;" x1="236.4454" x2="390.7703" y1="204.986"
/><line y2="196.8513" style="fill:none;" x1="109.2297" x2="109.2297" y1="324.067"
/><line y2="295.014" style="fill:none;" x1="109.2297" x2="263.5546" y1="196.8513"
/><line y2="422.2296" style="fill:none;" x1="263.5546" x2="263.5546" y1="295.014"
/><line y2="324.067" style="fill:none;" x1="263.5546" x2="109.2297" y1="422.2296"
/></g
></g
></svg
>
I've uploaded a few converted pkl as well (1, 2, 3)
Can you please advise on how I might get my own deepsvg dataset trained ?
(You've mentioned a training notebook (or google colab notebooks): would you happen to still have that around can share ?)
Thank you so much for your time,
George
from deepsvg.
Update
I've managed to get past the empy data frame issue by hackily commenting this section in svgtensor_dataset.py:
# df = df[(df.nb_groups <= max_num_groups) & (df.max_len_group <= max_seq_len)]
# if max_total_len is not None:
# df = df[df.total_len <= max_total_len]
however this landed me right at this error:
Traceback (most recent call last):
File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 150, in <module>
train(cfg, model_name, experiment_name, log_dir=args.log_dir, debug=args.debug, resume=args.resume)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 51, in train
cfg.set_train_vars(train_vars, dataloader)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\configs\deepsvg\default_icons.py", line 77, in set_train_vars
for idx in random.sample(range(len(dataloader.dataset)), k=10)]
File "C:\Users\george.profenza\Downloads\gp\deepsvg\configs\deepsvg\default_icons.py", line 77, in <listcomp>
for idx in random.sample(range(len(dataloader.dataset)), k=10)]
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 177, in get
return self.get_data(t_sep, fillings, model_args=model_args, label=label)
File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 208, in get_data
res[arg] = torch.stack([t.cmds() for t in t_list])
RuntimeError: stack expects each tensor to be equal size, but got [66] at entry 0 and [32] at entry 1
Suspecting it's related, but currently I don't fully understand how the data should be structured.
Any hints/tip on how I may train on a custom dataset are highly appreciated.
Thank you so much,
George
from deepsvg.
I had another shot and spotted the default model parameters that after as filters for the meta data frames.
However, I'm still stuck in the same get_data
section.
In getting slightly different conditions hitting torch.stack errors, but it's pretty much the same area.
@pwichmann If you have version of deepSVG I'd like to give that a go.
(Will DM)
Thank you so much for offering to support
from deepsvg.
Update
I've managed to get past the empy data frame issue by hackily commenting this section in svgtensor_dataset.py:
# df = df[(df.nb_groups <= max_num_groups) & (df.max_len_group <= max_seq_len)] # if max_total_len is not None: # df = df[df.total_len <= max_total_len]however this landed me right at this error:
Traceback (most recent call last): File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\george.profenza\.pyenv\pyenv-win\versions\3.7.4-amd64\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 150, in <module> train(cfg, model_name, experiment_name, log_dir=args.log_dir, debug=args.debug, resume=args.resume) File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\train.py", line 51, in train cfg.set_train_vars(train_vars, dataloader) File "C:\Users\george.profenza\Downloads\gp\deepsvg\configs\deepsvg\default_icons.py", line 77, in set_train_vars for idx in random.sample(range(len(dataloader.dataset)), k=10)] File "C:\Users\george.profenza\Downloads\gp\deepsvg\configs\deepsvg\default_icons.py", line 77, in <listcomp> for idx in random.sample(range(len(dataloader.dataset)), k=10)] File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 177, in get return self.get_data(t_sep, fillings, model_args=model_args, label=label) File "C:\Users\george.profenza\Downloads\gp\deepsvg\deepsvg\svgtensor_dataset.py", line 208, in get_data res[arg] = torch.stack([t.cmds() for t in t_list]) RuntimeError: stack expects each tensor to be equal size, but got [66] at entry 0 and [32] at entry 1Suspecting it's related, but currently I don't fully understand how the data should be structured.
Any hints/tip on how I may train on a custom dataset are highly appreciated.
Thank you so much, George
Hi, I had the exact same problem as you. Do you have a solution now?
from deepsvg.
RuntimeError: stack expects each tensor to be equal size, but ...
This problem is due to the fact that, the number of command of a path in your svg file, is greater than the limitation. The limitation is max_seq_len + 2
in deepsvg/model/config.py
, where +2
represents EOS
and SOS
.
So, the following codes are used to select svgs that meet the requirement.
df = df[(df.nb_groups <= max_num_groups) & (df.max_len_group <= max_seq_len)]
if max_total_len is not None:
df = df[df.total_len <= max_total_len]
Besides, if you want to construct your own dataset, you have to run preprocess.py
. In this file, an important operation is drop_z()
in svg.canonicalize()
, which removes command Z
in svg files. This is because, after my experiment, svg images remain the same after removing command Z
.
from deepsvg.
In this file, an important operation is
drop_z()
insvg.canonicalize()
, which removescommand Z
in svg files. This is because, after my experiment, svg images remain the same after removingcommand Z
.
Agree with previous statement, but don't understand the operation to drop Z. Z command means moving the brush to the beginning of the path so that make the path closed. It's important and Z command is one of 7 command types encoded so I cannot understand the operation of removing them because this makes the command types seems nonsense.
from deepsvg.
Agree with previous statement, but don't understand the operation to drop Z. Z command means moving the brush to the beginning of the path so that make the path closed. It's important and Z command is one of 7 command types encoded so I cannot understand the operation of removing them because this makes the command types seems nonsense.
Well, I agree that Z command
is important in SVG files, and we should not drop Z
.
But drop_z()
seems resonable, because usually, Z
is the last command of a path, M
is the first command of a path. This means that, if we delete Z
, we can still draw the correct SVG because the M command
will move the cursor to the right position. But errors would occur when other commands like C
and L
follow the Z
to be deleted.
from deepsvg.
Hello, Alex.
Great work and Thank You for this library π I have been playing around with it and inspired by the
preprocess.py
, modified it to a (much needed) simple script to batch convert SVGs to*.pkl
tensors.To anyone interested:
from concurrent import futures import os from argparse import ArgumentParser import logging from tqdm import tqdm import glob import pickle import sys sys.path.append('..') from deepsvg.svglib.svg import SVG def convert_svg(svg_file, output_folder): filename = os.path.splitext(os.path.basename(svg_file))[0] svg = SVG.load_svg(svg_file) tensor_data = svg.to_tensor() with open(os.path.join(output_folder, f"{filename}.pkl"), "wb") as f: dict_data = { "tensors": [[tensor_data]], "fillings": [0] } pickle.dump(dict_data, f, pickle.HIGHEST_PROTOCOL) def main(args): with futures.ThreadPoolExecutor(max_workers=args.workers) as executor: svg_files = glob.glob(os.path.join(args.input_folder, "*.svg")) with tqdm(total=len(svg_files)) as pbar: preprocess_requests = [executor.submit(convert_svg, svg_file, args.output_folder) for svg_file in svg_files] for _ in futures.as_completed(preprocess_requests): pbar.update(1) logging.info("SVG files' conversion to tensors complete.") if __name__ == '__main__': logging.basicConfig(level=logging.INFO) parser = ArgumentParser() parser.add_argument("--input_folder") parser.add_argument("--output_folder") parser.add_argument("--workers", default=4, type=int) args = parser.parse_args() if not os.path.exists(args.output_folder): os.makedirs(args.output_folder) main(args)All the best.
This doesn't generate a meta.csv, am I right? It's necessary when using the SVGDataloader included in the library.
from deepsvg.
Related Issues (20)
- Tensor sizes do not match HOT 1
- Misleading Markdown HOT 1
- SVG.from_tensor() HOT 1
- TypeError: must be real number, not NoneType HOT 1
- ALSA lib errors when importing SVG
- Errors while preprocessing svg files HOT 1
- Metrics for RE and IS HOT 1
- Requirments setup error
- Update requirements.txt to 1.12.1 from 1.4.0 as 1.4.0 does not exist anymore. HOT 2
- Could you post a simple example of how to view the icon .pkl files? HOT 2
- DeepSVG for text-conditioned vector generation. HOT 1
- AttributeError: 'SVGRectangle' object has no attribute 'translate' ---Bug while using own svg file HOT 1
- About own datasets HOT 1
- Loading SVGs does not carry over their stroke attributes HOT 1
- ERROR: Could not find a version that satisfies the requirement torch==1.4.0 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0) ERROR: No matching distribution found for torch==1.4.0
- RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
- Hello, I probably find I bug in 'deepsvg/svglib /svg.py' (merge_group)
- SVGCommandArc is not implemented yet
- RuntimeError: The size of tensor a (45) must match the size of tensor b (8) at non-singleton dimension 0 HOT 2
- How to adjust max_num_groups and max_total_len
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepsvg.