Comments (1)
Can descriptions be added? Otherwise, this collection of a bunch of URLs (albeit alphabetized) has little use. Maybe a script that goes through them and retrieves
document.title
would do?
I agree. For technologies is fine as it is, for companies also, but for individuals categories at least one should be added. I do not know 100% of them and checking each site one by one is a nightmare. I am not sure is it legal. Something like this should do the job.
import requests
from bs4 import BeautifulSoup
from transformers import pipeline
# Initialize a summarization pipeline
summarizer = pipeline("summarization")
def crawl_and_summarize(url):
# Crawl the webpage
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
# Extract main content, could be more specific based on site structure
text = soup.get_text()
# Summarize the text
summary = summarizer(text, max_length=130, min_length=30, do_sample=False)
return summary[0]['summary_text']
def read_urls_from_file(file_path):
with open(file_path, 'r') as file:
return [line.strip() for line in file if line.strip()]
# File containing URLs, one per line
file_path = 'urls.txt'
# Read URLs from the file
urls = read_urls_from_file(file_path)
# Crawl and summarize each URL
for url in urls:
try:
summary = crawl_and_summarize(url)
print(f"URL: {url}\nSummary: {summary}\n")
except Exception as e:
print(f"Error processing {url}: {e}")
In this script:
- URLs are read from a file named
'urls.txt'
, but you can change thefile_path
variable to the actual path of your file. - The script reads each line from the file, strips any leading/trailing whitespace, and ignores empty lines.
- Error handling is added to continue processing even if an error occurs with a specific URL.
Remember to place the 'urls.txt'
file in the same directory as your script, or provide the absolute path to the file. Also, ensure that each URL in the file is on a new line.
from engineering-blogs.
Related Issues (20)
- slack support for dev weekly
- Moved/dead feeds HOT 2
- Reddit https://redditblog.com/topic/technology HOT 1
- New client supporting your dynamic OPML / reading list
- Sorting Based On Topic And Category HOT 1
- Add Photomath Engineering blog
- Use Markdown syntax for hyperlinks HOT 1
- Inactive blog deletion HOT 2
- Article
- Gojek Engineering Blog
- Adding Prime Video technical blog
- Adding another Categroy
- Update Grammarly tech blog link
- Request to add a blog to the list
- Starting a newsletter with curated blog posts HOT 2
- Incompatible nokogiri and ruby versions HOT 1
- reflect categories from markdown in opml file
- Will Dev Weekly keep publishing? HOT 1
- [New Feature] Auto Generate engineering_blogs.opml file from Github Actions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from engineering-blogs.