TL;DR
- Research: I have determined that it is currently difficult to build an AI for visual inspection with sufficient accuracy using only zero-shot LLMs. I'm considering the application of Data-Centric methods such as autoLabeling and mislabelDetection, while keeping an eye on new model releases.
- Life: I held a farewell party for a friend returning home in March. I played basketball with someone we met at the gym and hiking (photo: StanfordDish) to maintain our physical and mental health.
1. Research
I had a productive discussion with the AI startup Snorkel, and we were able to refine our ideas for implementing Data-Centric features for VIHub (tentative name), resulting in satisfactory progress.
03/12 📝 Meeting w/ Snorkel CV Team
03/21 🍽️ CS Visiting Scholars Lunch
Progress
Chat with Snorkel Team
- I had a meeting with the CV Team where I introduced our research and they demonstrated their SaaS.
- Their core technology, Weak Supervision, is particularly useful for tasks involving NLP or structured datasets, and they are expanding their services to include consulting for BigTech companies as their main clients. Users simply write a simple Labeling Function, and the Snorkel library uses a Graphical Model to perform Autolabeling. For example, Gmail has used Snorkel in the development of its spam detection AI.
- Regarding image-based services, it seemed they are still in the early stages, but the examples of utilizing Weak Supervision and practical application were insightful.
- The Snorkel library is available on GitHub, and we will consider how it can be utilized in our future research.
- There are no immediate plans for joint research or development, but I will continue to exchange information.
PoC Development
- Implemented a monitoring feature for collected images that generates embeddings upon storing in Image Storage and links these embeddings to a Vector Database.
- Technology selection:
- Multimodal Vector Embedding:
CLIP
- Similarity Metrics:
Cosine Distance
- Vector Database:
Pinecone
- Multimodal Vector Embedding:
- The Python SDK was also upgraded.
- Technology selection:
- Planning to release the following features utilizing this:
- Similarity Search: Display images whose embeddings are similar to a given image.
- Semantic Search: Display images whose embeddings are close to a Text Query.
- Duplication Detection: Detect images with a Cosine Distance of 1 and delete duplicate images.
- Mislabel Detection: Detect images that are closely embedded but labeled differently.
- Anomaly Detection: Detect issues like shooting defects or changes in shooting conditions and anomalies from images that are significantly distant from typical image embeddings.
https://image-search.msrks.dev
Literature Research, etc.
I believe the following Visual Foundation Models are well-suited for a Data-Centric approach and would like to incorporate them:
- Grounding DINO: Zero-shot Object Detection
- SAM (Segment Anything Model): Zero-shot Semantic Segmentation
Future Plan
- Continue implementing the Data-Centric AI features mentioned above.
2. Life
03/05 ✈️ Goodbye 🇨🇳Alan @SanJoseAirport
03/08 🍽️ Global Chef
03/10 🥾 Hiking @StanfordDish
03/16 🍽️ Farewell Party @🇰🇷Ed
03/24 ☕️ Tea Party @🇺🇸Tom&Steffi
03/28 Meeting 🇨🇳Valerie @SanMateoCollege
03/28 🥾 Hiking @MuirWoodsNationalMonument
Laoshi Village Website Project
I went to the airport to see off my friend Alan, whom I met at a Stanford English conversation class, as he is temporarily returning to his home country in March. He is the mayor of a village on Hainan Island in China. I have been asked to build a website to promote tourism, which I have taken on as a private project. We will continue our communication remotely until he returns here in September.
Despite struggling with China's Great Firewall, I've managed to make some progress with the technology selection and design proposals, and the project is starting to take shape. My recent challenge is that I can't use major social authentication services like Google, so I applied for developer access to WeChat (Tencent) and TikTok, but my applications were rejected.
Farewell Party
We held a farewell party for Ed, my roommate at the Gates Building, who is returning to his home country in March. He mentioned that he would continue to provide advice on research and career matters in the future, and I expressed my gratitude to him.
Hiking @ Muir Woods National Monument
I went hiking in Muir Woods. I was able to refresh myself in nature..