TL;DR
- Research: I concretized the future direction while deepening my understanding of the lab's vision and achievements.
- Life: I built connections and enjoyed every day. photo: Lab Dineer.
I was stimulated by my lab's essential approaches to problem-solving, which also posed some questions for my ex-environment.
- How valuable is it to engage in data analysis when the mechanisms for data accumulation are not yet established?
- Regarding data accumulation, isn't it inefficient to construct databases for each method of analysis?
- Wouldn't a more centralized data management system, accessible by analysts, be more fundamental?
- In an era where data analysis is increasingly possible through natural language, lowering the entry barrier, can analysts without domain knowledge still maintain their value? Are they even necessary?
- What should an organization look like to provide maximum value?
While refining my research theme, I want to contemplate these issues as well.
1. Chris Re Lab
The Chris Re Lab is a unique laboratory situated at the cutting edge of Stanford CS, maintaining a strong presence in both academic and industrial fields.
- Vision: Chris Re is one of the industry leaders pushing forward Data-Centric AI, alongside figures like Andrew Ng. He has a strong commitment to contributing to Open Source.
- Members: The team is diverse, including individuals with professional experience at companies like Google, Meta, and MS-Research, as well as those with degrees in Biology, Law, and more.
- Locations: The lab operates both on-campus (Gates) and off-campus (Factory), a venture capital firm where Chris Re is a Founder.
- Operations: There are mandatory in-person meetings for all members twice a week (Mondays and Wednesdays).
- Tools: The lab utilizes tools such as Github, Slack, and Google Sheets.
- Education 1: Chris Re teaches Foundation Models in CS324.
- Education 2: The lab manages the MLSys Seminar, which boasts 16,000 subscribers on Youtube.
- Startup 1: The lab has spawned several startups, notably Snorkel, which counts among its main clients companies like GAFAM. In Japan, Hitachi plays a role in promoting Snorkel's technology, providing Auto Labeling and Data Curation as representative products. Snorkel allows analysts to write simple heuristic functions, providing weak supervision that serves as hints for automatic data labeling. Label estimation is realized by solving Graphical Models.
- Startup 2: Chris Re is also a founder of Together, known for the RedPajama Project in the OpenSourceLLM space. Previously at Alpaca, Lamma was instruct-tuned using text-davinci results, but this raised licensing issues. With RedPajama, the goal is to provide a complete OpenSource resource, including datasets.
- Achievements: The lab consistently publishes papers at top conferences such as ICML, NeurIPS, and ICLR, while focusing on highly practical research. A recent notable work is FlashAttention, which has been adopted by almost all AI libraries including Pytorch and Tensorflow (see usage). FlashAttention is a hardware-aware implementation algorithm for attention mechanisms. Following studies like H3, Hyena, and M2 are also very promising. Essentially, they aim to replace the $O(N^2)$ attention mechanism with a combination of GlobalConv and Dense, achieving similar performance with a $O(NlogN)$ computational cost using FFT. This might mean that "Attention is All You Need" is not entirely accurate. Accelerating LLMs expands the possible context length, enabling the handling of multi-modal inputs, which tend to have a high token count, within LLMs. This is a fundamental step towards the ultimate goal of AI research: the realization of AGI. Based on these foundations, some lab members are working on projects like feeding DNA sequences into LLMs with HyenaDNA and performing reasoning on Legal Documents.
- Vision
- Data-Centric AI
- Open-Source will win
- from GPT-X to GPT-You
- Interest and Recent Achievements
- LLM
- Longer Context-Size
- FlashAttention, S4, H3, Hyena, M2
- Context Learning
- AMA
- Data-Centric AI
- Week Supervision
- Snorkel
- Data Validation
- Meerkat
- Application
- Bio
- HyenaDNA
- Law
- LegalBench
2. Research
# every week
Mon: Lab Meeting at Factory, Menlo Park
Wed: Lab Meeting at Gates, Stanford
Wed: ML Lunch at Gates, Stanford
# Networking w/ Research Discussion
10/20 🇯🇵Morio(Hitachi) on Campus
10/20 🇯🇵Kawaguchi(MUFG) on Campus
10/23 🇯🇵Hoshi(JR-East) at my-Home(BBQ🍖)
10/26 🇯🇵Hirose(ToyotaUSA) on Campus
10/26 🇯🇵Kobayashi(Bridgestone) on Campus
10/26 🇯🇵Mori(Toyota) on Campus
# Gates 282
Roommate: 🇰🇷Ed(Samsung, Visual Inspection AI)
Efforts This Month
- While attending meetings, I deepened my understanding of the Lab's vision and achievements.
- I exchanged contact information with 6 Japanese Visiting Scholars, who are also my peers in Computer Science. I also have regular conversations with my roommate, Ed.
- I worked on materializing and refining the vision for my First Project Idea, an LLM-based Visual Inspection Solution.
- As for Image Embedding, CLIP is currently the mainstream choice, but there are higher-performing alternatives like BLIP, and recently OpenAI released GPT4V, which leaves room for consideration. However, I am thinking of adopting CLIP for the Proof of Concept (PoC).
- I also considered whether showing a Few-Shot Example and relying on Context-Learning would be sufficient for inputting the Visual Inspection Spec. into the LLM, or whether FineTuning is necessary. Using LoRA would make FineTuning relatively easy, but this is Plan-B. I would like to proceed with the PoC using the former method first, as the trend in technology towards increasing Context-Length makes it an attractive and straightforward option.
Future Direction
- For the LLM-based Visual Inspection, I want to keep an eye on the developments regarding the release of the GPT4V API while continuing to conduct research and technical investigations, and also start working on the Proof of Concept (PoC).
- In parallel, since the manufacturing industry, unlike the tech industry, places greater importance on extracting value from limited and precious data, I want to expand my ideas to the development of solutions using Data-Centric AI methodologies.
- For instance, building high-quality label sets through multi-labeler collaboration, detecting error labels, and data augmentation as tools to improve label quality.
- From a certain perspective, LLM-based Visual Inspection can also be considered a methodology of Data-Centric AI, as it utilizes general knowledge that has been pre-trained and infuses priors through prompting to extract maximum value from a limited dataset.
3. Daily Life
# every week
Tue: English Debate Class
- 10/03 🇹🇼Stephanie (Love at First Sight)
- 10/10 🇨🇳Alan (Pessimism from Chinese History)
- 10/17 my turn🚀🚀 (Role of Prison) 🇦🇷Matthias (Octagonal Food Label)
- 10/24 🇫🇷Cremontine (Climate Change) 🇯🇵Koganei ("Old Enough!")
- 10/31 🇨🇳Rongkun (The Purpose of Life) 🇺🇸Mary (Pros/Cons of Halloween)
10/08 Test Drive w/ 🇮🇳Monoj
10/13 Buy Car from 🇮🇳Monoj
10/14 Video Call w/ 🇨🇦Clyde
10/14 Game Night on Campus
10/18 Lab Dinner at Palo Alto
10/19 Video Call w/ 🇧🇼Mika
10/21 Haloween Decoration Tour w/ 🇺🇸Liz
10/23 Driving Test at Santa Clara DMV w/ 🇯🇵Nakanishi(OFS)
10/26 Modern Music Concert on Campus
10/27 Driving Test at Santa Clara DMV
10/27 Global Chef Dinner on Campus
10/28 Halloweekend on Campus
10/29 Filoli Garden w/ 🇺🇸Liz
10/29 Jazz Concert on Campus
English Debate Class
I enrolled in an English class for International Students, which is held for free on campus. The class I am participating in is the Debate class on Tuesday afternoons, where students take turns acting as facilitators, providing a Debate Topic along with some background knowledge. The students come from various countries around the world, and this class provides an opportunity not only to practice English conversation but also to deepen understanding of different cultures and build diverse friendships. It is a wonderful opportunity and a highlight of my week.
On October 17th, it was my turn to facilitate, and I introduced the example of "Norway's world's freest prison," providing the theme of the role prisons play (Punishment or Rehabilitation). We exchanged opinions while sharing examples from different countries, and the class was very lively and engaging.
Game Night
An irresistible event for board game enthusiasts, held on campus approximately once per quarter.
Lab Dinner at Palo Alto
I attended a dinner where we engaged in lively discussions about tech news, such as AI regulations, and shared our thoughts on what we would do if we were billionaires. We also exchanged travel recommendations and learned about each other’s experiences living outside of California.
Title Transfer at DMV
In the United States, prices are soaring due to inflation and supply shortages, resulting in used cars that are a few years old being sold at the same price as new cars (which are not available without a wait). While it would have been easier to rely on used car dealerships like Gulliver USA, I learned that I could save on Sales Tax (~10%) through a private sale, so I decided to give it a try. Despite navigating through scams, feeling disappointed by last-minute cancellations, I eventually found an affordable car on Facebook Marketplace. Through this process, I gained experience in drafting my own contract, negotiating the price using KBB and CarFax, and going through the title change registration at the DMV. I was ultimately able to acquire a car (2015 Nissan Altima with 59k miles).
Haloween Decoration Tour
Liz, whom I met at Stanford, took me on a Halloween Decoration Tour. Although it was scarier than the Christmas illuminations I've seen in Japan, the neighborhood was filled with luxurious and animated objects, transforming an ordinary residential area into something akin to a theme park. At the same time, I could really sense the wealth of this region (Menlo Park).
Driving Test
I took the exam at the Santa Clara DMV. It was required to have someone with a valid license accompany me. Mr. Nakanishi from OFS kindly agreed to assist me with this. Fortunately, I encountered a friendly examiner and was able to pass with a perfect score.
Modern Music Concert
I attended a contemporary music concert by Matthew Goodheart held at The Knoll. It was a bit too complex for me, but I felt like I was able to grasp a bit of the effort put into expanding the possibilities of music.
Halloweekend
I participated in the Halloween event held at Mausoleum. The Haunted House was quite impressive and well-done.
Filoli Garden
I was guided through Filoli Garden by Liz. I learned that "Filoli" stands for "Fight for a just cause. Love your Fellow Man. Live a Good Life." It was an incredibly well-maintained and beautiful garden.
Jazz Concert
I attended a Jazz Concert held at Memorial Church. The atmosphere was wonderful, enhanced by the setting of the venue.