본문 바로가기

좋아하는 것_매직IT/96.IT 핫이슈

Karmem - 구글 Flatbuffers 보다 10배 빠른 바이너리 직렬화 포맷 (github.com/inkeliz)

반응형

Karmem - 구글 Flatbuffers 보다 10배 빠른 바이너리 직렬화 포맷을 소개합니다.

Karmem 의 github 에서는 아래와 같이 소개하고 있고요..
Karmem is a fast binary serialization format. The priority of Karmem is to be easy to use while been fast as possible. It's optimized to take Golang and TinyGo's maximum performance and is efficient for repeatable reads, reading different content of the same type. Karmem demonstrates to be ten times faster than Google Flatbuffers, with the additional overhead of bounds-checking included.

⚠️ Karmem still under development, the API is not stable. However, serialization-format itself is unlike to change and should remain backward compatible with older versions.

한마디로, Karmem 은 구글 Flatbuffers 보다 10배 빠른 바이너리 직렬화 포맷이라고 머릿속에 넣어두시면 될것 같습니다.

참고로, 사용 방법은 아래와 같습니다.

Usage

That is a small example of how use Karmem.

Schema

karmem app @golang.package(`app`);  
  
enum SocialNetwork uint8 { Unknown; Facebook; Instagram; Twitter; TikTok; }  
  
struct ProfileData table {  
    Network SocialNetwork;  
    Username []char;  
    ID uint64;  
}  
  
struct Profile inline {  
    Data ProfileData;  
}  
  
struct AccountData table {  
    ID uint64;  
    Email []char;  
    Profiles []Profile;  
}

Generate the code using go run karmem.org/cmd/karmem build --golang -o "km" app.km.

Encoding

In order to encode, use should create an native struct and then encode it.

var writerPool = sync.Pool{New: func() any { return karmem.NewWriter(1024) }}

func main() {
	var writer = writerPool.Get().(*karmem.Writer)

	content := app.AccountData{
		ID:    42,
		Email: "example@email.com",
		Profiles: []app.Profile{
			{Data: app.ProfileData{
				Network:  app.SocialNetworkFacebook,
				Username: "inkeliz",
				ID:       123,
			}},
			{Data: app.ProfileData{
				Network:  app.SocialNetworkFacebook,
				Username: "karmem",
				ID:       231,
			}},
			{Data: app.ProfileData{
				Network:  app.SocialNetworkInstagram,
				Username: "inkeliz",
				ID:       312,
			}},
		},
	}

	if _, err := content.WriteAsRoot(writer); err != nil {
		panic(err)
	}

	encoded := writer.Bytes()
	_ = encoded // Do something with encoded data

	writer.Reset()
	writerPool.Put(writer)
}

Reading

Instead of decoding it to another struct, you can read some fields directly, without any additional decoding. In this example, we only need the username of each profile.

func decodes(encoded []byte) {
	reader := karmem.NewReader(encoded)
	account := app.NewAccountDataViewer(reader, 0)

	profiles := account.Profiles(reader)
	for i := range profiles {
		fmt.Println(string(profiles[i].Data(reader).Username(reader)))
	}
}

Notice, we use NewAccountDataViewer, any Viewer is just a Viewer, and doesn't copy the backend data.

Decoding

You can also decode it to an existent struct. In some cases, it's better if you re-use the same struct for multiples reads.

var accountPool = sync.Pool{New: func() any { return new(app.AccountData) }}

func decodes(encoded []byte) {
	account := accountPool.Get().(*app.AccountData)
	account.ReadAsRoot(karmem.NewReader(encoded))

	profiles := account.Profiles
	for i := range profiles {
		fmt.Println(profiles[i].Data.Username)
	}

	accountPool.Put(account)
}


아래는 github에서 소개하고 있는 벤치마크인데요...한번 참고로 보시면 좋을것 같고요..

Benchmark

Flatbuffers vs Karmem

Using similar schema with Flatbuffers and Karmem. Karmem is almost 10 times faster than Google Flatbuffers.

Native (MacOS/ARM64 - M1):

name               flatbuffers/op    karmem/op    delta
EncodeObjectAPI-8    1.46ms ± 0%    0.32ms ± 0%   -78.22%  (p=0.008 n=5+5)
DecodeObjectAPI-8    2.16ms ± 0%    0.15ms ± 0%   -93.14%  (p=0.008 n=5+5)
DecodeSumVec3-8       887µs ± 1%      99µs ± 1%   -88.86%  (p=0.008 n=5+5)

name               flatbuffers/op   karmem/op   delta
EncodeObjectAPI-8    12.1kB ± 0%     0.0kB       -100.00%  (p=0.008 n=5+5)
DecodeObjectAPI-8    2.74MB ± 0%    0.03MB ± 0%   -98.83%  (p=0.008 n=5+5)
DecodeSumVec3-8       0.00B          0.00B           ~     (all equal)    

name               flatbuffers/op  karmem/op  delta
EncodeObjectAPI-8     1.00k ± 0%     0.00k       -100.00%  (p=0.008 n=5+5)
DecodeObjectAPI-8      108k ± 0%        1k ± 0%   -99.07%  (p=0.008 n=5+5)
DecodeSumVec3-8        0.00           0.00           ~     (all equal)

WebAssembly on Wazero (MacOS/ARM64 - M1):

name               flatbuffers/op    karmem/op    delta
EncodeObjectAPI-8    10.1ms ± 0%     2.5ms ± 0%  -75.27%  (p=0.016 n=4+5)
DecodeObjectAPI-8    31.1ms ± 0%     1.2ms ± 0%  -96.18%  (p=0.008 n=5+5)
DecodeSumVec3-8      4.44ms ± 0%    0.47ms ± 0%  -89.41%  (p=0.008 n=5+5)

name               flatbuffers/op   karmem/op   delta
EncodeObjectAPI-8    3.02kB ± 0%    3.02kB ± 0%     ~     (all equal)
DecodeObjectAPI-8    2.16MB ± 0%    0.01MB ± 0%  -99.45%  (p=0.008 n=5+5)
DecodeSumVec3-8      1.25kB ± 0%    1.25kB ± 0%     ~     (all equal)

name               flatbuffers/op  karmem/op  delta
EncodeObjectAPI-8      4.00 ± 0%      4.00 ± 0%     ~     (all equal)
DecodeObjectAPI-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)
DecodeSumVec3-8        5.00 ± 0%      5.00 ± 0%     ~     (all equal)

Raw-Struct vs Karmem

The performance is nearly the same when comparing reading non-serialized data from a native struct and reading it from a karmem-serialized data.

Native (MacOS/ARM64 - M1):

name             old time/op    new time/op    delta
DecodeSumVec3-8    93.7µs ± 0%    98.8µs ± 1%  +5.38%  (p=0.008 n=5+5)

name             old alloc/op   new alloc/op   delta
DecodeSumVec3-8     0.00B          0.00B         ~     (all equal)

name             old allocs/op  new allocs/op  delta
DecodeSumVec3-8      0.00           0.00         ~     (all equal)

Karmem vs Karmem

That is an comparison with all supported languages.

WebAssembly on Wazero (MacOS/ARM64 - M1):

name \ time/op     result/wasi-go-km.out  result/wasi-as-km.out  result/wasi-zig-km.out  result/wasi-c-km.out  result/wasi-swift-km.out
DecodeSumVec3-8               470µs ± 0%             932µs ± 0%              231µs ± 0%            230µs ± 0%              97822µs ± 5%
DecodeObjectAPI-8            1.19ms ± 0%            3.70ms ± 0%             0.62ms ± 0%           0.56ms ± 0%              74.72ms ± 4%
EncodeObjectAPI-8            2.52ms ± 0%            2.98ms ± 2%             0.71ms ± 0%           0.67ms ± 0%              42.45ms ± 7%

name \ alloc/op    result/wasi-go-km.out  result/wasi-as-km.out  result/wasi-zig-km.out  result/wasi-c-km.out  result/wasi-swift-km.out
DecodeSumVec3-8              1.25kB ± 0%           12.72kB ± 0%             1.25kB ± 0%           1.25kB ± 0%               2.99kB ± 0%
DecodeObjectAPI-8            11.9kB ± 1%            74.2kB ± 0%            164.3kB ± 0%            1.2kB ± 0%              291.7kB ± 3%
EncodeObjectAPI-8            3.02kB ± 0%           38.38kB ± 0%             1.23kB ± 0%           1.23kB ± 0%               2.98kB ± 0%

name \ allocs/op   result/wasi-go-km.out  result/wasi-as-km.out  result/wasi-zig-km.out  result/wasi-c-km.out  result/wasi-swift-km.out
DecodeSumVec3-8                5.00 ± 0%              5.00 ± 0%               5.00 ± 0%             5.00 ± 0%                35.00 ± 0%
DecodeObjectAPI-8              5.00 ± 0%              4.00 ± 0%               4.00 ± 0%             4.00 ± 0%                35.00 ± 0%
EncodeObjectAPI-8              4.00 ± 0%              3.00 ± 0%               3.00 ± 0%             3.00 ± 0%                33.00 ± 0%

 

 

지원하는 언어는 위와 같습니다. 

그리고, Karmem 의 특징을 간략하게 정리하면 아래와 같습니다. 

  • WebAssembly Host 와 Guest 간 데이터 전송을 빠르고 쉽게하기 위해 만든 Binary Serialization Format
    → "event-command 패턴"
    → 한번 인코딩하고, 언어에 관계없이 여러 게스트가 동일 콘텐츠를 공유해서 효율적
  • TinyGo 와 WASM에 최적화
  • 반복적으로 같은 타입의 다른 콘텐츠를 읽어들이는데 효율적
  • Object-API 를 제공하지만 그래도 빠름
  • 비교
    • Witx 는 너무 복잡하고 데이터구조 와 함수까지 정의
    • Flatbuffers 는 원하는 만큼 빠르지 않고, 바운드 체킹이 없음
    • Cap'n'Proto 는 좋지만, Zig 및 AssemblyScript 구현이 없음. API도 어려움

 

좀 더 자세한 내용은 아래 github 페이지를 참고하시면 좋을것 같고요..

오늘의 블로그는 여기까지고요..
항상 믿고 봐주셔서 감사합니다.

728x90
300x250